Comparing images using the Vision framework

Photo by Eric Prouzet / Unsplash

Photography is one of my hobbies. There is something magical in taking pictures. It's like stopping time at a particular moment. The only problem is that I'm no expert and not all of my photographs are great, if any, therefore often I take more than one photo. Just to be safe. "Being safe" in my case means I have a bunch of redundant pictures that serve no purpose. I should remove them but I always have a good reason not to. I have more interesting and important things to do or places to be.

Last time I was complaining that I'm terrible at selfies but managed to find a solution to my little problem thanks to technology. Check my Detecting face capture quality using the Vision framework for more details. This time is no different.

The Vision request I will introduce today is unique. VNGenerateImageFeaturePrintRequest behaves slightly different from the other requests. But we will get to that. For now, knowing that this inconspicuous request is able to compare images for us is enough. Let's begin with:

import Vision

Then create a request:

let request = VNGenerateImageFeaturePrintRequest()

And pass it to the request handler:

guard let cgImage = image.cgImage else { return nil }
let request = VNGenerateImageFeaturePrintRequest()

let requestHandler = VNImageRequestHandler(cgImage: cgImage,
                                           orientation: .init(image.imageOrientation),
                                           options: [:])
do {
    try requestHandler.perform([request])
} catch {
    print("Can't make the request due to \(error)")
}

This is explained in detail in my Detecting body pose using Vision framework article.

As the result, we get a VNFeaturePrintObservation:

guard let result = request.results?.first else { return nil }
return result

The complete function:

func process(_ image: UIImage) -> VNFeaturePrintObservation? {
    guard let cgImage = image.cgImage else { return nil }
    let request = VNGenerateImageFeaturePrintRequest()
    
    let requestHandler = VNImageRequestHandler(cgImage: cgImage,
                                               orientation: .init(image.imageOrientation),
                                               options: [:])
    do {
        try requestHandler.perform([request])
    } catch {
        print("Can't make the request due to \(error)")
    }
    
    guard let result = request.results?.first else { return nil }
    return result
}

Note: I'm working with this code in a playground and it makes things easier when the function is synchronous but the Vision requests will block the main thread of your application. Make sure to execute them i.e. on:

private let visionQueue = DispatchQueue.global(qos: .userInitiated)

But remember to return to the main queue to display the results.

And that's it. The work is done.

This is the place where you should ask

How it's done? What about comparing the images?

This is why this request is unique. VNFeaturePrintObservation holds the calculated feature print data. We can't parse it and use it to compare images in an easy way but the observation can do it for us. The way we can use it to compare images is to use observation to compute the distance between the feature prints.

To compare two images similarity we need to:

  • Use the request on both of them to get the VNFeaturePrintObservation containing feature print data for each of them.
  • We use this function provided by the observations to compare the distance between the feature prints they hold:
open func computeDistance(_ outDistance: UnsafeMutablePointer<Float>, to featurePrint: VNFeaturePrintObservation) throws

Notice the UnsafeMutablePointer<Float> which is our outDistance. Returning values is not the only way function can provide its result:

You use instances of the UnsafeMutablePointer type to access data of a specific type in memory. The type of data that a pointer can access is the pointer’s Pointee type. UnsafeMutablePointer provides no automated memory management or alignment guarantees. You are responsible for handling the life cycle of any memory you work with through unsafe pointers to avoid leaks or undefined behavior.

UnsafeMutablePointer Documentation

This is a pointer to a Float located somewhere in memory that our function will use to pass the result.

Pointers are not something we use in Swift on a daily basis. I remember pointers mainly from the C / C++ and... Objective-C days. You can find more information here.

We know now that with pointers we can access specific locations in the memory but this doesn't explain what we need to do to get our distance.

First, we need to have something to compare. This code will produce observations for two images:

let balloon1 = UIImage(named: "balloon_1.jpg")!
let balloon1FeaturePrint = process(balloon1)!

let balloon2 = UIImage(named: "balloon_2.jpg")!
let balloon2FeaturePrint = process(balloon2)!

I use ! for simplicity. I don't recommend this approach on production code.

Then we need to make a variable for our distance:

var balloon1ToBallon2Distance: Float = .infinity

It will hold the distance between feature prints of balloon_1 and balloon_2 images. The lesser the distance is the more the images are similar. Identical images will have a 0.0 distance between their features. That's why we use .infinity as the initial value.

Now it's the time to face the UnsafeMutablePointer<Float>. We are used to providing parameters for functions but this is different. This time we need to provide a pointer which the function can later use to provide us the result. To do it we will use & which in C++ is called address-of operator. Instead of providing balloon1ToBallon2Distance we will pass the &balloon1ToBallon2Distance which is a pointer containing an address to a place in memory where this var is located:

do {
    try balloon1FeaturePrint.computeDistance(&balloon1ToBallon2Distance, to: balloon2FeaturePrint)
} catch {
    print("Couldn't compute the distance")
}

And after the function is complete we will have the result in this variable. Pointers are powerful but also terrifying and dangerous. Behold their power:

print(balloon1ToBallon2Distance)
11.10728

This variable was .infinity a while ago and now it contains the distance between feature points of balloon_1 and balloon_2 images. They are pretty close which means they are similar.

See it for yourself. I made a few photographs of this year's Christmas tree:

As you can see I made four photos of a balloon, one of a heart. The plane photo is for demonstration purposes. The plane is a photo by Lacie Slezak.

Let's imagine we are working on an application that will help us with removing the redundant photographs. The user makes a few photographs and we want to know whether they are similar or not. If we find similar photographs we can try to help the user to select the best one and automatically get rid of the rest.

First, we need to have these feature prints for each image:

let balloon1 = UIImage(named: "balloon_1.jpg")!
let balloon1FeaturePrint = process(balloon1)!

let balloon2 = UIImage(named: "balloon_2.jpg")!
let balloon2FeaturePrint = process(balloon2)!

let balloon3 = UIImage(named: "balloon_3.jpg")!
let balloon3FeaturePrint = process(balloon3)!

let balloon4 = UIImage(named: "balloon_4.jpg")!
let balloon4FeaturePrint = process(balloon4)!

let heart = UIImage(named: "heart.jpg")!
let heartFeaturePrint = process(heart)!

let plane = UIImage(named: "plane.jpg")! // Original photo by https://unsplash.com/@nbb_photos
let planeFeaturePrint = process(plane)!

Then we need to prepare variables for the distance:

var balloon1ToBallon2Distance: Float = .infinity
var balloon1ToBallon3Distance: Float = .infinity
var balloon1ToBallon4Distance: Float = .infinity
var balloon1ToHeartDistance: Float = .infinity
var balloon1ToPlaneDistance: Float = .infinity

The last piece is to calculate distances using observations:

do {
    try balloon1FeaturePrint.computeDistance(&balloon1ToBallon2Distance, to: balloon2FeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToBallon3Distance, to: balloon3FeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToBallon4Distance, to: balloon4FeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToHeartDistance, to: heartFeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToPlaneDistance, to: planeFeaturePrint)
} catch {
    print("Couldn't compute the distance")
}

The idea is simple. We take the first image and compare it to the next images. When the distance is small we can assume the images are similar therefore redundant.

Let's see the results:

11.10728
11.60783
10.46046
21.83002

The Christmas tree is the same and the heart is located near the balloon but the distance has doubled. This is good because this photograph is different than the previous four. The background is similar but the salient object has changed.

28.88646

The plane distance from the balloon_1 is almost three times larger than from the other balloons.

Thanks to this request you can group similar images together. When they are grouped together you can analyze them one by one trying to find the best looking, least blurry image of them all. You can apply other vision requests like saliency detection to find which one has objects positioned in a better way, the face capture quality request can find the one with better-captured faces. For more requests check the Vision series.

You can find the code here:

import UIKit
import Vision

func process(_ image: UIImage) -> VNFeaturePrintObservation? {
    guard let cgImage = image.cgImage else { return nil }
    let request = VNGenerateImageFeaturePrintRequest()
    
    let requestHandler = VNImageRequestHandler(cgImage: cgImage,
                                               orientation: .init(image.imageOrientation),
                                               options: [:])
    do {
        try requestHandler.perform([request])
    } catch {
        print("Can't make the request due to \(error)")
    }
    
    guard let result = request.results?.first else { return nil }
    return result
}

let balloon1 = UIImage(named: "balloon_1.jpg")!
let balloon1FeaturePrint = process(balloon1)!

let balloon2 = UIImage(named: "balloon_2.jpg")!
let balloon2FeaturePrint = process(balloon2)!

let balloon3 = UIImage(named: "balloon_3.jpg")!
let balloon3FeaturePrint = process(balloon3)!

let balloon4 = UIImage(named: "balloon_4.jpg")!
let balloon4FeaturePrint = process(balloon4)!

let heart = UIImage(named: "heart.jpg")!
let heartFeaturePrint = process(heart)!

let plane = UIImage(named: "plane.jpg")! // Original photo by https://unsplash.com/@nbb_photos
let planeFeaturePrint = process(plane)!

var balloon1ToBallon2Distance: Float = .infinity
var balloon1ToBallon3Distance: Float = .infinity
var balloon1ToBallon4Distance: Float = .infinity
var balloon1ToHeartDistance: Float = .infinity
var balloon1ToPlaneDistance: Float = .infinity

do {
    try balloon1FeaturePrint.computeDistance(&balloon1ToBallon2Distance, to: balloon2FeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToBallon3Distance, to: balloon3FeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToBallon4Distance, to: balloon4FeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToHeartDistance, to: heartFeaturePrint)
    try balloon1FeaturePrint.computeDistance(&balloon1ToPlaneDistance, to: planeFeaturePrint)
} catch {
    print("Couldn't compute the distance")
}

If you want to play with Vision and see it for yourself you can check the latest version of my vision demo application here. You can find the code used in this article here.

If you have any feedback, or just want to say hi, you are more than welcome to write me an e-mail or tweet to @tustanowskik

If you want to be up to date and always be the first to know what I'm working on tap follow @tustanowskik on Twitter

Thank you for reading!

If you want to help me stay on my feet during the night when I'm working on my blog - now you can:

Kamil Tustanowski is iOS Dev, blog writer, seeker of new ways of human-machine interaction
Hey ūüĎčIf you are seeing this page it means you either read my blog https://cornerbit.tech¬†or play with my code on GitHub https://github.com/ktustanowski.Thank you...
Kamil Tustanowski

Kamil Tustanowski

I'm an iOS developer dinosaur who remembers times when Objective-C was "the only way", we did memory management by hand and whole iPhones were smaller than screens in current models.
Poland