Detecting face capture quality using the Vision framework

Photo by Andriyko Podilnyk / Unsplash

Selfies. I'm terrible at selfies. Even my daughter laughs at me when I try. I guess:

I'm too old for this 💩!

Roger Murtaugh - Lethal Weapon

Potentially.

What I'm certain of is that technology can help me with my little problem. I may even solve more than one problem at a time.

I like photography. I like to take pictures. The problem is that I don't like to select the best pictures and I end up with redundant photos. What if I could assess the selfie quality? I could make fewer pictures if I knew the quality upfront or I could check multiple selfies quality at once and leave the best one.

This is what we will do today. Despite its mundane character, I think this is one of the features that are easy to apply and can bring a huge benefit to the user.

To make it happen we need VNDetectFaceCaptureQualityRequest this as a result produces a float number telling us how good the face quality is and a frame to indicate where the face is.

As usual, we start with:

import Vision

Then we create the request:

let faceQualityRequest = VNDetectFaceCaptureQualityRequest()

And we pass it to the request handler:

guard let cgImage = image.cgImage else { return }
let faceQualityRequest = VNDetectFaceCaptureQualityRequest()
let requestHandler = VNImageRequestHandler(cgImage: cgImage,
                                           orientation: .init(image.imageOrientation),
                                           options: [:])
do {
    try requestHandler.perform([faceQualityRequest])
} catch {
    print("Can't make the request due to \(error)")
}

This is explained in detail in my Detecting body pose using Vision framework article.

When a request is performed we get the results:

guard let results = faceQualityRequest.results else { return }

The result type is VNFaceObservation. From iOS 15 we don't have to do typecasting anymore.

We know this type because we were working with it before. This time in VNFaceObservation we are interested in the faceCaptureQuality which will give us a float number:

The value ranges from 0 to 1. Faces with quality closer to 1 are better lit, sharper, and more centrally positioned than faces with quality closer to 0.

VNDetectFaceCaptureQualityRequest documentation

Additionally, there is a frame to indicate which info is for which face. More than one face can be verified at a time.

It's time to prepare the results for the presentation. This is done in a few phases because of all the translations we need to do:

let boxesAndNames = results
    .map { (box: $0.boundingBox.rectangle(in: image),
            name: "\($0.faceCaptureQuality ?? 0.0)") }

First, we associate the face capture quality with the bounding box and we project the bounding box CGRect onto the image to get non-normalized values.

let rectangles = boxesAndNames.map { $0.box }
    .map { CGRect(origin: $0.origin.translateFromCoreImageToUIKitCoordinateSpace(using: image.size.height - $0.size.height),
                  size: $0.size) }

Then we translate the non-normalized CGRect to UIKit coordinate space and populate DisplayableText we used in previous articles:

let displayableTexts = zip(rectangles,
                           boxesAndNames.map { $0.name })
    .map { DisplayableText(frame: $0.0,
                           text: $0.1) }

This is described in more detail in Barcode detection using Vision framework and Detecting body pose using Vision framework.

The last part is to prepare an updated image and set it in our user-facing image view:

self?.imageView.image = image.draw(rectangles: rectangles,
                                   displayableTexts: displayableTexts)

But to do that we need the draw function we remember from previous articles. First, we draw the rectangles indicating faces:

extension UIImage {
    func draw(rectangles: [CGRect],
              displayableTexts: [DisplayableText],
              strokeColor: UIColor = .primary,
              lineWidth: CGFloat = 2) -> UIImage? {
        let renderer = UIGraphicsImageRenderer(size: size)
        return renderer.image { context in
            draw(in: CGRect(origin: .zero, size: size))

            context.cgContext.setStrokeColor(strokeColor.cgColor)
            context.cgContext.setLineWidth(lineWidth)
            rectangles.forEach { context.cgContext.addRect($0) }
            context.cgContext.drawPath(using: .stroke)

And then we display texts for each face:

            let textAttributes = [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 20, weight: .bold),
                                  NSAttributedString.Key.foregroundColor: strokeColor,
                                  NSAttributedString.Key.backgroundColor: UIColor.black]
            
            displayableTexts.forEach { displayableText in
                displayableText.text.draw(with: displayableText.frame!,
                                          options: [],
                                          attributes: textAttributes,
                                          context: nil)
            }
        }
    }
}

And this is it.

Friendly reminder: It's good to make a separate queue because vision requests will block the main thread.

Finally, the exciting part:

Original photy by Aaron Andrew Ang
Original photy by Ben White
Original photo by x )

You can see a pattern here. A clear, sharp picture of a face watching towards the camera got a much better score.

This request can detect the quality of multiple faces:

Original photo by Jason Goodman

I would like to show you one last photo:

Original photy by Audrey Fretz

It has the lowest score. But... I like it. My bet is that the rating went down because the face is obstructed, a bit blurry, eyes are closed, it's too close. But... I like it.

This request doesn't tell you whether the selfie is good or bad. It tells whether the face is properly lit, positioned, sharp, and so on. You will know whether, technically speaking, one photo did a better job at exposing the face, than the other. Not which photo is more beautiful.

⚠️ A word of advice. If you use this request to allow users to purge all the bad quality selfies make sure you ask the user for confirmation before you delete them.

Below you can find a complete code. You need to provide the image you want to use for the request and have an imageView you want to use to display updated image:

let visionQueue = DispatchQueue.global(qos: .userInitiated)

func process(_ image: UIImage) {
    guard let cgImage = image.cgImage else { return }
    let faceQualityRequest = VNDetectFaceCaptureQualityRequest()
    
    let requestHandler = VNImageRequestHandler(cgImage: cgImage,
                                               orientation: .init(image.imageOrientation),
                                               options: [:])

    saveImageButton.isHidden = false
    visionQueue.async { [weak self] in
        do {
            try requestHandler.perform([faceQualityRequest])
        } catch {
            print("Can't make the request due to \(error)")
        }

        guard let results = faceQualityRequest.results else { return }
        
        let boxesAndNames = results
            .map { (box: $0.boundingBox.rectangle(in: image),
                    name: "\($0.faceCaptureQuality ?? 0.0)") }
            
        let rectangles = boxesAndNames.map { $0.box }
            .map { CGRect(origin: $0.origin.translateFromCoreImageToUIKitCoordinateSpace(using: image.size.height - $0.size.height),
                          size: $0.size) }

        let displayableTexts = zip(rectangles,
                                   boxesAndNames.map { $0.name })
            .map { DisplayableText(frame: $0.0,
                                   text: $0.1) }
        
        DispatchQueue.main.async {
            self?.imageView.image = image.draw(rectangles: rectangles,
                                               displayableTexts: displayableTexts)
        }
    }
}

extension UIImage {
    func draw(rectangles: [CGRect],
              displayableTexts: [DisplayableText],
              strokeColor: UIColor = .primary,
              lineWidth: CGFloat = 2) -> UIImage? {
        let renderer = UIGraphicsImageRenderer(size: size)
        return renderer.image { context in
            draw(in: CGRect(origin: .zero, size: size))

            context.cgContext.setStrokeColor(strokeColor.cgColor)
            context.cgContext.setLineWidth(lineWidth)
            rectangles.forEach { context.cgContext.addRect($0) }
            context.cgContext.drawPath(using: .stroke)

            let textAttributes = [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 20, weight: .bold),
                                  NSAttributedString.Key.foregroundColor: strokeColor,
                                  NSAttributedString.Key.backgroundColor: UIColor.black]
            
            displayableTexts.forEach { displayableText in
                displayableText.text.draw(with: displayableText.frame!,
                                          options: [],
                                          attributes: textAttributes,
                                          context: nil)
            }
        }
    }
}

⚠️ I made a playground version of this code first but noticed that it was producing incorrect output. The same code used in the application on the device is providing correct output but if you run the code in the playgrounds or in the simulator - it's wrong.

If you want to play with Vision and see it for yourself you can check the latest version of my vision demo application here. You can find the code used in this article here.

If you have any feedback, or just want to say hi, you are more than welcome to write me an e-mail or tweet to @tustanowskik

If you want to be up to date and always be the first to know what I'm working on tap follow @tustanowskik on Twitter

Thank you for reading!

This article was featured in Awesome Swift Weekly #289 🎉

If you want to help me stay on my feet during the night when I'm working on my blog - now you can:

Kamil Tustanowski is iOS Dev, blog writer, seeker of new ways of human-machine interaction
Hey 👋If you are seeing this page it means you either read my blog https://cornerbit.tech or play with my code on GitHub https://github.com/ktustanowski.Thank you...
Kamil Tustanowski

Kamil Tustanowski

I'm an iOS developer dinosaur who remembers times when Objective-C was "the only way", we did memory management by hand and whole iPhones were smaller than screens in current models.
Poland