Animals detection using the Vision framework

Last time we were decoding barcodes and I thought that we could do something more "lively" now. Do you like animals? I hope you do but even if you don't I think this will be interesting.

In a moment I will show you how to detect animals in the images. This is another step in our journey to better understanding the contents of the images. We can locate people in the images and even try to reason about their actions but we can always do better. Knowing the "context" is important.

Do you remember the exercise we did in an article about detecting the human body, hand, and face?

Let's try this again. Imagine we detected a human who is walking in the photo. This is the only thing we know and we don't have the data to make any more assumptions.
After applying animals detection a dog appeared on the left of the human. We can assume that this person is walking the dog. This can lead to another assumption that this person likes and owns an animal.
Knowing that we can automatically propose animal-friendly locations if the person wants to travel, ask to add this animal as a pet if the person didn't do it, and so on.

This is "not our first rodeo" and the code we need for performing the request and displaying the results is similar to the code we were using for barcode detection therefore I will focus on the differences.

As usual, the heart of our task is using the correct request. This time we are interested in VNRecognizeAnimalsRequest. This request differs from the other requests we were working on. The difference is not huge but in comparison, but this request feels closer to CoreML than the other ones.

Let's start with checking the list of animals this request can detect:

let knownAnimals = try? VNRecognizeAnimalsRequest.knownAnimalIdentifiers(forRevision: VNRecognizeAnimalsRequestRevision1).map { $0.rawValue }

This returns a list of animals that this request using revision 1, currently the only one, can detect:

["Cat", "Dog"]

It can detect cats and dogs. Period. At least for now. This is somewhat disappointing but on the other hand, more people have a cat or a dog as a pet than a tiger or elephant.

It's time to create the request:

let animalsRequest = VNRecognizeAnimalsRequest()

And pass it to the request handler to perform it. Please check Barcodes detection using Vision framework and my previous Vision articles if you need more information.

Now it's time to analyze the results. There is another difference here - the VNRecognizedObjectObservation is more generic and closer to CoreML than previous observations:

guard let results = animalsRequest.results as? [VNRecognizedObjectObservation] else { return }

This observation has three parameters we are interested in:

  • The boundingBox which returns the CGRect representing an area in the image where the animal is located.
  • The array of labels containing VNClassificationObservations. Each of these observations has identifier which contains information, in our case, whether a cat or dog was found.
  • The confidence contains values from 0.0 to 1.0 describing whether Vision is certain of this observation or not.

Presenting the results of our animal request is similar to presenting the results of the barcodes request from last week. Because of that, we will reuse the code. The rendering code will stay the same and we will update the code which provides data to display.

First, we make a tuple containing CGRect and Strings containing animals names and positions in the image:

let boxesAndNames = results
    .map { (box: $0.boundingBox.rectangle(in: image),
            name: $0.labels.first?.identifier ?? "n/a") }

In order to get the animal name we need to access labels. One element is all we need therefore we use first. This leaves us face to face with VNClassificationObservation which has the identifier we are looking for.

Next, we prepare data in a format acceptable by our rendering function:

let rectangles = boxesAndNames.map { $0.box }
    .map { CGRect(origin: $0.origin.translateFromCoreImageToUIKitCoordinateSpace(using: image.size.height - $0.size.height),
                  size: $0.size) }

let displayableTexts = zip(rectangles,
                           boxesAndNames.map { $0.name })
    .map { DisplayableText(frame: $0.0,
                           text: $0.1) }

The rest of the code is identical to barcode handling. Below is all the code needed:

extension ImageProcessingViewController {
    func process(_ image: UIImage) {
        guard let cgImage = image.cgImage else { return }
        let animalsRequest = VNRecognizeAnimalsRequest()
        
        let requestHandler = VNImageRequestHandler(cgImage: cgImage,
                                                   orientation: .init(image.imageOrientation),
                                                   options: [:])

        saveImageButton.isHidden = false
        visionQueue.async { [weak self] in
            do {
                try requestHandler.perform([animalsRequest])
            } catch {
                print("Can't make the request due to \(error)")
            }

            guard let results = animalsRequest.results as? [VNRecognizedObjectObservation] else { return }
            
            let boxesAndNames = results
                .map { (box: $0.boundingBox.rectangle(in: image),
                        name: $0.labels.first?.identifier ?? "n/a") }
                
            let rectangles = boxesAndNames.map { $0.box }
                .map { CGRect(origin: $0.origin.translateFromCoreImageToUIKitCoordinateSpace(using: image.size.height - $0.size.height),
                              size: $0.size) }

            let displayableTexts = zip(rectangles,
                                       boxesAndNames.map { $0.name })
                .map { DisplayableText(frame: $0.0,
                                       text: $0.1) }
            
            DispatchQueue.main.async {
                self?.imageView.image = image.draw(rectangles: rectangles,
                                                   displayableTexts: displayableTexts)
            }
        }
    }
}

extension UIImage {
    func draw(rectangles: [CGRect],
              displayableTexts: [DisplayableText],
              strokeColor: UIColor = .primary,
              lineWidth: CGFloat = 2) -> UIImage? {
        let renderer = UIGraphicsImageRenderer(size: size)
        return renderer.image { context in
            draw(in: CGRect(origin: .zero, size: size))

            context.cgContext.setStrokeColor(strokeColor.cgColor)
            context.cgContext.setLineWidth(lineWidth)
            rectangles.forEach { context.cgContext.addRect($0) }
            context.cgContext.drawPath(using: .stroke)

            let textAttributes = [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 20, weight: .bold),
                                  NSAttributedString.Key.foregroundColor: strokeColor,
                                  NSAttributedString.Key.backgroundColor: UIColor.black]
            
            displayableTexts.forEach { displayableText in
                displayableText.text.draw(with: displayableText.frame,
                                          options: [],
                                          attributes: textAttributes,
                                          context: nil)
            }
        }
    }
}

If you diff this code and the code from the previous article you will see not much has changed because these requests produce similar outputs a rectangle and a description.

Please check Barcodes detection using Vision framework for details on the implementation because it's all explained there and I don't want to bore you to tears by repeating myself.

I think it's time to see the result of our work:

Say hello to Fanta. She's the most adorable princess-dog in the whole world. At least my kids tell me this. Every day ūü§£

If you want to play with Vision and see it for yourself you can check the latest version of my vision demo application here. If you want to check the code that was used in this article check version 0.4.0. The code is located in this file.

If you have any feedback, or just want to say hi, you are more than welcome to write me an e-mail or tweet to @tustanowskik

If you want to be up to date and always be first to know what I'm working on tap follow @tustanowskik on Twitter.

Thank you for reading!

This article was featured in iOS Dev Weekly #526 ūüéČ

Kamil Tustanowski

Kamil Tustanowski

I'm an iOS developer dinosaur who remembers times when Objective-C was "the only way", we did memory management by hand and whole iPhones were smaller than screens in current models.
Poland