A Practical Comparison of Face Detection and Recognition Tools

As an IT company, Diatom Enterprises has been producing custom software for already 15 years. However, during the recent year, we have been deeply interested in the IoT, AI and robotics, and the Robot Pepper was selected as a perfect platform to integrate all Diatom’s developments and to bring it to the business environment.

Lately, we encountered the necessity to use face detection and recognition on one of our experimental projects for the robot Pepper and faced several challenges with this feature.
On Pepper, the built-in face detection and recognition functions have several issues:

Lengthy face detection process – up to 15 seconds to detect a person’s face

Unstable face recognition – In good lighting conditions, it is 6 of 10; in low light conditions, 4 of 10.

We decided to find a way to improve the main disadvantages of Pepper.

Our basis for the new approach was to use a person-tracking feature on Pepper. This feature indicates when there is a person around. Once we know that a person is in front of Pepper, we use Pepper’s video stream to take a picture, assuming that the person’s face should be there.

The next step is to recognize the face. We extended a Microsoft web API for face recognition to pre-learn new faces from images. Once we upload new face images to the Microsoft Face API, the person is ready to be recognized.

Using our web API, we upload a picture taken by Pepper to the Microsoft Face API service and get JSON data about the person in response if the image was recognized. We can get the name, age, emotion, gender and facial features such as glasses, moustache, beard and sideburns in return. Pepper then uses this info on its own.

Once we had evaluated this method of face detection and recognition, we decided to look around and find other available solutions as well. We figured out that there are basically two working approaches: either use a web-based API service for face recognition or a computer-hosted application that uses a facial recognition tool. We played with a few of the popular available tools for face detection and recognition.

Below is a short summary of our results.

Microsoft Face API

This is a web-based service for face recognition and detection. We created our own wrapper for the available Microsoft Face API methods. The wrapper has some additional functionality we needed in order for it to work with Pepper.

This method produced the following results:

– Hybrid approach: Face detection is on Pepper (computer); recognition takes place over the web API service.
– Overall, face detection and recognition now take up to six seconds – two seconds to take the picture on Pepper and three to four seconds to transfer it over the internet, recognize it and send the result back to Pepper.
– Overall time to detect and recognize a person – three to seven seconds
– Face recognition now is very stable; it is 18 of 20.
– Cost: MS Face API is $1.50 per 1,000 transactions for 0–1,000,000 transactions.
Face storage costs $0.50 per 1,000 images, per month. See more here.

Emgu CV .NET wrapper of the OpenCV open source library

This approach works on a computer as a standalone running application. The computer has to have a camera connected to it. WebIP cameras also work well for this.

We used a Windows-based desktop application to detect and recognize faces. The face detection is very stable and is able to detect a face within four meters. The face recognition uses a proprietary database. Each person can have several faces stored in the database. Unfortunately, face recognition works quickly but is very unstable. It cannot be used in production projects.

It produced the following results:

– Hybrid: Face detection is local computer-hosted; face recognition is over a web service.
– Face detection – one second
– Face detection stability – 18 of 20
– Face recognition – one second
– Face recognition stability – 16 of 20
– Working distance to detect and recognize faces – up to four meters
– Overall time to detect and recognize a person – two seconds
– Cost: A commercial license costs $399 for a single developer or $799 for a whole workgroup of 25 developers. See more here.

The face database stores many versions of a person’s face as greyscale images in a folder:

A hybrid approach based on Emgu CV face detection and Microsoft Face API face recognition

We adjusted the existing Windows-based desktop application to use the Emgu CV library for face detection and the Microsoft Face API for face recognition. This gave us improvements in the stability of face matchings.

This approach produced the following results:

– Local computer-hosted
– Face detection – one second
– Face detection stability – 18 of 20
– Face recognition – four seconds
– Face recognition stability – 18 of 20
– Working distance to detect and recognize face – up to 3.5 meters
– Overall time to detect and recognize a person – five to seven seconds
– Cost: see #1 and #2 above

Luxand FaceSDK library

Luxand’s Face API library is a local computer-hosted solution. We used the existing Windows-based demo desktop application to test the functionality. The face recognition uses a proprietary database. An SQL database can be used to store facial data. Each person can have several faces stored in the database. The library gives a fast and stable output.

This solution produced the following results:
– Local computer-hosted
– Face detection – one second
– Face detection stability – 19 of 20
– Face recognition – one second
– Face recognition stability – 19 of 20
– Working distance to detect and recognize faces – up to 3.5 meters
– Overall time to detect and recognize a person – one to three seconds
– Cost: see more here.

Example of live recognition

Summary:

Both web-based and computer-hosted methods for face detection and recognition have their own usage scenarios.
Both can be integrated into existing IT solutions.

The better video camera you use for recognition, the more stable the result you will get.

In our case, the solution based on the Luxand FaceSDK library seems the simplest in terms of development and stability of recognition. The cost of the solution can be requested from the Luxand company and depends on the actual usage scenario.