While OpenCV’s CvVideoCamera is great for fast prototyping and provides very easy access to the camera frames of an iOS device, it provides a rather poor performance. This is especially the case when only the luminance pixels (grayscale data) of a camera frame are needed for image processing. This article shows how to use the native iOS camera APIs to achieve this.
The basic setup to process camera frames is provided in an according Q&A by Apple and in an iOS example project AVCam. It is important to set the correct pixel format for the video output frames via
AVCaptureVideoDataOutput.videoSettings. All possible values can be found out via
AVCaptureVideoDataOutput.availableVideoCVPixelFormatTypes which returns an array of FourCC pixel format codes as integers. If you want to print these codes to the console, you can use the following function to convert them to a string:
The documentation states that the first element in the array is the most efficient pixel format to use (i. e. it means no further conversion work). On recent iOS hardware, it should have the value “420v” which represents a YUV 4:2:0 “bi planar” pixel format that is also called NV12 12. An image with such an pixel format contains 3 “planes” of pixel data. To the first plane belongs the luminance (grayscale) information (Y) with 8 bits per pixel. The second and third plane contains subsampled chromatic information (blue U and red V) in an interleaved manner. If we’re only interested in the luminance information – which is often the case in image processing – this format is very efficient, since we can easily access the Y-plane as a whole instead of converting color pixels to grayscale pixels at first (as with RGB).
When the video output pixel format is set to the mentioned format, we will receive YUV frames when we have implemented the function
– captureOutput:didOutputSampleBuffer:fromConnection: from the
AVCaptureVideoDataOutputSampleBufferDelegate. Theses frames can be accessed using the sample buffer of type
CMSampleBufferRef. It is quite easy to copy the grayscale pixels from such a buffer to a
cv::Mat (or any other byte buffer):
As mentioned in this StackOverflow answer, it is very important to use
CVPixelBufferGetBaseAddressOfPlane() instead of
CVPixelBufferGetBaseAddress() for YUV frames, otherwise the image gets strangely repeated on the left side like this:
You can then work with this
cv::Mat as before, you will just notice that the frame rate is much higher as compared to using
CvVideoCamera. A full example can also be seen in this github repository of me.