CvVideoCamera vs. native iOS camera APIs

Posted at 24 Jun 2014
Tags: opencv, ios, imageproc

While OpenCV’s CvVideoCamera is great for fast prototyping and provides very easy access to the camera frames of an iOS device, it provides a rather poor performance. This is especially the case when only the luminance pixels (grayscale data) of a camera frame are needed for image processing. This article shows how to use the native iOS camera APIs to achieve this.

The basic setup to process camera frames is provided in an according Q&A by Apple and in an iOS example project AVCam. It is important to set the correct pixel format for the video output frames via AVCaptureVideoDataOutput.videoSettings. All possible values can be found out via AVCaptureVideoDataOutput.availableVideoCVPixelFormatTypes which returns an array of FourCC pixel format codes as integers. If you want to print these codes to the console, you can use the following function to convert them to a string:

void fourCCStringFromCode(int code, char fourCC[5]) {
    for (int i = 0; i < 4; i++) {
        fourCC[3 - i] = code >> (i * 8);
    fourCC[4] = '\0';

The documentation states that the first element in the array is the most efficient pixel format to use (i. e. it means no further conversion work). On recent iOS hardware, it should have the value “420v” which represents a YUV 4:2:0 “bi planar” pixel format that is also called NV12 12. An image with such an pixel format contains 3 “planes” of pixel data. To the first plane belongs the luminance (grayscale) information (Y) with 8 bits per pixel. The second and third plane contains subsampled chromatic information (blue U and red V) in an interleaved manner. If we’re only interested in the luminance information – which is often the case in image processing – this format is very efficient, since we can easily access the Y-plane as a whole instead of converting color pixels to grayscale pixels at first (as with RGB).

When the video output pixel format is set to the mentioned format, we will receive YUV frames when we have implemented the function – captureOutput:didOutputSampleBuffer:fromConnection: from the AVCaptureVideoDataOutputSampleBufferDelegate. Theses frames can be accessed using the sample buffer of type CMSampleBufferRef. It is quite easy to copy the grayscale pixels from such a buffer to a cv::Mat (or any other byte buffer):

+ (void)convertYUVSampleBuffer:(CMSampleBufferRef)buf toGrayscaleMat:(cv::Mat &)mat {
    CVImageBufferRef imgBuf = CMSampleBufferGetImageBuffer(buf);

    // lock the buffer
    CVPixelBufferLockBaseAddress(imgBuf, 0);

    // get the address to the image data
    void *imgBufAddr = CVPixelBufferGetBaseAddressOfPlane(imgBuf, 0);

    // get image properties
    int w = (int)CVPixelBufferGetWidth(imgBuf);
    int h = (int)CVPixelBufferGetHeight(imgBuf);

    // create the cv mat
    mat.create(h, w, CV_8UC1);              // 8 bit unsigned chars for grayscale data
    memcpy(, imgBufAddr, w * h);    // the first plane contains the grayscale data
                                            // therefore we use <imgBufAddr> as source

    // unlock again
    CVPixelBufferUnlockBaseAddress(imgBuf, 0);

As mentioned in this StackOverflow answer, it is very important to use CVPixelBufferGetBaseAddressOfPlane() instead of CVPixelBufferGetBaseAddress() for YUV frames, otherwise the image gets strangely repeated on the left side like this:

cvpixelbuffer fail

You can then work with this cv::Mat as before, you will just notice that the frame rate is much higher as compared to using CvVideoCamera. A full example can also be seen in this github repository of me.



If you spotted a mistake or want to comment on this post, please contact me: post(-at-)mkonrad(-dot-)net.
← “CV Dazzle -- Camouflage from Face Detection
View all posts
Simple camera calibration tool and intrinsics database” →