Is Intel within ARM’s reach? Pedestrian Detection shows the way
Comparing smart-phone performances - and the SOC and processor cores that drive them - has been a hotly discussed topic of late. More so now, since Intel is trying to challenge ARM in the low-power mobile space with the Atom processor, while ARM is trying to challenge Intel in the server space with the Cortex-A53 and A57. There have been articles written previously comparing the performance of ARM-based phones Vs Atom-based ones - and many benchmarks too- but perhaps not one that compares Cortex-A15 Vs A9 Vs Intel Core i3 from an actual developers perspective. In this post I will try to share my experience working on optimizing computer vision algorithms, pedestrian detection in particular, and compare the performance of the same on the three processors.
Computer vision in layman's terms can be described as analytics that can gleaned from a video or an image. Face detection and recognition are easily understood examples of computer vision.
So why use a computer vision algorithm for benchmarking? Vision algorithms are very compute intensive, and with increasing availability of processing power, these algorithms are finding more application in the real-world. Unlike other compute intensive tasks of earlier days like graphics and video codecs, vision algorithms usually don't have a dedicated engine inside the processors to accelerate them.
Consequently, all the computation is done on the CPU itself, and vision algorithms are usually a good mix of fixed and floating point number crunching and a good amount of conditional executions too - making them a decent choice for benchmarking CPUs.
Pedestrian detection - the industry term for detecting humans in videos - is one of the challenging problems that automotive makers are working on for next generation safety using cameras. The most popular algorithm for pedestrian detection is HOG (Histogram of Oriented Gradients) proposed by Navneet Dalal and Bill Triggs. Its available as part of OpenCV - an open source computer vision library with a permissive license - and is pretty well optimized. However it's still not fast enough to run real-time on most processor architectures.
On an Intel Core i3 machine, using a single core running at full clock doing an exhaustive search for humans with a minimum size of 64x128 pixels on a VGA sized image, is 13 times slower than desired. To make things worse, the automotive industry usually can't afford to put a Core i3 processor in a car due to power issues. So most of the processors that are used have an ARM core (Cortex-A8, A9 and now A15) plus a DSP/FPGA inside it. So the need of the day is clearly to optimize HOG further - a lot further.We chose the ARM core for optimization because of its ubiquity.
Optimizing an algorithm basically includes two steps. The first step is what we call C optimizations and are mostly processor agnostic. In the second step, we focus more on ARM architecture, primarily the Neon instructions - similar to SSE instructions in x86. The basic algorithm though remains the same and so does the output (detection accuracy).