Q&A with Amberella's Les Kohn
Innovators 2008: Processor innovator discusses video encoding and decoding, multiprocessor programming, and more.
By Brian Dipert, Senior Technical Editor -- EDN, June 26, 2008
Please give us a rundown of your professional background, as well as a description of your current responsibilities at Ambarella.
|
I was intensely interested in electronics and computers growing up, I majored in physics at Caltech but also took classes in electrical engineering and computer science, including a research project to develop a novel system-programming language. This lead to an looking at computer instruction sets for efficient compilation. I was lucky enough to get a microprocessor architect job out of school at National Semiconductor in 1977 at the time that National decided to compete with Intel in the high end microprocessor market. I and a friend from Caltech, were architects for the NS32000 microprocessor family, one of the first 32 bit microprocessors. We then started a networked NS32K workstation project, with a brand new programming language, object based OS and windowed IDE. In hindsight, it was a bit over ambitious (this was before Sun was started).
I joined Intel's microprocessor architecture group in 1982 to work on one of the three 32-bit CISC microprocessors Intel was developing, and tried to persuade Intel to develop a RISC processor as a quicker and more efficient alternative. After several attempts, the Intel 860 was brought to market as the first RISC microprocessor for floating point intensive applications. I and other principles left Intel in 1989 to found a startup developing a multiprocessor multimedia workstation utilizing the 860.
Unfortunately we were unable to get sufficient funding, so we made an 860 based PC graphics add in board product instead.
I worked at Sun from 1991 to 1994 as chief architect of Ultrasparc I and II, Sun's first two 64-bit microprocessors. Ultrasparc included the VIS instruction set extension for image processing and video codec acceleration.
As a result of this experience, I realized that instruction set extensions for general purpose processors were ill suited to the hard real time constraints of video processing.
I was at C-Cube from 1995 - 2000 as chief architect of Dvexpert, the first single chip MPEG-2 codec, Dvexpress the first dual format DV/mpeg codec, and Domino the first HD decode/multistream SD encode SOC. C-cube taught me a lot about what is required to do a high quality real time video codec, as well as the need to develop full SOC solutions to address high volume markets. I found that a general purpose processor for high level processing, plus dedicated hardware coprocessors for low level processing provided the right mix of flexibility and efficiency. This requires a good understanding of the video encoding algorithms upfront so that the right low level operations are designed. Fortunately C-Cube had a world class video algorithm team.
In mid 2000 I became CTO and cofounder at Afara. Afara developed a high throughput server microprocessor based on using a relatively large number of power efficient multithreaded cores. Afara was acquired by Sun in 2002, and the design became the basis of the Sun T1 server family.
In 2004, I moved back into the video space and cofounded Ambarella. The premise of the company was that low power H.264 encoding would enable true hybrid cameras with high quality video and still picture stored in flash memory. As CTO, I work on product roadmap, algorithm, architecture and implementation issues, especially things that cross the boundary between chip design, system hardware and software.
Video is one technology that I follow with notably intimate scrutiny. That's because its evolutions (frame rate, per-frame size, codec, delivery method, etc) uniquely impact numerous areas of the tech spectrum: storage (both magnetic and optical), transmission and reception (LAN and WAN), and processing (encoding, decoding, transcoding, artifact suppression, scaling, and so on, both in dedicated hardware, and in software running on CPUs, GPUs and DSPs). As such, although I'll ask application-specific questions next, I'm curious first to hear your overall impressions of the standard- to high-definition transition, and the MPEG-2 to MPEG-4 transition. Have they occurred at the rate you might have predicted at their inception (which predated your time at Ambarella; certainly C-Cube was evaluating high-definition and MPEG-4 during your time there)? If not, why? And how will the progressions play out in the future?
Major technology transitions including HD and H.264 do generally take longer than initially predicted. There is a large amount of infrastructure investment in existing formats, and large scale adoption of new formats depends on bringing the new format price point down close enough to the old format. This creates a chicken and egg problem because cost depends on volume which in turn depends on cost. Getting the whole ecosystem in place takes time. For example, full HD H.264 decoding on the PC is not yet widespread because hardware acceleration is required. Overall though, there has been tremendous progress in solving these problems over the last couple of years. For example, a full HD H.264 camcorder is now available for $200, less than the price of many SD camcorders.
Going forward, I think it will be a while before there is a mainstream replacement for H.264. MPEG-2 lasted over ten years with continuous improvement in the encoders during that period and I expect to see something similar for H.264. As for evolution of HD formats, I think distribution formats such as broadcast or optical media will also evolve slowly due to the cost of upgrading the infrastructure. However, with flash and hard disk storage and internet content delivery, consumer formats need not be constrained by broadcast or optical disk distribution formats. The next step in video formats is 1080P60. This format avoids the limitations of current HD formats including interlace artifacts of 1080i60 and motion judder artifacts of 1080P30 and it is supported by most current HD TVs via HDMI input. Ambarella recently introduced the first consumer 1080P60 camcorder chip (A390). LCD displays and sensors that support greater than 2MP video are already available, making consumer HD video beyond 1080P resolution possible in the next few years. The key for widespread adoption of higher resolution formats will be availability of decoders for TVs and PCs.
Now for the application-specific focus. The miniDV to high-def camcorder transition is one that I've observed with particular closeness; in no small part, I admit, due to personal interest in the topic. The first HD camcoders for consumers and cost-conscious professionals employed the DV tape-based, high-definition MPEG-2 codec-based HDV format originally developed by JVC. In recent years, JVC has broadened HDV's ubiquity beyond tape to encompass HDDs, and the HD camcorder codec variety has similarly broadened to also encompass H.264 (aka MPEG-4 AVC, aka MPEG-4 Part 10)-based AVCHD, stored on HDDs, optical discs, and flash-memory cards.
Initially, HDV was hampered by scant and feature-deficient support within video-editing packages. Nowadays, HDV support is much more robust than at its inception, while AVCHD is in an embryonic state that mimics its HDV predecessor's (and continued competitor's) past condition, in spite of the fact that AVCHD employs a more modern and therefore (at least conceptually) superior codec from a quality-per-bitrate basis. Meanwhile, miniDV sales remain strong. Consumers, with a shoot-it-and-store-it YouTube mentality, do very little video editing. And, ironically, the extremely entry-level (with entry-level pricetag to match) Pure Digital Systems Flip solid-state camcorder line has, in a fairly short timeframe, reportedly captured more than 10% of the total camcorder market.
Ambarella, being an H.264-based silicon supplier, has a vested interest in AVCHD's success. Do you agree with my analysis of the camcorder market as it has evolved over time, and as it currently stands? How do you see the market evolving in the future, and what factors (such as cost-effective, easy-to-use format support within video-editing software, powerful and inexpensive computers powering that software, and the thankful end of the blue-laser format wars) will influence this evolution?
Tape-based camcorders including DV and HDV are actually a rapidly declining segment of the camcorder market. Last year miniDV was only 29% of unit shipments and is projected to be below 20% this year. Optical-based formats are also fading due to recording time and performance limitations, leaving hard-disk and flash-based storage as the media of choice. Flash-based storage has many advantages over small hard disks including lower power, smaller size, more rugged, better performance, and is becoming compelling as flash prices continue to drop dramatically.
HD camcorders accounted for 12% of unit shipments last year and are projected to reach around 40% by 2010. New HD camcorders have moved to H.264 to improve recording time. Different container formats are being used including .mp4 and AVC-HD. In addition to the traditional camcorder market, I expect that hybrid DSC/HD video cameras will become increasingly popular as the incremental cost for HD video drops. The barrier to entry for flash based camcorders is much lower than traditional tape based camcorders and the cost of a basic HD camcorder/hybrid DSC models will drop below $100 as low price brands compete in the market. Basic, easy to use camcorders such as the Pure Digital will evolve to HD for the same reason.
Internet based video sharing is becoming as ubiquitous as photo sharing (see http://en.wikipedia.org/wiki/List_of_video_sharing_websites). Some video sharing sites have already started to migrate to HD as network bandwidths improve (Vimeo and Vidyyou), but even low-resolution video has improved quality if the source is captured in HD and downsampled for sharing.
Interestingly, I actually see consumers doing more video editing as video sharing proliferates. This in turn will drive more innovation in quick and easy to use video-editing tools. There are currently a number of HD H.264 editing applications available today, including Arcsoft, Cyberlink, Nero, Sonic, and Apple, but there is room for improvement in ease-of-use and performance. This should happen as PCs with H.264 hardware-decoding acceleration become widely available, and as editing packages evolve.
The "megapixel race" for digital still cameras is beginning to level off, and the market is quite saturated in most of the "first world." The healthiest segment of the market right now seems to be digital SLRs, as point-and-shoot camera owners upgrade their equipment in search of value-added features like removable lenses. Meanwhile, even the lowest-priced still cameras exceed the 1920×1080-pixel per-frame resolutions captured by high-def video camcorders. In recent years, I've repeatedly explored (both online and in print) the potential for a single camera capable of capturing >4M-pixel still images and HD video clips to obviate the need for separate still and video cameras, along with the barriers to entry for such a device. I'm curious to hear your thoughts on such a product.
I'm glad we think alike, because enabling hybrid cameras has been Ambarella's objective from our first chip. My experience with dragging both SLR and camcorder around on vacations convinced me that hybrid cameras ware a great concept and the ultimate camera in my opinion would be a hybrid SLR.
Recently, cameras based on the Ambarella A2 have been introduced that can capture 5M-pixel still images and 720P60/1080P30 HD video formats. These cameras retail for around $200 (my picture for this article was shot on one such model), so a lot of progress has been made.
Ideally speaking, a hybrid camera would have no compromise in still or video quality relative to a dedicated device. But this has not yet been achieved. Developing such a product has been a challenge for a number of reasons:
-
Smooth motion video requires the sensor to read out at 60 FPS. This means a sensor that can support 8MP stills should support a readout rate of 480M pixels per second. Most high resolution sensors and image processors are incapable of processing at this rate, so binning (summing pixels together into a single output value before demosaicing) is employed. An 8M sensor might combine 4 pixels together to read out 2M pixels at 60 FPS rather than 8 Mpixels. Although this may sound like "full HD" resolution, binning introduces jaggy-edge artifacts and a significant loss of resolution compared with reading out the full 8M pixels at 60 FPS and downsampling after demosaicing with a high quality filter. Fortunately, high-resolution CMOS sensors have recently been introduced that are capable of reading the full sensor resolution at the 60 FPS rate, and Ambarella's recently introduced A390 can process pixels at this rate. The combination provides video quality that exceeds conventional camcorders, while providing still picture resolution in excess of 6M pixels The fast frame rate can also be applied to still captures, including seamless capture of a high resolution while shooting a video sequence.
-
Camcorder lenses traditionally support a larger zoom ratio (10× or more) combined with a faster maximum aperture (f2.0 or lower) than traditional DSC lens. To achieve this, sensor size and resolution are normally reduced relative to a DSC lens. Personally, I would be happy to tradeoff some zoom range to get a higher-resolution still image while still keeping a reasonable lens size. Still camera makers tend to focus on increasing megapixel count every generation, even though this has reached the point of diminishing returns. It requires some courage for a camera company to buck industry trends and make the right compromises to get a product that can provide great video and still quality.
Several industry observers, myself included, saw Blu-ray's triumph over HD DVD as something of a hollow victory. Sony and its partners poured a tremendous amount of time and money into the development and competitive positioning of Blu-ray, and it's not clear to me that the Blu-ray backers will ever get enough return on their investment to break even, far from turning a tangible profit. In part, this skepticism is a function of the longevity and acceptability of red-laser DVD content, especially if the high-def alternative comes with an incremental price markup. And in part, the cynicism is a function of the blossoming online delivery alternative. Thanks to progressive download capability, a standard-def film is "ready to watch" less than two minutes after I press "rent," over my 2.5-Mbps link. Folks with >10-Mbps broadband connections have a similar near-instant-gratification experience with high-def material.
I'm curious, in light of these industry trends, to hear your well-educated thoughts on how video distribution is going to evolve over time. And, as part of your answer, please share if, when, and over what delivery and playback schemes you think video resolutions will grow beyond 1080 lines in the future.
The biggest problem with any new player format is getting a critical mass of content to make it interesting for consumers. It took several years for red-laser DVD to reach critical mass without a format war. In addition Blu-ray players and content are significantly more expensive and DRM is more restrictive than DVD. At the same time, there is no Internet HD movie download service available, since most Internet connections do not yet support >10-Mbps download speed, and DRM security/business issues are a concern for the content providers. When these issues are resolved I would expect Internet download to be a strong alternative to Blu-ray, but it may not completely replace Blu-ray, just as CD is still widely used today.
Of course an increasing amount of HD content is available today via the broadcast infrastructure (terrestrial, cable, satellite and IPTV), and Comcast has announced that HD VOD movies will be available this year.
Competition between the service providers has resulted in more channels of HD content being delivered, with satellite and IPTV using H.264 to achieve this.
I doubt you will see Hollywood embrace consumer formats beyond 1080 lines anytime soon because:
-
Current HDTVs do not support higher resolution
-
They still have a lot of content to release in 1080 HD first
-
A lot of infrastructure would need to be upgraded.
I would expect that Internet delivery will be mature enough by the time that such content is available that it will be the preferred delivery method.
What other applications for Ambaralla video processors beyond HD camcorders currently exist, and will emerge and evolve over time? In which of these applications does H.264 compete against alternative codecs, such as HD MPEG-2, WMV/VC-1, On2's VP6 and VP7, and wavelets, and how does it stack up against them across common evaluation criteria (storage size and transmission bandwidth for a given quality level across varying types of content, encode and decode processing 'muscle' and latency, industry standardization, etc)? I'm curious hear to hear both about applications in which data interchange is a critical function, therefore interoperability is key, and others of a more 'closed box' nature (such as security systems) in which unique features of a proprietary codec might rise to the top of the priority list.
There are many applications for Ambarella chips beyond camcorders (and hybrid cameras) including:
-
Broadcast infrastructure. Ambarella chips are being used in the majority of broadcast H.264 encoder systems sold today. Although it was not our initial objective to address this market, we found that our low-power, single-chip encoder quality was better than high-end multichip DSP/FPGA solutions.
-
Security market. This includes both HD IP cameras as well as multichannel DVR and NVR encoders.
-
HD video capture for PC platform.
-
Time shifting PVR.
-
HD videoconferencing
Ambarella's biggest challenge today is not the number of applications or potential design wins, but picking the right high-volume opportunities to focus on.
In today's interconnected world, there are few applications that are completely closed systems. For instance, the security market generally requires interoperability to send compressed video to viewers that may be remote from the recording device and PVRs may need to send bitstreams to external decoders. The good news it that H.264's compression efficiency is better than any other commercial video codec that I am aware of, including the codecs that you listed. That's why the next-generation video format for Adobe Flash format is H.264 rather than a proprietary video codec. H.264 is quite flexible in supporting latency/quality tradeoffs, and has different profiles to address different price/feature points. The cost for H.264's higher compression efficiency is that a more complex encoder/decoder is required, but with current technology this is primarily an issue for the PC platform. I believe this will be addressed by using graphics-chip acceleration to get robust HD playback.
The only video application I can think that is not addressed by H.264 is compression of RAW data coming from a sensor. This type of compression is used for high-end digital-cinema cameras, where the image pipeline processing from RAW to YUV data is done offline on a PC, so that parameters such as exposure and white balance can be adjusted in post production (much as people use RAW exposures in SLR cameras). Digital-cinema vendors such as RED have developed proprietary wavelet based compression for this application.
The downside is that software image pipeline processing is quite a bit slower than a hardware image pipeline, and the data rate required is considerably higher than H.264.
Your industry experience in video (first at C-Cube, and now at Ambarella) is notable, but your expertise is much broader than that. You:
-
Were the chief architect of Sun's first two 64-bit microprocessors: the UltraSPARC I and II
-
Served as chief architect of the Intel i860
-
Co-architected the National 32000 microprocessor family
-
Were CTO and co-founder at Afara Websystems, subsequently acquired by Sun Microsystems in 2002, wherein you became a Sun fellow and crafted Sun's roadmap for its UltraSPARC processors.
As such, if you'll indulge me, I'd like to pick your brain on a broader set of computing questions. As you know, Moore's Law ran headfirst into the Laws of Thermodynamics, specifically with respect to leakage-current-driven power consumption, and as such, increases in clock speed have been replaced by increases in core counts. The parallel processing afforded by such architectures, while it might be ideal for specialized applications such as imaging and graphics, has to date not translated into benefits for the broader set of general computing tasks.
Multitasking operating systems enable simultaneous juggling of more than one concurrently running process, of course, but multithreading within a process has to date proven to be quite difficult to accomplish, with underwhelming return on development investments. Is the problem here simply one of development-tool limitations, coupled with generations of software developers who haven't been trained in the necessary skill sets? Or are we facing a fundamental code-limitation roadblock? And if the latter, how can we exploit the burgeoning per-die cost-effective transistor counts which Moore's Law still delivers, aside from simply tossing ever-increasing amounts of cache on-board?
Software developers have dealt successfully with using parallel processing in the server computer space for some time. Major data center applications such as Web services, application servers, and database servers are designed for MP scalability using both loosely and tightly coupled processors. The key to this type of scalability is that there are a large number of concurrent clients which can be serviced in parallel, and the software architecture is optimized around removing dependencies between transactions.
Large-scale scientific code has also been successfully parallelized by appropriate partitioning of the large data sets involved.
The major problem is primarily with client-side applications that do not have a simple paradigm for dividing up the work evenly across many processors. I don't think this problem can be solved by development tools or more programmer training. It will require intensive reengineering of the code by analyzing performance bottlenecks and developing case-by-case solutions. In many cases, it will not be worth the effort to do this and the applications will live with a gradual performance improvement from increasing cache sizes, clock-rate scaling, and microarchitecture optimizations.
On the other hand, the most interesting and compute-intensive tasks on the client side are things that can be parallelized, such as image, graphics and video processing. Also, since the human brain is a highly parallel processing engine, it seems clear that AI-type tasks such as speech recognition and computer vision can also be parallelized. I am quite optimistic that people will figure out how to use whatever transistor budget is provided by future process technology.
Thanks, Les, for sharing your insights with EDN's readers. In closing, do you have any added thoughts on video-related or other topics we haven't already addressed?
The first microprocessor I worked on 30 years ago utilized a 4.5-micron process.
Now 0.045-micron micrprocessors are entering production, with a 10,000× density improvement and bigger die size. However, I believe the performance gain is only around 2000×. Clearly the rate of performance increase in general-purpose processing is slowing down, and the drive for power efficiency and performance will require more specialized processing engines. At the same time, the cost of developing state-of-the-art SOCs and associated software is continuing to rise. There are not many markets that have the volume to justify this level of investment, and it has never been more challenging to be in a semiconductor startup. I often feel like I did on a trip to Mount Everest base camp: The air is thin, but exhilarating.






















