Voices: Marvell's Nikhil Balram: A "visual-pipeline" view
In a wide-ranging Q&A session with EDN's Brian Dipert, a noted digital-video expert shares his views on trends in video processsing, flat-panel displays, digital TV, blue-laser DVDs, video codecs, media distribution, and much more.
By Brian Dipert, Senior Technical Editor -- EDN, May 10, 2007
Nikhil Balram is vice president and general manager of the digital-entertainment-business unit, communications and consumer-business group at Marvell. Previously the chief technology officer for National Semiconductor's displays group, he most recently served as the general manager of the company's high-definition products division. He has more than 18 years of experience in digital image, video, and display processing and has served as an executive with Faroudja, Sage, Genesis Microchip, and SonicBlue. Balram holds bachelor's, master's, and doctorate degrees in electrical engineering from Carnegie-Mellon University (Pittsburgh).
I've reread the introduction to yourself that you wrote on EDN's '' (consumer-electronics) blog. You've certainly had an interesting career so far. Are there any introductory comments you'd like to make about your employment history to date—any experiences that stick out or valuable lessons you've learned? One interesting aspect of your career thus far, to me, is that you alternated between being a semiconductor supplier and being a semiconductor customer. And, initially at Faroudja, you wore both hats: IC developer and user. You're in the supplier role now. How has being on 'the other side of the table' in the past been helpful to you as an IC architect and now as a market-development person? And to what degree are you still wearing both hats at Marvell? I'm thinking of the increasing trend for OEMs—your customers—to take IC suppliers' reference hardware and software designs directly or nearly directly into production?
As evident in my career history, I like to take a system, or 'pipeline,' point of view. From a technical perspective as an imaging/video/display engineer, this [viewpoint] means understanding the whole 'visual pipeline' from pixel creation to pixel rendering. Applying the same mindset as a consumer-electronics professional means understanding the product pipeline—from algorithm and feature invention to silicon and software creation to design of the system product, all the way to the sale and how the consumer uses the product. Early on, I felt strongly that I could not be successful as a silicon developer and supplier, unless I understood the endgame. So, I seized the opportunity at Faroudja and again at SonicBlue to understand the development of, the sale of, and the usage of the end product and to apply this knowledge to the development of my next generation of silicon. As you point out, with the increasing complex nature of CE [consumer-electronics] products, it has become important for the silicon provider to provide the OEMs with complete solutions that they can quickly take to market exactly as is or with special customizations. This [approach] is exactly the approach Marvell is taking with the Marvell Total Solutions program.
Speaking of SonicBlue, I've mentioned to you in the past that our ReplayTV 4000-series PVR is by far my wife's favorite piece of technology gear. I know that you worked on the follow-on ReplayTV 5000 series, but I suspect its predecessor's technology acted to at least some degree as its foundation. So, I have to ask: Just how does the commercial-skipping feature work and work so well? With the exception of especially schizophrenic programs, such as 24, its ability to detect and jump past commercials is uncanny.
The basic principles of commercial-skipping actually predate the ReplayTV 4000 PVR and go back to certain models of VCRs. Without commenting specifically on ReplayTV or any other specific implementation, let me explain the general principles behind commercial-skipping: In general, commercials inserted into broadcast TV have certain special characteristics: They are usually bracketed by one black frame on either side, their length is usually a multiple of 15 seconds (15, 30, 45 or 60), and the audio volume is usually higher than the regular programming. [The system uses] some or all of these characteristics to determine if a segment of the recorded material is a commercial. One reason that commercial-skipping is less reliable for shows like 24 or Alias is because they tend to have a lot of dark scenes, which could sometimes be confused as the bracketing frame around a commercial.
One of my biggest professional challenges is to not adopt a Northern California-centric—that is, leading-edge-technology availability, pervasive usage, automobile-centric culture, moderate yearlong climate—and Brian-centric—early adopter well up the learning curve on a number of technologies, comparatively high-versus-norm discretionary income and educational background, home-based office worker, infrequent automobile user, infrequent television watcher—view of the world. By virtue of your frequent travel, both domestic and international, I'd posit that you're more easily able to adopt a broad perspective on technology opportunities and hinge factors. You'll have the opportunity to give a geography-tailored slant on technologies in your answers to subsequent questions, but, for now, are there any broad-stroke geography-specific comments you'd like to note?
I will be writing about this topic in EDN's ' ' blog, so let me not steal my own thunder. The one comment I'll make for now is that, when I travel around the world looking at what is happening in CE in different regions, it feels like being in a time machine. I see things happening in less developed markets that are very similar to what happened in more developed markets, such as Western Europe and the United States over the last decade. So, the history of the developed markets provides a crystal ball into the future of the developing markets—but with one big caveat: Regional cultural and social differences and differences in perception act as modifying factors, so that what happens in these new markets has great similarities but at the same time important differences, as well. To cite one example, in India, many of features required in a step-up TV are what one would have predicted from the US experience but with one big difference: audio power. In the United States, TVs tend to have low-power audio speakers because people who care about audio migrate to using a separate home-audio system, and most houses have small numbers of people and low noise levels. In India, households tend to be large and noisy, and people who spend a little extra money for a better TV expect it to put out enough audio power to be heard above the din. There are many other examples of 'almost the same but different in one or two ways.' It reminds me of the old Ray Bradbury science-fiction story A Sound of Thunder, in which a tour operator takes people back to prehistoric times and allows them to kill dinosaurs, which are about to die off anyway. But, when one of the customers runs in a panic and steps on a butterfly, they come back to a world that is familiar yet quite different.
VIDEO PROCESSING
The video-processing technology progression that you've participated in and, in no small part, driven is one of the best examples of the power of Moore's Law that comes to my mind. Gear you used to design at Faroudja, which cost niche videophile-market customers tens to hundreds of thousands of dollars and was constructed of tens to hundreds of ICs and other components, now sells in single-IC form to masses of consumers for a few dollars. I suspect you concur with my observation, but it never hurts to ask: Do you agree, and do you have any thoughts to share?
This is true. Moore's Law is one of the best examples of a branch of technology with a continuous stream of significant benefit to the consumer. This [thought] is exactly the one that went through my mind when I came to Faroudja. In 1997, S3 recruited me to lead its Faroudja Project. It had licensed Faroudja technology with the goal of bringing cinema-quality video to the desktop, and it wanted someone who had video, IC, architecture, and management skills to lead this [project]. It was a tremendous challenge to take a stand-alone box that was used to a single simple operating mode—plug in power and video and connect to projector—and create an integrated module inside an SOC [system on chip] to work in a complex environment with many types of video sources, many of which were nonisochronous. We were able to meet the technical challenges of the project, but the business challenges defeated us. The PC multimedia market was driven by 3-D-graphics performance, and video performance was viewed as too 'nichey' and too subjective. However, during the course of this development, it became clear to me that Moore's Law had caught up with the Faroudja boxes; we could cost-effectively put the whole pipeline into fewer ICs. I had an epiphany that the coming of pixilated displays would transform video-format conversion from a niche market into a mainstream requirement, and I saw that, thanks to Moore's Law, we could provide Faroudja-quality for the mainstream. I approached Faroudja with this vision, and the new chief executive officer immediately bought into it and hired me to help lead the company in this new direction.
I'd like to next ask about one aspect of the Moore's Law trend that we discussed in the previous question. To what degree do you think that semiconductor vendors' integration-cultivated price drops created the mass market for video processing, versus the other way around? In other words, which came first: the customer demand or the product supply?
This one is easy. The chicken came first. Seriously, what I mean is that, for semiconductors, the end product (system) usually comes first and establishes itself as a desirable consumer product, and this [occurrence] motivates semiconductor companies to invest in improvements in quality, feature set, and cost. Once they start to invest, it starts a strong self-reinforcing positive cycle, which gives the impression of the chicken and the egg. But, if you go back to the beginning, the starting point is always the chicken. You can see this [scenario] in the case of the PC and again in the case of the flat-panel TV. Flat-panel TVs came along and captured the imagination of the consumer, even though they were horrendously expensive. The invention of the flat-panel TV and the validation of consumer interest in the product category spurred the suppliers to make their components better and cheaper. This [situation] was as true for the panel component as it was for the electronics components. On the flip side, we are all aware of product categories that did not take off, simply because there wasn't sufficient consumer interest to start the cycle.
Video processing, in many people's minds, first and foremost—if not completely—means interlaced- to progressive-scan conversion. Early techniques for accomplishing this process were crude—discarding one of the two interlaced fields and brute-force 'doubling up' the other, for example, or preserving the alternating fields but employing a 'bob' line-doubling algorithm to create the missing lines in each resulting converted frame, or 'weaving' the two fields together, and, if necessary, repeatedly displaying the same weaved frame to create the desired frame cadence. More elaborate techniques that appeared over time intelligently selected among and blended these approaches in a content-specific manner. All of these techniques have now broadened beyond standard-definition video to encompass high definition. Do you have any thoughts on how the deinterlacing evolution has occurred to date and how it may further evolve in the future? For example, will the need to comprehend interlaced-image capture always be a legacy requirement for video processing, or will its necessity ever disappear as, for example, video archives are progressive-scan-transformed and 480i and 1080i videocameras evolve into progressive-scan counterparts?
I still recall bob and weave … in the PC industry in the late '90s, as if they were the pinnacle of deinterlacing techniques. I gave a talk at WinHEC [Windows Hardware Engineering Conference] in 1998 that provided a theoretical foundation to explain what artifacts are produced by various de-interlacing techniques, including bob and weave, and that demonstrated the difference between these techniques and per-pixel motion-adaptive deinterlacing. It is remarkable to think how many types of CE and PC-CE devices came to market, even as late as 2001 and 2002, that employ bobbing as their primary deinterlacing technique. In the case of LCDs, the slow response proved fortuitous because it damped out much of the objectionable line flicker produced by bobbing, making it easier to get away with such inferior deinterlacing. By the time response times had improved sufficiently to make line flicker very visible, most TVs had moved to using various forms of 3-D motion-adaptive deinterlacing.
History does repeat itself. Price pressure from Plasma displays forced RP-TV and LCD-TV makers to adopt 1080p resolution earlier than most had anticipated. As a result many first-generation 1080p sets in 2004 to 2005 time frame did not have 3-D motion-adaptive deinterlacing for 1080i implemented in their electronics and thus ended up handling 1080i with bobbing techniques.
I think interlace will be with us for a while since it is a part of all major HD standards, so at least video 'sink' devices, such as TVs, need to support deinterlacing for the foreseeable future. In theory, everything should and will move toward progressive, but, from a practical perspective, there will always be a temptation to drop to interlace for some situations in which you need to make a trade-off. If I need to transfer video via a standard HD interface, I have to choose between 1080i60, 1080p24, and 720p60. In this case, the one that seems to offer the best compromise in transferring the most resolution and motion, while promising the full reconstruction of both on the display side, is 1080i60.
My next question involves another legacy video-processing technique—this time, responding to the difference in frame rates between film of 24 frames/sec and video of 30 frames/sec in NTSC (National Television System Committee) markets, for example. Particularly as we move into the era of digital cinema, thereby obsoleting traditional silver-halide film and the equipment that projects it, will there ever be a time when we can eliminate the need for telecine and inverse-telecine support? Or, conversely, in the blossoming digital capture, edit, and projection era, will the number of possible frame rates similarly blossom, thereby increasing the need for robust frame-rate-conversion processing throughout the chain?
I think there will always be multiple frame rates because of differences in priorities for different types of applications. In some cases, higher resolution is more important than higher frame rate, and, in others, it's exactly the opposite. Because the ATSC (Advanced Television Systems Committee) standard and other HD standards have allowed for multiple frame rates, I expect digital TVs will continue to support multiple frame rates from an input point of view. However, most display technologies tend to provide best front-of-screen performance when the frame rate is fixed, so the onus falls on the electronics to do proper frame-rate conversion from various input rates to the desired display rate.
From our past conversations, I know that you (for obvious reasons) closely follow trends in display technologies. What are your thoughts about the 120-Hz-refresh displays that are now entering the market? Since their refresh rate is an integer multiple of both 24-fps film-sourced material and 30-fps video-sourced content, their suppliers claim that these displays can handle both types of content in a more artifact-free manner. What's the reality behind the marketing hype?
The integer ratio that you mentioned—120=5×24=2×60—is not the primary driver behind the claim that 120 Hz provides a better image. This high-frame-rate driving is being proposed as a solution to the so-called perceptual-blur problem of the LCD. The LCD has two major blurring mechanisms, one of which is physical and the other of which is perceptual. The physical mechanism is the inertia in the motion of the liquid-crystal molecules that are being rotated to reduce or increase the amount of light that is allowed to pass through. The inertia leads to the molecule's not reaching its desired position within a frame time, and this slow 'dynamic response' leads to moving images that appear washed out. This problem is most acute with small changes in gray scale that require small changes in the position of the molecules. This issue has been largely overcome through a mechanism called overdrive, in which the LCD electronics store the current image and use knowledge of the current value at each pixel, versus the value desired in the next frame, and accordingly apply a compensated drive that is higher or lower, so that the pixel reaches the desired value within the next frame time. You can see that new LCD TVs report 'gray-to-gray' response times of 8 msec or less.
Aside from this physical mechanism, an additional source of blurring is the fact that LCDs are so-called scan-and-hold, or 'always-on,' displays, where the image is visible on the screen throughout each frame time, unlike emissive displays, such as the CRT, PDP [plasma-display panel], or OLED [organic light-emitting diode], where there is a dark period. This always-on behavior conflicts with the way the human eye tracks and perceives motion. The eye works best when presented with a series of images with a dark period in between. For example, in a movie theater, a viewer's eyes track an object of interest as it appears in consecutive frames and fills in the image of its intermediate points during the intervening dark periods. In the case of an always-on display, such as an LCD, the moving object appears to stay in one place and then jump to the new location. The disparity between the fixed location of the object and the intermediate position that the eye expects to see leads to the perception of blurring. There are various ways of solving this problem, and it is a very active research area, but the currently most favored approach is to interpolate new frames and run the display at a higher frame rate, so that the jumps in the moving object are much smaller. There is nothing magical about 120 Hz except that the integer ratio that it has with 60 and 24 Hz makes it a convenient number from the perspective of various commonly used motion-compensated frame-interpolation algorithms.
What is the reality behind the marketing hype? There is no doubt that one can be made to see the differences between a 60-Hz LCD and a 120-Hz LCD. The simplest demo is done with a horizontally scrolling image, where you can see a major difference in the sharpness of various vertically oriented edges. The difference is less perceptible for real video and film because in the scenes that have motion, the edges of most moving objects suffer from optical motion blur, and the presence or absence of perceptual blur is less visible. However, in cases where there is a single major object of interest that the camera is closely tracking the camera, the difference between 60 and 120 Hz is apparent. In other words, I can certainly create demo material to show you the value of a good 120-Hz display versus a standard 60-Hz display.
Differences in frame rate between source and destination are only part of the video-processing equation; comprehending differences in frame size between source and destination is also important. Scaling down incoming content is a fairly straightforward process, although doing so when the source-to-destination pixel dimensions aren't an even multiple of each other as well as there being a consistent multiplier for both horizontal and vertical dimensions has potential artifact impacts. Scaling up, thereby interpolating missing pixels from content in as pleasant a manner as possible is far more challenging. Take, for example, the scenario of playing back substandard definition content from YouTube on a large-screen 1080p LCD without creating a blurry mess. And converting 4-to-3-aspect-ratio material into wide-screen material without vertical black bars or immediately apparent short and fat results—and vice versa—is a tremendous challenge, regardless of whether scaling down, scaling up, or not scaling is also occurring. What insight can you give into these aspects of video processing?
Downscaling and upscaling are challenging but well-understood operations that [experts can do well]. However, as new companies enter the CE-video-processing market, they often learn the same lessons the experts did by making the same mistakes. As you point out, applying scaling for simple upconversions or downconversions in resolution is much easier than applying it for aspect-ratio conversion. There is no perfect solution to an electronic aspect-ratio conversion. One can avoid scaling by applying horizontal (pillar box for showing 4-to-3 on 16-to9 display) or vertical padding (letterbox for showing 16-to-9 on 4-to-3 display or 2.35-to-1 on 16-to-9) or by applying 'continuous' scaling, which maintains the aspect ratio in the center region and smoothly adjusts it as one approaches the left/right/top/bottom sides of the image. My opinion is that, as displays get really large and even the average consumer is able to watch video on 40-inch and larger TVs, it is better to give up some real estate in the form of letterbox or pillar box and watch the content in its original unprocessed aspect ratio.
You mentioned the scenario of playing substandard content from YouTube on a large 1080p display. I believe this is an important point. As we move to a seamless networked-home-media experience, consumers will want to watch all types of video content on their big-screen TV, and they will be surprised and disappointed at how poor the various low-resolution Internet-video sources appear on this type of display. I think improving the appearance of low-resolution video on large high-resolution displays is the new frontier in video processing. This has been an area of significant focus for us.
You might want to incorporate other potential video-processing algorithms, such as edge sharpening, color enhancement, and 'block'- and 'mosquito'-compression-artifact suppression. How do you know when to draw the line on what you implement, both in an absolute quality and a quality-versus-implementation cost basis? Does this decision vary with your assumption about what types of content the end user will view?
|
The philosophy … 'first do no harm' is at the heart of all the video processing that my team develops. We call it quiet video processing. It is really important for a CE-video/display-electronics developer to always keep in mind the most important objective is to preserve the immersion of the viewer. I think of the content developers, the electronics developers, and the TV developers as having the shared goal of suspending reality for our viewer for a brief period of time—transporting the viewer to a magical universe where they don't need to worry about overdue bills or domineering bosses but can just flow with the story that unfolds and wake up refreshed and ready to go back to dealing with the painful realities of the world they live in. My job as the video processor is to make this make-believe world look as appealing as possible. This means applying the best possible improvements in each portion of the video-processing pipeline: from the quality of the analog-to-digital conversion for an analog signal; to the 3-D spatio-temporal AWGN [additive-white-gaussian-noise] reduction; to content-adaptive compression-artifact reduction; 3D deinterlacing with arbitrary cadence detection; and vector interpolation with ultrashallow edge handling, smooth nonlinear scaling, adaptive edge-enhancement, locally adaptive contrast enhancement, and intelligent color remapping. The combined effect of all of these functions needs to produce a visually appealing image that the viewer can become lost in. However if the processing produces sparkling images most of the time but occasionally breaks down with moiré, 'jaggies,' flicker, or other unnatural artifacts, the magical world is lost. The passing car with the dancing moiré on the radiator grill or the brick wall with the flickering lines abruptly reminds the viewer that the world is artificial. It needs to be a primary goal of the video-processing artist to prevent the introduction of such disruptive artifacts at all times. In other words, like the doctor, we want to apply our knowledge with the goal of healing the patient and helping him look and feel good at all times with the underlying principle that we should never make things worse. This is the philosophy we apply to any solution we offer for any CE product. The implementation itself is tuned to the application and type of product.
A fundamental tension in the semiconductor industry, which I encounter time and time again in a diversity of technology and product sectors, is the tug of war between what you do in hardware versus what you do in software. Hardware-implemented algorithms are arguably lower cost than software-housed alternatives, as well as lower power and higher performance, but they're also less flexible, and development time is longer both for the first iteration of the silicon and for subsequent spins to fix bugs and add features. Particularly popular DSPs and processors—x86 CPUs, for example—with their tremendous volume-cost learning advantages and leading-edge fabrication facilities, further blur the distinction between hardware and software. Your thoughts?
This is a basic question that plays out in every major market but whose answers are very predictable. Periodically, we have companies that insist on applying the answer that fits their strength or their preferred business model to every market they enter. But if their preferred answer is at odds with the most fundamental aspect of the market, they fail. The fundamental aspect I am referring to is the need for flexibility. In the case of applications that require flexibility as their most important attribute, the market is willing to pay a premium for a software solution, opening the door to CPU and DSP options. This [situation] is true for most connected products, in which the OEM wants or needs to retain the ability to upgrade the features of the device in the field. In the case of applications where the product ships and then is not touched again by its creator, except through routine warranty/service, the cost dynamics of any large consumer market prevail, and the product uses hardware solutions because they offer the lowest cost. The simplest example of this is the traditional TV. Programmable video processing has generally been unsuccessful in mainstream TV markets because it is always cheaper to use a well-designed ASIC that has enough register-based programmability of its fixed-function hardware to adjust the settings to get the desired effects, ship the product, and move on to the next one. On the other hand, for a network-connected DTV, it may make more sense to use a programmable solution, at least in the early stages of the market, so that the OEM can upgrade the set—for example, to support new codecs. The cell-phone market is another example where a CPU or DSP is required instead of a fixed function, because of the large and continuously increasing software value added. Bottom line: If one analyzes the target market objectively without being predisposed to a specific answer, it is usually pretty clear which approach is better.
Regardless of whether the video processing occurs in hardware, software, or—more likely—partitioned between both, another notable characteristic I've noted is its heavy reliance on real-life testing to supplement lab simulation. For example, FPGA-based prototyping of designs that will end up being implemented in ASICs is pervasive in the video-processing arena. Do you agree with my observations? If so, what are the factors driving the necessity for robust real-life testing before chip and system production, and do you see any transformation in this validation step in the future?
Your observation is correct. One of the most important reasons is that video processing is significantly different from image processing because of its temporal aspect. One mistake repeatedly made by image-processing experts who enter this field is that they assume they can simply carry everything over from the 2-D world to the 3-D world of video. In actuality, the temporal aspect dominates the effects of the processing. A simple example is image enhancement. There are some very effective edge-enhancement algorithms that can really make a still image pop. For example, one could pause a video at a scene with a highly textured wall and apply enhancement algorithms that really show off the details. But then when you play the video, you discover that the same enhancement causes flickering as the camera pans across the scene, and the negative effect of the flickering dominates the scene far more than the positive effect of the sharpness produced by the edge enhancement.
At the same time, increasing resolution and frame rates make it more expensive and time-consuming to build complete FPGA-prototyping systems. For example, it is very expensive and time-consuming to build an FPGA system to demonstrate 1080p 120-Hz video. In such cases, it may be more practical to use a combination of simulations running on specially configured PC systems, FPGA validation of specific portions of an algorithm, and instinct that comes from years of video experience to develop test silicon. Over the years, the number of times that I have been surprised by the results of an FPGA simulation has steadily declined.
Speaking of real-life testing, let's next talk about benchmarking. The Video2000 benchmark that you developed with now-Futuremark (formerly, MadOnion) was an innovation in the industry in cultivating end-user awareness of the need for high-quality video processing. I can't count the number of times I've seen that waving-flag-video-clip snapshot on product presentations and in demos by a diverse collection of IC and software suppliers. Tell us about the history of Video2000, about some of the Faroudja-proprietary follow-on benchmarks that you developed, how the video industry has used and abused benchmarks in general over the years, and any other thoughts you might have on this topic.
Ten years ago at S3, I led a team whose charter was to bring Faroudja cinema-quality video to the PC platform. When I realized that the PC platform and market at that time was not set up for video as a major value-added element, I went to Faroudja to drive the technology into the CE market. At Faroudja, we realized that a fundamental problem with introducing high-quality video was that most people—the target end consumer, the media that informed them, and, in many cases, our target OEM customers—did not have a clear metric for judging video quality. There was a common perception that video quality is entirely subjective and there is no common theme or baseline: This person likes one thing, and that person likes something entirely different. I understood that the key enabler for differentiation was the target audience's understanding of the baseline. It became clear to me that the industry needed an objective video benchmark to help drive video quality. Mark Farley, director of video Marketing at S3 and a former colleague of mine, felt the same way.
In early 1999, we decided to collaborate on the creation of a comprehensive video benchmark for the PC platform. The game plan was to help create a methodology and set of test patterns and then apply them in the CE market. FutureMark signed up to create the video benchmark, which the company named Video2000, as a complement to its popular 3-D-graphics benchmark. We needed to develop a robust and repeatable methodology for benchmarking video quality. So, I pulled in Jim Larimer, a world-renowned human-visual-systems expert who was a principal scientist at NASA, to help develop a sound approach to scoring. I had previously participated in various benchmarking committees that tried human-visual models and various measurement metrics, but the practical problem with all of these approaches was defining a repeatable measure for image quality. Finally, Larimer and I created a binary scoring system that used the human viewer (test taker) as the measurement instrument. The basic idea was to create test patterns that clearly showed various common artifacts and ask the viewer to select a one or a zero based on the absence or presence of the artifact—the presence of artifacts [was] zero, or bad, and the absence of artifacts [was] one, or good. So flicker, jaggies, and moiré are all bad, and the absence of them is all good. This [test] resulted in a very robust and repeatable benchmark, albeit one that was somewhat tedious to go through; it was analogous to spending a half-hour at the optometrist. I wrote descriptions of each test pattern, and the FutureMark programmers created them. After some iterations and comical errors, we ended up with a very nice suite of test patterns. We announced the benchmark at WinHEC, in 1999, and FutureMark, which had changed its name to MadOnion by then, formally launched it later that same year.
Once Video2000 was complete, I focused on creating a private set of benchmarks for the CE market, that leveraged the test patterns and ideas developed during the Video2000 activity. I wanted to create simple examples that even nontechnical consumers could relate to when the test pattern was explained to them. I spent months sitting in the Faroudja home theatre late at night going through the huge archives of materials that Yves had collected over the years. Sometimes, I would play the same clip over and over with different settings of the video deinterlacer. I was looking for natural material that would clearly show the absence or presence of each of the major deinterlacing artifacts, including line flicker, feathering, lack of inverse 3-to-2 pulldown, and jaggies. One of the clips was the US flag fluttering in the breeze in front of a Bank of America building, filmed by a Faroudja team member with a handheld camcorder. This sequence clearly stuck in my mind. I had already laid out a plan to launch my FLIxxx ICs with a new mass-market branding that would leverage the premium Faroudja brand without pulling it down. I thought of it as the 'Camry-with-the-Lexus-engine' approach. We created the DCDi-by-Faroudja branding with the flag sequence as the key benchmark to highlight the 'produces-smooth-and-natural-images' story. We ended up creating several test disks, each more refined that its predecessor, eliminating all but the clearest test patterns and always featuring the flag sequence. Knowledgeable folks like you then started using my benchmarks to compare various deinterlacing solutions.
Other video-IC companies in this space saw the success of DCDi branding and the proliferation of special test-pattern disks to demonstrate key features and started doing their own versions. They used the same approach I had adopted and worked with key influencers and reviewers to encourage them to adopt their test-pattern disks in their evaluation.
The use of test disks has become an industry trend. My opinion is that, in the early days, when we introduced the above-mentioned test patterns, they served a nonpartisan purpose by acting as an education tool that helped raise awareness across the industry. I think many of the test patterns being used today are simply benchmarking gamesmanship—just good marketing aiming to create artificial differentiators to separate out various deinterlacing approaches that are all pretty close in performance on typical consumer material. I believe there is a need for new test patterns/examples to show new types of video features, rather than jockeying to show off the next 1% improvements in standard-video features like deinterlacing and AWGN reduction using contrived examples.
DISPLAYS
Display technology is something that from our past conversations I know you've closely followed over the years. I'm going to toss out some topics for which industry controversy exists, and I'd welcome your thoughts in each case on how you see the current situation as well as its future evolution. First, what about direct-view displays: CRT versus LCD versus plasma, and what about SED (surface-conduction electron-emitter display)?
I think we all know about the rapid decline of CRTs. Comparing LCD to plasma from a technology-characteristic perspective has its merits. When people ask me for a recommendation, I always ask about how they plan to use it. For dark home-theater viewing, an emissive display that produces real black is preferred, while, in a bright setting, the high light output possible with transmissive displays is preferable. So, I recommend plasma for home-theater-like viewing and LCD for bright-room multiuse. Why not one of each? Both LCD and plasma continue to get better with each generation, with plasma having overcome its high-resolution weakness and achieved 1080p at 50-inch diagonal size and LCD having addressed its black-level and dynamic-contrast-ratio issues by using adaptive backlight dimming.
As far as predictions of success of SED or any other new display technologies are concerned, the main guidance I can offer is a simple rule based on display industry history: New display technologies are successful if, and only if, they enable a new application that is not well-served by the incumbents. LCD was able to enter the market by enabling new portable applications, the laptop, that were not well-served by the CRT, the 800-pound gorilla of the day. DLP [digital-light processing], by enabling a new class of application, the ultraportable front projector, that was not well-served by transmissive-light-valve technologies that offered wonderful image quality but not true portability. Plasma was able to enter the TV market by enabling large flat-panel displays for commercial and later consumer use that were larger than anything offered by the reigning TV technologies of CRT and LCD. In each case, enabling a new application gave the new technology a space to fund its development and grow in maturity before going on to face the incumbents in an existing market segment. On the other hand, new display technologies that have sought to enter a large existing market and compete head on by 'intercepting' it at some projected performance and cost point at some target future date have always failed, because the incumbent was always better and cheaper than projected. A good example of this is the FED [field-emission display], which attempted to intercept LCD to serve the laptop-display market but found that the LCD had gotten better and cheaper ahead of everyone's projections.
What about LED versus CCFL (cold-cathode-fluorescent-lamp) backlights, multicolor LED or CCFL backlights versus white, and more elaborate, white- or multicolor-LED-array approaches, such as those that Brightside (now Dolby) advocates?
I think LED has a very promising future because it produces clearly visible advantages, especially wide-color gamut, 2-D dimming for the best dynamic contrast ratio, and power efficiency. At the same time, there is a chicken-and-egg problem because LED backlights are currently much more expensive than CCFL backlights. I think this will change over the next three years as commoditization forces Tier 1 OEMs to adopt visibly better technologies, such as LED, and as increasing adoption and focus helps reduce the cost difference.
What about projection displays, such as CRT, DLP, LCD, and LCOS (liquid crystal on silicon)?
Clearly, CRT projection is declining and rapidly being replaced by DLP and LCOS. There are some very good DLP and LCOS RPTVs [rear-projection televisions] in the market.
What do you think of direct-view versus projection displays?
I think the consumer has voted with his wallet. Despite the fact that projection displays offer the best value proposition in cost per diagonal inch and that several manufacturers have launched some beautifully designed TVs, the large price reductions in LCD and plasma have limited the rate of adoption of projection displays. I think this trend will continue. In addition, only the United States and China have had significant adoption of RPTVs. In most other countries, the space requirements have led consumers to smaller screens and more compact flat-panel displays.
MEDIA DISTRIBUTION
Displays are meaningless without content to display on them. What do you think the potential is for widespread ascendancy of high-definition optical discs versus the current champion, DVD? What are the factors behind your opinion, and what are the influencing variables that may cause you to revise that opinion in the future?
I think good, crisp marketing of blue-laser HD media in conjunction with marketing of 1080p displays can start a self-reinforcing virtuous cycle with consumers. Blue laser got off to a rocky start last year with the format war and less-than-perfect product launches. But things look much more promising this year. I am already hearing that, in Europe, the promise of HD-optical media is a major driver for the adoption of 'HD-ready' TVs, since the road map for broadcast of HD is still fragmented and uncertain. My key rule of thumb for whether something will be successful in the consumer market is: Can the benefit be shown in a convincing manner to the average consumer? The answer depends on a number of things, both technical and nontechnical. For example, the sophistication level of the retail channels in the region makes a big difference in what features are sellable.
To what degree will ATSC, specifically high-definition ATSC, influence consumers' desires for other high-definition content? What is the likelihood that the two-years-out NTSC-to-ATSC conversion in the United States will occur as scheduled? I know you've just come back from India, where you observed some interesting trends. Do you care to share them? What other non-US trends have you noticed in your travels and other customer interactions that bear mentioning?
I believe HD ATSC clearly influences consumers' desires for other HD. I think we have all heard of consumers who see HD for the first time at home when they finally have their DTVs properly connected and are so hooked on it that they never want to watch any SD channels again. At the same time, there are some caveats here. In my opinion, large 1080p direct-view TVs show most digitally compressed content in an unflattering light. Most material that I have seen from broadcast-HD shows significant block- and mosquito-noise artifacts when viewed on a large, bright 1080p display, such as a 46-inch LCD TV. So the main driver for such displays will be blue-laser content where it is possible to clearly show the large increase in resolution and quality of the 1080p material on the disk as compared to the 480p material on the DVD version.
Regarding non-US trends, the one thing that has been clear is that the biggest transition happening in TV worldwide now is not a transition from analog to digital transmission but a transition from analog-CRT displays to digital LCDs. Unlike the United States with its FCC mandate relentlessly driving the entire TV market to integrated HDTV, in most parts of the world I have been to, the primary transition is to HD-ready TV. The uncertain economics of HD broadcast (since advertisers have not indicated a willingness to pay more for an HD version of their ad versus an SD version) leads to a slow path on the broadcast/transmission side. But consumers are buying HD-ready LCD and Plasma TVs in fast-growing numbers worldwide.
What are your thoughts on Blu-ray versus HD DVD? Do you think there's any significant near- or long-term potential for an advanced video codec partnered with multilayer red-laser media?
I am very bullish on the success of blue-laser media, so the answer to the question of potential for a new codec with red laser is 'no.' I think this year will see the start of a positive self-reinforcing cycle between 1080p displays and blue-laser players. The launch of LG's first Super Multi-Blue player creates a simple solution for any consumers who are confused about the selection of BD [Blu-ray Disc] versus HD-DVD. At the same time, HD-DVD players are already cheap, and there are indications that Blu-ray players may have significant cost declines this year. So, I believe this will be a good year for blue laser platforms with a great one to come next.
Speaking of video codecs, how do you handicap the competition between H.264 (also known as MPEG-4 AVC or MPEG-4 Part 10), VC-1 (also known as Windows Media Video 9), and high-definition MPEG-2? What other notable video codecs are on your radar screen?
What I have seen is consistent with what has been reported many times over the years, namely that H.264 and VC-1 provide a significant advantage over MPEG-2 at lower resolutions, lower bit-rates, or both—for example, a two- to three-times lower bit rate at the same image quality for SD content at 2 Mbps or lower, but the advantage gets smaller as the resolution and bit rate increase. In applications, this [fact] means that H.264 and VC-1 appear to have considerable advantage for media streaming and storage at SD and HD, where the desired bit-rates are usually less than 10 Mbps for HD and 4 Mbps for SD. On the other hand, when we go toward higher bit rates, such as the 45 Mbps maximum allowed by Blu-ray or HD-DVD, the advantage gets smaller and eventually becomes almost insignificant: 10 to 15%. For example, compare the bit rates of the Blu-ray and HD-DVD versions of the movie Mission Impossible III, which are considered to be approximately equal in image quality. I believe the Blu-ray version of the movie has been encoded using MPEG-2 at a bit-rate of approximately 19 to 20 Mbps, while the HD-DVD version has been encoded using VC-1 at a video bit-rate of approximately 17Mbps.
In the case of Internet content, there are other codecs, such as On2, which is used in flash. Its hard to compare image quality because the material encoded using these codecs is usually amateur content, so it's not fair to compare it to professionally created content, such as a movie trailer, that is usually encoded with H.264 or VC-1. In general, since the Internet is such a dynamic medium and the PC is so flexible, it's likely that we will continue to see new codecs being used, especially for user-created content. However, in the case of fixed media, such as broadcast and packaged optical media, I do not anticipate any changes in codecs beyond the triumvirate for the next several years.
Some prognosticators claim that the mind- and market-share battle between Blu-ray and HD DVD is irrelevant—that purchased or rented Internet-downloadable movies will obsolete them both before they have the chance to establish any tangible long-term beachhead. What do you think?
I don't think these [technologies] will be mutually exclusive. A lot of money is made by efficiently managing the rollout of professionally created entertainment content—starting with the theatrical release, moving to the hotel/cable pay-per-view release, then the airline release, then the release to DVD, and finally the broadcast release. I believe these windows will continue to be used to drive sales and rentals of the most popular content in the form of packaged optical media for the growing audience with great home-theater setups. I believe the use of carefully created release windows will continue to drive sales and rentals of premium content in the form of packaged optical media for the growing audience with great HD home-theater systems. Also, from a practical point of view, it will be a while before one can download a high-definition new release fast enough to compete with the practice of simply going to the nearby video store and renting a copy. One possible demarcation is that we may see Internet delivery used for easy access to a lot of niche and older content. I also think that, for a while, Internet downloaded/streamed commercial content will be primarily SD for low-cost, responsiveness, and robustness. So, convenience and choice will be the strong suit for Internet-delivered content, and image quality and latest/premium content will be the strong suit for blue laser.
Today's dominant standard- and high-definition-broadcast-video-delivery approaches are over the air, cable, and satellite. The emerging IPTV alternative is well-established in specific markets, such as with SureWest in the Sacramento, CA, area, with the beginning stages of nationwide rollouts under way by AT&T and Verizon. Fiber, depending on the degree to which it's implemented—to the curb or to the premises—requires tremendous infrastructure investment but also delivers tremendous bandwidth for both video and ancillary services. To what degree do you believe IPTV will succeed as an alternative video-distribution vehicle, and to what degree does your response depend on how and how much IPTV's established competitors respond?
Like many others, I believe that in the long-term IP delivery will be the primary mode for all content: data, voice, and video. It is simply the most efficient mechanism for allocation of channel bandwidth. However, we all know that for a very long time, the hype has exceeded the deployment. But the deployment is very real, and, in my opinion, it is only a matter of time before this is the preferred mode of delivery over all types of physical medium, whether it is over fiber or ether. Of course, it could take a very long time for the deployment to be pervasive since there are many complex factors that need to be worked out, including regulation, infrastructure, and cost. Bottom line: I think the deployment will be fast in some regions of the world and very slow in others.
For several years now, Microsoft has been promoting its Media Center Edition variant of the Windows XP operating system, whose functions the company now builds directly into several Windows Vista versions, as both a way to employ multimedia content that's played elsewhere in the home and as a direct-playback node in, for example, college dormitories, apartments, and living rooms. The company recently broadened its reach into the home by unveiling the Windows Home Server at CES (Consumer Electronics Show), currently in beta and due out this year. As you survey various potential uses for your video processing ICs, I suspect that the two dominant usage models vying for consumers' wallets: stand-alone CE-playback devices and the PC-centric model that encompasses comparatively 'dumb' networked-CE playback devices. Where are you placing your bets and why?
I believe strongly that the world is moving toward seamlessly networked media devices in the home, in the car, and everywhere we go: what we at Marvell call the connected digital lifestyle. I don't see this [scenario] as a PC-centric or non-PC-centric issue. I think the CE market will transition to connected media devices but with radically different rates of transition in different geographical regions. These connected devices will use Wi-Fi, embedded CPUs, and operating systems and will run rich software applications. The PC will be one of these connected devices, but I don't see it necessarily as the sole server or master in the system. I expect there to be a variety of intelligent connected devices, some with large storage for collection and serving and others with small amounts of storage for lower cost and portability. This is an old vision articulated by many people and companies for many years, but it is finally starting to happen, thanks primarily to dramatic reductions in semiconductor cost and improvements in wireless bandwidth.
MOBILE VIDEO
The Nokia N800 tablet measures 5.67×2.95×0.51 in. and has a crisp, 800×480-pixel display that spans nearly the entire front of the unit. We all know by now about Apple's iPhone. The latest generation of Pocket PCs also touts VGA LCDs, albeit not wide screen, and both cell phones and portable media players, such as Microsoft's Zune, are also inevitably going the high-resolution route. Yet, especially in large portions of the United States and other areas of the world lacking widespread mass transit, the fundamental usage model is unclear to me. Unlike with a dedicated audio player, you can't use one of these devices while you're multitasking—walking or exercising, for example, and most definitely not while driving. And the small screens, no matter how crisp, seem incompatible with aging populations' dimming eyesight. What's your take on the potential for mobile-video success, and is it to some degree geography-specific? In Asia, for example, mass transit is pervasive, and various cultural factors also encourage isolated multimedia-content consumption.
I agree that the popularity of mobile video is very strongly connected to geography and culture. Almost all of us who work in urban areas have 'lost time' that is spent commuting. Clearly this lost time can't be used to watch video in a region like Silicon Valley where you have one person per car. On the other hand, it is an interesting option for people using mass transit in dense-population areas like Tokyo.
However, I think the real story with mobile video is the portability of your personal video library. As you know, the iPod and other MP3 players opened the door for consumers to carry their entire audio libraries with them so that they could listen to their preferred content anytime and anywhere. The introduction of a variety of audio docks created a rich and complete audio ecosystem, enabling the consumer to listen to favorite music at home through a high-performance AV system, in a bedroom or dorm room through a small speaker system, or anywhere on the road through headphones.
Hard drives continue to increase in density and decrease in cost. One can already buy PMPs [portable media players] that have 80 to 160 Gbytes of storage. If you assume your content is in SD resolution (640×480 pixels) encoded in H.264 at 1.5 Mbps, as is typical of iTunes movie content, you can store 100 hours on an 80-Gbyte video iPod and 200 hours on a 160-Gbyte PMP. In the likely case of a mix of QVGA and VGA content, one could store even more than that. The implication of so much storage in such a convenient form factor at such a low cost is that a consumer could carry his video library with him at all times, just as he does today with his audio library. Why would he do so? Certainly not just to watch video on a tiny screen on the road. He might, however, if there were an easy way of watching the content anywhere, including on a TV in his home, in a hotel room, or at a friend's house. To do this, he needs a new class of dock: an HD-video version of the traditional audio-PMP dock.
This [type of dock] is exactly what we demonstrated at CES with our lead partner, Meridian, a very-high-end audio-video company. Click here for a picture of this dock. At CES in January, we launched the 88DE2710 adaptive digital-video-format converter that can upconvert all types of video and graphics content from QVGA to 1080p. As mentioned earlier, one of the key focus areas for us is to make low-resolution video look good on a large, high-resolution display. At the same show, Meridian demonstrated a prototype of an iPod HD dock that used our 88DE2710 to upconvert iTunes video content from the iPod to 1080p displayed on a Sharp 46-inch 1080p LCD TV. This concept and demo were very favorably received, and we have been working closely with a number of manufacturers that want to bring this device to market this year.
To summarize, I think a device like the HD dock, makes 'mobile video' into a much bigger and all-encompassing category, with mobile consumption or viewing only being a small piece.
WHAT'S NEXT?
Marvell had a diverse product line when it launched its new video-processing IC family at CES. For example, the company offered wireless networking, storage ICs, and the ARM-based CPU line it acquired from Intel and recently expanded. Is it to much of a stretch to assume that Marvell's future plans encompass combining your imaging expertise with other building blocks in the company's portfolio to create new devices for particular markets and applications? What can you share about Marvell's integration vision?
Marvell's diverse technology and product portfolio enables our platform approach for these new product categories, which can encompass a variety of Marvell silicon. A great example is the Seagate DAVE [digital-audio-video-experience] microdrive targeted at the cell-phone market. DAVE has both a Marvell applications processor and Marvell Wi-Fi.
From the video perspective, as a company, we are working on the entire visual pipeline: from image acquisition to preprocessing, compression and decompression, storage, transfer between wired or wireless devices, postprocessing, and display. As you know, various CE devices require some or all pieces of this pipeline. The fully featured cell phone is one example of a CE platform that could use this entire pipeline.





















