Looking for quantitative measures of video quality: ITU J.144 and beyond in silicon
As the growth in video sources, video storage media, and video display devices continues, we are rapidly replicating the Tower of Babel. Video data formats, compressed data formats, transport containers, decompression algorithms, and—maybe most critically—error recovery algorithms are all appearing like fruit flies above an over-ripe peach on a hot August afternoon.
One of the results of this proliferation is that the specifications for boxes that have to sit in the midst of the delivery channel from content creation to production to head-end to consumer are beginning to sound like the specs for a Star Trek universal translator, or perhaps more accurately, like the description of Douglas Adams’s Babel Fish. All you have to do is accept packets of containerized video in whatever format, somehow get at the video, convert it into whatever format the client requires, get it back into whatever kind of container the client expects, in whatever kind of packets the network demands, and send it along without adding latency or damaging the original data.
But skeptics point out that designers are more likely to run across a healthy Babel Fish than they are to meet this requirement. Not only is the sheer number of combinations daunting, but algorithms are not all lossless, CoDecs have bugs, storage and transmission errors happen, and error-recovery routines are not always ideal. It is becoming extraordinarily difficult for a designer at any point in the delivery system to estimate the quality of the video that the ultimate consumer will see.
This problem is further compounded by the lack of a quantitative meaning for that word "quality." The International Telecommunications Union has offered us J.144 as an algorithm for measuring video image quality, but that has not stopped the angst in the industry over this part of the problem. (For a somewhat dated review of the situation, see, for instance, here.) J.144 is a standard for comparing a compressed video against a reference copy of the video stream—useful at the source, but useless in the living room. Hence J.144 has become only another tool that someone must implement, not a complete solution or a golden metric.
But difficulty won’t make the problem go away. The product that everyone, from the high-school student newly-armed with an HDTV camcorder to a cable system operator to the phone company, wants to deliver is a quality viewing experience. Part of that experience is content—I won’t go there. But a big part of it is image quality. Otherwise there would be almost no market for large-screen 1080p monitors. But how do you deliver what you can’t measure?
The problem is sufficiently wide-spread that it is attracting the attention of experienced video systems developers. One such, Indian design house eInfochips, has recently completed design of a J.144-based automated, continuous quality-measuring system intended to sit on the output of a network head end. This rather elaborate embedded video processor can’t tell a cable operator what the customer is receiving, but it can at least estimate the quality of the compressed signal the operator is transmitting.
From there, the problem breaks into two separate issues, according to eInfochips president and CEO Pratul Shroff. First, algorithmic decisions can result in an intentional loss in image quality, the severity of which will depend on the type of content, the viewer’s attentiveness, and what all has happened to the data since it left the compression algorithm. Second, just plain old bugs—anywhere along the delivery channel—can introduce errors into the video stream. So can unrepairable bit errors, and so can conscious choices in error-response algorithms.
These two types of problems are rather different, but both contribute to the degradation of the viewer’s experience, whether it be a peculiar border on the top of the screen, loss of audio sync, funny transient macroblock artifacts trailing out behind a fast-moving image, or complete break-up of the frame.
Since there is no reference copy of the original video stream in a customer’s living room—let alone on his iPhone—the industry can’t trust J.144 to evaluate all these kinds of degradations. The only judge at the user’s end of the channel is a human pair of golden eyes. And this, of course, has to be a sampling measurement, as even large equipment vendors only have a few human master video-viewers they trust. Shipping the golden eyes around to living rooms is pretty much out of the question. Yet recognizing errors and artifacts in a video image without reference to the source would appear to require something close to a human expectation of what the content should look like.
So here is a design opportunity. How does one implement an automated, quantitative measure of video quality within the confines of the terminal display unit? An IP block that can identify, classify, and quantify individual degradations in a video viewing experience and feed that information back into the channel could be a very valuable tool in delivering what the end customer is paying for—a good experience. That means revenue for everyone, not least the IP developer. And that might be a good enough value proposition to convince the system operators to even dedicate a tiny slice of bandwidth to transmitting metadescriptions that could aid an evaluation algorithm in understanding the intent of the video stream. It seems worth exploring.
Nurbumbmithum commented:
j commented:
VideoPro commented:















