Subscribe to EDN
RSS
Reprints/License
Print
Email

EDN hands-on story: Video characterization creates hands-on headaches

How well do video codecs decrease bit rates and still preserve quality? And can you ascertain their credentials without recruiting an army of test subjects? The answer: a qualified "yes." Read on for the sordid details in this issue and in EDN's August 8 edition.

By Brian Dipert, Technical Editor -- EDN, July 25, 2002

AT A GLANCE
  • JNDmetrix provides an intriguing alternative to traditional video-quality-testing options, but its intolerance for discarded frames and altered frame resolutions mean it's not a panacea.

  • File-format incompatibilities lead to frustration and, sometimes, to transcoding artifacts.

  • Depending on the video content you encode and your prioritization of quality, file size, and encoding speed, your interpretation of this study's results may significantly differ from that of your peers.

  • Click here for the Web addendum to this article.

Sidebars:
Specs and settings

A Web-only addendum to the following article contains reference clips, software settings, test results, test clips, other video-compression studies, software, and other useful links. Click here to access the article addendum.

Based on your enthusiastic response to last year's two-part audio-compression hands-on project, I have extended my foray into multimedia, now focusing on video (references 1 and 2). Whereas last year's articles focused exclusively on determining how the codecs work, this two-part article expands the focus to how well the codecs work. But I don't necessarily want to follow in the footsteps of my predecessors in codec analysis, whose methods exhibit some "blind spots" and provide no baseline process, data, and results that you can build on.

Do a Web search for video-compression studies, and you most often come across frame-versus-frame snapshots comparing various codecs, usually based on a single video sequence, such as an action-filled scene from The Matrix. Unfortunately, studies such as these often provide too little information for you to draw any definitive conclusion about results on that sequence, much less extrapolate those results to the entire film or other types of film- and video-source material, which might better match the clips you want to compress. For all you know, you might be comparing a key picture from one codec's bit stream with a delta picture from another: an apples-to-oranges assessment (Reference 3).

If the tester "rips" a movie from a DVD for use as the reference clip, that clip has already undergone one generation of lossy MPEG-2 encoding and consequently contains compression artifacts that unduly complicate a codec's job. DVD's interlaced video and the artifacts the tester creates in improperly deinterlacing it also may not correlate to the film- or progressive-scan video-captured source you likely are using in your work (Reference 4). These studies rarely provide you with detailed listings of the encoder settings the tester used; those settings can significantly influence the results.

Image quality is also rarely the only parameter of concern. Unfortunately, many of these studies provide no detailed information on the computer hardware and operating system, on the compression- and decompression-software versions, on the time that encoding and decoding takes, or on the resultant compressed-file-storage and -transmission sizes and the amount of scratchpad and buffer memory you need to decode them. What good is a codec that produces outstanding images but requires a supercomputer and a week to compress a 1-minute clip or one that a 2-GHz Pentium 4 processor can decode only as fast as 5 frames/sec? Also how good are those that have bloated storage footprints, that streaming-delivery viewers need a T1 line to watch, or that require 1 Gbyte of buffer DRAM at the receiving end?

Video moving past your eyes at 30 frames/sec can hide a lot of artifact "sins" that still-image analysis unnecessarily magnifies—a fact that elaborate subjective studies attempt to comprehend. These studies also strive to encompass both genders, broad age ranges, diverse ethnic backgrounds, and other examples of societal variety within their testing groups. Tests such as these, unfortunately, are expensive and time-consuming. They are also uncontrolled and unreliable unless experts trained in psychophysics perform them and base them on international standards, such as ITU-R Recommendation BT-500-10 "Methodology for the subjective assessment of the quality of television pictures." Without adequate pretraining, just as with audio subjective testing, test subjects may not accurately discern the differences between clips (Reference 5). And if the testing sessions take too long, fatigue may cloud viewers' judgments.

Subjective test results also depend on numerous factors besides the video material and the viewers. Such factors include the viewing environment's ambient lighting conditions; the displays' type, brightness, color gamut and depth, and distance from viewers; and whether the decoding hardware connects to the display via lossless RGB, artifact-ridden composite video, or some other interface with quality between these two extremes (Reference 6). You also have to consider whether a codec's postdecode processing tweaks the image colors in a manner that, although inaccurate, is more pleasing to the test culture's eyes and brains or whether it applies "blur" filters or frame-and-pixel interpolation to patch the degradation that has occurred during compression or transmission.

PSNR (peak signal-to-noise-ratio) testing, also known as MSE (mean-square-error) testing is on the opposite end of the objective-versus-subjective spectrum from the study groups. In this case, computer-based analysis undertakes pixel-by-pixel and frame-by-frame comparisons of a reference clip with a test clip, data-logging the luminance and chrominance deviations in the test clip. You can quickly perform inexpensive and repeatable PSNR studies. Unfortunately, they don't often correlate to how the human visual system detects and responds to image inaccuracies.

Whether the eyes and brain judge an error as significant depends on numerous factors, such as where the error resides in the image, both absolutely and relative to where the eyes' attention is at that point in time; what portions of the color spectrum the error's pixels represent; what percentage of the total frame it consumes; and how many consecutive frames it corrupts. PSNR analysis may judge as significant an error that the human eye barely registers, if at all. Conversely, a minor PSNR aberration may be unacceptable in practice (Figure 1, see complete-article PDF for all figures referenced in this article). For example, large amounts of noise may be nearly invisible in areas of high detail and texture, such as within grassy fields, whereas even subtle "blocky" artifacts are visible in flat areas, such as the sky.

Another significant shortcoming of PSNR analysis is that it requires the reference and test clips to have identical numbers of frames, in the same order, and with identical resolutions. Discarding and reducing the resolution of frames are two common techniques that lossy encoders use to shoehorn a tolerable-quality video presentation into a constrained storage and transmission environment. And those trade-offs are valid: People who are comfortable with the images that VHS tape delivers will accept a full-screen-interpolated CIF streaming-video presentation, and higher-than-20-frame/sec frame rates are necessary only for material containing objects moving at high speeds. Any frame mismatch between the reference and the test clips adversely impacts PSNR results, however.

I first became aware of Sarnoff Corp's JNDmetrix technology during last year's video-deinterlacing hands-on project (Reference 7). Although that study didn't end up incorporating test-clip files that JNDmetrix could access, I remained intrigued by the theory behind JNDmetrix and hoped to incorporate it into future work. This project is an opportunity to fulfill that goal. JNDmetrix, like PSNR analysis, is computer-based. But like subjective analysis, it factors in human-visual-system characteristics when determining analysis results. Quoting from the vendor's documentation:

"The Sarnoff JNDmetrix-IQ visual quality analysis software objectively measures the degree to which the human eye can see distortion in an image or video sequence. This tool computationally measures the perceptibility of differences between two video sequences, producing measurements that accurately reflect human subjective assessments of video fidelity. Based on rigorous scientifically controlled experiments measuring video fidelity in terms of the just noticeable difference (JND), the JND-metrix-IQ software supplies a reliable, objective way to assess video fidelity.

"The JNDmetrix-IQ analysis software takes as input a pair of images or video sequences to be compared, the original (reference) and a processed (test) version. It outputs measurements quantifying the visible differences between the pair. These measurements quantitatively characterize the perceptibility and spatio-temporal distribution of distortion in the processed video. The most basic unit of measurement used by the JNDmetrix-IQ tool is the just noticeable difference (JND). The JND is a long-established unit for measuring and modeling sensory systems (first used by psychophysicists in the latter half of the 19th century). A JND value of 1 indicates the threshold amount of difference required between two stimuli in order for 75% of subjective observers to be able to just barely notice that difference.

"JNDmetrix-IQ allows for a wide variety of input image and video file formats, and output measurement modes. It also provides parameters to define the characteristics of the input sequences (e.g., field/frame structure and video rate) and to specify the viewing conditions and display characteristics. JNDmetrix-IQ generates image/video distortion measures at the pixel, field, frame, and sequence level (field-level distortion measures are computed for interlaced image/video, frame distortion measures are computed for progressive image/video data). Distortion measures can be generated separately for luminance and chrominance, at pixel, field and frame resolutions. The luminance pixel-level JND distortion values can be further broken down into seven separate spatio-temporal frequency channel measurements."

JNDmetrix delivers many of the combined advantages of both subjective and objective—that is, PSNR—testing. Unfortunately, it also retains a few of the PSNR-analysis method's shortcomings and, as such, isn't a panacea to the compression-analysis problem. For example, the reference and test clips must be of the same resolution and same average and peak frame rates. Also, you can't intermingle interlaced- and progressive-scan clips. These restrictions and JNDmetrix's limited file-format support complicated the project.

I used JNDmetrix to indirectly compare one codec's set of test clips with the test clips of another codec by evaluating both sets against the same reference clips. As the JND abbreviation suggests, there must be a "just noticeable difference" between the test clips. Overcompression of the reference clips results in JND numbers that allow for no differentiation, reflecting quality degradation that, although different for each codec, is equally "bad."

In other words, "Perceptual analysis of clearly flawed video does not tell you much," says JNDmetrix product manager Doug Dixon. "A big issue with low bit rates is when codecs knowingly make 'bad' video, particularly by dropping some frames, with the knowledge that the player will 'fix' it up and hide the damage, for example by interpolating missing frames or smoothing over blocky artifacts," he says. "Frame-by-frame JND comparison doesn't work in these cases, since some frames are missing or deliberately lower quality and show up as spikes off the chart. The key is to test the codecs with just enough bandwidth to do a reasonable job but not starve them so they break down. You don't need a JND score to distinguish between jumpy and blocky video streams; they're all unacceptable."

Speaking of reference clips, what should I use? In April's video-compression-theory article, I confessed that I hadn't yet plugged my Panasonic PV-DV101 MiniDV camcorder into my computer's ADS Technologies FireWire card. I've subsequently and successfully tackled that challenge, so I could create my own clips. But I'm no expert on what's easy and difficult for compression algorithms or typical and atypical in video content. And, because the DV format stores images on tape using Motion JPEG-like encoding, I'd be subjecting the codecs to compression-artifact unrealism similar to the earlier-described ripping-from-DVD testing scenario.

Instead, I relied on industry-standard testing clips from the University of Hannover, which MPEG-4's developers used, and from the VQEG (Video Quality Experts Group, Table 1). My target encoding parameters for the Hannover clips retained the originals' CIF resolution and 30-frame/sec playback rate and targeted a 300-kbps average bit rate and CBR (constant-bit-rate) compression (Figure 2, see complete-article PDF for all figures referenced in this article). I intended to replicate a typical broadband streaming-video scenario. ETesting Labs used the same Hannover clips and the same YUVTOAVI conversion utility for its subjective study but targeted a more aggressive 250-kbps encoded bit rate (Reference 8).

I maintained the VQEG clips' original D1 resolutions, which are 720×576 pixels for PAL, 720×486 pixels for NTSC, and respective 25- and 30-frame/sec PAL and NTSC playback rates and compressed them to an 800-kbps bit rate with VBR (variable-bit-rate) encoding (Figure 3, see complete-article PDF for all figures referenced in this article). Here, my target usage scenarios were progressive-download, full-download-and-play, and play-from-mass-storage—that is, CD or DVD media—environments. Comparing the encoding targets, note that D1's per-frame resolution is four times that of CIF's, but the allowable encoded bit rate is only 2.7 times larger.

Working in the codecs' favor in both cases is the fact that none of the reference clips have an audio track. The codecs can devote the entire 300- or 800-kbps bit rate (minus a few percentage points for "wrapper" overhead) to image compression. This reference-clip quirk harks back to the chips' original use in subjective studies. Test subjects viewing audio-plus-video clips often pick the better-sounding candidate, even if it has subpar image quality.

I also intended to replicate a narrowband-streaming scenario: QCIF resolution, a 15-frame/sec frame rate, and an approximately 34-kbps bit rate. But this scenario would have required me to transform the original reference clips' rate and resolution. Feedback from industry experts predicted that the resulting artifacts would have been so pervasive as to, echoing Dixon, "not tell you much." And, in spite of RealNetworks' recent demos, I remain unenthused about the practicality of narrowband video streaming.

The original clips are in a raw YUV format, common in research settings but not directly importable to mainstream video applications, so I converted them to the more ubiquitous AVI format. After much Internet searching, I found a copy of the YUVTOAVI utility that Microsoft bundles with its MPEG-4 tool kit for converting the 4:2:0 component-video Hannover clips. The VQEG clips, on the other hand, came in a 4:2:2 component-video format. After even more searching, I came across the YUV2AVI utility developed by University of Surrey Research Fellow Stewart Worrall. Worrall's program converts component video to RGB video during the YUV-to-AVI transcoding process. Worrall also provided me with a utility to convert the VQEG clips, which store YUV data in a pixel-by-pixel fashion, into the plane-by-plane format that YUV2AVI expects.

JNDmetrix requires that both the reference and the test clip be either field- or frame-based, and most of the codecs I was testing would convert any input video to progressive. Therefore, I needed to deinterlace the VQEG AVIs; the Hannover clips are already in a progressive-scan format. I hoped to use Discreet's Cleaner, which offers high-quality adaptive deinterlacing, to accomplish this task. But Worrall's YUV2AVI produced AVI files that Cleaner and several other video-playback programs didn't accept. (He has subsequently fixed this bug and others my testing uncovered.) Instead, I used Adobe Premiere 6, which did import the AVIs but has less advanced deinterlacing capability. This nonoptimal deinterlacing process unfortunately left artifacts that complicated the codecs' efforts.

Jumping ahead in the process, I also performed some postcompression transformation of the Windows Media Video clips. JNDmetrix neither directly imports the WMV (Windows Media Video) format nor straightforwardly handles many other proprietary file wrappers. So, I'd have to convert the WMVs back to uncompressed AVIs before JNDmetrix could analyze them. Fortunately, Sonic Foundry's Vegas Video 3 does import WMV and exports AVI. Vegas Video is generally a good program, but its default 32-bit AVI output is incompatible with JNDmetrix; I had to disable alpha-channel creation. Also, when decoding WMV, Vegas Video 3 first output two copies of the first frame and then decoded normally up to the next-to-last frame, which it repeated numerous times. Finally, it output two copies of the last frame. Figuring out this bizarre—but, fortunately, consistent—decode sequence required many hours of painstaking analysis. Fortunately, JNDmetrix enabled me both to specify starting frame numbers, which can be different, for the reference and test clips and to specify the number of frames to inspect. So, by disregarding the last frame I still got meaningful WMV-analysis results.

Before compressing the reference AVIs, I defragmented the large hard-disk drive I'd bought for this project (see sidebar "Specs and settings"). I also terminated all background-running applications via the Windows 98 Task Manager to gain an accurate measure of encoding speed. I compressed the reference AVI clips to WMV8 (Windows Media Video Version 8) format using Microsoft's command-line encoder, a free download from the vendor's Web site. Aside from specifying a target bit rate and the encoding method (CBR or VBR and one- or two-pass encoding), I initially left all other settings at default. Inspecting the resultant report files, I noticed that the encoder had dropped frames from the challenging "funfair_cif" clip to achieve the default "75" quality setting at the target 300-kbps bit rate. To eliminate dropped frames, I re-encoded this clip at a lower "45" quality setting that Microsoft recommended.

After completing the WMV encoding, I turned my attention to MPEG-2, using Ligos' LSX-MPEG Version 2 plug-in for Adobe Premiere. Unfortunately, I could only encode the 720×576-pixel PAL clips; the vertical dimension of the 720×486 NTSC clips is not divisible by 16, so LSX-MPEG rejected them. I could have cropped the NTSC clips to a 720×480-pixel resolution, but then their frame size would be incompatible with both the reference clips and the WMV test clips, leading to bogus JNDmetrix errors. I initially encoded the PAL clips to MPEG-2 4:2:2 Profile, Main Level but then realized not only that most MPEG-2 decoders would be unable to handle them, but also that the format I chose was inconsistent with today's most common MPEG-2 applications: DVD (Main Profile, Main Level) and DTV (Main Profile, High Level). I therefore re-encoded to Main Profile, Main Level, consistent with the clips' D1 resolution.

When I began using the Ligos software, I discovered that it currently supports only MPEG-4 Simple Profile. (Advanced Simple Profile development is in progress.) Simple Profile bit streams are intended for processing-deficient decoding applications, such as power-stingy cellular phones and, in exchange, deliver lower image quality than Advanced Simple Profile (Reference 9). Simple Profile, for example, doesn't support B-frames, global motion compensation, or quarter-pixel motion estimation. Simple Profile also doesn't support frame sizes larger than CIF, so I couldn't use it with my D1 resolution clips. Therefore, the results of this study do not generally indict MPEG-4, especially because Advanced Simple Profile will likely be the format of choice for 300-kbps-video streaming.

For analysis, Sarnoff offers both a GUI-driven JNDmetrix Analyzer program and a command-line-driven JNDbatch utility. Parameter passing to JNDmetrix can occur both via an initialization file and via settings on the command line. I grappled with bugs in the program—or, perhaps, in its documentation—dealing with command-line syntax and incorrect parameter passing from the initialization file to JNDbatch. These bugs stymied me until I found combinations of options that would produce the correct result. I also found that, unless I maximized available system resources by terminating all background-running applications before starting a JNDbatch command-line session, the program would crash at random. (I highly recommend a more robust operating system than the DOS-based Windows 9x or ME, such as Windows 2000, for your analysis work.)

Because JNDmetrix doesn't directly accept the WMV file extension, I decoded the WMV files to AVIs in Vegas Video before comparing them with the reference AVIs. In contrast, JNDmetrix's heritage as an MPEG-2 tool made me confident that I'd be able to directly feed it the MPEG-2 output of LSX-MPEG. The process ended up being more difficult than I thought. JNDmetrix initially refused to parse the MPEG-2 bit streams that LSX-MPEG created, although numerous video-playback programs on my system, including Windows Media Player and CyberLink PowerDVD, easily decoded and displayed them.

My initial work-around was an unsupported back-door mode in the Ligos decoder, which enabled me to pull the MPEG-2 clips back into Adobe Premiere and transcode them into uncompressed AVIs, as I'd done with the WMVs. However, I later found that, for some baffling reason, this MPEG-2-to-AVI transcoding severely degraded the image quality. Eventually, I discovered that JNDmetrix would import all of MPEG-2 clips if, in the background, I also had one of the clips open in Windows Media Player. I suspect that a conflict between the LSX-MPEG and PowerDVD MPEG-2 DirectShow filters on my system is behind JNDmetrix's difficulty with the MPEG-2 files.

JNDmetrix also doesn't yet directly accept the MPEG-4 clips, although I can play them back in both Windows Media Player and DivXNetworks' Player. Ligos and Sarnoff hope to fix this and other incompatibilities by the time you read this article. Fortunately, Apple Computer recently released the Public Preview version of QuickTime 6, which also accepts the MPEG-4 clips. After getting a green light from Ligos, I transcoded the MP4s to uncompressed AVIs in QuickTime and then ran them through JNDbatch to get the numbers in Table 1 in the second part of this article (August 8) and in the Web addendum.




Other companies and organizations mentioned in this article
ADS Technologies
www.adstech.com
Apple Computer
www.apple.com
Canalweb
www.canalweb.net
Canopus
www.canopuscorp.com
CyberLink
www.gocyberlink.com
Dicas
www.dicas.de
Discreet
www.discreet.com
DivXNetworks
www.divxnetworks.com
Envivio
www.envivio.com
IBM
www.ibm.com
Intel
www.intel.com
Internet Streaming Media Alliance
www.isma.tv
Interra
www.interra.tv
MPEG-4 Industry Forum
www.m4if.org
NEC
www.nec.com
On2 Technologies
www.on2.com
Panasonic
www.panasonic.com
Princeton Graphics
www.prgr.com
RealNetworks
www.realnetworks.com
Seagate
www.seagate.com
SGI
www.sgi.com
Sorenson Media
www.sorenson.com
TechSmith
www.techsmith.com
University of Hannover
www.uni-hannover.de
University of Surrey (Stewart Worrall)
www.ee.surrey.ac.uk/Personal/S.Worrall/
Video Quality Experts Group
www.vqeg.org
http://ftp.crc.ca/test/pub/crc/vqeg/
http://media.xiph.org/vqeg/
ViewSonic
www.viewsonic.com





References
  1. Dipert, Brian, "Digital audio gets an audition: part 1, lossless compression," EDN, Jan 4, 2001, pg 48.

  2. Dipert, Brian, "Digital audio gets an audition: part 2, lossy compression," EDN, Jan 18, 2001, pg 87.

  3. Dipert, Brian, "Video compression slims down for spring," EDN, April 4, 2002, pg 59.

  4. Dipert, Brian, "Video improvements obviate big bit streams," EDN, March 15, 2001, pg 83.

  5. Dipert, Brian, "Security scheme doesn't hold water(marking)," EDN, Dec 21, 2000, pg 35.

  6. Dipert, Brian, "A crash course in color conversion," EDN, June 7, 2001, pg 46.

  7. Dipert, Brian, "Video quality: a hands-on view," EDN, June 7, 2001, pg 83.

  8. "Video quality comparison study," eTesting Labs, May 2001, www.etestinglabs.com/main/reports/msvideo.pdf.

  9. Waggoner, Ben, "MPEG-4 codecs compared," DV magazine, June 2002.

Author Information
Technical Editor Brian Dipert's eyes are slowly but surely transforming into 4-by-3- and 16-by-9-aspect-ratio rectangles. As long as the retinas remain progressive-scan, though, he's not too worried. You can reach him at 1-916-454-5242, bdipert@edn.com, and www.bdipert.com.

For more information...

When you contact any of the following manufacturers directly, please let them know you read about their products in EDN.

Adobe Systems
1-408-536-6000
www.adobe.com

Ligos
1-415-249-0100
www.ligos.com

Microsoft
1-425-882-8080
www.microsoft.com

Sarnoff
1-609-734-2000
www.sarnoff.com

Sonic Foundry
1-608-256-3133
www.sonicfoundry.com

 

RSS
Reprints/License
Print
Email
Talkback
Canon Resource Center

Featured Company


Most Recent Resources

Advertisement
Related Content

No related content found.

  • 0 rated items found.
Advertisement

KNOWLEDGE CENTER

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Engineering Careers
Jobs sponsored by
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2012 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows