Swarm Multimedia Inc.
Swarm Multimedia Inc.
Swarm Multimedia Inc.






A Very Brief Compression Primer and Related Technologies
Introduction
To go into a lengthy and deep discussion of digital video compression here would not be very practical nor time well spent. After all, there are a many great web-sites that go into the many aspects of video compression, and in considerable detail I might add. It is appropriate however, to point out the relatively simple message of what video compression is all about: -what information do I throw away and still keep a semblance of the original content when the encoded image is finally decoded and displayed, even after a lot of the original content has been effectively removed? The degree of 'semblance' or quality is a function of what the encoding method has defined what is and what is not image redundancies (identify what is thrown away) and implemented (how this is done without degrading too much of the final image and still maintain good compression ratios of any benefit). Of course the matching decoder must also be designed to complement some of the encoders processes only in reverse. The following brief outline will attempt to explain this task.
 
Why Have Compression?
The concept of information compression has been and still is an evolving process. The demands by end users to squeeze more information over existing bandwidths, drives the technology to provide more efficient coding schemes even though wider bandwidth services like the wired technologies of ISDN, DSL, cable or the wireless offerings of 802.11x, CDMA2000, GSM or GPRS are available or coming on-line. It always seems to be not enough.

In the old days of dial-up modems, 9600 baud was considered hot technology. Given that the types of information usually ported over this medium was mostly text based and simple vector graphics, it was not expected to be real-time. We have come a long way since then, where the information age has dictated that more information must be available in a timely manner. The data bandwidths have grown to keep pace with the demands of data size and real-time requirements to boot. The types of data we have at our finger-tips now like video, require near real-time or real-time operation. Video is one of those data types that inherently have a lot of information to port, relative to other information types, like text or audio. Compression schemes to reduce the size of relevant information per unit time (bit rates) and still have its result yield similar or better image quality than previous versions is migrating the technology to some as yet undefined Holy Grail. Probably schemes that use light energy rather than digital bits. Simultaneously painting images directly to display surfaces using variations in colour energy levels that may be extracted from images. Who knows?

Compression in video technology really took off with a combination of the personal computer, CD-ROM technology and the underlying desire by the industry in general to merge television, telephony with the computer. This realization didn't occur overnight, but is was bound to happen over time as these overlapping technologies began sharing similar digital processing techniques to advance their respective products.

To illustrate the relationship between available bandwidth of a system and the parameters that define a compressed digital video stream, a general equation has been developed. In the course of the definition it will become obvious as to why compression is needed.

If we took a single digitized video image, say in the size of 640 x 480 pixels. That is the image is 640 pixels wide by 480 pixels in height (this resolution is close to an NTSC size for full screen video broadcast as defined by CCIR-601). Each pixel represents full colour depth of 24bits. That is red is defined as 8 bits, blue is 8 bits and red is 8 bits, commonly known as 24bit RGB. The video frame rate is about 30fps (really 29.97fps) and no compression is applied. Lets look at the file size for a single frame.

General raw formula

Image Data Size = width x height x colour depth x frame rate
              or
Image Data Size = 640 x 480 x 3 x 1 = 920Kilobytes per frame

For 30 frame per sec then:

Byte rate -920,000 x 30 = 27.6 Mbytes/sec

Or in bit rate terms, multiply by 8

Bit rate = approx 220Mbits/sec

Clearly one would need a very wide data pipe to push that much data through in that amount of time; -this is neither realistic nor practical even by today's standards. With the availability of hardware accelerators, high bandwidth buses, and faster processors though, this is not a problem as long as the data is processed locally. However to move this amount of information over a WAN type transmission link, would require the application of compression.

If a compression scheme were applied to reduce the bit rate by say a factor of 100:1, we would now be closer to the WAN connection speeds like those offered by cable and DSL vendors. Even, though the original bit rate of 220Mbits/sec has been reduced to 2.2Mbit/s it is still touching on the high end for most consumer cable/DSL users (unless you want to pay hefty premiums for dedicated services), but it is well within the wireless LAN (802.11x) or wired intranet LANs environments.

Today there are vast improvements in efficiency in both video CODECs and transmission modulation schemes. Arguably, it has helped port video over dial-up and other narrow band pipes with relative success. Compression schemes offered by streaming vendors using MPEG-4, Real-Video, Divx, etc., get good quality using 32kbit/s or higher in QCIF (176x144) or CIF (352x288) frame sizes. CIF video requires higher bit rates to maintain quality (128Kbit/sec to 384kbit/sec).

In the early days when compression schemes were not as efficient, simple techniques like reducing the frame size to 320x240 (QVGA) and lowering frame rates to 15fps were very common. Automatically one would get data rate reduction of about 8 to one without applying any fancy mathematics. The video CODEC's of the time were fashioned to run on dual spin CD-ROM players. That was the focus.

So, after all is said on done, the general compression equation boils down to something very simple (there are other factors that this equation doesn't take into account but they represent small variations in the final outcome).

We'll use the bit rate as the result of the formula:

Bit rate = Width x Height x 8 x Colour Depth x Frame Rate x Q
                                          Compression Ratio


So what does this mean? The bit rate has an inverse relationship to the compression ratio, the higher the compression ratio the lower the bit rate for the same parameters in the numerator, and poorer quality is likely to result. Equally, reducing the frame size or frame rate or colour depth would reflect in a reduction in bit rate for the same compression ratio, or reducing the compression ratio and keeping the bit rate the same, as compensation for smaller frame sizes and frame rate, and so on. The parameters shown can be changed as needed to yield the results desired.

The human eye is more sensitive to changes in brightness than it does colour. This characteristic is used as an advantage in many compression techniques. The colour component of an image pixel is sampled to a higher degree than its luminance cousin. As such the colour depth of a pixel can be adjusted to 4, 8, 12, 16 or 24 bits per pixel with no or little differences in quality perception after compression is applied. Colour conversion schemes like YUV4:2:2 (16bits), YUV4:2:0 (12bits), reduce the number of bits per pixel before the encoding algorithms are applied. MPEG video uses YUV4:2:0 colour space prior to applying any encoding schemes. JPEG could use YUV4:2:2 or YUV4:2:0 prior to compression. YUV4:2:0 would yield a poorer quality image just due to less bits to process, but requires less processing cycles to create an encoded image. For low bit rate video however, this is not an issue.

The Q factor shown in the equation is a fiddle factor that could define the quality of the original video before compression or the efficiency of the compressor. It is normalized to one. Highest quality will be a one. Deemed average quality may be a .5 and so on. Remember what was mentioned earlier about the subjective quality assessment by the viewer of the video mentioned previously.
 
What are principles of compression?
Regardless of what compression scheme is conceived and implemented, the aim is always the same; -throw out redundancies or inconsequential content and keep only what is important. The trick is how efficient is the identification and removal of this type of content and how can the reverse process reconstruct the compressed object with the information that is left after compression and keep it recognizable, that is, minimum artifacts < what's an artifact? (any undesired noise that detracts from the perceived quality of the image, like blurs, blockiness, even skipped frames, snow, graininess, smudging, etc, particularly when various degrees of motion are in play).

Compression schemes are different for different types of information. Audio, video, text, graphics are the main information types where each would exhibit a unique characteristic that can be exploited for compression, where ultimately the development of an associated CODEC would result.

Audio for the most part is really represented by gaps of nothingness, its air. Speech and music exhibit low duty cycle information that can be exploited. Video content is similar frame to frame, text is repeated character patterns or sequences. Graphics is similar to video only that the redundancies are intra-frame vs inter-frame. Video takes advantage of both inter and intra-frame characteristics simply because of the inherent temporal properties of video. Inter-frame compression is applied to data redundancies contained within each frame on its own and intra-frame compression is removal of static redundancies vs processing motion content on a frame to frame basis.

Lets focus on video for the moment. Television images are created by rapidly sequencing images across the TV picture tube. It is fast enough to produce the effect of moving pictures. Each image or picture is painted onto the TV screen face line by line top to bottom in two separate interlaced sequences called fields.

In the digital video world for the most part, the process is the same as in the TV world, except there is no interlacing, its what is called progressively scanned video (each line is scanned only once but the same number of lines are processed to complete a picture frame). The progressive scan outcome results from a conversion that takes place after a single image is captured by the digitizer and stored in memory. Once the image is in memory, the video data is easily shuffled out for further processing line by line, in order this time, top to bottom, without interlace.

Picture elements or pixels form the smallest part of a picture. Each pixel can be represented in any number of bit lengths depending on the desired quality and richness wanted. The more bits per pixel, better quality usually results, but the more information must be computed (more digital signal processing per pixel block) and be moved frame to frame. Each image from frame to frame is for the most part similar in content. The only difference is that there is a small shift in content due to motion of the objects in the images from frame to frame. By identifying the stuff that stays the same or is moving, then making a decision on whether to throw it out or code it so that other frames can use it to construct themselves, is the magic of compression. Dreaming up these coding schemes has given us readily recognized video CODEC's like MPEG-x, various AVI flavours, Quicktime, DIVX among many more. Each of these CODEC's use some algorithm sanctioned by a standards body or a proprietary compression scheme that has been privately funded.

To get maximum efficiency in video compression it must compress both a key frame and its motion affected neighbour frames. The amount of compression applied is a function of desired bit rate, quality yielded at that bit rate and the efficiency of the encoder. An example of this is how MPEG and Quicktime video works. For a more detailed look at how an MPEG sequence is constructed see http://www.mpeg.org/MPEG/starting-points.html, as an example of the many web-sites dedicated to the technology.

For a beginners insite into digital video and compression in general, Adobe has a great series of papers available on their web-site. Of course they plug their products, but the overviews are well worth reading.

http://www.adobe.com/products/premiere/pdfs/dvprimer.pdf

and if you are into video streaming check out this primer from Adobe.... http://www.adobe.com/products/aftereffects/pdfs/AdobeStr.pdf
 
CODEC's and CODEC types
What is a CODEC? It is short for CODer and DECoder or compressor and decompressor. The job of the coder part of the CODEC is to process input information from a camera or a tape deck and apply its compression algorithm against it. It will produce a reduced equivalent of the original image in a combination of mathematical and look-up-table terms and will be stored in a defined way. The decoder part of the CODEC is to take the encoded file and reconstruct the image with the information it has. The decoder portion of the CODEC, know nothing about the display destination or how to paint it to the screen. This is the job the various supported drivers within the operating system in which the CODEC is to work.

The CODEC in itself is an independent piece of code. It should not be operating system (O/S) dependant but as it turns out, some CODEC's have some aspects of its design tailored for the O/S in which it is to operate under. The job of capturing the video from a camera or displaying the video on screen, would directly come under the influence of the O/S and its supported hardware via its drivers.

In the Windows case for example, many CODEC's have been tuned to work under Microsoft's Video for Windows (VFW) environment. AVI or Audio Video Interleave is one CODEC family that exclusively uses VFW in which to work. VFW is an environment where compliant CODEC's will work seamlessly with VFW compliant hardware (sound cards, capture cards etc). The glue that binds the CODEC's to the hardware are VFW centric drivers. These drivers form a layer between the CODEC, the operating system and the hardware.

Non VFW CODEC's will have to work in their own environment if they are to operate as a video component under Windows. The many MPEG flavours, Quicktime, and other proprietary CODEC's using Windows as their operating base, are examples of non VFW based CODEC's. That is not to say that these same CODEC can't be converted to the VFW way, in which to operate.

Here is a sample running list of some CODEC's and what they operate under:

MPEG-x  ...........  Apple standard
Quicktime  ...........  ISO standard
Indeo  ...........  AVI
Cinepak  ...........  AVI
MJPEG  ...........  AVI
Sorenson  ...........  proprietary
Divx  ...........  AVI but an MPEG-4 bent
TrueMotionS  ...........  AVI
Smacker  ...........  AVI
 
Compression for DVD
DVD (Digital Video Disk) or sometimes called Digital Versatile Disk is a well defined standard in terms of supported video and audio format types. A fully compliant DVD title must support several system methods that define a DVD. Video, audio, navigation, stills, and control functions must all be integrated to an established standard to be called a DVD product (this integration process occurs during the compile phase of a DVD project using a DVD authoring tool). Meeting these standards will ensure compatibility to the many set-top and DVD-ROM players on the market. At least that's the plan. The interpretation of the specification is the issue and is being ironed out as the technology matures. As a result, there are still some incompatibilities between players when it comes to DVD-R's, though this issue is quickly fading.

Outside all of the other DVD specifications like, DVD-ROM, DVD-RAM etc, we'll only refer to DVD-Video in this note. On an aside, there are two consortiums vying for acceptance in the DVD market-place. Each one has a different take on how DVD should be recorded, thus deviating from the original DVD specification which creates derivatives of DVD. They are not compatible except on the playback front. DVD- and DVD+ recordables will work on some, and a growing number of set top DVD players. Most set top players appear to support the DVD- flavour as the prime choice by buyers of DVD equipment than those of the "plus" sign, like DVD+R, or DVD+RW. It remains to be seen how the market will shake out.

When producing a DVD title on DVD-R media there are some points to bring up:
a)   a DVD-R authored as DVD-Video, will play on most set-top players and a great many DVD-ROM players. See compatibility link.
b)   there are two types of DVD-R disks, a 'General' type and an 'Authoring' type. The main difference is that the authoring type can be used as a master for disk replication if the CMF (Cutting Master Format) option is enabled during the burn cycle. The media is also more expensive. Otherwise a DLT tape must be used to store the DVD content before to meet the disk replication shop requirements. You will need an authoring burner like the Pioneer DVR-S201 to create authoring disks. The cheaper set-top burners like the Pioneer A03, A04. DVR-103, DVR-2000 or a Panasonic equivalent DMR-E30, burn only 'General' type disks. Each type of DVD-R disk will play on DVD set-top players and DVD-ROM players

A super site on anything to do with DVD stuff, latest news, technology, etc. A must read and to keep in touch with.
http://www.dvdwriters.co.uk/

I also highly recommend you read these two white papers from Pioneer. They are not that long and have a ton of very useful information on the DVD-R media.

The first paper discusses DVD-R, recordable technology at an overview level.
http://www.pioneerelectronics.com/Pioneer/Files/DVDR47_WhitePaper.pdf

and this paper, discusses the difference between Authoring and General type disks.
http://www.pioneerelectronics.com/Pioneer/Files/DVDR_whitepaper.pdf
Further reading on the Pioneer site will give you good insight on the players and the technology. Save these files on your machine or print them out for future reference.

The media size recommended for burning DVD-R's is the 4.7GB option vs the 3.9GB.

When creating DVD-Video titles, ISO MPEG-2 video is used for best results. The standard does support the lower quality MPEG-1 if needed. You can mix them in the one title. Audio can be Dolby AC3, DTS, AIFF or PCM RAW.

Video run time is dependant on the bit-rate chosen at encode time. Usually bit rates between 5 to 7Mbits/sec are chosen in VBR mode with an IPB setting of 15:3. Graphics used for the front-end picture can be a 24bit bitmap 720x480 in size, TIFF (non LZW compressed), Adobe *.psd or JPEG.

The following sites have very deep stuff on DVD if you are interested to further your knowledge of DVD.

Technical
http://www.unik.no/~robert/hifi/dvd/
http://www.mpeg.org/MPEG/DVD/
http://www.dvdforum.org/
http://www.mpeg.org/MPEG/starting-points.html

FAQ
http://www.dvddemystified.com/
http://www.dvddemystified.com/dvdfaq.html - Jim Taylor's stuff is great to start on
http://www.intervideo.com/jsp/Support.jsp - WinDVD software player

Hacker Stuff
http://www.afterdawn.com/
http://www.afterdawn.com/links/
http://www.dvdwriters.co.uk/
http://www.divx.com/
 

© Swarm Multimedia Inc. 2006