MPEG2 Audio for DVD:

The Compromise Choice

 

October, 1996

 

Introduction

Digital Video Disc (DVD) is intended to meet the needs of a wide range of consumers. The delivery of high quality audio will be as much a part of the format as high quality video, and must be able to conform to starkly different, and sometimes contradictory reproduction requirements. From a single soundtrack, it must be possible to provide:

 

· high intelligibility of the audio under low listening levels or in noisy environments by restricting the reproduced dynamic range;

· full dynamic range for superb home theatre performance;

· 5.1-channel discrete multichannel sound;

· compatible matrix-encoded two-channel audio for conventional Dolby Surround, and;

· mono audio suitable for RF remodulation.

It must be possible to achieve any of the above features without compromising the others.

 

MPEG2 layer 2 audio is being proposed for various consumer formats such as U.K. DVB and PAL DVD. It is generally understood that the first programs will be delivered in conventional stereo or matrixed surround, and that discrete multichannel programs would follow at some future date using MPEG2's multichannel extension method. Since there are no MPEG2 audio decoder ICs yet available, it is assumed that consumer hardware would be made initially using the existing MPEG1 stereo audio decoders already available, but with the option for more data rates and sample rates. This decision locks the audio coding method to the Backward Compatible (BC) mode of MPEG2, wherein a "compatibility matrix" is used to ensure that two-channel MPEG1 decoders receive the entire audio soundtrack when multichannel programs are delivered.

 

Note: While other versions of MPEG2 audio bitstreams are considered backwards compatible because the coding technology is similar to MPEG1, the audio recovered from a 2-channel MPEG1 decoder is not usable unless the MPEG2 multichannel encoder activates the backward compatible (BC) matrix feature. Without this option, 2-channel audio would still be reproduced, but it would not be a complete downmix of the entire 5-channel audio soundtrack. This BC matrix presents MPEG2 audio with several quality and functional difficulties, as described below.

 

By introducing MPEG audio in this "stereo today; multichannel tomorrow" two-step strategy, the limitations of multichannel MPEG2 may remain largely unknown until sometime in the future, long after the decision has been made to use MPEG2 in the format. This may delay the realization that MPEG2 may actually be unsatisfactory for the delivery of high quality multichannel audio, thus effectively preventing the delivery format from ever fulfilling its intended promise of having multichannel sound.

The compatibility matrix

MPEG2 BC delivers five-channel audio in the following way. The five input channels are first combined into two-channel stereo by the use of a compatibility matrix, with one matrixed output combining Left, Center, L-Surround and R-Surround channels and the other output combining Right, Center, L-Surround and R-Surround channels in appropriate proportions. This main 2-channel data is accompanied by auxiliary data containing separate Center, L-Surround and R-Surround channels¾ three of the five same signals already carried in the matrixed data. Conventional MPEG1 decoders see only the main stereo data and ignore the rest, thus allowing them to reproduce the entire soundtrack as based on the particular mixing formula of the compatibility matrix. In an MPEG2 decoder, the auxiliary signals may also be decoded and electrically subtracted from the main signals, thus canceling the redundant signal counterparts and leaving only the individual Left and Right channels. The five original channels are thereby recovered with theoretically complete separation from the MPEG2 decoder, but they have not necessarily arrived totally unscathed in the process. Coding artefacts may be exposed (unmasked) by the matrix extraction process. Also, there are difficulties in handling the peak signal levels caused by the compatibility matrix, which must affect the sound quality of either the MPEG1 or the MPEG2 decoder. The producers must choose which one to compromise.

 

The subject of coding artefact unmasking has been detailed in various CCIR and IEEE papers, and occurs basically when two coded audio signals cancel each other, thus exposing the formerly masked coding artefacts. This happens in the MPEG2 decoder while canceling the Center, L-Surround and R-Surround signals from the matrixed stereo composite in order to extract the original Left and Right signals. Various methods of mitigating this problem have been proposed by MPEG's architects, but none has been shown to be totally effective, and usually incur a bitrate penalty to obtain the original subjective quality the system achieved before applying the compatibility matrix. The redundancy of the Center, L-Surround and R-Surround information being encoded for the purpose of backward compatibility with MPEG1 decoders causes a loss of efficiency to the system and an attendant quality compromise for the multichannel listener. Therefore, a compatibility matrix can never be regarded as an advantage to the final quality of the MPEG2 multichannel sound.

 

Other difficulties arise due to the summation of audio signals in the compatibility matrix. If only two audio signals at 0 dBFS are added together, the sum would be +6 dB (well into signal clipping), and must be scaled down by at least 6 dB to ensure it does not exceed 0 dBFS and cause audible distortion. In the more typical case of downmixing five channels into two, an attenuation of about 7-10 dB would be needed to avoid clipping when the signals reach full scale. The whole multichannel soundtrack could be scaled down for coding and then scaled back up again after extracting the original discrete signals in the MPEG2 decoder without much trouble. However, the audio coming from the MPEG1 decoder cannot be rescaled to the original level as it still contains the increased peak levels resulting from summing the multiple input channels. This would mean that the dialogue reference levels would be noticeably lower than found in conventional two-channel programs. People listening in stereo would experience a disturbing drop in listening level as a result of receiving a multichannel program created with a compatibility matrix, while conventional stereo programs would exhibit no level drop. The change in level from one program to another would not be acceptable to consumers or broadcasters.

 

To avoid the use of overall level scaling, peak level compression may be applied to the multichannel source signals when using the compatibility matrix, thus preventing clipping only as needed. However, in order for the BC matrix extraction process to work properly for the multichannel decoder, the auxiliary audio channels must also have exactly the same compression applied as those in the matrixed channels. This is required so the subtraction process will be accurate, even though the auxiliary signals would not otherwise require limiting to prevent clipping. While this solution allows the MPEG1 listener to receive programs of consistent average loudness, the five-channel discrete mode of MPEG2 would now deliver audio with peak program reductions from time to time. Home theater enthusiasts may not appreciate this alteration of the program.

 

The notion of offering 7.1-channel soundtracks with MPEG only exacerbates these quality compromises further. Considering that domestic reproduction does not have the problems of extreme viewing screen width posed in a certain movie theaters, there is no basis for adding two more front speakers (half-left and half-right) to a system that already has the enhanced positional accuracy delivered by three front speakers (left, center, and right).

Other compromises

Only one set of BC matrix coefficients may be used for any particular program being delivered by MPEG2. It is therefore necessary to decide which matrix mix best serves the anticipated audience. This assumes there is one type of listener in the mass audience, which of course cannot be the case unless the program is in mono. If the program on the disc is a movie soundtrack, then the matrix must be made compatible with Dolby Surround decoders by including the proper ±90 degree phase-shifted surround components in the two-channel "Lt/Rt" signal. The method of accomplishing this with high quality is technically challenging, and Dolby Laboratories has devised a precise method to do so in a cost-effective manner, using a patented combination of encoder and decoder processes. It has not been publicly demonstrated how MPEG2 will attempt to meet this requirement, but tests conducted by video software companies in Hollywood in July, 1996, showed the system serious flaws in sound quality.

 

If mono reproduction is the primary goal, however, the out-of-phase surround signals in an Lt/Rt encoded signal would cancel and disappear, so a different compatibility matrix would be desired. If good stereo/mono compatibility is needed, the MPEG "Lo/Ro" matrix can achieve this by avoiding the phase encoding of the surround signals, but this now prevents surround decoders from working properly. No matter which matrix is chosen, some users will not obtain optimal reproduction.

 

Table 1. Summary of MPEG audio processing.

 

Audio process

MPEG1 decoding

MPEG2 decoding

Quality degraded?

Compatibility matrix

favors compatibility for certain users over others; audio quality degraded

causes artifact unmasking

MPEG1, MPEG2

Level scaling

(no compression)

causes loudness variations between different programs

can be rescaled to original levels

MPEG1

Peak limiting

maintains average loudness with other 2-ch programs

causes unnecessary peak level reduction

MPEG2

A positive note

DVD may emerge as the only viable format to deliver multichannel audio to consumers in PAL video territories for the foreseeable future. The DVD standard allows Dolby Digital to be added as an option on PAL discs, thus assuring they can attain the same overall program quality found on the NTSC version. This parity ensures that consumers will not perceive the NTSC version as having inherent advantages over the PAL version; such perceptions having been shown to be a disruptive factor for the laser disc format and for copyright owners.

 

Dolby is working with the video software companies and post production studios to ensure that PAL titles include Dolby Digital bitstreams when appropriate.

 

DVD players sold in PAL countries only need to include the digital output connector to provide the Dolby Digital bitstream to an external Dolby Digital decoder and recover the full 5.1-channel soundtrack. Most DVD players plan to include this connector.

Dolby Digital avoids the compromises

By starting from a "clean sheet" approach to the design of multichannel audio coding, Dolby was able to see the inherent difficulties of adding more channels to a system originally designed as a 2-channel carrier. Therefore, while MPEG1 decoders can only look at a portion of the total multichannel bitstream being delivered, all Dolby Digital decoders receive and make use of the entire bitstream. This offers the opportunity to obtain the full quality of the source content and to optimize the reproduction of the sound in different ways to suit different listeners. It also allows Dolby Digital to dispense with the use of a "compatibility matrix" and thus completely avoids the problems of coder unmasking.

 

It is possible for mono, stereo and full 5.1-channel Dolby Digital decoders to reproduce all audio programs with consistent average loudness levels, no matter how many audio channel are present in the received bitstream or how many are reproduced by the audio system. The system can achieve this whether or not downmixing to fewer output channels, and can decode the audio with no peak level compression at all if full dynamic range is desired, or with variable peak compression and dynamic range control at the option of the product designer or end user. Since the type of downmix used for stereo or mono reproduction is not preselected in the delivered program, all reproduction options are possible for all users with no compromises to other users of the same bitstream.

 

Table 2. Summary of Dolby Digital audio processing.

 

Audio process

2-channel decoding

5.1-channel decoding

Quality degraded?

Decoder downmixing

optimizes sound mix for each user

no compromise

no

Level scaling

(no compression)

all programs level matched

all programs level matched

no

Peak limiting

maintains typical dynamics with other 2-ch programs

improves subjective quality at lower volumes

no

 

Conclusion

MPEG2 audio is ill-equipped for new consumer formats intending to deliver discrete multichannel audio. Its limitations are fundamental to its structure, and so are not able to be finessed or eliminated by breakthroughs in coder design, as they stem primarily from the inherent compatibility requirements of MPEG1 audio decoding.

 

Dolby Digital was designed from the outset as a multichannel coder to avoid these types of limitations, and has proven itself a practical solution to the delivery of compatible, high-quality multichannel audio.