Two Operation Schemes of Digital Audio Signal Compression Technology

Release time:

2023-11-22 08:18

After the audio signal is digitally encoded, one of the problems facing is the problem of mass data storage and transmission. Digital audio signal compression technology is a very important part of digital television broadcasting system. Compression efficiency and compression quality directly affect the transmission efficiency of digital television broadcasting and the transmission quality of audio and video. In this paper, the digital audio compression technology to do a shallow analysis.

Compared with analog signals, digital signals have obvious advantages, but digital signals also have their own corresponding disadvantages, that is, the demand for storage capacity and the increase of channel capacity requirements during transmission. Audio compression technology refers to the application of appropriate digital signal processing techniques to the original digital audio signal stream (PCM encoding) to reduce (compress) its code rate without loss of useful information or negligible loss introduced, also known as compression coding. It must have a corresponding inverse transform, called decompression or decoding. Generally speaking, audio compression techniques can be divided into two categories: lossless data compression and lossy data compression.

lossless data compression

The original data information can be recovered bit-by-bit after decompression using a lossless compression scheme. They eliminate the statistical redundancy that exists in the audio signal by predicting the values in past samples. A small compression ratio, approximately 2:1, may be achieved depending on the complexity of the original audio signal. Time-domain predictive coding techniques make lossless compression feasible thanks to time-domain predictive coding techniques. They are:

1. Differential algorithm

The audio signal contains repetitive sounds, but also a large number of redundant and perceptually irrelevant sounds. The duplicate data information is deleted during encoding and reintroduced during decoding. An audio signal is first decomposed into a number of sub-bands containing discrete tones. DPCM is then applied using a predictor suitable for the short term periodic signal. This encoding is adaptive, looking at the input signal energy to modify the quantization step size. This leads to the so-called adaptive DPCM(ADPCM).

2. Entropy Encoder

Redundancy in the quantized subband coefficient representation is exploited to improve entropy coding efficiency. These coefficients are sent sequentially at progressively increasing frequencies, producing large values at low frequencies and long runs of near zero values after smaller values at high frequencies. The VLC is taken from a different Huffman table that is statistically more consistent with the low and high frequency values.

3. Block floating point system

The binary values from the/D conversion process are grouped into data blocks, either in the time domain, by using adjacent samples at the/D conversion transmission output, or in the frequency domain, by using adjacent frequency coefficients at the FDCT output. The binary values in the data block are then scaled up so that the larger value is only below the fully scaled value. This scaling factor is called an exponent and is common to all values in the block.

Therefore, each value can be determined with a mantissa (a sample value) and an indicating positive number. The bit allocation calculation is derived from the HAS model, and the way to achieve data rate compression is to send the exponent value once per data block. The coding performance is good, but the noise is related to the signal content. Shielding techniques help to reduce this audible noise.

lossy data compression

Lossy data compression is achieved by combining two or more processing techniques to take advantage of the HAS's inability to detect specific spectral components in other high amplitudes. In this way, high-performance data compression schemes and much higher compression ratios from 2:1 to 20:1 can be obtained, depending on the complexity of the encoding/decoding process and audio quality requirements.

Lossy data compression systems use perceptual coding techniques. The basic principle is that all signals below the threshold curve are discarded to eliminate perceptual redundancy in the audio signal. Therefore, these lossy data compression systems are also referred to as perceptually lossless. Perceptual lossless compression is feasible thanks to a combination of several techniques, such:

1. Time and frequency domain masking of signal components.

2. Quantify the noise masking of each audible tone

By allocating enough bits, it is ensured that the quantization noise level is always below the masking curve. At frequencies close to the audible signal, an SNR of 20 or 30DB is acceptable.

3. Joint coding

This technique takes advantage of redundancy in multi-channel audio systems. It has been found that a large amount of the same data exists in all channels. Thus, data compression can be obtained by encoding these same data once and indicating to the decoder that these data must be repeated in other channels.

Implementation of Audio Decoding Process

Important masking effects occur in the frequency domain. To take advantage of this property, the audio signal spectrum is decomposed into multiple sub-bands with a time and frequency resolution that matches the critical bandwidth of the HAS.

The structure of the perceptual encoder consists of the following parts:

1. Multi-band filter

Usually called a filter bank, its role is to decompose the spectrum into sub-bands.

2. Bit allocator

It is used to estimate the masking threshold and to allocate bits based on the spectral energy of the audio signal and a psychological model.

3. Conversion and Quantization Processor

4. Data Multiplexer

Used to receive quantized data and add side information (bit allocation and scaling factor information) for the decoding process.

3.1 filter banks (there are three types of filter banks)

(1) Sub-band groups. The signal spectrum is divided into equal-width frequency sub-bands. This is similar to the HAS process of frequency analysis, which divides the audio spectrum into critical bands. The width of the critical sub-bands is variable. The bandwidth below 500Hz is 100Hz, and at 10KHz, the bandwidth above 10KHz increases to several KHz. Sub-bands below 500H contain several critical bands. The sub-band filters have a small overlap and are typically used for adjacent time samples. Each sub-band signal is then uniformly quantized with the bit allocation for that sub-band to maintain a positive mask-to-noise ratio (MNR). The ratio is positive when the shielding curve is above the noise curve.

(2) Conversion group. Modified DCT(MDCT) algorithms are commonly used to convert time-domain audio signals into a large number of subbands (256 to 1024). There is also some overlap in such filter banks.

(3) hybrid filter banks. They consist of a subband filter followed by an MDCT filter. This combination provides a finer frequency resolution.

3.2 Perceptual Models, Masking Curves, and Bit Allocation

An accurate psychological analysis of the input PCM signal is performed with respect to its frequency and energy content, using the algorithm of the Fast Fourier Transform. Masking curves were calculated from the hearing thresholds and the frequency masking properties of the HAS. The shape and level of the masking curve is related to the signal content. The difference between the spectral signal package and the masking curve determines the large number of bits (on a 6dB per bit basis) required to encode all spectral components of the audio signal. This bit allocation process ensures that the quantization noise is below the audible threshold.

The masking threshold for each sub-band is derived from the masking curve. Each threshold determines the acceptable larger amount of noise energy in each sub-band at which the system noise becomes audible for perceptually lossless compression.

3.3 converters and quantizers

The filter output samples from each subband are scaled and quantized in two ways:

(1) Block floating point system. The system normalizes the larger value in the data block to a fully scaled value. This block scaling factor is transmitted within the data stream and is used by the decoder to downscale all data values in the block. In MPEG layer 1, the data block consists of 12 consecutive samples. The audio time consists of 384 samples (32 sub-bands, 12 samples per sub-band). The values of all data blocks are then quantized. The quantization step value is determined by the bit allocator.

(2) Noise allocation and scalar quantization. In the previous method, each sub-band has a different scaling factor.

The second method uses the same scaling factor for several frequency bands with approximately the critical bandwidth. The value of this scaling factor is not derived from the standard process, but is part of the noise allocation process. No bit allocation is performed here. After estimating the mask threshold for each sub-band, the scaling factor is used to modify all quantization step values in the scaling factor band to modify the quantization noise structure to better match the frequency line of the threshold. The non-uniform quantization process serves to adapt the quantization noise to the signal amplitude in an optimized manner. The next step is to encode the audio spectral values with Huffman coding and get better data compression.

3.4 Data Multiplexer

Blocks of 12 data samples from each quantizer output are multiplexed with other corresponding scaling factors and bit allocation information to form an audio frame in the encoded bitstream. Optional auxiliary data may also be inserted into the bitstream. The MPEG standard does not specify the types of data that can be transmitted and how these data types are formatted in the bitstream.