汕头市美力科技有限公司

搜索
img
News Center
>
>
Two operation schemes of digital audio signal compression technology
Check category

Two operation schemes of digital audio signal compression technology

  • Categories:Industry News
  • Author:
  • Origin:
  • Time of issue:2018-04-12 11:39
  • Views:

Two operation schemes of digital audio signal compression technology

Information
After the audio signal is digitally encoded, one of the biggest problems facing is the problem of massive data storage and transmission. The compression technology of digital audio signals is a very important link in the digital TV broadcasting system. Compression efficiency and compression quality directly affect the transmission efficiency of digital TV broadcasting and the transmission quality of audio and video. This article mainly analyzes the digital audio compression technology.
 
   Compared with analog signals, digital signals have obvious advantages, but digital signals also have their own corresponding disadvantages, that is, the demand for storage capacity and the increase in channel capacity during transmission. Audio compression technology refers to the application of appropriate digital signal processing technology to the original digital audio signal stream (PCM encoding) to reduce (compress) its code rate without loss of useful information or negligible loss. Called compression coding. It must have a corresponding inverse transform, called decompression or decoding. Generally speaking, audio compression techniques can be divided into lossless data compression and lossy data compression.
 
   lossless data compression
 
  Using a lossless compression scheme can restore the original data information bit by bit after decompression. They eliminate the statistical redundancy that exists in the audio signal by predicting the values ​​in the past samples. A small compression ratio can be achieved, preferably about 2:1, depending on the complexity of the original audio signal. Time-domain predictive coding technology makes lossless compression feasible, thanks to time-domain predictive coding technology. they are:
 
   1. Difference algorithm
 
   Audio signals contain repetitive sounds, as well as a lot of redundancy and perceptually irrelevant sounds. Duplicate data information is deleted during the encoding process and re-introduced during decoding. The audio signal is first decomposed into several sub-bands containing discrete tones. Then apply DPCM using a predictor suitable for short-term periodic signals. This kind of coding is adaptive, it looks at the input signal energy to modify the quantization step size. This leads to the so-called adaptive DPCM (ADPCM).
 
   2. Entropy encoder
 
"Using the redundancy in the representation of quantized subband coefficients to improve the efficiency of entropy coding. These coefficients are sent in order of increasing frequency, producing a larger value at low frequencies, and a long stroke close to zero after producing smaller high frequencies. The VLC is taken from a different Huffman table that is most consistent with the statistics of the low-frequency value and the high-frequency value.
 
  3. Block floating point system
 
The binary values ​​from the A/D conversion process are grouped into data blocks, either in the time domain, by using adjacent samples at the A/D conversion transmission output end; or in the frequency domain, by using adjacent samples at the FDCT output end Frequency factor. Then the binary value in the data block is increased proportionally so that the maximum value is only lower than the fully converted value. This conversion factor is called an exponent and is common to all values ​​in the block.
 
   Therefore, each value can be determined by a mantissa (a sample value) and an indicator positive number. The bit allocation calculation is derived from the HAS model, and the method to achieve data rate compression is to send the index value once for each data block. The coding performance is good, but the noise is related to the signal content. Shielding technology helps reduce this audible noise.
 
   lossy data compression
 
"The way to achieve lossy data compression is to combine two or more processing techniques to take advantage of HAS's inability to detect other high-amplitude specific spectral components. In this way, high-performance data compression schemes and much higher compression ratios from 2:1 to 20:1 can be obtained, depending on the complexity of the encoding/decoding process and audio quality requirements.
 
   The lossy data compression system uses perceptual coding technology. The basic principle is to discard all signals below the threshold curve to eliminate the perceptual redundancy in the audio signal. Therefore, these lossy data compression systems are also called perceptually lossless. Perceptually lossless compression is feasible due to the combination of several technologies, such as:
 
   1. Time and frequency domain shielding of signal components.
 
   2. Quantify the noise shielding of each audible tone
 
   By allocating enough bits to ensure that the quantization noise level is always lower than the shielding curve. At frequencies close to the audible signal, an SNR of 20 or 30DB is acceptable.
 
  3. Joint coding
 
   This technology takes advantage of the redundancy in a multi-channel audio system. It has been found that there is a large amount of the same data in all channels. Therefore, data compression can be obtained by encoding the same data once, and indicate to the decoder that these data must be repeated in other channels.
 
  The realization of audio decoding process
 
   The most important shielding effect appears in the frequency domain. In order to take advantage of this property, the audio signal spectrum is decomposed into multiple sub-bands according to the time and frequency resolution matching the critical bandwidth of the HAS.
 
The structure of the perceptual encoder is composed of the following parts:
 
   1. Multi-band filter
 
   is usually called a filter bank, and its role is to decompose the frequency spectrum into sub-bands.
 
   2. Bit splitter
 
   is used to estimate the masking threshold and allocate bits based on the frequency spectrum energy of the audio signal and the psychological model.
 
  3. Conversion and quantization processor
 
   4. Data multiplexer
 
   is used to receive quantized data and add sub-information (bit allocation and scaling factor information) for the decoding process.
 
  3.1 filter bank (there are three types of filter bank)
 
   (1) Sub-band group. The signal spectrum is divided into frequency sub-bands of equal width. This is similar to the HAS process of frequency analysis, which divides the audio frequency spectrum into critical bands. The width of the critical subband is variable. The bandwidth below 500Hz is 100Hz, at 10KHz, the bandwidth above 10KHz increases to several KHz. The sub-bands below 500H include several critical bands. The sub-band filters have a small overlap and are usually used for adjacent time samples. Each sub-band signal is then uniformly quantized with the bit allocation of the sub-band to maintain a positive mask-to-noise ratio (MNR). When the shielding curve is above the noise curve, the ratio is positive.
 
   (2) Conversion group. The modified DCT (MDCT) algorithm is usually used to convert the time domain audio signal into a large number of sub-bands (256 to 1024). There is also some overlap in this filter bank.
 
   (3) Hybrid filter bank. They consist of sub-band filters followed by MDCT filters. This combination provides finer frequency resolution.
 
  3.2 Perception model, shielding curve and bit allocation
 
  The precise psychological analysis of the input PCM signal is performed on its frequency and energy content, and the tool used is the algorithm of fast Fourier transform. The shielding curve is calculated from the hearing threshold and the frequency shielding properties of HAS. The shape and level of the shielding curve are related to the signal content. The difference between the spectral signal packaging and the shielding curve determines the maximum number of bits required to encode all spectral components of the audio signal (based on 6dB per bit). This bit allocation process ensures that the quantization noise is below the audible threshold.
 
   derives the masking threshold of each sub-band from the masking curve. Each threshold determines the maximum quantized noise energy that can be accepted in each sub-band. At the threshold, the noise of the perceptually lossless compression system can begin to be audible.
 
  3.3 Converter and quantizer
 
  The output samples from each sub-band filter are converted and quantized by two methods:
 
   (1) Block floating point system. The system normalizes the maximum value in the data block to a fully converted value. This block scaling factor is transmitted within the data stream, and the decoder uses it to down-scale all data values ​​in the block. In the first layer of MPEG, the data block is composed of 12 consecutive samples. The audio time is composed of 384 samples (32 sub-bands, each with 12 samples). The values ​​of all data blocks are then quantized and the quantization step value is determined by the bit allocator.
 
   (2) Noise allocation and scalar quantization. In the previous method, each sub-band has a different conversion factor.
 
   The second method uses the same conversion factor for several frequency bands with approximately critical bandwidths. The value of this conversion factor is not derived from the standard process, but is part of the noise allocation process. No bit allocation is performed here. After estimating the masking threshold of each sub-band, the conversion factor is used to modify all the quantization step values ​​in the conversion factor band to modify the quantization noise structure to better match the frequency line of the threshold. The non-uniform quantization process is used to adapt the quantization noise to the signal amplitude in an optimized manner. The next step is to use Huffman coding to encode the audio frequency spectrum value and get better data compression.
 
  3.4 Data multiplexer
 
  A block of 12 data samples output from each quantizer is multiplexed with other corresponding scaling factors and bit allocation information to form an audio frame in an encoded bit stream. It is also possible to insert optional auxiliary data into the bit stream. The MPEG standard does not specify the types of data that can be transmitted and how these data types are formatted in the bit stream.

Relevant Information

What are the main parameters of a good speaker?
What are the main parameters of a good speaker?
1. Rated impedance. Common rated impedances of speakers are 4 ohms, 6 ohms, 8 ohms, 16 ohms, etc. The impedance of the connected speaker is mostly in the range of 4-16 ohms. The impedance of the speaker should be selected according to the requirements of the power amplifier during use. 2. Effective frequency range. The wider the sound pressure frequency range of the speaker, the better the frequency characteristics. 3. Frequency divider. Generally speaking, the performance of three-way speakers should be better than two-way speakers. Because the three-way frequency adds a mid-range speaker unit, the mid-range can be made more mellow. 4. Sensitivity. A speaker with a sensitivity of 90dB is sufficient to meet the needs of home audio. 5. The caliber of the speaker. The diameter of the woofer is generally 20-38cm, and there are also large diameters of 60cm or 72cm; the diameter of the tweeter is generally 2-6cm, and some are larger than 9cm. 6. The net weight of the speaker. The heavier the speaker, the better the quality. Because the heavier the speaker, the larger the magnet or the thicker the sheet material used in the speaker, both of which will make the sound quality better.
Two operation schemes of digital audio signal compression technology
Two operation schemes of digital audio signal compression technology
After the audio signal is digitally encoded, one of the biggest problems facing is the problem of massive data storage and transmission. The compression technology of digital audio signals is a very important link in the digital TV broadcasting system. Compression efficiency and compression quality directly affect the transmission efficiency of digital TV broadcasting and the transmission quality of audio and video. This article mainly analyzes the digital audio compression technology.    Compared with analog signals, digital signals have obvious advantages, but digital signals also have their own corresponding disadvantages, that is, the demand for storage capacity and the increase in channel capacity during transmission. Audio compression technology refers to the application of appropriate digital signal processing technology to the original digital audio signal stream (PCM encoding) to reduce (compress) its code rate without loss of useful information or negligible loss. Called compression coding. It must have a corresponding inverse transform, called decompression or decoding. Generally speaking, audio compression techniques can be divided into lossless data compression and lossy data compression.    lossless data compression   Using a lossless compression scheme can restore the original data information bit by bit after decompression. They eliminate the statistical redundancy that exists in the audio signal by predicting the values ​​in the past samples. A small compression ratio can be achieved, preferably about 2:1, depending on the complexity of the original audio signal. Time-domain predictive coding technology makes lossless compression feasible, thanks to time-domain predictive coding technology. they are:    1. Difference algorithm    Audio signals contain repetitive sounds, as well as a lot of redundancy and perceptually irrelevant sounds. Duplicate data information is deleted during the encoding process and re-introduced during decoding. The audio signal is first decomposed into several sub-bands containing discrete tones. Then apply DPCM using a predictor suitable for short-term periodic signals. This kind of coding is adaptive, it looks at the input signal energy to modify the quantization step size. This leads to the so-called adaptive DPCM (ADPCM).    2. Entropy encoder "Using the redundancy in the representation of quantized subband coefficients to improve the efficiency of entropy coding. These coefficients are sent in order of increasing frequency, producing a larger value at low frequencies, and a long stroke close to zero after producing smaller high frequencies. The VLC is taken from a different Huffman table that is most consistent with the statistics of the low-frequency value and the high-frequency value.   3. Block floating point system The binary values ​​from the A/D conversion process are grouped into data blocks, either in the time domain, by using adjacent samples at the A/D conversion transmission output end; or in the frequency domain, by using adjacent samples at the FDCT output end Frequency factor. Then the binary value in the data block is increased proportionally so that the maximum value is only lower than the fully converted value. This conversion factor is called an exponent and is common to all values ​​in the block.    Therefore, each value can be determined by a mantissa (a sample value) and an indicator positive number. The bit allocation calculation is derived from the HAS model, and the method to achieve data rate compression is to send the index value once for each data block. The coding performance is good, but the noise is related to the signal content. Shielding technology helps reduce this audible noise.    lossy data compression "The way to achieve lossy data compression is to combine two or more processing techniques to take advantage of HAS's inability to detect other high-amplitude specific spectral components. In this way, high-performance data compression schemes and much higher compression ratios from 2:1 to 20:1 can be obtained, depending on the complexity of the encoding/decoding process and audio quality requirements.    The lossy data compression system uses perceptual coding technology. The basic principle is to discard all signals below the threshold curve to eliminate the perceptual redundancy in the audio signal. Therefore, these lossy data compression systems are also called perceptually lossless. Perceptually lossless compression is feasible due to the combination of several technologies, such as:    1. Time and frequency domain shielding of signal components.    2. Quantify the noise shielding of each audible tone    By allocating enough bits to ensure that the quantization noise level is always lower than the shielding curve. At frequencies close to the audible signal, an SNR of 20 or 30DB is acceptable.   3. Joint coding    This technology takes advantage of the redundancy in a multi-channel audio system. It has been
What are the specifications of Bluetooth speakers
What are the specifications of Bluetooth speakers
At present, there are many Bluetooth headset products on the market, and relatively few Bluetooth speakers. The so-called Bluetooth speaker actually refers to the speaker that relies on the Bluetooth transmission protocol as the carrier for data transmission. Since most mobile devices (mobile phones, notebooks, tablet computers) are equipped with Bluetooth chips, no data cable or audio cable connection is required. It is quickly recognized, easy to operate, and easy to connect. From the perspective of sound quality performance, the effective audio data volume of CD sound quality data (44.1KHz sampling rate, 16bit sampling accuracy) is about 1.4Mbit. To transmit CD sound quality music signals, the transmission rate only needs to be maintained at 2Mbit per second. The "2.1+EDR" specification is sufficient. Moreover, because such products often follow the mature acoustic structure of traditional speakers, and realize wireless playback after integrating the Bluetooth module, their sound quality performance is comparable to that of speaker products of the same level. From the point of view of specifications, although the Bluetooth 3.0/4.0 standard has been proposed, the former is mainly reflected in the Bluetooth radio frequency modulation method in line with Wi-Fi, and the latter is reflected in the application of automatic power control, that is, low power consumption. The two versions reflect the progress of Bluetooth technology, but have little connection with audio applications. From the point of view of chip-level applications, it is suitable for version 3.0/4.0. The mainstream Bluetooth speakers all use the A2DP stereo protocol. In 2012, smartphones, tablets and other devices all support the A2DP protocol, so there is no obstacle to the use of Bluetooth speakers.
Does the speaker have to be burned?
Does the speaker have to be burned?
In the first stage, it is only a warm-up stage, and it is not necessary to select the tracks from the recommended reference tracks above. Just use a more soothing tune like "Guess the Heart" for normal playback at a volume of about 30% of the normal volume, and the playback time is generally 10 to 12 hours. After the first stage of warm-up, even if it enters the official appraisal stage, there are clear standards for the selection of music types and styles, down to the adjustment of volume and playback time period. First of all, from the three types of high, middle and low audio of the recommended reference tracks, Liang Zhu, Dukou, and Xianyun Guhe were selected as the praise machine tracks. Use a volume of 60% to 70% of the normal volume for a 48-hour loop playback. This stage is a link between the previous and the next. The adaptive memory of the headset for each frequency is formed at this stage. The success or failure of this stage will directly affect the performance of the headset for the sound of the frequency used in the future work process. Of course, in addition to the three recommended songs above, you can also use other songs, such as Four Seasons Autumn, Qinghai-Tibet Plateau, etc. mentioned in the table. The third stage is mainly a consolidation stage. The main role played is to further enhance the return visit effect of the corresponding frequency band, so that the corresponding frequency band can be better interpreted in the future playback. Here, the trial comprehensive performance is superb. Strong California Hotel, The mass (Era), Fairytale (Secret Garden) three songs. Need to pay attention to when playing. Keep the volume at about 50% for loop playback, and the playback time required is about 14 hours. The whole testimonial process is over here. The rest is the adaptive stage of ordinary music. Under normal circumstances, after about two weeks, the performance of the headphones will feel completely reborn. Remember: Do not use rock and dance music, so the speakers will be useless!

Shantou Meili Technology Co., Ltd.

E-Mail:sean@tgspeaker.cn
Tel:+86-15089109295 (Sean Zhuang)

E-Mail:william@tgspeaker.cn 

Tel:+86-18011729326 (William Kuo)

E-Mail:able@tgspeaker.cn

Tel:+86-13531218687 (Able Zhang)

Meili Technology

Website

Copyright © 2021 Shantou Meili Technology Co., Ltd.   粤ICP备2021122862号

img