Audio and Speech Processing

Tuesday, February 12, 2008

Notes - Feb12 2008

DSP terminology

Precision : Word length & bus width
Resolution : Smallest Non zero magnitude
Dynamic Range: Ratio of Maximum to Minimum signal
Overflow & Saturation

Analog to Digital

---- Analog ----> Sampling---> Quantisation-----> Encoder ----> Digital Signal

Before quantization Sampled data can be converted back as long as the nyquist theorum is followed.

Friday, February 01, 2008

H(z) =

b₀z^{- 2} +
b₁z^{- 1} +
b₂

a₀z^{- 2} +
a₁z^{- 1} +
a₂

Thursday, January 31, 2008

ADAPTIVE NOISE CANCELLATION AND SIGNAL SEPARATION WITH ...

http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/vgerven/phd/phd_twopage_nodutch.ps&gz

Tuesday, October 03, 2006

Using Sox as a library to read audio File

Ok so finally I decided to test what I have been studying in audio processing and use it with C/C++ searched the web and finally figured out sox is pretty cool. It already has code to take care of most file operations. Cool I thought lets try to reuse some of the source files - tried and failed - ran into compiler and linker error :-(

But wait - as it turns out sox has a library which is pretty cool - hey and its easy to use (I could use it within few hours) - here are my notes from trying to use sox as a library both under linux and cygwin under windows XP (hey my laptop has XP and I wanted to watch TV while trying to compile the damn thing)

Trying to compile Sox for linux
Ran into couple of issues. The usage for libst.a as documented in libst.txt was as follows

ft_t st_open_input(const char *path, const st_signalinfo_t *info, const char *filetype);
Note :
path = "-" specifies date to be read from stdin
info = NULL header is checked to provide filetype
filetype = NULL then use header/file ext to figure out type

ft_t st_open_output(const char *path, const st_signalinfo_t *info,const char *filetype, const char *comment);

As it turns out in the latest library the open APIs have been changed to "st_open_read" and "st_open_write".

Then I ran into some issues because the proper libraries were not getting compiled

libst.a(vorbis.o)(.text+0x99): In function `st_vorbisstartread':
/home/users/v/vi/vivekkumar/sox/audio/src/vorbis.c:116: undefined reference to `ov_open_callbacks'
libst.a(vorbis.o)(.text+0xb7):/home/users/v/vi/vivekkumar/sox/audio/src/vorbis.c:124: undefined reference to `ov_info'
libst.a(vorbis.o)(.text+0xcb):/home/users/v/vi/vivekkumar/sox/audio/src/vorbis.c:125: undefined reference to `ov_comment'
/home/users/v/vi/vivekkumar/sox/audio/src/vorbis.c:376: undefined reference to vorbis_encode_init_vbr'

Using the following lines was finally able to compile everything
gcc -lm -lvorbisfile -lvorbisenc dummy.c -o dummy libst.a

Compiling Sox library with Cygwin

Then
I later got a __assert linker error while trying to compile the dummy
project. Finally was able to google and figure out that it happens if
some object are compiled with -mno-cygwin and some without it. As it
turns out by default the sox project in cygwin is configured to compile
with "-mno-cygwin" option. Here is how

gcc -lm -mno-cygwin dummy.c -o dummy libst.a

Saturday, September 23, 2006

FFT Implementations

FFTW - Fastst FFT in the West on sourceforge

Embededded systems article on FFT
http://www.embedded.com/columns/showArticle.jhtml?articleID=172302493

Embedded Systems article on voice enhancements

vercome the technical challenges of typical voice enhancement devices
By Perry Peiyuan He and Roman Anthony Dyba, Freescale Semiconductor Inc., Courtesy of Network Systems Designline
Sep 6 2006 (2:21 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=192501961

Friday, August 18, 2006

Understanding the measurement of sound intensity in decibels (dB)

Sound intensity is frequently expressed in terms of decibels (dB). The decibel (1/10 of a Bel) was named in honour of Alexander Graham Bell. It expresses the logarithm to the base 10 of
the ratio between two sound intensities or (sound pressures)².

The
internationally recognised reference standard intensity for sound is 10^-12
watts per m² which corresponds to a sound pressure of 2.10^-5
N/m² (or Pascal). For a pure tone at 1000 Hz this is close to the
threshold of hearing for young healthy adults.

style='mso-bidi-font-weight:normal'>	dB	Sound pressure (N/m²; Pa)	Relative sound Pressure
Jet aeroplane, 80 ft from tai; hair cell damage	120 dB	20	10^-6
Busy traffic, shouting	80 dB	2.10^-1	10⁴
Conversational speech	60 dB	2.10^-2	10³
Residential area at night	40 dB	2.10^-3	10²
Whisper at 5ft	20 dB	2.10^-4	10¹
Threshold for hearing 1000 Hz young adult	0 dB	2.10^-5	1.0

Wednesday, July 19, 2006

MPEG Audio

MPEG 1 - ISO/IEC 11172-3

Till 1.5 MBPS
Layer 3 (MP3)
Layer 1 & 2 can be split and do not contain highjacked data
Layer 3 decodes layer 1 & layer 2 data

MPEG 2

Supports Lower Sampling Rate

Compliance - ISO/IEC 13818-4 : 2004(E)

Steps for Mp3 Encoder

1. Break Down Stream into Frames
2. Split into Subbands - determine its "spectral energy distribution."
3. Take Encoding bitrate, compare freq. spread for each frame with human psychoacoustics (reference table) Allocate bits
* Borrow bits from Reservoir if needed
4. Huffman coding -lossless - lookup to do bit substitution

Steps for Mp3 Decoder

1. examining the bitstream of header and data frames for spectral components and the side information stored alongside them
2. reconstructing this information to create an audio signal.

Mp3 - Imperciability

1. While certain frequencies may not be distinctly perceptible, their cumulative effect contributes to the overall "presence" and ambience of recorded music. may not be distinctly perceptible, their cumulative effect contributes to the overall "presence" and ambience of recorded music.
2. Joint Stereo - One ear better than the other

Misc Info

* MPEG1 Frame - always 1,152 samples per frame
o 44.1Khz Sampling = 26.12ms (38.46... fps aka frames per second
* FrameSize (bytes) = 144 * BitRate / (SampleRate + Padding)
o 44.1Khz Sampling + 128kbps -> FrameSize = 144 * 128000 / (44100 + 0) = 417.96 bytes

Links

* How MP3 Works: Inside the Codec - http://www.mp3-converter.com/mp3codec/
* For more information on psychoacoustics, read any of the excellent papers on the subject at www.cpl.umn.edu/auditory.htm
* www.mp3-tech.org/
* www.iso.ch.

Human Ear

The human ear is largely insensitive to the location of the source of sounds at the very low and very high ends of the frequency spectrum - therefore the subwoofer can be placed anywhere.

Peak Sensitivity ~ 2K to 4K
90dB dynamic Range

Friday, April 07, 2006

An efficient method of Huffman decoding for MPEG-2 AAC and its performance analysis

Jae-Sik Lee Jong-Hoon Jeong Tae-Gyu Chang
Sch. of Electr. & Electron. Eng., Chung-Ang Univ. Seoul, South Korea;

This paper appears in: Speech and Audio Processing, IEEE Transactions on
Publication Date: Nov. 2005
Volume: 13, Issue: 6
On page(s): 1206- 1209
ISSN: 1063-6676
INSPEC Accession Number: 8622572
Digital Object Identifier: 10.1109/TSA.2005.852989
Posted online: 2005-10-17 08:45:36.0

Abstract
This paper presents a new method for Huffman decoding specially designed for the MPEG-2 AAC audio. The method significantly enhances the processing efficiency of the conventional Huffman decoding realized with the ordinary binary tree search method. A data structure of one-dimensional array is newly designed based on the numerical interpretation of the incoming bit stream and its utilization for the offset oriented nodes allocation. The Huffman tree implemented with the proposed data structure allows the direct computation of the branching location, eliminating the need for the pipeline-violating "compare and jump" instructions. The experimental results show the average performance enhancement of 67% and 285%, compared to those of the conventional binary tree search method and the sequential search method, respectively. The proposed method also shows slightly better processing efficiency, while requiring much less memory space, compared even with those up-to-date efficient search methods of Hashemian and its variants.