Audio Signal Processing in Humans and Machines

EECS 225d (3 units)
Audio Signal Processing in Humans and Machines
Initial class (Jan. 16): 299 Cory, Tu 3:30-5
Afterwards: ICSI, 1947 Center Street, 6th floor, TuTh 3:30-5
Spring Semester, 2007
Professor Morgan

The focus of the course is on engineering models for speech and audio processing. These models are used to design systems for analysis, synthesis, and recognition. For each of these topics there will be an emphasis on physiological and psychoacoustic properties of the human auditory and speech generation systems, particularly from an engineer's perspective: how can we make use of knowledge about these natural systems when we design artificial ones?

Topics typically include: an introduction to pattern recognition; speech coding, synthesis, and recognition; models of speech production and perception; signal processing for speech analysis; pitch perception and auditory spectral analysis with applications to speech processing; a historical survey of speech synthesizers from the 18th century to the present; robustness to environmental variation in speech recognizers; vocoders; statistical speech recognition, including introduction to Hidden Markov Model and Neural Network approaches.

This year (2007) the course is being modified to include time-frequency analysis and array processing. Speech synthesis and coding are not being covered in any depth although projects in these areas are legitimate for the required work. Prerequisites: EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Required text: B. Gold and N. Morgan ``Speech and Audio Signal Processing'', Wiley Press 1999.

Supplementary reading:

R. Duda, P. Hart, and D. Stork, ``Pattern Classification,'' Wiley Interscience, 2001 edition.
X. Huang, A. Acero, and H. Won, ``Spoken Language Processing,'' Prentice Hall, 2001
J.L. Flanagan, ``Speech Analysis Synthesis and Perception,'' Springer-Verlag, 1972.
B. Moore, ``An Introduction to the Psychology of Hearing,'' Academic Press, 1989.
D. O'Shaughnessy, ``Speech Communication,'' IEEE Press 2000.
L. Rabiner and B.-H. Juang, ``Fundamentals of Speech Recognition,'' Prentice Hall, 1993.

EECS 225d Tentative Schedule, Spring 2007

HISTORICAL BACKGROUND
WEEK 1:
1. Overall Introduction Jan 16 (Morgan)
2. Brief History of Synthetic Audio/speech analysis and synthesis Jan 18 (Morgan)
WEEK 2:
3. Brief History of Automatic Speech Recognition Jan 23 (Morgan)
4. Speech Recognition Overview Jan 25 (Morgan)
WEEK 3:
5. Human Speech Recognition/Intro to Acoustics Jan 30 (Morgan)
6. Room Acoustics Feb 1 (Morgan)

MATHEMATICAL BACKGROUND
WEEK 4:
7. Pattern Classification Feb 6 (Gastpar)
8. Statistical Pattern Classification Feb 8 (Gastpar)
WEEK 5:
9. Time-frequency/wavelets Feb 13 (Gastpar)
10. Array Processing Feb 15 (Gastpar)

ENGINEERING APPLICATIONS
WEEK 6:
11. Musical Topics in Audio (Lazzaro) Feb 20
12. Microphone Array Processing Feb 22 (M. Seltzer, Microsoft)
WEEK 7:
13. Filter Bank, Cepstral and LPC Analysis Feb 27 (Morgan)
14. Feature Extraction for ASR March 1 (Morgan)
WEEK 8:
15. Deterministic Sequence Recognition March 6 (Morgan)
16. Statistical Sequence Recognition Mar 8 (Mirghafori)
WEEK 9:
17. Statistical Model Training Mar 13 (Mirghafori)
18. Linguistic Categories for ASR Mar 15 (Johnson)
WEEK 10:
19. Complete ASR Systems Mar 20 (Mirghafori)
20. Speaker Verification March 22 (Mirghafori)
WEEK 11:
Spring Break March 26-30
WEEK 12:
21. Speech Synthesis Apr 3(K. Silverman, Apple)
22. Discriminant Acoustic Probability Estimation Apr 5 (Morgan)
WEEK 13:
23. Speech Understanding Apr 10 (Hakkani-Tur)
24. Spoken Dialog Systems Apr 12 (Hakkani-Tur)

MAMMALIAN PROCESSING
WEEK 14:
25. Auditory Pathway (chapter 14) Apr 17 (Ghitza)
26. Psychophysics (chapter 15)Apr 19 (Ghitza)
WEEK 15:
27. Pitch Perception (chapter 16) Apr 24 (Ghitza)
28. Speech Perception (chapter 17) Apr 26 (Ghitza)
WEEK 16:
29. Auditory Processing in Songbirds May 1 (Gastpar)
30. More about ancient speech synthesis methods May 3 (Ohala)
WEEK 17:
31. Student presentations (double session, starting at 2) May 8

Maintained by:
N. Morgan
morgan@ICSI.Berkeley.EDU
$Date: 2007/1/7 14:00:00 $