Audio Signal Processing in Humans and Machines

EECS 225d (3 units)
Audio Signal Processing in Humans and Machines
Initial class (Jan. 19): 299 Cory, W 3:30-5
Afterwards: ICSI, 1947 Center Street, 6th floor, MW 3:30-5
Spring Semester, 2005
Professor Morgan

The focus of the course is on engineering models for speech and audio processing. These models are used to design systems for analysis, synthesis, and recognition. For each of these topics there will be an emphasis on physiological and psychoacoustic properties of the human auditory and speech generation systems, particularly from an engineer's perspective: how can we make use of knowledge about these natural systems when we design artificial ones?

Topics include: an introduction to pattern recognition; speech coding, synthesis, and recognition; models of speech production and perception; signal processing for speech analysis; pitch perception and auditory spectral analysis with applications to speech processing; a historical survey of speech synthesizers from the 18th century to the present; robustness to environmental variation in speech recognizers; vocoders; statistical speech recognition, including introduction to Hidden Markov Model and Neural Network approaches.

Prerequisites: EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Required text: B. Gold and N. Morgan ``Speech and Audio Signal Processing'', Wiley Press 1999.

Supplementary reading:

R. Duda, P. Hart, and D. Stork, ``Pattern Classification,'' Wiley Interscience, 2001 edition.
X. Huang, A. Acero, and H. Won, ``Spoken Language Processing,'' Prentice Hall, 2001
J.L. Flanagan, ``Speech Analysis Synthesis and Perception,'' Springer-Verlag, 1972.
B. Moore, ``An Introduction to the Psychology of Hearing,'' Academic Press, 1989.
D. O'Shaughnessy, ``Speech Communication,'' IEEE Press 2000.
L. Rabiner and B.-H. Juang, ``Fundamentals of Speech Recognition,'' Prentice Hall, 1993.

EECS 225d Tentative Schedule, Spring 2005 [NOTE: THIS HAS BEEN UPDATED AS OF JAN 15]

HISTORICAL BACKGROUND
WEEK 1:
1. Overall Introduction Jan 19
WEEK 2:
2. Brief History of Synthetic Audio Jan 24
3. Introduction to Speech Analysis and Synthesis Jan 26
WEEK 3:
4. Brief History of Automatic Speech Recognition Jan 31
5. Speech Recognition Overview; Quick DSP Review Feb 2

MATHEMATICAL BACKGROUND
WEEK 4:
6. Pattern Classification Feb 7
7. Statistical Pattern Classification Feb 9
WEEK 5:
8. Basic Acoustics Feb 14
9. Filter Bank, Cepstral and LPC Analysis Feb 16

ENGINEERING APPLICATIONS
WEEK 6:
President's Day Holiday Feb 21
10. Feature Extraction I Feb 23 (Hermansky)
WEEK 7:
11. Feature Extraction II Feb 28
12. Deterministic Sequence Recognition Mar 2
WEEK 8:
13. Statistical Sequence Recognition Mar 7 (Mirghafori)
14. Statistical Model Training Mar 9 (Mirghafori)
WEEK 9:
15. Complete ASR Systems Mar 14 (Mirghafori)
16. Linguistic Categories for ASR Mar 16(Ohala and Wooters)
WEEK 10:
Spring Break March 21-25
WEEK 11:
17. Speaker Adaptation and other Transformations Mar 28 (Peskin)
18. Speaker Verification Mar 30 (Mirghafori)
WEEK 12:
19. Discriminant Acoustic Probability Estimation Apr 4
20. Speech Synthesis Apr 6 (Silverman)
WEEK 13:
21. Speech recognition on Cell Phones (Cohen, CTO of VoiceSignal) Apr 11
22. Speech Coding Apr 13

HUMAN PROCESSING
WEEK 14:
23. Auditory Pathway (chapter 14) Apr 18 (Ghitza)
24. Psychophysics (chapter 15)Apr 20 (Ghitza)
WEEK 15:
25. Pitch Perception (chapter 16) Apr 25 (Ghitza)
26. Speech Perception (chapter 17) Apr 27 (Ghitza)
WEEK 16:
27. Audio Compression - John Strawn - May 2
28. TBD May 4
WEEK 17:
29. Student presentations (double session, starting at 2) May 9

Maintained by:
N. Morgan
morgan@ICSI.Berkeley.EDU
$Date: 2005/1/15 16:21:47 $