EECS 225d (3 units)
Audio Signal Processing in Humans and Machines
299 Cory, MWF 2-3
Spring Semester, 2001
Professors Morgan and Gold

The focus of the course is on engineering models for speech and audio processing. These models are used to design systems for analysis, synthesis, and recognition. For each of these topics there will be an emphasis on physiological and psychoacoustic properties of the human auditory and speech generation systems, particularly from an engineer's perspective: how can we make use of knowledge about these natural systems when we design artificial ones?

Topics include: an introduction to pattern recognition; speech coding, synthesis, and recognition; models of speech and music production and perception; signal processing for speech analysis; pitch perception and auditory spectral analysis with applications to speech and music; a historical survey of speech synthesizers from the 18th century to the present; robustness to environmental variation in speech recognizers; vocoders and music synthesizers; statistical speech recognition, including introduction to Hidden Markov Model and Neural Network approaches.

Prerequisites: EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Required text: B. Gold and N. Morgan ``Speech and Audio Signal Processing'', Wiley Press 1999.

Supplementary reading:

EECS 225d Tentative Schedule, Spring 2001

HISTORICAL BACKGROUND
WEEK 1:
1. Overall Introduction (Morgan) Jan 17
2. Brief History of Automatic Speech Recognition (ASR) (Morgan) Jan 19
WEEK 2:
3. ASR Background, continued (Morgan) Jan 22
4. Early History of Synthetic Audio (Gold) Jan 24
5. Speech Analysis/Synthesis Overview (Gold) Jan 26

MATHEMATICAL BACKGROUND
WEEK 3:
6. Speech Recognition Overview (Morgan) Jan 29
7. DSP Refresher (Gold) Jan 31
8. DSP Refresher continuted (Gold) Feb 2
WEEK 4:
9. Pattern Classification (Morgan) Feb 5
10. Statistical Pattern Classification (Morgan) Feb 7
11. Expectation Maximization (EM) (Morgan) Feb 9

ACOUSTICS
WEEK 5:
12. Wave Basics (Gold) Feb 12
13. Speech Production Models (Gold) Feb 14
14. Music Production Models (Gold) Feb 16
WEEK 6:
President's Day Holiday Feb 19

AUDITORY PERCEPTION
15. Room Acoustics (Morgan) Feb 21
16. Ear Physiology (Gold) Feb 23
WEEK 7:
17. Psychoacoustics (Gold) Feb 26
18. Models of Pitch Perception (Gold) Feb 28
19. Models of Speech Perception (Gold) March 2 at 1 PM
19a. Human Speech Recognition (Morgan) Second lecture at 2 PM

SPEECH FEATURES
WEEK 8:
20. The Auditory System as a Filter Bank (Gold) March 5
21. Filter Banks and Cepstral Analysis (Gold) Mar 7
22. LPC for Speech Analysis (Morgan) Mar 9

SYNTHESIS AND CODING
WEEK 9:
23. Speech Synthesis (Gold) Mar 12
24. Pitch Detection of Speech and Music (Gold) Mar 14
25. Channel Vocoders and Predictive Coding (Gold) Mar 16

WEEK 10:
26. MIDTERM (on material from WEEKS 1-8) March 19
27. Low Rate Coding (Gold) Mar 21
28. High Rate Coding (Celp, STC, etc) (Gold) Mar 23

WEEK 11:
Spring Break March 26-30

WEEK 12:
29. Music Synthesis (Gold) Apr 2
30. Audio Transformations (Gold) Apr 4

AUTOMATIC SPEECH RECOGNITION
31. Acoustic Phonetics; a brief introduction (Plauche) Apr 6

WEEK 13:
32. Feature Extraction for ASR (Morgan) Apr 9
33. Deterministic Sequence Recognition (Morgan) Apr 11
34. Statistical Sequence Recognition (Morgan) Apr 13

WEEK 14:
35. HMM Training 1 (Morgan) Apr 16
36. HMM Training 2 (Morgan) Apr 18
37. Probability Estimation (Morgan) Apr 20

WEEK 15:
38. Discriminant Training (Morgan) Apr 23
39. Complete ASR Systems (Morgan) Apr 25
38. MPEG Audio (Lazzaro) Apr 27

WEEK 16:
41. Speaker Verification (Morgan) Apr 30
42. Q and A, Some Current Research Topics (Morgan) May 2
43. Student presentations May 4
WEEK 17:
44. No class (double length class on May 4)



Maintained by:
N. Morgan
morgan@ICSI.Berkeley.EDU
$Date: 2001/01/17 18:05:47 $