NATO Advanced Study Institute

Computational Hearing

July 1 - July 12, 1998
Il Ciocco (Tuscany), Italy

Lecture Summaries and Outlines

Anatomy of the Peripheral Auditory Pathway

Marianne Vater

The cochlea performs the first important steps in the filtering operations underlying feature detection in the auditory pathway. Its functional organization is optimized (i) for sensitive and sharply tuned hydromechanical frequency analysis, (ii) to capture timing information, and (iii) to convey sensory information in topographically ordered channels to the central auditory pathway. An understanding of cochlear function requires knowledge of cochlear structure. This session will adress the structural basis of mechanical cochlear filtering operations, and describe the interface with the central auditory pathway: the auditory nerve and the descending (efferent) pathways. The description will focus on the mammalian cochlea. It will highlight the basic functional-anatomical design common to all mammals without neglecting important species differences that have evolved as specializations for specific computational tasks in adaptation to different bioacoustic environments. Particular attention is payed to those structural features that are relevant for cochlear modelling.

Selected topics:

Overview of the three-dimensional architecture of the cochlea Receptor cell arrangements and ultrastructure with particular emphasis on mechanoelectrical transduction, reverse transduction, and the role of outer hair cells as local, active mechanical elements. Functional organization of extrasensory components: basilar membrane, tectorial membrane, supporting cells (ultrastructure, geometry and material properties). Longitudinal anatomical gradients in relation to the cochlear frequency map. Organization of cochlear output systems. Descending olivocochlear systems.

Anatomy of the Central Auditory Pathway

Jeffery A. Winer

The aim of this presentation is to provide an introduction to summary of the major features of the central auditory system. The exposition will consider the biological basis of hearing from the cochlear nucleus to the cerebral cortex from an anatomical and physiological perspective. The goal is to introduce the principal nuclei and parts of the auditory system in a systematic way, and to compare their principles of organization. The exposition aims to make the auditory system accessible to the nonspecialist and interesting and provocative to those familiar with the main issues.

The emphasis of the presentation is explicitly functional and it will treat the auditory pathway as six interrelated parts, each with common as well as unique features. The parts are:

Consideration of each of these structures will begin with a brief summary of their anatomical organization. This will include an account of the types of neurons present and it will lead to a brief analysis of representative connections. The treatment of connections will be used as a basis for characterizing one or more circuits in the structure. The discussion of circuits will lead to the central issue in the exposition: relating structural arrangements to physiological results. The physiological perspective will be enhanced with a discussion of the resident neurotransmitters as well as species differences. Finally, the effects of damage on behavioral and perceptual processes will be integrated into the presentation to arrive at a larger picture of how the structure contributes more globally to auditory performance in the intact animal.

The second part of the presentation and the ensuing discussion will concentrate on thematic issues of larger significance. For example, which auditory nuclei are conserved in phylogeny? To what degree is the auditory "system" a single system or is it really several interacting neural networks? What is the significance of multiple overlapping functional representations in a nucleus or cortical area? How valid is it to use tonotopic organization as a primary feature when many parts of the auditory system have little systematic representation of frequency? Is it reasonable to speak of functions as resident to a particular part of the auditory system, or are they more widely distributed? Which features distinguish the human auditory system from that in other species? Do any principles of central auditory system organization have parallels with patterns of processing in the visual and somatic sensory system?

Mechanics and Biophysics of the Inner Ear

Jont Allen

1) Linear cochlear models
2) Micromechanical models
3) Experiments with Matlab
4) nonlinear models
5) Experiments with Matlab
Physiology of the Auditory Periphery

Ted Evans


Encoding of stimulus parameters of simple and complex stimuli at cochlear nerve level, illustrated with a quasi-real-time LabVIEW model of the cochlea and cochlear nerve. (This is available on our ftp site as indicated below). Relevance of cochlear filtering for psychophysical frequency selectivity and comparisons between phyiological and psychophysical measures. Consequences for auditory processing of physiological vulnerability of cochlear tuning. Pharmacological dissection of inhibition in the cochlear nucleus and its relevance for determining cochlear nucleus response properties.

Practical Session

This will be devoted to study of and building with, LabVIEW models of the cochlear nerve, cochlear nucleus and superior olive. The models run sufficiently in real time nd allow stimuli and reponses to be manipulated and perceived over a sound card. The construction of the cochlear nerve and cochlear nucleus will be examined, and the contribution of each stage to the final response. Then, experiments will be performed on the models, examining the effects of changing stimulus parameters for simple and complex stimuli. Finally, as time permits, students will be encouraged to build their own models from the provided ìbuilding blocksî

Physiology of the Auditory Brainstem

Eric D. Young
The auditory system differs from other sensory systems by having an elaborate and complex series of nuclei in the brainstem and midbrain. This system consists of several parallel pathways that add at least one synapse to the path from primary nucleus (cochlear nucleus) to thalamus (medial geniculate); some pathways through the brainstem add two or three synapses. One role of these extra synapses is carrying out the interaural calculations necessary for binaural sound-localization processing, but there are probably additional roles as well. In this session, the neural organization of the brainstem auditory system will be discussed using a framework of neural modeling.

The brainstem auditory system, particularly the cochlear nucleus, is one of the most intensively studied parts of the brain. The cochlear nucleus contains a heterogeneous variety of cell types which integrate incoming information in the auditory nerve in a variety of different ways. Because the cochlear nucleus has been studied so thoroughly, a great many details of the anatomical organization and electrophysiological properties of its cells are known. This knowledge allows the physiological principles underlying synaptic integration in the cochlear nucleus to be understood in both qualitative and quantitative terms. The response characteristics of cochlear nucleus cells are also well-studied and particular physiological response types have been identified as deriving from known anatomical cell types. This understanding has allowed information to be combined across several levels of analysis to achieve an understanding of the acoustic response properties of various cochlear nucleus cell types in terms of the anatomical and electrophysiological properties of the underlying neurons.

The presentation will use three well-studied principal-cell systems of the cochlear nucleus as examples of neural integration: multipolar cells and bushy cells from the ventral cochlear nucleus (VCN) and pyramidal cells of the dorsal cochlear nucleus (DCN). The presentation will build on information about anatomy and physiology provided in previous sessions (Winer and Evans) to provide a complete picture of the organization of these systems. In each case, a model of the cell will be constructed based on its anatomical and membrane-level properties. The synaptic and voltage-gated channels that are present in the cells will be described and Hodgkin-Huxley style models will be used to show how the cells' responses to sound in-vivo can be understood in terms of the lower-level properties. In the case of pyramidal cells, the discussion will be extended to show how the interneuronal network in which the cells are embedded helps in generating their response properties. Network models of the DCN will be presented to demonstrate how the complex nonlinear properties of its cells arise from a two-interneuron circuit.

The importance of the various circuits for auditory information processing will be discussed based on 1) the representations of sound that they provide and 2) the higher auditory centers to which they project. For example, in the case of the bushy cells, their projection to the superior olivary complex suggests that their role in hearing is to support binaural processing, and their response properties can be understood on this basis. In addition, bushy cell models can be used to understand and model the properties of cells in the medial superior olivary nucleus.

All of these systems converge in the inferior colliculus, which is the input to the auditory thalamocortical system. The colliculus constructs the lower auditory representation that is presented to the thalamus and cortex in the same way as the retina constructs the initial visual image and the dorsal column nuclei construct the initial somatosensory image. The nature of the convergence of lower systems in the colliculus is still only partially understood and the properties of the representation of the acoustic environment in the colliculus have been studied mainly with reference to binaural hearing and a few other special cases. The nature of the acoustic representation in the colliculus will be briefly discussed, mainly to promote discussion as a lead-in to later sessions on the thalamus and cortex (Schreiner).

Specific topics

Introduction - presentation of the anatomy of the brainstem auditory system focusing on function. The parallel systems of the cochlear nucleus, their cell types, their auditory nerve inputs, and their projection sites. This discussion will specialize and augment the presentation by Winer as needed to support the material in the rest of the session.

Multipolar cells - structure and membrane properties of T-multipolars: synaptic inputs on the dendrites, mainly delayed-rectifier K+ channels. Default model: Hodgkin-Huxley action potential generator with dendritic tree. Implications for responses to sound of 1) a regular-firing action-potential generator; 2) filtering in the dendritic tree; 3) placement of inputs from different spontaneous rate groups. The multipolar cell (chopper neuron) as a robust rate encoder.

Bushy cells - structure and membrane properties of spherical and globular bushy cells. Importance of the low-threshold potassium channel for synaptic integration. Model: Hodgkin-Huxley action potential generator with low-threshold K+ channel. Implications of the large synapses from auditory nerve fibers and bushy-cell electrophysiological properties for encoding of auditory information. The problems of regularity and phase-locking in bushy-cell models.

If time permits, the similarities of the bushy cells and their major target neurons in the superior olivary complex will be discussed and the applicability of the bushy cell model to interaural-time-difference sensitivity in the medial superior olive will be discussed.

The dorsal cochlear nucleus - structure and membrane properties of pyramidal cells; the transient K+ channel and pauser/buildup behavior. Contrasting properties of DCN interneurons: vertical and cartwheel cells and perhaps also VCN D-multipolars. Details of the circuitry of the DCN; the two-interneuron circuit model as an explanation for the complex nonlinear acoustic response properties of DCN principal cells. The additional circuitry in the superficial DCN and the implications of the non-auditory inputs that reach the principal cells via that route will be discussed if time permits.

Organization of the sessions. Given the amount of material, it is likely that the sessions will be organized as follows:

Introductory materials. The following review chapters will be serve as background for this session.
Physiology of Auditory Thalamus and Cortex

Christoph Schreiner

1. Spectral Receptive field properties 2. Temporal receptive field properties 3. Spectral-temporal receptive fields

4. Spatial organization of receptive field properties

5. Consequences of receptive field distributions in AI for simple signals 6. Effects of background noise on cortical receptive fields

7. Intrinsic connections in AI relate to functional organization

8. Comparison to other auditory fields

9. Effects of representational plasticity on AI

Frequency Analysis and Loudness

Jont Allen

1) History of modern psychophysics 2) Intensity and frequency JNDs

3) Loudness

4) Masking as loudness uncertanty (noise)

5) Experiments with Matlab Integrating Perception and Physiology
Roy Patterson


For years, it has been common to hear psychoacousticians and physiologists argue about the sorts of models that should be used in hearing research. The psychoacousticians typically argue that any model which explains the data of an experiment is as good as any other, and they have been happy to use models in which squaring and autocorrelation are applied directly to stimulus waves. To physiologists this is anathema; models of hearing should include the processes that we know occur, and in the order that they occur. Their models, however, are usually restricted to the cochlea or a single nerve cell in the brain stem.

As we approach the 21st century, it would be nice to think that we might begin to draw these two poles together and construct a reasonably comprehensive, multi-channel, auditory model based on physiological data that is able to explain a substantial range of perceptual data associated with peripheral auditory processing. I will describe recent attempts to assemble such models with emphasis on the most important constraints at each stage of processing, and constraints on the order of the processes.

Malcolm Slaney


Pitch is defined as "that attribute of auditory sensation in terms of which sounds may be ordered on a musical scale." This definition is based on perception, not vocal production or mathematical properties of the signal. Sounds simple, but how is it measured? How do humans hear pitch? Why is this an important question?

Pitch is a simple attribute of many sounds. Yet it has been the center of arguments by acousticians for many decades. Over the years it has been a fertile area for computational models, with many detailed predictions. This lecture will summarize what is known about the perception of pitch and describe many styles of computational models.

Pitch is important for three reasons. First, it is simple to measure people's perceptions, Secondly, it is important for speech and music perception. Finally, it appears to be an important grouping mechanism for auditory object formation and sound separation. It important to build good models of pitch perception to verify our knowledge of the neurophysiology and psychoacousitcs, and to build the next level of models of perception on top of a good pitch model.

Models of psychoacoustic pitch are described as spectral or temporal. Spectral models of pitch assume that the cochlea provides a detailed description of the spectral content of the signal--the pitch of the signal can be read from the peaks of the signal's frequency content. Temporal models of pitch, on the other hand, note that periodicities in the signal are important. Pitch is easily modeled by some sort of temporal correlation, where the periodicity with the most energy determines the pitch interval. Both types of models will be discussed in this lecture.

DETAILED OVERVIEW The lecture will be accompanied with audio demos and many examples of computational models of pitch.

Temporal Analysis and Periodicity Processing

Gerald Langner

Periodicity is a temporal feature of an important class of acoustic signal characterized by fast periodic modulations with a rate defined by the fundamental frequency. They elicit the perception of the so- called periodicity pitch which is independent of their waveform or frequency composition, and remains the same as long as the signal period is the same. Typically, the frequency range of harmonic sounds is sufficiently broad to activate different frequency channels of the auditory system. By analyzing the periodicity and comparing the result over different frequency channels a neuronal correlation analysis may provide a solution for the auditory binding problem and an explanation for the so-called 'cocktail-party effect' .As a result of coincidence detection temporal information is transferred into a rate code and periodicity is represented in the IC topographically orthogonal to the tonotopic organization. Evidence for a topographic representation of periodicity information was found in the auditory midbrain and cortex of various animals. Magnetoencephalographic recordings show that periodicity pitch and frequency are arranged orthogonally also in the human auditory cortex.

Temporal Representation of Periodicity

Chopper and pauser neurons in the cochlear nucleus are known to project directly to the inferior colliculus (IC) and their projection areas in the frequency planes do at least partly overlap. For simulations in a computer model it is assumed that coincidence of oscillator and reducer responses may explain why neurons in the IC respond best to a particular modulation frequency. According to the theory, coincidence neurons in the IC will respond best when the difference of delays at their input are equal to (i.e. compensated by) the period of the signal.

Spatial Representation of Periodicity

A three-dimensional reconstruction of center frequencies (CF) obtained in many parallel electrode tracks revealed a fine-structure of tonotopic organization in "frequency-band laminae". In each lamina, CF increased over a small range of frequencies orthogonal to the main frequency gradient of the IC. In addition, best modulation frequencies (BMF) increased along an isofrequency-line with the highest BMFs located at the lateral border of the IC. Evidence for a topographic representation of periodicity information was also found in the auditory midbrains of guinea fowl, chinchilla , and gerbils. In the forebrain it was first demonstrated in mynah bird that envelope periodicity is represented roughly orthogonal to the frequency gradient. Further evidence comes from optical recording in the cat auditory cortex. Using a neuromagnetometer it was possible to study the response of the cortex of human subjects to harmonic and pure tone sounds. It was found that periodicity pitch and frequency are arranged orthogonally - in accordance with the topographic arrangements found in the auditory systems of animals.

Auditory Processing of Speech

Steven Greenberg and Hynek Hermansky Peripheral Auditory Representations of Speech

Eric Young How to connect this to models?

Central Auditory Representations of Speech and Complex Signals

Christoph Schreiner and Shihab Shamma

1. Representation of animal vocalizations in AI (cat and monkey)

1.1 single units

1.2 spatio-temporally distributed representation

1.3 effects of modification of carrier content and bandwidth

1.4 effects of modification of temporal envelope content

2. Representation of speech-sounds in AI

2.1 level dependence

2.2 back-ground noise dependence

2.3 behavioral training effects

Auditory Scene Analysis

Phil Green and Dan Ellis

Part I: The basic function of auditory organization (Dan Ellis)

1. Introduction to auditory organization & its modeling

- the problem confronting hearing systems

- descriptive levels for perceptual systems (Marr)

- an overview of auditory organization: principles & cues

2. A brief history of the psychology and modelling of auditory organization

- recognizing the problem: Cherry, Bregman, Warren

- psychological experiments: paradigms and results (Bregman, Darwin ...)

- computational modelling: paradigms and results (Parsons, Weintraub, Cooke)

3. Streaming and fusion

- Bregman & Pinker and its implications

- basic cues to fusion: onset, harmonicity, spatial location

- basic cues to streaming: proximity in time/frequency; van Noorden

- models of streaming

4. The double vowel paradigm

- basic perceptual results: Scheffers, Lea

- modeling the basic results: Assmann/Summ, Meddis/Hewitt, de Cheveigne

- complicating factors: beats, vowels vs. sentences

5. The role of expectations

- inference & illusions in hearing: continuity, restoration, competition

- models of top-down influence: blackboards, agents, probabilistic inference

Matlab demos:

- Generating a variety of basic stimuli (streaming, fusion, mistuning,

double vowels) so that users can experience different effects.

- Demonstrations of some simple streaming, double-vowel and inference


6. Neural substrate for auditory grouping

The binding problem and its auditory version

The Correlation theory of Von Der Malsburg.

Neurophysiological evidence in perception.

Neural Oscillator models: e.g. Brown & Cooke (streaming); Brown & Wang (double vowels)


C. von der Malsburg, & W. Schneider (1986), "A neural cocktail-party processor", Biol. Cybern., 54, pp.29-40.

G.J. Brown and M.P. Cooke (1996) "Temporal synchronisation in a neural oscillator model of primitive auditory stream segregation". Readings in Computational Auditory Scene Analysis, Edited by H. Okuno and D. Rosenthal, Lawrence Erlbaum, in press.

G.J. Brown, M.P. Cooke & E. Mousset (1996), "Are neural oscillations the substrate of auditory grouping? ESCA Tutorial and Workshop on the Auditory Basis of Speech Perception, Keele University, July 15-19.

G.J. Brown and D.L. Wang (1997) Modelling the perceptual segregation of double vowels with a network of neural oscillators. Neural Networks, 10 (9), pp. 1547-1558.

Matlab demos: neural oscillators in action. See

7. Auditory organisation and speech perception

Recognition of distorted speech and the missing data hypothesis

Hands-on Matlab demos: distorted speech.


M.P. Cooke and P.D. Green, "Missing Data and Masked Data in Speech Perception and Robust Automatic Speech Recognition", to appear in Listening to Speech', S.Greenberg snd W. A, Ainsworth (eds).

Lippmann, R.P. & Carlson, B.A. (1997), "Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise", Proc. Eurospeech'97, pp.37-40.

See html. ml

8. Application to robust Automatic Speech Recognition

Formulating the missing data problem for speech recognition

Adapting Hidden Markov Model Recognisers for missing data

Results from robust ASR


M.P. Cooke, A.C. Morris and P.D. Green (1997), "Missing data techniques for robust speech recognition" , proc. ICASSP-97 .

A.C. Morris, M.P.Cooke and P.D. Green (1998), "Some solutions to the missing feature problem in data classification, with application to noise-robust ASR". ICASSP-98, Seattle.

Lippmann, R.P. & Carlson, B.A. (1997), "Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise", Proc. Eurospeech'97, pp.37-40.

Hands-on Matlab demos: missing data ASR. See