Slides from my talks

NB: This page has been superceded by the version at Columbia, please go there for more recent talks.

This page points to the slide packs for talks I've given since the start of 1997. They are almost all in Acrobat PDF format, derived from my FrameMaker originals. Many of them have links to sound files, but I'm afraid they won't work (except where noted).

Recognition and Organization of Speech and Audio
(hosted by Brian Whitman at NEC Research Institute, Princeton NJ, 2001-08-16)
Latest version of my "get to know me" talk with more emphasis on the non-speech-recognition aspects of LabROSA's work.

Computational Models of Auditory Organization
(At the EU Advanced Course in Computational Neuroscience, Abdus Salam Center for Theoretical Physics, Trieste, Italy, 2001-08-09)
Also available as a ZIP file including all the linked sound examples.
A talk on high level auditory perception and efforts to model it. The audience were experts in neural system modeling, but the talk barely touches that field - a case of "and now for something completely different..."

Recognition and Organization of Speech and Audio
(hosted by Jont Allen at AT&T Shannon Labs, Florham Park NJ, 2001-06-21)
Updated version of talk introducing my new lab at Columbia and some of the current and planned projects. Includes slides on new projects such as multisource decoding, lyrics recognition and acoustic detection of meeting participant motion.

Tandem modeling investigations
(at the Respite project meeting, Saillon, Switzerland, 2001-01-25)
Brief slide pack updating my colleagues on the Respite project of recent work by myself and collaborators on Tandem acoustic modeling.

Recognition and Organization of Speech and Audio
(at Chin Lee's group, Lucent, Murray Hill, 2001-01-12)
A slightly improved version of my LabROSA introductory talk, including detail on the Tandem modeling approach. (I reused these slides for a talk at IDIAP in Martigny, Switzerland on 2001-02-01.)

Recognition and Organization of Speech and Audio
(at the Columbia Univ. EE dept., 2000-10-13)
This is a combined tutorial and seminar on the speech and audio processing themes that I will be researching in my new lab, dubbed LabROSA.

RESPITE: Tandem & multistream research
(at the Sphear/RESPITE research workshop, Mons, Belgium, 2000-09-16)
Review of work performed at ICSI (and OGI, CMU, Columbia...) in the preceding 6 months relevant to the RESPITE multistream speech recognition theme. Covers the latest experiments with Tandem modeling for the large-vocabulary SPINE task, as well as online normalization and foreign languages. Also mentions Barry Chen's work on multistream mixtures-of-experts, and Mike Shire's work on multicondition feature design.

Tandem Acoustic Modeling: Neural nets for mainstream ASR?
(at Sheffield University Speech & Hearing Group, 2000-06-20, then at ICSI Real lunch, 2000-06-27)
A discussion of the 'Tandem modeling' approach (feeding neural network outputs as features into HTK to do better than either approach alone). This is based on my ICASSP-2000 poster on the same topic, but has some new figures, partly in response to comments received during the poster session.

Improved recognition by combining different features and different systems
(at AVIOS 2000, San Jose, 2000-05-24)
This was meant to be a relatively general-interest talk on the various ways that speech recognition can be improved by combining different approaches to the same problems. AVIOS (the American Voice Input Output Society) is a very applications-focussed conference.

Content-based analysis and indexing for speech, sound & multimedia
(at the ICSI Real Lunch meeting, 2000-04-04)
I hadn't given a talk about my own work to my own group for a long time, so this was meant as an overview of the things I have been thinking about for the past year or so, and the direction in which I plan to go. Specifically: applying information retrieval to multimedia content, particularly sound mixtures that are broken up into objects using computational auditory scene analysis.

Speech Interfaces
(at the Human Centered Computing retreat, UCB, 2000-02-24)
John Canny of Berkeley CS has been organizing an initiative in Human Centered Computing -- roughly, the intersection of computer science, social sciences and design. This talk was to provide an overview of the state of speech recognition and some current projects at ICSI, emphasizing our highly collaborative nature. (n.b. it opens in full-screen mode - Ctl-L (or something like that) returns you to normal window view).

Sound Content Analysis
(at Shih-Fu Chang's group, Columbia University, 2000-02-08)
I was visiting this group in at Columbia who are working on content-based indexing and retrieval based on image and video cues; it's an obvious match to my interest in audio content-based retrieval. This talk spanned speech recognition, auditory scene analysis, and my ideas for content-based analysis. (Be prepared to hit Ctl-L to get out of full-screen mode).

Jan 2000 European tour review
(at the ICSI real lunch meeting, 2000-02-01)
Before you know it, I'm back to Europe, attending the final meeting of the Thisl project and the end-of-year-one meeting for the Respite project. These slides provide some overview and updating of these meetings as I presented to the rest of the home team; they are based on the slides I used at the meetings.

European tour review
(at the ICSI real lunch meeting, 1999oct05)
This was my brief slide pack reviewing the parts that I found interesting at Eurospeech and the two EU project meetings, as well as updating my colleagues on what we will be doing in those projects.

Thisl update
(at the Thisl meeting, Les Marecottes, Switzerland, 1999sep20)
My second meeting in Switzerland was a brief progress review of the Thisl project on spoken document retrieval. This is a very brief slide pack summarizing work at ICSI since the last meeting in June.

AURORA with a neural net etc.
(at the RESPITE/SPHEAR workhop, Les Marecottes, Switzerland, 1999sep13)
This was a private workshop for participants in the two European projects being managed by Phil Green of Sheffield. I am involved in RESPITE, and this brief talk described the work I've recently been doing on addressing the AURORA noisy digits task with neural net acoustic models, as well as a couple of other multistream- related projects going on at ICSI.

An overview of speech recognition research at ICSI
(at the Tampere University of Technology, 1999sep02)
My second talk at TUT gave a little background to ICSI and the realization group, a brief introduction to connectionist speech recognition, and a lightning tour of some research projects in speech recognition currently happening within the group.

CASA: Principles, practice & applications
(at the Tampere University of Technology, 1999sep01)
As the guest of Anssi Klapuri and the Tampere International Center for Signal Processing, I spent a few days at TUT and gave a couple of talks. This one is intended as an introduction to auditory scene analysis, computational modeling thereof, and some applications - including some speculation about content-based retrieval for nonspeech audio.

ICSI/Thisl progress report
(at the Thisl meeting, Sheffield UK, 1999jun24)
This brief report summarized the work at ICSI on the THISL project since the previous report in February. Specifically, the Thomson NLP parser was integrated into the GUI, we trained an MSG acoustic model on the BBC data, and I reported on some related projects and developments.

European projects update
(at ICSI Real Lunch Meeting, 1999feb11)
On my return from the European trip described below, I gave a lunch talk describing the meetings, and what I and others had said at them.

Current work at ICSI
(BBC R&D, London, and ICP Grenoble, France, 1999feb03-08)
I spent ten days in Europe attending meetings of the THISL and RESPITE projects - EU funded collaborations with European labs and ICSI - and another meeting to discuss a possible future project proposal involving many of the same partners. These slides were the ones I used when presenting our work at these meetings. I called it 'current work at ICSI', but of course it was a very limited subset, just the work related to those projects.

Broadcast News: Features & acoustic modelling
(SPRACH final review meeting, INESC Lisbon, Portugal, 1998dec15)
The 3 year EU collaborative SPRACH project ended in December 1998; our final review meeting was mainly taken up with a description of the system we had collectively submitted to the Broadcast News evaluation. I was describing just ICSI's contribution in the acoustic modelling (modulation-filtered spectrogram features and very large multi-layer perceptron classifiers).

Some aspects of the ICSI 1998 Broadcast News effort
(Part of the BN overview Real Lunch, 1998nov25)
After the crazy rush to fulfill our part in submitting a full LVCSR system to the 1998 NIST/DARPA Broadcast News evaluation, Morgan, Eric, Adam and I gave a lunch talk to the rest of the group to explain what all the fuss had been about. My part was about feature choice, large nets, and some preliminary work on whole-utterance filters (i.e. nonlinear segment normalization) and gender-dependence.

Speech Recognition at ICSI: Broadcast News and Beyond
(at Erv Hafter's Ear Club, UC Berkeley Psychology, 1998sep21)
Erv Hafter runs a seminar series as part of his UCB psychoacoustics group which I agreed to address. In the event, it turned out to be a fairly general talk about the Broadcast News task, our efforts (in conjunction with our European partners) to field a system in this year's evaluations, and other aspects of speech recognition that I thought would interest hearing scientists.

Review of September SPRACH/Thisl meetings
(at the Realization Group Lunch Meeting, ICSI, 1998sep09)
Morgan and I went to another pair of meetings in Cambridge, UK, for these two EU-funded projects. On return, I gave a brief review of what had been discussed and the projects' status at our lunch meeting, using these eight slides.

Auditory Scene Analysis: Phenomena, theories and computational models
(at the NATO Advanced Studies Institute on Computational Hearing, Il Ciocco, Italy, 1998jul11)
NATO has a fund to support scientific meetings with a 'tutorial' aspect. My colleague Steve Greenberg organized this 12 day meeting on hearing which ranged from anatomy and physiology through to speech recognition and auditory organization. I gave a 90 minute talk on auditory scene analysis on the last full day.

SPRACH/ThisL review
(parts of the EC project review meetings, 1998mar24/25, Mons, Belgium)
Projects funded by the European Commission `Framework' program have annual progress review meetings with external reviewers. We are currently subcontractors on two, SPRACH and ThisL, and we had back-to-back reviews for them. These are the few slides I contributed to each day's proceedings covering aspects of the work done at ICSI under these grants, and the single slide I used to summarize the ThisL project when making a trip report to the rest of our group on return.

ICSI Speech Technology
(at Randy Katz's group meeting, UCB CS dept, 1998feb26)
This group on campus is interested in using speech recognition in some demo applications for their work in scalable and mobile networking. I presented this one-page summary of what we do at ICSI and the tools we could share with them.

Visualization tools & demos and the ICSI Realization group
(ICSI Real Lunch, 1998feb12)
One of my projects since being at ICSI has been to encourage and support the proliferation of accessible demos of the research we do. To this end, I've developed a number of specific visualization tools within a Tcl/Tk + extensions framework. This talk served to publicize these tools, and to share my vision of "a demo on every desk".

Automatic audio analysis for content description & indexing
(MPEG-7 Symposium, San Jose, 1998feb04)
MPEG-7 is to be a new standard for the description and indexing of the content of multimedia 'assets' such as video and audio. I was invited to talk about my work in computational auditory scene analysis as one approach to extracting the kind of information that the standard might want to cover. You can learn more about MPEGs 1, 2, 4 and 7 at the MPEG home page.

ICSI/ThisL status report
(IDIAP, Switzerland 1997dec11)
The ThisL project (Thematic Indexing of Spoken Language) had an informal meeting. I went representing ICSI, and I gave a very brief presentation of some relavent work at ICSI: visualization/user-interface tools, recognizing in reverb by combining information at different time scales, and my speech-mixtures stuff.

Problems and future work for ASR-in-CASA
(Stanford Hearing Seminar 1997nov20 / Berkeley Ear Club 1997nov24)
A replacement for the section 5 of the original Mohonk '97 talk, for the extended version I gave of that talk when I got back to California.

On the importance of illusions for artificial listeners
(Haskins 1997oct24 / NUWC Newport 1997oct25)
Forgive the title trying to be cute. This pack (again in Acrobat PDF) comprise slides for two talks I gave while 'out east' for mohonk, basically just introducing my work. One talk was to Haskins Lab in New Haven - a bunch of very serious speech, hearing and language scientists who probably see this stuff as too applied, and the next was to a group of Navy sonar researchers who probably think this hearing modelling is extremely left-field, blue sky stuff. Most slides were the same in both talks, although the voice over differed!

Computational Auditory Scene Analysis exploiting Speech Recognizer knowledge
(Mohonk 1997oct22)
This is the actual presentation I made at the 1997 IEEE Mohonk Audio Workshop. The slides are in Acrobat PDF format - I got sick of translating them to HTML; hope that's OK.

Exploiting ASR in CASA
(ICSI 1997may21)
This was a lunchtime talk I gave to my colleagues at ICSI describing the paper I had submitted to IEEE WASPAA'97 on an idea for integrating a speech recognition engine into a computational auditory scene analysis system that is anticipating a mixture of speech and nonspeech sounds. The first big problem that came up was working the speech recognizer 'backwards' to recover an estimate of the speech spectrum from the recognized phoneme labels; the talk focusses mainly on this aspect.

Digital Audio
(Lego 1997may06)
This was a talk I gave at a mini workshop on digital audio hosted by Lego (the plastic brick people) at their headquarters in Billund, Denmark. They are looking into future generations of computer-based toys, and brought together a collection of researchers from industry and academia to brainstorm about audio in toys. My talk was supposed to provide an introduction and framework, focussing on synthesis.

Divisive issues in Computational Auditory Scene Analysis
(Stanford Hearing Seminar, 1997mar06)
This was a talk I gave at Malcolm Slaney's Stanford Hearing Seminar. It was intended to be a brief overview of research into computational models of auditory scene analysis, focussing on the distinctions between the different projects in this field.

[ Dan's research ] [ Dan Ellis ] [ ICSI Realization group ]

Updated: $Date: 2001/08/09 14:02:11 $
DAn Ellis <>
International Computer Science Institute, Berkeley CA