Audio-Based Multimedia Event Classification With Neural Networks

Benjamin ElizaldeBenjamin Elizalde

ICSI

Thursday, July 30, 2015
4:00 p.m., ICSI Lecture Hall

Multimedia Event Detection aims to identify events, also called scenes, in videos, such as a flash mob or a wedding ceremony. Audio content information complements cues such as visual content and text. In this talk, we explore the optimization of neural networks (NNs) for audio-based multimedia event classification, and discuss some insights towards more effectively using this paradigm for MED. We explore different architectures, in terms of number of layers and number of neurons. We also assess the performance impact of pre-training with Restricted Boltzmann Machines (RBMs) in contrast with random initialization, and explore the effect of varying the context window for the input to the NNs. We used the publicly available event-annotated YLI-MED dataset.