Hands On: Multimedia Methods for Large Scale Video Analysis

Fall 2012
CS294 Lecture (CCN 26906) -- Room 405 Soda -- Wednesdays and Fridays 1:00-2:30pm
Dr. Gerald Friedland

Everyday, thousands of videos are uploaded into the web creating an ever-growing demand for methods to make them easier to retrieve, search, and index. YouTube alone claims that every minute, 72hours worth of video material is uploaded. Most of these videos consist of consumer-produced, “unconstrained” videos from social media networks, such as YouTube uploads or Flickr content. Since many of these videos are reflecting people's everyday life experience, they constitute a corpus of never before seen scale for empirical research. But what methods can cope with this amount of data? How does one approach the problems in a research setting, i.e. without thousands of compute cores at one's disposal? What are the most pressing research questions? This class will provide a practical perspective on large scale video analysis.

The class consists of lectures and a hands-on component. The lectures provide a practical introduction to multimedia methods for large scale video analysis, i.e. methods that infer content from every possible cue that might exist in a video, including the visual content, the temporal structure, acoustic content, metadata, user comments, etc. Guest speakers players will enrich the class with their experiences.

In order to allow for hands-on experiments on actual data (e.g. to fulfill the requirement of the class project, see below), students enrolled in the class will be given temporary accounts at the International Computer Science Institute, giving access to the compute cluster as well as to two large collections of consumer-produced videos, namely the TrecVID MED corpus (100k videos) and the MediaEval corpus (25k videos+metadata). In addition, every enrolled student will receive at least $100 worth of Amazon EC2 time.


Topics include:
- Acoustic methods for video analysis
- Visual methods for video analysis
- Meta-data and tag-based methods for video analysis
- Information fusion and multimodal integration
- Coping with memory and computational issues
- Crowd sourcing for ground truth annotation

The slides of the lectures are available here:
- 2012-08-24
- 2012-08-29
- 2012-08-31
- 2012-09-05
- 2012-09-07
- 2012-09-12
- 2012-09-14
- 2012-09-19 and 2012-09-19
- 2012-09-21
- 2012-09-26 (team presentations, no slides)
- 2012-09-28
- 2012-10-03
- 2012-10-05
- 2012-10-10
- 2012-10-12
- 2012-10-17 (guest presentation, ICSI)
- MIDTERM EXAM: 2012-10-19
- 2012-10-24 (team presentations, no slides)
- 2012-10-26
- 2012-10-31 (team presentations, no slides)
- 2012-11-02 (ACM Multimedia, no class)
- 2012-11-07 (guest presentation, Stanford)
- 2012-11-09
- 2012-11-24 (guest presentation, SRI)
- 2012-11-16 (guest presentation, YouTube, no slides)
- 2012-11-21 (team presentations, no slides)
- 2012-11-23 (Thanksgiving, no class)
- 2012-11-28 (team presentations, no slides)
- 2012-11-30
- 2012-12-05
- 2012-12-07(guest presentation, TU Delft)

Reading Materials

Supportive materials used for this class consists of contemporary research articles from conferences and journals. Details will be presented in class. In addition, students have early access to the textbook materials “Introduction to Multimedia Computing” by G. Friedland and R. Jain which is going to appear at Cambridge University Press by clicking here.


To pass graded, students must attend regularily, pass the mid-term and realize a project related to the class. Further details will be discussed on the first day of the class.


Dr. Gerald Friedland
International Computer Science Insitute
1947 Center Street, Suite 600
fractor at icsi.berkeley.edu