Everyday, thousands of videos and images are uploaded into the web creating an ever-growing demand for methods to make them easier to retrieve, search, and index. YouTube alone claims that every minute, 100 hours worth of video material is uploaded. Most of these videos consist of consumer-produced, “unconstrained” videos from social media networks, such as YouTube uploads or Flickr content. Since many of these videos are reflecting people's everyday life experience, they constitute a corpus of never before seen scale for empirical research. But what are the methods can cope with this amount of data? How does one approach the problems in a research setting, with or without thousands of compute cores at one's disposal? How does one budget and estimate the time needed to perform such experiments? What rigor has to be employed to make experiments robust in evaluations? What about reproducibility?

It turns out, there are many myth in Machine Learning that this class will dispell. For example, speech recognition does not need the cloud and many computer vision tasks do not need a GPU.


This class will provide a theoretical and practical perspective on the experimental design for machine learning experiments on multimedia data.  

The class consists of lectures and a hands-on component. The lectures provide a theoretical introduction to machine learning design and signal processing for various media types, including the visual content, the temporal structure, acoustic content, metadata, user comments, etc. Moreover, the lectures will discuss contemporary work in the field of multimedia content analysis. Guest speakers will enrich the class with their experiences.

For the hands-on component  students will receive Amazon EC2 credits to implement project discussed with the instructor. Experiments can be performed using the Multimedia Commons infrastructure on the YFCC100M corpus.

The Multimedia Commons. See


1) Application Motivation: Lecture slides

2) Scientific process revisited, deterministic and statistical machine learning and when to prefer which: Lecture slides

3) Fundamentals of Machine Learning: Capacity Lecture slides

4) Fundamentals of Machine Learning: Generalization I Lecture Slides

5) Fundamentals of Signal Processing: Generalization II: Adversarial Examples and the Physics of Noise Lecture Slides

6) Demos: Tensorflow Meter and capacity estimation tools.

7) Guest Talk: Bo Li

8) Perceptual Data Best Practices for Audio I Lecture Slides

9) Perceptual Data Best Practices for Audio II Lecture Slides

10) Perceptual Data Best Practices for Images and Video Lecture Slides

11) Student presentations I

12) Student presentations II

13) Reproducibility, Summary, and Project FAQ (whiteboard)


To pass, students have to attend regularily and realize a project related to the class as outlined below.

Project Requirements

Teams of 1 to 3 students enrolled in the class chose a project to either produce or reproduce a scientific result that entails machine learning on at least two modalities of which at least one is a rich sensory modality. For example: Tags and images, video frames and audio, audio and text, etc... Whether production or reproduction of a result, the result needs to be repeatable by the instructor and other teams in the class. I suggest teams with both graduate and undergraduate student members.

The project needs to:

- Comply with ACM reproducibility guidelines and the ACM SIG Multimedia reproducibility guidelines.

- Create a report on the project that documents all measurements taken in accordance with the Machine Learning Experimental Process overview sheet and as outlined in class. The report can be in written form, in slides form, a video or a mixture of that. Refer to the 10 questions sheet for details.

In this class, we do not care about accuracy as much as we care about generalization and reproducibility. A project will not fail due to low accuracy.

Optional: Submit a paper to ACM Multimedia, ACM ICMR, IEEE MIPR, IEEE ICASSP, or another conference about your excellent results.


EE, CS, and data science MS and PhD students can directly enroll. Undergraduates should contact me for enrollment details. Priority will be given to URAP students of the multimedia group.


The class requires solid programming skills, assumes familiarity with fundamental statistical concepts like the central limit theorem, probability distributions, and information measures. Familiarity with basic signal processing and computer architecture skills are helpful. Furthermore, a team-working attitude and open-mindedness towards interdisciplinary approaches is essential.


The Machine Learning Experimental Design Cheat Sheet helps with the ML fundamentals of the class.

The Machine Learning Experimental Process is an overview of the suggested experimental process.

The 10 questions sheet.

David MacKay's fantastic book (especially Chapter 40) can be consulted for depth.

In general, supportive materials used for this class consists of contemporary research articles from conferences and journals. Details will be presented in class. I humbly recommend my textbook from Cambridge University Press. An overview of research on a large scale video analysis task is given in the Springer book. Also check out our demo on (deep) neural network capacity.

Lectures from the 2012 version of the class (before deep learning) can be accessed here.

Experimental Design for Machine Learning on Multimedia Data

Spring 2019

CS294-082 Lecture (CCN 33112) -- Room 320 Soda -- Fridays 2:00-3:30pm

Prof. Gerald Friedland

Updates: Please pay attention to Piazza.