Predicting the Predictor: Engineering Laws for Sizing Neural Networks

Presented by Gerald Friedland

Wednesday, September 27, 2018
1:00 p.m.
ICSI Lecture Hall

Abstract:

Given a problem, most approaches to machine learning experiments currently involve significant amount of work in hyper parameter tuning. This is especially a problem for very large scale experiments, such as on multimedia or molecular dynamics data. This talk presents a line of research to be able to measure and predict the experimental design for neural networks. This presentation is made of three parts. Based on MacKay's information theoretic model of supervised machine learning~\cite{mackay2003}, I first present four easily applicable engineering rules to analytically determine the capacity of neural network architectures. These enable the comparison of the efficiency of different architectures independently of a task. Second, I introduce and experimentally validate a heuristic method to estimate the neural network capacity requirement for a given dataset and labeling. This allows an estimate of the maximum and expected required capacity of a neural network for a given problem. Third, I outline a generalization process that successively reduces the capacity starting at the estimate and conclude with a discussion on the consequences of sizing a network wrongly.

The measurement and estimation tools and all experiments presented are available on github: https://github.com/fractor/nntailoring
An interactive sizing demo based on Tensorflow is available at: http://tfmeter.icsi.berkeley.edu

References:
- D.J.C. MacKay: "Information Theory, Inference, and Learning Algorithms", Cambridge University Press, 2003.
- G. Friedland, A. Metere, M. Krell: “A Practical Approach to Sizing Neural Networks”, LLNL Technical Report LLNL-TR-758456, to appear on arxiv.