Previous Work: Streaming Algorithms for Fundamental Computations in Numerical Linear Algebra

Principal Investigator(s): 
Michael Mahoney

Streaming algorithms that use every input datum once (single-pass) or scan the input a small number of times (multiple passes) are gaining importance due to the increasing volumes of data that are available for business, scientific, and security applications. Performing large-scale data analysis and machine learning often requires addressing numerical linear algebra primitives, such as L2 regression, singular value decompositions, L1 regression, and canonical correlations. This project aims to improve significantly the theory and practice of streaming algorithms for these fundamental linear algebra kernels.

Currently, these computations are performed either using inexact incremental single-pass algorithms, or by expensive multi-pass algorithms. Although existing inexact algorithms often work well enough in practice, the worst-case behavior of applications relying on these building blocks has not been characterized. This is especially troubling in the security and anomaly-detection areas, where a malicious party could conceivably exploit such inexactness. In this project, researchers are developing a set of provably-accurate single-pass algorithms for numerical linear algebra. They are also exploring alternative algorithmic routes, mainly multi-pass randomized algorithms, both for the core problems (L2 regression and SVD) and for the more challenging ones (L1 regression and canonical correlations). They are characterizing the accuracy/performance tradeoffs associated with these computations, where performance refers mostly to the number of passes but also to the total computational effort (including communication). This investigation is being done using benchmarks from significant applications, as well as theoretical lower bounds on single-pass algorithms.

This project is a collaboration between scientists at Tel Aviv University, The Hebrew University, UC Berkeley, and ICSI and is funded by NSF and BSF (The Israel Binational Science Foundation). Funding for ICSI is provided by NSF for travel to Israel in order to faciliate this collaboration.