Previous Work: Local Algorithms for Large Informatics Graphs

Principal Investigator(s): 
Michael Mahoney

A serious problem with many existing machine learning and data analysis tools in the complex networks area is that they are often very brittle and/or do not scale well to larger networks. As a consequence, analysts often develop intuition on small networks, with 102 or 103 nodes, and then try to apply these methods on larger networks, with 105 or 107 or more nodes. Larger networks, however, often have very different static and dynamic properties than smaller networks. These properties are often deeply counterintuitive and surprisingly subtle to identify and test, but they often have consequences that are very practically relevant.

At root, the problem is that existing algorithmic/statistical methods fail to capture properly the "coupling" between very local structure and very global structure in these networks. For example, local structure might be the properties of ego networks in large social graphs, and global structure might be the diameter and other large-scale properties of the entire network. These failings clearly indicate that the metrics that are intuitive to people, based on their experience with small-scale networks, and that are used to control inference or obtain domain-speci c insight, are often simply inappropriate for very large-scale graphs. Relatedly, existing algorithmic/statistical tools make local-global assumptions that are strongly violated by realistic networks. This has very detrimental consequences for our ability to extract domain insight from algorithmic tools applied to these networks.

ICSI researchers will develop and implement existing and novel local and locally-biased algorithms on large networks to evaluate how those algorithms perform for downstream tasks of interest on realistic social/information networks to develop insights for use in downstream network analysis. In particular, primary research directions that will be considered include the following. 

  • Local algorithms for other network problems: these algorithms can be applied to a small part of a very large graph, and they can come with improved algorithmic and/or statistical guarantees.
  • Implementations on large-scale graphs: the behavior of these local algorithms depends strongly on implementation details, for graphs consisting of hundreds of thousands up to hundreds of millions of nodes. 
  • Relationship with meaningful metrics of domain interest: these algorithms often have implicit embedding properties that can meaningfully be interpreted in terms of metrics of interest to the domain scientist.
  • Intermediate-scale structure: as the size of the local region grows, these algorithms can be used to characterize intermediate-scale structure, e.g., whether a small contagion will propagate to the entire graph. 
  • Algorithmic-statistical tradeoffs: the improved algorithmic/statistical guarantees of these algorithms are sometimes synergistic and sometimes not, and characterizing these tradeoffs is essential for their use.

The research work will focus on these methodological developments with an eye toward solving important practical problems such as scalable inference on graphs, understanding and controlling contagion and viral propagation on networks, and developing tools that are useful to the domain scientist at extracting interpretable and actionable insight from noisy and complex networked data.

Funded by ARO.