New York Tech Journal
Tech news from the Big Apple

Once upon a #Graph

Posted on April 21st, 2016

#ColumbiaDataScienceInstitute

04/20/2016 @ Davis Auditorium, Columbia U., NY

20160420_171328[1]

Jennifer Chayes @Microsoft talked about research characterizing large #networks. The is especially relevant given the apparently non-random friendship patterns on #Facebook with stars having exceptionally large numbers of connects and clusters based on geography of socioeconomic status, etc. We assume that it is unlikely that a totally randomly generated graph would have a similar structure. If it’s not random, then how do we characterize the network? Since the networks are extremely large, many of the classical measures of network structure such as maximum distances, etc. need to viewed for the POV of their limit properties as the network grows very large over time.

One key concept is the stochastic #block models used to describe social structure in groups (Holland & Leinhardt). Here the points in the network are divided into k species with different propensities to link to members in their own group and with members in each other group.

She reviewed work done on dense graphs and finite graphs, highlighting the cut norm (Frieze-Kannan) which characterizes the network structure by the number of edges that need to be cut to divide the network into two parts, considering all possible permutations of the points in the network. By estimating a parameter W one can characterize the network as its size increases to a limit.

Social networks, however, are sparse. This affects the estimation of W since W converges to zero as a sparse network increases to a limit. To get around this problem, Jennifer proposed two methods to weight the edges by the network density. The measures produce different estimates, but they converge to non-zero values for sparse matrices when joined with extend stochastic block models.

The last question she explored by what statistical information can be released without violating privacy. The key concept is having statistics in which the deletion of any point (individual) does not change the network statistics by more than some epsilon.

 

posted in:  ColumbiaDataScienceInstitute, data, Personal Data    / leave comments:   No comments yet