New York Tech Journal
Tech news from the Big Apple

#Blockstack: an introduction

Posted on May 4th, 2016

#BlockstackNYC

05/04/2016 @ AWS popup loft, 350 West Broadway, NY

20160504_192055[1]20160504_194659[1]20160504_195937[1]20160504_201917[1]

Blockstack offers secure identification based on blockchain encryption and confirmation. Six speakers described the underlying machinery and applications.

As in Bitcoin, Blockstack promises secure identification and transactions without using a central verifying agent.

The BlockStack application stack from the bottom up contains:

  1. Blockchain – want to use the most secure chain, which is currently bitcoin.
  2. Naming – intermediate pseudonym
  3. Identity – establish who you are
  4. Authentication – use electronic signature
  5. Storage – put pointers in the block chain, so you need storage for the actual information
  6. Apps built on top of the stack

Layers

  1. Cryptcurrency blockchain
  2. Virtual blockchain – gives flexibility to migrate to another cryptocurrency.
  3. Routing – pointers data location. Initially used a DHT (distributed hash table).
  4. Data on cloud servers. Could be Dropbox, S3, …

1&2 are the control plane, above that is data plane

The current implementation uses a bitcoin wallet for identity and requires 10 blockchain confirmations to set up a person.

Applications presented

  1. OpenBazaar (a place to buy and sell without an intermediary) has a long identification string for each buy/sell. Blockstack provides a secure mapping of these ids to a simpler human-readable id
  2. Mediachain is a data network for information on creative works in which contributed information is validated by the contributor’s identity. All objects are IPFS + IPLD objects with information saved to Merkel trees. They are working on the challenge of private key management: high volumes of registrations and the need to register on behalf of 3rd
  3. IPFS (interplanetary file system) proposes to
    1. Create a DNS based on the package content which will allow copies to be located on several locations in the network
    2. Greater individual control over DNS names independent of any centralized naming body
    3. There are three levels of naming
      1. The content defines the address through a hash tag. But if the blog changes, the address changes.
      2. Key name. a mutable layer of names that is stable even as the content changes

 

posted in:  applications, Blockstack, Open source, Personal Data, security, startup    / leave comments:   No comments yet

Once upon a #Graph

Posted on April 21st, 2016

#ColumbiaDataScienceInstitute

04/20/2016 @ Davis Auditorium, Columbia U., NY

20160420_171328[1]

Jennifer Chayes @Microsoft talked about research characterizing large #networks. The is especially relevant given the apparently non-random friendship patterns on #Facebook with stars having exceptionally large numbers of connects and clusters based on geography of socioeconomic status, etc. We assume that it is unlikely that a totally randomly generated graph would have a similar structure. If it’s not random, then how do we characterize the network? Since the networks are extremely large, many of the classical measures of network structure such as maximum distances, etc. need to viewed for the POV of their limit properties as the network grows very large over time.

One key concept is the stochastic #block models used to describe social structure in groups (Holland & Leinhardt). Here the points in the network are divided into k species with different propensities to link to members in their own group and with members in each other group.

She reviewed work done on dense graphs and finite graphs, highlighting the cut norm (Frieze-Kannan) which characterizes the network structure by the number of edges that need to be cut to divide the network into two parts, considering all possible permutations of the points in the network. By estimating a parameter W one can characterize the network as its size increases to a limit.

Social networks, however, are sparse. This affects the estimation of W since W converges to zero as a sparse network increases to a limit. To get around this problem, Jennifer proposed two methods to weight the edges by the network density. The measures produce different estimates, but they converge to non-zero values for sparse matrices when joined with extend stochastic block models.

The last question she explored by what statistical information can be released without violating privacy. The key concept is having statistics in which the deletion of any point (individual) does not change the network statistics by more than some epsilon.

 

posted in:  ColumbiaDataScienceInstitute, data, Personal Data    / leave comments:   No comments yet

The State of the Personal Data Economy

Posted on July 28th, 2014

Doc Searls and The State of the Personal Data Economy

#PDNYC

Doc Searls, Sean Bohan, Rick Heitzmann – panelists

07/28/2014 WeWork Labs, 175 Varick St., New York

Personal data collected on the internet was discussed from the point of view of privacy and commercial value. Privacy issues are of heightened concern given the scrubbing of Google’s database in Europe, the spying revelations brought forth by the phone-hacking scandal at News Of the World and Snowden’s documentation of NSA spying. Paired with this is the idea that personal data would have its greatest value to the individual.

Doc Searls, in his keynote talk, presented a vision in which individuals would have control of their data which they could exchange for better, customized service. The panelists, Doc, Sean, and Rick, amplified this theme by talking about how a browser can use a combination of ad blockers and password stores to control what sites see and retain the data for personal consumption. There was also a discussion about companies that had an ad-based business model (such as Google) versus those who might be sympathetic toward these privacy initiatives (such as Microsoft) since they are not driven primarily by ad revenues.

A topic that was not explored was the willingness of third parties to agree to give preference to those providing personal data. One issue is that every individual will be providing different data (in probably a different format) with different items censored and embellished. Vendors would need to have a common data format in order to avoid the expense of data cleaning and their models would need to be flexibly designed to handle different types of data (and have some way to evaluating its quality in order to determine the discount/service offered).  

posted in:  data, Personal Data    / leave comments:   No comments yet