New York Tech Journal
Tech news from the Big Apple

Data Driven NYC: the future of #DataScience

Posted on December 17th, 2014


12/16/2014 @Bloomberg, 731 Lexington Ave, NY

After Michael Li @TheDataIncubator talked about the skills needed by data scientists to transition from academia to industry, Matt Turck interviewed two pioneers in the use of big data and deep learning:

Professor Yann LeCun, Director of AI Research at Facebook (Professor LeCun is world-renowned for his work in machine learning and computer vision)

Mike Olson, Co-Founder and Chairman of Cloudera (one of the key companies of the Hadoop/Big Data ecosystem; has raised $1.2 Billion to date)


Michael Li runs The Data Incubator, a program that trains scientists and engineers with advanced degrees to work as data scientists.  A screening exam is part of the admissions process and he showed findings from these tests. Among his findings were

  1. Many students who claim to know python programming or linear regression struggle to correctly solve the exam problem
  2. Graduates of name universities only marginally outperform graduates from “lesser” schools.
  3. Math and economics majors performed best on the exam.

Mike Olson @Cloudera talked about his experiences prior to founding Cloudera, which included. dropped out of Berkeley several times, working for Mike Stonebreaker, working on Berkeley DB, and being the CEO of Sleepy Cat.

Mike spoke about the initial industry misunderstandings about the purpose of map-reduce until it was realized that Hadoop was not built to solve the problems that databases were solving.

He next spoke about how map reduce is evolving and how Cloudera is positioned as the technology develops. Key to their positioning is that map-reduce is open source, but Cloudera retains proprietary tools to manage the process. This means that the underlying big-data technology will advance quickly from contributions throughout the industry. The admin tools gives then a revenue stream, thereby avoiding the price compression faced by a company with a completely open-source system (such as SleepyCat).

In addition, Cloudera sees itself as providing the platform whose usage will grow as new applications are created. Therefore it is in their interest to encourage this application layer and not compete with the application developers. By encouraging a rich ecosystem of applications and an industry-wide foundation, Cloudera grows the application of big-data which increases the demand for Cloudera’s services.

Mike also mentioned why they sold a meaningful stake of the company to Intel. This sales creates a mutually beneficial partnership as well as providing additional defenses against potential acquirers such as IBM.

He sees the demand for big-data analytics exploding with the biggest increase coming in the demand for new applications.

Yann Lecun @Facebook spoke about his work on neural nets over the past 30 years. He described how neural nets evolved as new hardware and data allowed convolution networks to first handle handwriting recognition and eventually visual and speech recognition to reach the current successes achieved by Deep Learning algorithms.

Going forward he sees more and more of our interactions mediated by intelligence processes. At Facebook, algorithms decide which items will be most interesting for us to read. Some of the challenges he sees are creating algorithms to understand the structure of language. Unsupervised learning is also a challenge.

His closing observations were about whether there should be some limits on the usage of artificial intelligence and the autonomy of AI systems to initiate actions.

posted in:  data analysis, Data Driven NYC, databases    / leave comments:   1 comment