New York Tech Journal
Tech news from the Big Apple

Intro to #DeepLearning using #PyTorch

Posted on February 21st, 2017

#ACM NY

02/21/2017 @ NYU Courant Institute (251 Mercer St, New York, NY)

Soumith Chintala @Facebook first talked about trends in the cutting edge of machine learning. His main point was that the world is moving from fixed agents to dynamic neural nets in which agents restructure themselves over time. Currently, the ML world is dominated by static datasets + static model structures which learn offline and do not change their structure without human intervention.

He then talked about PyTorch which is the next generation of ML tools after Lua #Torch. In creating PyTorch they wanted to keep the best features of LuaTorch, such as performance and extensibility while eliminating rigid containers and allowing for execution on multiple-GPU systems. PyTorch is also designed so programmers can create dynamic neural nets.

Other features include

  1. Kernel fusion – take several objects and fuse them into a single object
  2. Order of execution – reorder objects for faster execution
  3. Automatic work placement when you have multiple GPUs

PyTorch is available for download on http://pytorch.org and was released Jan 18, 2017.

Currently, PyTorch runs only on Linux and OSX.

posted in:  ACM, data analysis, Programming, Python    / leave comments:   No comments yet

#VideoStreaming, #webpack,#diagrams

Posted on January 18th, 2017

#CodeDrivenNYC

01/17/2017 @FirstMarkCapital, 100 Fifth Ave, NY 3rd floor

Tim Whidden, VP Engineering at 1stdibs: Webpack Before It Was Cool – Lessons Learned

Sarah Groff-Palermo, Designer and Developer: Label Goes Here: A Talk About Diagrams

Dave Yeu, VP Engineering at Livestream: A Primer to Video on the Web: Video Delivery & Its Challenges

Dave Yeu @livestream talked about some of the challenges of streaming large amounts of video and livestreaming: petabytes storage, io, cpu, latency (for live video)

Problems

  1. Long-lived connections – there are several solutions
    1. HLS (Http live streaming) which cuts video into small segments and uses http as the delivery vehicle. Originally developed by Apple as a way to deliver video to iPhone as their coverage moves from cell tower to cell tower. It uses the power of http protocol = a play list & small chunks which are separate url’s: m3u8 files that point to the actual files.
      1. But there are challenges – if you need 3 chunks in your buffer, then you have a 15 second delay. As you decrease the size of each chunk, the play list gets longer so you need to do more requests for the m3u8 file.
    2. DASH – segments follow a template which reduces index requests
    3. RTMP – persistent connections, extremely low latency, used by Facebook
  2. Authorization – but don’t want you to rebroadcast. (no key, so not DRM).
    1. Move authentication to cache level – use Varnish.
    2. Add token to the playlist, Varnish vets the token and serves the content. => all things come through their api.
    3. But – you expand the scope of your app = cache + server.
  3. Geo-restrictions
    1. Could do this: IP address + restrictions. But in this case you need to put geo-block behind the cache and server.
    2. Instead, the api generate s geo-block config. Varnish loads in a memory map and checks
    3. If there is a geo violation, then Varnish returns a modified url, so the server can decide how to respond

++

Tim Whidden @1stdibs, an online market place for curated goods –“ ebay for rich people” spoke about Webpack, a front end module system. He described how modules increase the usability of functions and performs other functions like code compression.

++

Finally, Sarah Groff-Palermo @sarahgp.com spoke about how diagrams help her clarify the code she has written and provide documentation for her and others in the future.

She described a classification of learning types from sequential learner (likes tutorials) to global learners (like to see the big picture first) (see http://www4.ncsu.edu/unity/lockers/users/f/felder/public/ILSdir/styles.htm) . Sarah showed several diagrams and pointed out how they help her get and keep the global picture. She especially likes the paradigm from Ben Schneiderman  – overview, zoom and filter then details-on-demand

For further ideals she recommended

  1. the book Going Forth – lots of diagrams
  2. Now you see it by Stephen Few
  3. Flowing data – blog by Nathan Yau
  4. Keynote is a good tool to use for diagrams

posted in:  applications, Code Driven NYC, video    / leave comments:   No comments yet

NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)

Posted on November 22nd, 2016

#NYAI

11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor

img_20161122_1918271 img_20161122_2039491

Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)

Speaker 2: Top-down vs. Bottom-up Computational Creativity  – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)

Matthew Russell @DigitalReasoning  spoke about understanding language using NLP,  relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:

  1. Vagrant – allow an environmental setup.
  2. Jupyter Notebook – like a lab notebook
  3. Git – version control
  4. Automation –

He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.

He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.

He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.

In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of

Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure

Bottom-up: data driven – data driven, “hands off”, you learn the structure

He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other

To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.

It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.

As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.

Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative

They like a mix of top-down and bottom-up approaches:

  1. Want speed, but neural nets are slow
  2. Music has a lot of theory behind it, so it’s best to let the programmers code these rules
  3. Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars

Runtime written in Haskell – functional language so its great for music

posted in:  AI, Big data, data analysis, Data science, NewYorkAI, Programming    / leave comments:   No comments yet

#Genomic analysis and #BigData using #FPGA’s

Posted on November 17th, 2016

#BigDataGenomicsNYC

11/17/2016 @ Phosphous, 1140 Broadway, NY, 11th floor

img_20161117_1954111 img_20161117_2015231 img_20161117_2017531

Rami Mehio @Edico Genome spoke about the fast analysis of a human genome  (initially did secondary analysis which is similar to telecommunications – errors in the channel) as errors come from the process due to the repeats and mistakes in the sequencer)

Genomic data doubles every 7 months historically, but the computational speed to do the analysis lags, as Moore’s law has a doubling every 18 months. With standard CPUs, mapping takes 10 to 30 hours on a 24 core server. Quality control adds several hours.

In addition, a human genome file is a 80GB Fastq file.  (this is only for a rough look at the genome at 30x = # times DNA is multiplied = #times the analysis is redone.)

Using FPGAs reduced the analysis time to 20 minutes. Also the files in CRAM compression are reduced to 50GB.

The server code is in C/C++. The FPGAs are not programmed, but their connectors are specified using the VITAL or VHDL languages.

HMM and Smith-Waterman algorithms require the bulk of the processing time, so both are implemented in the FPGAs. Other challenges are to get sufficient data to feed the FPGA which means the software needs to run in parallel. Also, the FPGAs are configured so they can change the algorithm selectively to make advantage of what needs to be done at the time.

posted in:  Big data, data, Genome, hardware    / leave comments:   No comments yet

Listening to Customers as you develop, assembling a #genome, delivering food boxes

Posted on September 21st, 2016

#CodeDrivenNYC

09/21/2016 @FirstMark, 100 Fifth Ave, NY, 3rd floor

img_20160921_1824581 img_20160921_1850401 img_20160921_1910301 img_20160921_1937151

JJ Fliegelman @WayUp (formerly CampusJob) spoke about the development process used by their application which is the largest market for college students to find jobs. JJ talked about their development steps.

He emphasized the importance of specing out ideas on what they should be building and talking to your users.

They use tools to stay in touch with your customers

  1. HelpScout – see all support tickets. Get the vibe
  2. FullStory – DVR software – plays back video recordings of how users are using the software

They also put ideas in a repository using Trello.

To illustrate their process, he examined how they work to improved job search relevance.

They look at Impact per unit Effort to measure the value. They do this across new features over time. Can prioritize and get multiple estimates. It’s a probabilistic measure.

Assessing impact – are people dropping off? Do people click on it? What are the complaints? They talk to experts using cold emails. They also cultivate a culture of educated guesses

Assess effort – get it wrong often and get better over time

They prioritize impact/effort with the least technical debt

They Spec & Build – (product, architecture, kickoff) to get organized

Use Clubhouse is their project tracker: readable by humans

Architecture spec to solve today’s problem, but look ahead. Eg.. initial architecture – used wordnet, elastic search, but found that elastic search was too slow so they moved to a graph database.

Build – build as little as possible; prototype; adjust your plan

Deploy – they will deploy things that are not worse (e.g. a button that doesn’t work yet)

They do code reviews to avoid deploying bad code

Paul Fisher @Phosphorus (from Recombine – formerly focused on the fertility space: carrier-screening. Now emphasize diagnostic DNA sequencing) talked about the processes they use to analyze DNA sequences. With the rapid development of laboratory technique, it’s a computer science question now. Use Scala, Ruby, Java.

Sequencers produce hundreds of short reads of 50 to 150 base pairs. They use a reference genome to align the reads. Want multiple reads (depth of reads) to create a consensus sequence

To lower cost and speed their analysis, they focus on particular areas to maximize their read depth.

They use a variant viewer to understand variants between the person’s and the reference genome:

  1. SNPs – one base is changed – degree of pathogenicity varies
  2. Indels – insertions & deletions
  3. CNVs – copy variations

They use several different file formats: FASTQ, Bam/Sam, VCF

Current methods have evolved to use Spark, Parquet (columnar storage db), and Adam (use Avro framework for nested collections)

Use Zepplin to share documentation: documentation that you can run.

Finally, Andrew Hogue @BlueApron spoke about the challenges he faces as the CTO. These include

Demand forecasting – use machine learning (random forest) to predict per user what they will order. Holidays are hard to predict. People order less lamb and avoid catfish. There was also a dip in orders and orders with meat during Lent.

Fulfillment – more than just inventory management since recipes change, food safety, weather, …

Subscription mechanics – weekly engagement with users. So opportunities to deepen engagement. Frequent communications can drive engagement or churn. A/B experiments need more time to run

BlueApron runs 3 Fulfillment centers for their weekly food deliveries: NJ, Texas, CA shipping 8mm boxes per month.

posted in:  applications, Big data, Code Driven NYC, data, data analysis, startup    / leave comments:   No comments yet

DataDrivenNYC: bringing the power of #DataAnalysis to ordinary users, #marketers, #analysts.

Posted on June 18th, 2016

#DataDrivenNYC

06/13/2016 @AXA Equitable Center (787 7th Avenue, New York, NY 10019)

20160613_183900 20160613_185245 20160613_191943 20160613_194901

The four speakers were

Adam @NarrativeScience talked about how people with different personalities and jobs may require/prefer different takes on the same data. His firm ingests data and has systems to generate natural language reports customized to the subject area and the reader’s needs.

They current develop stories with the guidance of experts, but eventually will more to machine learning to automate new subject areas.

Next, Neha @Confluent talked about how they created Apache Kafka: a streaming platform which collects data and allows access to these data in real time.

Read more…

posted in:  data, data analysis, Data Driven NYC, Data science, databases, Open source    / leave comments:   No comments yet

Advanced #DeepLearning #NeuralNets: #TimeSeries

Posted on June 16th, 2016

#DataScience+FintechJC

06/15/2016 @Qplum, 185 Hudson Street, Jersey City, NJ, suite 1620

20160615_182158[1] 20160615_183355[1] 20160615_185542[1] 20160615_190038[1] 20160615_184353[1]
Sumit Chopra@facebook first gave a general review of neural nets and then talked about how neural nets can be adapted to text analysis and time series data.

For a review of neural net learning models, see my previous post from this series and other posts on machine learning.

Sumit then broke the learning process into two steps: feature extraction and classification. Starting with raw data, the feature extractor is the deep learning model that prepares the data for the classifier which may be a simple linear model or random forest. In supervised training, errors in the prediction output by the classifier are feed back into the system using back propagation to tune the parameters of the feature extractor and the classifier.

In the remainder of the talk Sumit concentrated on how to improve the performance of the feature extractor.

In the general text classification (unlike image or speech recognition) the length of the input can be very long (and variable in length). In addition, analysis of text by general deep learning models

  1. does not capture order of words or predictions in time series
  2. can handle only small sized windows or the number of parameters explodes
  3. cannot capture long term dependencies

So, the feature extractor is cast as a time delay neural networks (#TDNN). In TDNN, the words are text is viewed as a string of words. Kernel matrices (usually of from 3 to 5 unit long) are defined which compute a dot products of the weights of the words in a contiguous block of text. The kernel matrix is shifted one word and the process is repeated until all words are processed. A second kernel matrix creates another set of features and so forth for a 3rd kernel, etc.

These features are then pooled using the mean or max of the features. This process is repeated to get additional features. Finally a point-wise non-linear transformation is applied to get the final set of features.

Unlike traditional neural network structures, these methods are new, so no one has done a study of what is revealed in the first layer, second layer, etc. Also theoretical work is lacking on the optimal number of layers for a text sample of a given size.

Historically, #TDNN has struggled with a series of problem including convergence issues, so recurrent neural networks (#RNN) were developed in which the encoder looks at the latest data point along with its own previous output. One example is the Elman Network, which each feature is the weighted sum of the kernel function (one encoder is used for all points on the time series) output with the previously computed feature value. Training is conducted as in a standard #NN using back propagation through time with the gradient accumulated over time before the encoder is re-parameterized, but RNN has a lot issues
1, exploding or vanishing gradients – depending on the largest eigenvalue
2. cannot capture long-term dependencies
3. training is somewhat brittle

The fix is called Long short-term memory. #LSTM, has additional memory “cells” to store short-term activations. It also has additional gates to alleviate the vanishing gradient problem.
(see Hochreiter et al . 1997). Now each encoder is made up of several parts as shown in his slides. It can also have a forget gate that turns off all the inputs and can peep back at the previous values of the memory cell. At Facebook, NLP and speech and vision recognition are all users of LSTM models

LSTM models, however still don’t have a long term memory. Sumit talked about how creating memory networks which will take a store and store the key features in a memory cell. A query runs against the memory cell and then concatenates the output vector with the text. A second query will retrieve the memory.
He also talked about using a dropout method to fight overfitting. Here, there are cells that randomly determine whether a signal is transmitted to the next layer

Autocoders can be used to pretrain the weights within the NN to avoid problems of creating solution that are only locally optimal instead of globally optimal.

[Many of these methods are similar in spirit to existing methods. For instance, kernel functions in RNN are very similar to moving average models in technical trading. The different features correspond to averages over different time periods and higher level features correspond to crossovers of the moving averages.

The dropoff method is similar to the techniques used in random forest to avoid overfitting.]

posted in:  data analysis, Data science, Programming    / leave comments:   No comments yet

#Visualization Metaphors: unraveling the big picture

Posted on May 19th, 2016

05/18/2016 @TheGraduateCenter CUNY, 365 5th Ave, NY

20160518_151211[1] 20160518_153418[1] 20160518_150729[1] 20160518_153530[1]

Manuel Lima ( @mslima  ) @Parsons gave examples of #data representations. He first looked back 800 years and talked about Ars Memorativa, the art of memory , a set of mnemonic principals to organize information: e.g. spatial orientation, order of things on paper, chunking, association (to reinforce relations), affect, repetition. (These are also foundation principals of #Gestalt psychology).

Of the many metaphors, trees are most used: e.g. tree of life and the tree of good and evil. geneology, evolution, laws, …

Manuel then talked about how #trees work well for hierarchical systems, but we are looking more frequently at more complex systems. In science, for instance:

17-19th century – single variable relationships

20th century – systems of relationships (trees)

21st century – organized complexity (networks)

Even the tree of life can be seen as a network once bacteria’s interaction with organisms is overlaid on the tree.

He then showed various  15 distinct typologies for mapping networks and showed works of art inspired by networks (the new networkism) : 2-d: Emma McNally, 3-d: Tomas Saraceno and Chiharu Shiota.

The following authors were suggested as references on network visualization: Edward Tufte, Jacques Bertin (French philosopher), and Pat Hanrahan (a computer science prof at Stanford extended his work, also one of the founders of Tableau)

posted in:  Art, Big data, data, data analysis, Data science, UX    / leave comments:   No comments yet

Industrial #Wearables & #IoT

Posted on May 17th, 2016

#SQLNYC

05/17/2016 @ Manhattan Ballroom, 29 W 36th Street, NY, 2nd floor

20160517_191339[1]

Anupam Sengupta @GuardHat (industrial safety helmet fitted with sensors monitoring 42 conditions) spoke about the helmet and the data back end. The hat is fitted with camera and microphone along with sensor for biometrics, geolocation, toxic gas, etc. The helmet is not sold, but will be available as a B2B service by year end.

Over an 8 hour shift it transmits 20 Mbytes of data. A typical work site would have from 100 to 300 workers and up to 3 shifts per day.

There is local processing on the device and data are sent real time for aggregation. Time to detect an event is 2 seconds. At the aggregation point, external data are added: weather data, location data as the building site changes.

HPCC is the back end with Lambda architecture so the data can be processed both in real time and for historical analysis.

Design considerations include:

Lack of reliability in data channel; “event stop” – data volume exceeds plan (here several people are involved with the event); devices don’t send the data stream every time; schema varies over time with conditions; temporal sequence not guaranteed

Encryption AS 256 for data transmission and storage

Radiation shielding within the helmet

Tracking limitations as agreed upon by unions and companies to preserve privacy

posted in:  data, Internet of Things, startup    / leave comments:   No comments yet

#DeepLearning and #NeuralNets

Posted on May 16th, 2016

#DataScience+FinTechJC

05/16/2016 @Qplum, 185 Hudson Street, Suite 1620  Plaza 5, Jersey City, NJ

20160516_180118[1]

#Raghavendra Boomaraju @ Columbia described the math behind neural nets and how back propagation is used to fit models.

Observations on deep learning include:

  1. Universal approximation theory says you can fit any model with one hidden layer, provided the layer has a sufficient number of levels. But multiple hidden layers work better. The more layers, the fewer levels you need in each layer to fit the data.
  2. To optimize the weights, back-propagate the loss function. But one does not need to optimize the g() function since g()’s are designed to have a very general shape (such as the logistic)
  3. Traditionally, fitting has been done by changing all inputs simultaneously (deterministic) or changing one input at a time during optimization (stochastic inputs) . More recently, researchers are changing subsets of the inputs (minibatches).
  4. Convolution operators are used to standardize inputs by size and orientation by rotating and scaling.
  5. There is a way to Visualize within a neural network – see http://colah.github.io/
  6. The gradient method used to optimize weights needs to be tuned so it is neither too aggressive nor too slow. Adaptive learning (Adam algorithm) is used to determine the optimal step size.
  7. Deep learning libraries include Theano, café, Google tensor flow, torch.
  8. To do unsupervised deep learning – take inputs through a series of layers that at some point have fewer levels than the number of inputs. The ensuing layers expand so the number of points on the output layer matches that of the input layer. Optimize the net so that the inputs match the outputs. The layer with the smallest number of point s describes the features in the data set.

 

posted in:  Big data, data, data analysis, Data science    / leave comments:   1 comment