New York Tech Journal
Tech news from the Big Apple

NYAI#5: Neural Nets (Jason Yosinski) & #ML For Production (Ken Sanford)

Posted on August 24th, 2016

#NYAI, New York #ArtificialIntelligence

08/24/2016 @Rise 43 West 23rd Street, NY, 2nd floorPreview Changes

IMG_20160824_200640[1] IMG_20160824_203211[1]

Jason Yosinski@GeometricTechnology spoke about his work on #NeuralNets to generate pictures. He started by talking about machine learning with feedback to train a robot to move more quickly and using feedback to computer-generate pictures that are appealing to humans.

Jason next talked about AlexNet, based on work by Krizhevsky et al 2012, to classify images using a neural net with 5 convolutional layers (interleaved with max pooling and contrast layers) plus 3 fully connected layers at the end. The net with 60 million parameters was training on ImageNet which contains over 1mm images. His image classification Code is available on http://Yosinski.com.

Jason talked about how the classifier thinks about categories when it is not being trained to identify that category. For instance, the network may learn about faces even though there is no human category since it helps the system detect things such as hats (above a face) to give it context. It also identifies text to give it context on other shapes it is trying to identify.

He next talked about generating images by inputting random noise and randomly changing pixels. Some changes will cause the goal (such as a ‘lions’) to increase in confidence. Over many random moves, the goal increases in its confidence level. Jason showed many random images that elicited high levels of confidence, but the images often looked like purple-green slime. This is probably because the network, while learning, immediately discards the overall color of the image and is therefore insensitive to aberrations from normal colors.  (See Erhan et al 2009)

[This also raises the question of how computer vision is different from human vision. If presented with a blue colored lion, the first reaction of a human might be to note how the color mismatches objects in the ‘lion’ category. One experiment would be to present the computer model with the picture of a blue lion and see how it is classified. Unlike computers, humans encode information beyond their list of items they have learned and this encoding includes extraneous information such as color or location. Maybe the difference is that humans incorporate a semantic layer that considers not only the category of the items, but other characteristics that define ‘lion-ness’.  Color may be more central to human image processing as it has been conjectured that we have color vision so we can distinguish between ripe and rotten fruits. Our vision also taps into our expectation to see certain objects within the world and we are primed to see those objects in specific contexts, so we have contextual information beyond what is available to the computer when classifying images.]

To improve the generated pictures of ‘lions’, he next used a generator to create pictures and change them until they get a picture which has high confidence of being a ‘lion’. The generator is designed to create identifiable images. The generator can even produce pictures on objects that it has not been trained to paint. (Need to apply regularization to get better pictures for the target.)

Slides at http://s.yosinski.com/nyai.pdf

In the second talk, Ken Sanford @Ekenomics and H20.AI talked about the H2O open source project. H2O is a machine learning engine that can run in R, Python,Java, etc.

Ken emphasized how H2O (a multilayer feed forward neural network) provides a platform that uses the Java Score Code engine. This easies the transition from the model developed in training and the model used to score inputs in a production environment.

He also talked about the Deep Water project which aims to allow other open source tools, such as MXNET, Caffe, Tensorflow,… (CNN, RNN, … models) to run in the H2O environment.

posted in:  AI, Big data, Data science, NewYorkAI, Open source    / leave comments:   No comments yet

Advanced #DeepLearning #NeuralNets: #TimeSeries

Posted on June 16th, 2016

#DataScience+FintechJC

06/15/2016 @Qplum, 185 Hudson Street, Jersey City, NJ, suite 1620

20160615_182158[1] 20160615_183355[1] 20160615_185542[1] 20160615_190038[1] 20160615_184353[1]
Sumit Chopra@facebook first gave a general review of neural nets and then talked about how neural nets can be adapted to text analysis and time series data.

For a review of neural net learning models, see my previous post from this series and other posts on machine learning.

Sumit then broke the learning process into two steps: feature extraction and classification. Starting with raw data, the feature extractor is the deep learning model that prepares the data for the classifier which may be a simple linear model or random forest. In supervised training, errors in the prediction output by the classifier are feed back into the system using back propagation to tune the parameters of the feature extractor and the classifier.

In the remainder of the talk Sumit concentrated on how to improve the performance of the feature extractor.

In the general text classification (unlike image or speech recognition) the length of the input can be very long (and variable in length). In addition, analysis of text by general deep learning models

  1. does not capture order of words or predictions in time series
  2. can handle only small sized windows or the number of parameters explodes
  3. cannot capture long term dependencies

So, the feature extractor is cast as a time delay neural networks (#TDNN). In TDNN, the words are text is viewed as a string of words. Kernel matrices (usually of from 3 to 5 unit long) are defined which compute a dot products of the weights of the words in a contiguous block of text. The kernel matrix is shifted one word and the process is repeated until all words are processed. A second kernel matrix creates another set of features and so forth for a 3rd kernel, etc.

These features are then pooled using the mean or max of the features. This process is repeated to get additional features. Finally a point-wise non-linear transformation is applied to get the final set of features.

Unlike traditional neural network structures, these methods are new, so no one has done a study of what is revealed in the first layer, second layer, etc. Also theoretical work is lacking on the optimal number of layers for a text sample of a given size.

Historically, #TDNN has struggled with a series of problem including convergence issues, so recurrent neural networks (#RNN) were developed in which the encoder looks at the latest data point along with its own previous output. One example is the Elman Network, which each feature is the weighted sum of the kernel function (one encoder is used for all points on the time series) output with the previously computed feature value. Training is conducted as in a standard #NN using back propagation through time with the gradient accumulated over time before the encoder is re-parameterized, but RNN has a lot issues
1, exploding or vanishing gradients – depending on the largest eigenvalue
2. cannot capture long-term dependencies
3. training is somewhat brittle

The fix is called Long short-term memory. #LSTM, has additional memory “cells” to store short-term activations. It also has additional gates to alleviate the vanishing gradient problem.
(see Hochreiter et al . 1997). Now each encoder is made up of several parts as shown in his slides. It can also have a forget gate that turns off all the inputs and can peep back at the previous values of the memory cell. At Facebook, NLP and speech and vision recognition are all users of LSTM models

LSTM models, however still don’t have a long term memory. Sumit talked about how creating memory networks which will take a store and store the key features in a memory cell. A query runs against the memory cell and then concatenates the output vector with the text. A second query will retrieve the memory.
He also talked about using a dropout method to fight overfitting. Here, there are cells that randomly determine whether a signal is transmitted to the next layer

Autocoders can be used to pretrain the weights within the NN to avoid problems of creating solution that are only locally optimal instead of globally optimal.

[Many of these methods are similar in spirit to existing methods. For instance, kernel functions in RNN are very similar to moving average models in technical trading. The different features correspond to averages over different time periods and higher level features correspond to crossovers of the moving averages.

The dropoff method is similar to the techniques used in random forest to avoid overfitting.]

posted in:  data analysis, Data science, Programming    / leave comments:   No comments yet

#DeepLearning and #NeuralNets

Posted on May 16th, 2016

#DataScience+FinTechJC

05/16/2016 @Qplum, 185 Hudson Street, Suite 1620  Plaza 5, Jersey City, NJ

20160516_180118[1]

#Raghavendra Boomaraju @ Columbia described the math behind neural nets and how back propagation is used to fit models.

Observations on deep learning include:

  1. Universal approximation theory says you can fit any model with one hidden layer, provided the layer has a sufficient number of levels. But multiple hidden layers work better. The more layers, the fewer levels you need in each layer to fit the data.
  2. To optimize the weights, back-propagate the loss function. But one does not need to optimize the g() function since g()’s are designed to have a very general shape (such as the logistic)
  3. Traditionally, fitting has been done by changing all inputs simultaneously (deterministic) or changing one input at a time during optimization (stochastic inputs) . More recently, researchers are changing subsets of the inputs (minibatches).
  4. Convolution operators are used to standardize inputs by size and orientation by rotating and scaling.
  5. There is a way to Visualize within a neural network – see http://colah.github.io/
  6. The gradient method used to optimize weights needs to be tuned so it is neither too aggressive nor too slow. Adaptive learning (Adam algorithm) is used to determine the optimal step size.
  7. Deep learning libraries include Theano, café, Google tensor flow, torch.
  8. To do unsupervised deep learning – take inputs through a series of layers that at some point have fewer levels than the number of inputs. The ensuing layers expand so the number of points on the output layer matches that of the input layer. Optimize the net so that the inputs match the outputs. The layer with the smallest number of point s describes the features in the data set.

 

posted in:  Big data, data, data analysis, Data science    / leave comments:   1 comment

CodeDrivenNYC: #Web #Annotation, #NeuralNets #DeepLearning, #WebGL #Anatomy

Posted on December 17th, 2015

#CodeDrivenNYC

12/16/2015 @FirstMarkCap, 100 5th Ave, NY

Three speakers talked about challenging programming problems and how they solved them

20151216_181952[1] BioDigital20151216_185209[1] 20151216_190526[1]

Matt Brown @Genius talked about how they implemented their product which allows users to annotate text on web pages. The challenge is locating the text that was annotated on a web page and the web page may be modified after the annotation was added. In this case, the text fragment and the location of the fragment may have changed, but the annotation should still point to the same part of the text. This means that the location of the text in the dom may have changed and the fragment itself may have been modified.

To restore the annotation they use fuzzy matching in the following steps

  1. Identify regions that may hold the text
  2. Conduct a fuzzy search to find possible starting and ending points for the matching text
  3. Highlight the text that is the closest match from the candidates in the fuzzy search

The user highlights text in the original web page and the program stores the highlighted fragment along with text showing the context both before and after the fragment.

When the user loads the web page, the following steps are performed to locate the fragment

  1. Use jQuery body.text to extract all text from the web site
  2. Build a list of infrequently used words and locate these words in the web site text
  3. Use the JavaScript implementation of Google’s diff-match-patch library to find the fragment in the text (The library uses the Bitap algorithm to find the best fuzzy match). The algorithm finds starting locations for text matches to the fragment. If the fragment is longer than 64 characters, only the first 64 characters are used. Searches are conducted using the before-context with the fragment to determine the general location in the text and using only the fragment to determine the possible starting points of the fragment in the text.
  4. Reverse the order of characters in both the fragment and the text. Repeat the previous step to determine possible ending points of the fragment in the text.
  5. Extract candidate locations for the fragment and pick the location which has the minimum Levenshtein distance (fewest character substitutions/inserts/removals).
  6. Highlight the text in that location. Repeat this process for each stored fragment.

Next, Peter Brodsky @HyperScience spoke about how his company is making the training of neural nets more efficient. HyperScience trains neural nets (containing up to 6 layers) on a variety of tasks (e.g. looking for abnormal employee behavior, reassembling shredded documents, eliminating porn from web sites).

The problems they want to overcome are

  1. Local minimum solutions are obtained instead of a global minimum
  2. Expensive to train
  3. Poor reuse

To overcome these problems they do the following. Once the nets are trained, they examine the nets and extract subnets that have similar patterns of weights. They test whether these subnets are performing common functions by swapping subnets across neural networks. If the performance does not change then they assume that the subnets are performing a common task. Over time they create libraries of subnets.

They can then describe the internal structure of the net in terms of the functions of subnets instead of in terms of nodes. This improves their ability to understand the processing within the net.

This has several advantages.

  1. They can create larger and more complex networks
  2. They can start with a weight vector and guide the net away from local minima and toward the global minimum.
  3. Their networks should learn faster since the standard building blocks are already in place and do not need to be reinvented.

In the third presentation, Tarek Sherif @BioDigital talked about how BioDigital is implementing anatomical content for the web. The challenge is to create 3d, interactive pictures showing human bodies in motion or in sections, layers, etc.

BioDigital uses webGL to render their content in HTML/CSS/JS on all browsers and mobile devices. Due to the computational load, optimization of memory management and JavaScript code is important.

The content can be static, animated or a series of animations. The challenge is to keep the size down for quick downloads, but have the user experience the beauty of the images.

Displaying anatomical content is challenging since it can be

  1. Deeply nested – e.g. brain inside skull
  2. Hierarchical – is the click on the hand or the arm?
  3. Scale – from cells to the whole body

User interactions can include –highlighting, dissection, isolation, transparency, annotation, rotation,…

Mobile is even more challenging

  1. Limited memory and GPU
  2. Variety of devices
    1. GL variable limits
    2. Shader precision
    3. Available extensions

To allow their images to be plugged into web sites, they create an API

  1. Create an iframe to embed into a page
  2. Allows basic interactions
  3. The underlying JavaScript can be customized

API challenges

  1. 3d terminology and concepts
  2. 3d navigation
  3. Anatomical concepts
  4. Architecture of the Human

Examples can be seen at https://developer.biodigital.com

The artists primarily use Maya and Zbrush as their creative tools.

Models can be customized for specific patients.

posted in:  Animation, applications, Code Driven NYC, Programming    / leave comments:   No comments yet

Intro to #DeepLearning using #PyTorch

Posted on February 21st, 2017

#ACM NY

02/21/2017 @ NYU Courant Institute (251 Mercer St, New York, NY)

Soumith Chintala @Facebook first talked about trends in the cutting edge of machine learning. His main point was that the world is moving from fixed agents to dynamic neural nets in which agents restructure themselves over time. Currently, the ML world is dominated by static datasets + static model structures which learn offline and do not change their structure without human intervention.

He then talked about PyTorch which is the next generation of ML tools after Lua #Torch. In creating PyTorch they wanted to keep the best features of LuaTorch, such as performance and extensibility while eliminating rigid containers and allowing for execution on multiple-GPU systems. PyTorch is also designed so programmers can create dynamic neural nets.

Other features include

  1. Kernel fusion – take several objects and fuse them into a single object
  2. Order of execution – reorder objects for faster execution
  3. Automatic work placement when you have multiple GPUs

PyTorch is available for download on http://pytorch.org and was released Jan 18, 2017.

Currently, PyTorch runs only on Linux and OSX.

posted in:  ACM, data analysis, Programming, Python    / leave comments:   No comments yet

NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)

Posted on November 22nd, 2016

#NYAI

11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor

img_20161122_1918271 img_20161122_2039491

Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)

Speaker 2: Top-down vs. Bottom-up Computational Creativity  – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)

Matthew Russell @DigitalReasoning  spoke about understanding language using NLP,  relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:

  1. Vagrant – allow an environmental setup.
  2. Jupyter Notebook – like a lab notebook
  3. Git – version control
  4. Automation –

He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.

He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.

He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.

In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of

Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure

Bottom-up: data driven – data driven, “hands off”, you learn the structure

He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other

To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.

It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.

As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.

Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative

They like a mix of top-down and bottom-up approaches:

  1. Want speed, but neural nets are slow
  2. Music has a lot of theory behind it, so it’s best to let the programmers code these rules
  3. Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars

Runtime written in Haskell – functional language so its great for music

posted in:  AI, Big data, data analysis, Data science, NewYorkAI, Programming    / leave comments:   No comments yet

#Unsupervised Learning (Soumith Chintala) & #Music Through #ML (Brian McFee)

Posted on July 26th, 2016

#NYAI

07/25/2016 @Rise, 28 West 24rd Street, NY, 2nd floor

IMG_20160725_192101[1] IMG_20160725_192826[1] IMG_20160725_193256[1] IMG_20160725_200916[1] IMG_20160725_201046[1]

Two speakers spoke about machine learning

In the first presentation, Brian McFee @NYU spoke about using ML to understanding the patterns of beats in music. He graphs beats identified by Mel-frequency cepstral coefficients (#MFCCs)

Random walk theory combines two representations of points in the graph.

  1. Local: In the graph, each point is a beat, edge connect adjacent beats. Weight edges by MFCC .
  2. Repetition: Link k-nearest neighbor by repetition = same sound – look for beats. Weight by similarity (k is set to the square root of the number of beats)
  3. Combination: A = mu * local + (1-mu)*repetition; optimize mu for a balanced random walk , so probability of a local move – probability of a repetition move over all vertices. Use a least squares optimization to find mu so the two parts of the equation make equal contributions across all points to the value of A.

The points are then partitioned by spectral clustering: normalize Laplacian – take bottom eigenvectors which encode component membership for each beat; cluster the eigenvectors Y of L to reveal the structure. Gives hierchical decomposition of the time series. m=1, the entire song. m=2 gets the two components of the song. As you add more eigenvectors, the number of segments within the song increases.

Brain then showed how this segmentation can create compelling visualizations of the structure of a song.

The Python code used for this analysis is available in the msaf library.

He has worked on convolutional neural nets, but find them to be better at handing individual notes within the song  (by contrast, rhythm is over a longer time period)

In the second presentation, Soumith Chintala talked about #GenerativeAdversarialNetworks (GAN).

Generative networks consist of a #NeuralNet “generator” that produces an image. It takes as input a high dimensional matrix (100 dimensions) of random noise. In a Generative Adversarial Networks a generator creates an image which is optimized over a loss function which evaluates “does it look real”. The decision of whether the image looks real is determined by a second neural net “discriminator” that tries to pick the fake image from a set of other real images plus the output of the generator.

Both the generator and discriminator NN’s are trained by gradient descent to optimize their individual performance: Generator = max game; discriminator = min game. The process optimizes Jensen-Shannon divergence.

Soumith then talked about extensions to GAN. These include

Class-conditional GANS – take noise + class of samples as input to the generator.

Video prediction GANS –predict what happens next given the previous 2 or 3 frames. Added a MSE loss (in addition to the discriminator classification loss) which compares what happened to what is predicted

Deep Convolution GAN – try to make the learning more stable by using a CNN.

Text-conditional GAN – input =noise + text. Use LSTM model on the text input. Generate images

Disentangling representations – InfoGAN – input random noise + categorical variables.

GAN is still unstable especially for larger images, so work to improve it includes

  1. Feature matching – take groups of features instead of just the whole image.
  2. Minibatch learning

No one has successfully used GAN for text-in to text-out

The meeting was concluded by a teaser for Watchroom – crowd funded movie on AI and VR.

posted in:  AI, data analysis, NewYorkAI    / leave comments:   No comments yet

Automatically scalable #Python & #Neuroscience as it relates to #MachineLearning

Posted on June 28th, 2016

#NYAI: New York Artificial Intelligence

06/28/2016 @Rise, 43 West 23rd Street, NY, 2nd floor

IMG_20160628_192420[1] IMG_20160628_200539[1] IMG_20160628_201905[1]

Braxton McKee (@braxtonmckee ) @Ufora first spoke about the challenges of creating a version of Python (#Pyfora) that naturally scales to take advantage of the hardware to handle parallelism as the problem grows.

Braxton presented an example in which we compute the minimum distance from target points a larger universe of points base on their Cartesian coordinates. This is easily written for small problems, but the computation needs to be optimized when computing this value across many cpu’s.

However, the allocation across cpu’s depends on the number of targets relative to the size of the point universe. Instead of trying to solve this analytically, they use a #Dynamicrebalancing strategy that splits the task and adds resources to the subtasks creating bottlenecks.

This approach solves many resource allocation problems, but still faces challenges

  1. nested parallelism. They look for parallelism within the code and look for bottlenecks at the top level of parallelism and split the task into subtasks at that level, …
  2. the data do not fit in memory. They break tasks into smaller tasks. They also have each task know which other caches hold data, so they can be accessed directly without going to slower main memory
  3. different types of architectures (such as gpu’s) require different types of optimization
  4. the optimizer cannot look inside python packages, so cannot optimize a bottleneck within a package.

Pyfora

  1. is a just-in-time compiler that moves stack frames from machine-to-machine and senses how to take advantage of parallelism
  2. tracks what data a thread is using
  3. dynamically schedules threads and data
  4. takes advantage of mutability which allows the compiler to assume that functions do no change over time so the compiler can look inside the function when optimizing execution
  5. is written on top of another language which allows for the possibility of porting the method to other languages

In the second presentation, Jeremy Freeman @Janelia.org spoke about the relationship between neuroscience research and machine learning models. He first talking about the early works on understanding the function of the visual cortex.

Findings by Hubel & Wiesel in1959 have set the foundation for visual processing models for the past 40 years. They found that Individual neurons in the V1 area of the visual cortex responded to the orientation of lines in the visual field. These inputs fed neurons that detect more complex features, such as edges, moving lines, etc.

Others also considered systems which have higher level recognition and how to train a system. These include

Perceptrons by Rosenblatt, 1957

Neocognitrons by Fukushima, 1980

Hierarchical learning machines, Lecun, 1985

Back propagation by Rumelhart, 1986

His doctoral research looked at the activity of neurons in V2 area. They found they could generate high order patterns that some neurons discriminate among.

But in 2012, there was a jump in performance of neural nets – U. of Toronto

By 2014, some of the neural network algos perform better than humans and primates, especially in the area of image processing. This has lead to many advances such as Google deepdream which combines images and texture to create an artistic hybrid image.

Recent scientific research allows one to look at thousands of neurons simultaneously. He also talked about some of his current research which uses “tactile virtual reality” to examine the neural activity as a mouse explores a maze (the mouse walks on a ball that senses it’s steps as it learns the maze).

Jeremy also spoke about Model-free episodic control for complex sequential tasks requiring memory and learning. ML research has created models such as LSTM and Neural Turing Nets which retain state representations. Graham Taylor has looked at neural feedback modulation using gates.

He also notes that there are similar functionalities between the V1 area in the visual cortex, the A1 auditory area, and the S1, tactile area.

To find out more, he suggested visiting his github site: Freeman-lab and looking the web site neurofinder.codeneuro.org.

posted in:  AI, data analysis, Data science, NewYorkAI, Open source, Python    / leave comments:   No comments yet

DataDrivenNYC: bringing the power of #DataAnalysis to ordinary users, #marketers, #analysts.

Posted on June 18th, 2016

#DataDrivenNYC

06/13/2016 @AXA Equitable Center (787 7th Avenue, New York, NY 10019)

20160613_183900 20160613_185245 20160613_191943 20160613_194901

The four speakers were

Adam @NarrativeScience talked about how people with different personalities and jobs may require/prefer different takes on the same data. His firm ingests data and has systems to generate natural language reports customized to the subject area and the reader’s needs.

They current develop stories with the guidance of experts, but eventually will more to machine learning to automate new subject areas.

Next, Neha @Confluent talked about how they created Apache Kafka: a streaming platform which collects data and allows access to these data in real time.

Read more…

posted in:  data, data analysis, Data Driven NYC, Data science, databases, Open source    / leave comments:   No comments yet

#DataDrivenNYC: #FaultTolerant #Web sites, #Finance, Predicting #B2B buying behavior, training #DeepLearning

Posted on May 18th, 2016

#DataDrivenNYC

05/18/2016 @AXA auditorium, 787 7th  Avenue, NY

20160518_182345[1] 20160518_184230[1] 20160518_191103[1] 20160518_193745[1]

Four speakers presented:

First, Nicolas Dessaigne @Algolia (Subscription service to access a search API) talked about the challenges building a highly fault-tolerant world-wide service. The steps resulted from their understanding of points of failure within their systems and the infrastructure their systems depend on.

Initially, they concentrated on their software development process including failed updates.  To overcome these problems, they update one server at a time (with a rack of servers), do partial updates, use Chef to automate deployment.

Then they migrated their DNS provider from .io to .net TLD to avoid slow response times they had seen intermittently in Asia. This was followed by the upgrades:

Feb 2015. Set up clusters of servers world-wide , so users have a server in their region:  lower latency

March 2015. Physically separate server clusters within a region to different providers

May 2015. Create fallback DNS servers

July 2015. Put a third data center online to make indexing robust

April 2016. Implement  a 1 second granularity for their system monitoring

Next, Matt Turck interviewed Louis DiModugno @AXA . In the US, AXA’s main focus is on predictive underwriting of insurance process. They also have projects to incorporate sensors into products and correctly route queries to call centers based on the demographics of the customer. World-wide they have three analysis hubs: France, US, Singapore (coming online).

Louis oversees both data and analytics in the U.S. and both he and the CTO report to the CIO.  They are interested in expanding their capabilities in areas such as creating unstructured databases from life insurance data that are currently on microfiche.

In the third presentation, Amanda Kahlow @6Sense talked about their business model  to provide information to customers in B2B commerce. They analyze business searches, customer web sites, visits to publisher’s (e.g. Forbes) web sites. Their goal is to determine the timing of customer purchases.

B2B purchases are different from B2C purchases since

  1. Businesses research their purchases online before they buy
  2. The research takes time (long sales cycle)
  3. The decision to buy involves multiple people within the company

So, there are few impulse buys and buyer behavior signals that a purchase is imminent.

The main CMO question is when (not who).

6sense ties data across searches (anonymous data). The goal is to identify when companies are in a specific part of the buying cycle, so sales can approach them now. (Example: show click-to-chat when the analytics says that the customer is ready to buy)

Lastly, Peter Brodsky @HyperScience  spoke about tools they are developing to speed machine learning. These include

  1. Tools to make it easier to add new data sets
  2. need to match fields, such as date which may be in different formats
  3. what to do with missing data
  4. need labeled data – lots of examples
  5. Speed up training time

The speed up is done by identifying subnets within the larger neural network. The subnets perform distinct functions. To determine if two subnets (in different networks) are equivalent, move one subnet from one network to replace another subnet in another network and see if the function is unchanged: Freeze the weights within the subnet and outside the subnet. Retrain the interface between the net and the subnet.

This creates building blocks which can be combined into larger blocks. These blocks can be applied to jump start the training process.

 

posted in:  AI, applications, Big data, data analysis, Data Driven NYC, startup    / leave comments:   No comments yet