NYAI#5: Neural Nets (Jason Yosinski) & #ML For Production (Ken Sanford)
Posted on August 24th, 2016
08/24/2016 @Rise 43 West 23rd Street, NY, 2nd floorPreview Changes
Jason Yosinski@GeometricTechnology spoke about his work on #NeuralNets to generate pictures. He started by talking about machine learning with feedback to train a robot to move more quickly and using feedback to computer-generate pictures that are appealing to humans.
Jason next talked about AlexNet, based on work by Krizhevsky et al 2012, to classify images using a neural net with 5 convolutional layers (interleaved with max pooling and contrast layers) plus 3 fully connected layers at the end. The net with 60 million parameters was training on ImageNet which contains over 1mm images. His image classification Code is available on http://Yosinski.com.
Jason talked about how the classifier thinks about categories when it is not being trained to identify that category. For instance, the network may learn about faces even though there is no human category since it helps the system detect things such as hats (above a face) to give it context. It also identifies text to give it context on other shapes it is trying to identify.
He next talked about generating images by inputting random noise and randomly changing pixels. Some changes will cause the goal (such as a ‘lions’) to increase in confidence. Over many random moves, the goal increases in its confidence level. Jason showed many random images that elicited high levels of confidence, but the images often looked like purple-green slime. This is probably because the network, while learning, immediately discards the overall color of the image and is therefore insensitive to aberrations from normal colors. (See Erhan et al 2009)
[This also raises the question of how computer vision is different from human vision. If presented with a blue colored lion, the first reaction of a human might be to note how the color mismatches objects in the ‘lion’ category. One experiment would be to present the computer model with the picture of a blue lion and see how it is classified. Unlike computers, humans encode information beyond their list of items they have learned and this encoding includes extraneous information such as color or location. Maybe the difference is that humans incorporate a semantic layer that considers not only the category of the items, but other characteristics that define ‘lion-ness’. Color may be more central to human image processing as it has been conjectured that we have color vision so we can distinguish between ripe and rotten fruits. Our vision also taps into our expectation to see certain objects within the world and we are primed to see those objects in specific contexts, so we have contextual information beyond what is available to the computer when classifying images.]
To improve the generated pictures of ‘lions’, he next used a generator to create pictures and change them until they get a picture which has high confidence of being a ‘lion’. The generator is designed to create identifiable images. The generator can even produce pictures on objects that it has not been trained to paint. (Need to apply regularization to get better pictures for the target.)
Slides at http://s.yosinski.com/nyai.pdf
In the second talk, Ken Sanford @Ekenomics and H20.AI talked about the H2O open source project. H2O is a machine learning engine that can run in R, Python,Java, etc.
Ken emphasized how H2O (a multilayer feed forward neural network) provides a platform that uses the Java Score Code engine. This easies the transition from the model developed in training and the model used to score inputs in a production environment.
He also talked about the Deep Water project which aims to allow other open source tools, such as MXNET, Caffe, Tensorflow,… (CNN, RNN, … models) to run in the H2O environment.
Board #Game #Design
Posted on August 15th, 2016
Central Jersey Mensa @Mensa
08/12/2016 @ APA Hotel Woodbridge, 120 S Wood Ave, Iselin, NJ
Gil Hova @FormalFerretGames, a designer and publisher of #BoardGames talked about how to design a good board game.
Unlike transformative games (only play once, but it changes you), his games are entertaining, but he emphasized that fun is not a general term, but needs to be applied to a specific audience.
To describe his approach, Gil talked about four key terms
- flow – feeling of being in the zone. Ways to get players in the flow include
- clear set of goals
- immediate feedback
- goals neither too difficult nor too easy
- need to make challenges progressively more difficult
- fiero – feeling after triumphing over adversity. emotional peak. counterbalance to flow. its a fleeting moment. For instance, the “take that” mechanism in which you punish another player. The concept of about meaningful play. see Jane McGonigal – “Reality is broken”.
- heuristics – rules of thumb that are not part of the rules, but ways that players figure out to play the game – e.g. bluffing in poker. The developer needs to see how rule changes change behavior. Players start with “zero level heuristics” that they use the first time you play a game. As your play more, you “climb the heuristic tree” It’s also called “lenticular design” as we see new things every time we play the game. The heuristic tree can have many shapes:
- bush (e.g. tic-tac-toe, only 1 or 2 heuristic rules to win –e.g. take the center square)
- palm tree – a long climb before you understand how to play the game and then there are a lot of tools at your disposal
- sequoia – lots of heuristic levels with new concepts & tools at each level (e.g. chess)
- core engagement – the core that appeals to players. The one thing on which game is based.
- Scrabble – mastery of words
- Bridge – communication with partner
The key thing is to incentivize interesting behavior: “game design is mind control”
If game is too random, then the play becomes not meaningful. e.g. Flux
The game needs to reward good play. The game needs to get them into the 4 key terms.
Theme and Mechanism – it doesn’t matter which comes first, but it helps if they support each other.
The theme is a promise to the players, so make the mechanism consistent with the player’s expectations from the theme.
If there is no theme, then its better be simple to explain the rules.
MDA – mechanism dynamic aesthetics – dynamics is the intersection of the aesthetics and the mechanism.
Players start with the theme and drill down to the mechanism.
Designer starts with the mechanism and moves to the theme.
In the goal of uniting the theme and mechanism, Gil advises – remove the flavor text on the card (used to describe the card) since the flavor of the card should be implied by how the card plays
Gil then talked about the game development process he uses: 4 stages of play testing
- proof of concept – play solo. is this a game? is it interesting?
- alpha – plain broad strokes, talk about it, play with other designers. discuss why it broke. discard after each play test
- beta functional, balanced, show it. its now a functional game. Google image for graphics
- gamma beautiful, graphic tests, release to market
and 3 types of playtesters that he uses during the play testing stages
- silent tester – just a silent opponent
- brilliant tester – “what if you could do this”
- crazy tester – play with a opponent that tries things you have not considered.
Gil closed by talking generally about the game-development industry
you cannot play a great idea!
no one will steal your game.
do not ask for an NDA
don’t be attached to your game
let your game be what it wants to be
he recommends listing to the podcast, flip the table, which looks at obscure board games.
You will need 75 to 100 tests overall to get from idea to published game.
#Unsupervised Learning (Soumith Chintala) & #Music Through #ML (Brian McFee)
Posted on July 26th, 2016
07/25/2016 @Rise, 28 West 24rd Street, NY, 2nd floor
Two speakers spoke about machine learning
In the first presentation, Brian McFee @NYU spoke about using ML to understanding the patterns of beats in music. He graphs beats identified by Mel-frequency cepstral coefficients (#MFCCs)
Random walk theory combines two representations of points in the graph.
- Local: In the graph, each point is a beat, edge connect adjacent beats. Weight edges by MFCC .
- Repetition: Link k-nearest neighbor by repetition = same sound – look for beats. Weight by similarity (k is set to the square root of the number of beats)
- Combination: A = mu * local + (1-mu)*repetition; optimize mu for a balanced random walk , so probability of a local move – probability of a repetition move over all vertices. Use a least squares optimization to find mu so the two parts of the equation make equal contributions across all points to the value of A.
The points are then partitioned by spectral clustering: normalize Laplacian – take bottom eigenvectors which encode component membership for each beat; cluster the eigenvectors Y of L to reveal the structure. Gives hierchical decomposition of the time series. m=1, the entire song. m=2 gets the two components of the song. As you add more eigenvectors, the number of segments within the song increases.
Brain then showed how this segmentation can create compelling visualizations of the structure of a song.
The Python code used for this analysis is available in the msaf library.
He has worked on convolutional neural nets, but find them to be better at handing individual notes within the song (by contrast, rhythm is over a longer time period)
In the second presentation, Soumith Chintala talked about #GenerativeAdversarialNetworks (GAN).
Generative networks consist of a #NeuralNet “generator” that produces an image. It takes as input a high dimensional matrix (100 dimensions) of random noise. In a Generative Adversarial Networks a generator creates an image which is optimized over a loss function which evaluates “does it look real”. The decision of whether the image looks real is determined by a second neural net “discriminator” that tries to pick the fake image from a set of other real images plus the output of the generator.
Both the generator and discriminator NN’s are trained by gradient descent to optimize their individual performance: Generator = max game; discriminator = min game. The process optimizes Jensen-Shannon divergence.
Soumith then talked about extensions to GAN. These include
Class-conditional GANS – take noise + class of samples as input to the generator.
Video prediction GANS –predict what happens next given the previous 2 or 3 frames. Added a MSE loss (in addition to the discriminator classification loss) which compares what happened to what is predicted
Deep Convolution GAN – try to make the learning more stable by using a CNN.
Text-conditional GAN – input =noise + text. Use LSTM model on the text input. Generate images
Disentangling representations – InfoGAN – input random noise + categorical variables.
GAN is still unstable especially for larger images, so work to improve it includes
- Feature matching – take groups of features instead of just the whole image.
- Minibatch learning
No one has successfully used GAN for text-in to text-out
The meeting was concluded by a teaser for Watchroom – crowd funded movie on AI and VR.
Automatically scalable #Python & #Neuroscience as it relates to #MachineLearning
Posted on June 28th, 2016
06/28/2016 @Rise, 43 West 23rd Street, NY, 2nd floor
Braxton McKee (@braxtonmckee ) @Ufora first spoke about the challenges of creating a version of Python (#Pyfora) that naturally scales to take advantage of the hardware to handle parallelism as the problem grows.
Braxton presented an example in which we compute the minimum distance from target points a larger universe of points base on their Cartesian coordinates. This is easily written for small problems, but the computation needs to be optimized when computing this value across many cpu’s.
However, the allocation across cpu’s depends on the number of targets relative to the size of the point universe. Instead of trying to solve this analytically, they use a #Dynamicrebalancing strategy that splits the task and adds resources to the subtasks creating bottlenecks.
This approach solves many resource allocation problems, but still faces challenges
- nested parallelism. They look for parallelism within the code and look for bottlenecks at the top level of parallelism and split the task into subtasks at that level, …
- the data do not fit in memory. They break tasks into smaller tasks. They also have each task know which other caches hold data, so they can be accessed directly without going to slower main memory
- different types of architectures (such as gpu’s) require different types of optimization
- the optimizer cannot look inside python packages, so cannot optimize a bottleneck within a package.
- is a just-in-time compiler that moves stack frames from machine-to-machine and senses how to take advantage of parallelism
- tracks what data a thread is using
- dynamically schedules threads and data
- takes advantage of mutability which allows the compiler to assume that functions do no change over time so the compiler can look inside the function when optimizing execution
- is written on top of another language which allows for the possibility of porting the method to other languages
In the second presentation, Jeremy Freeman @Janelia.org spoke about the relationship between neuroscience research and machine learning models. He first talking about the early works on understanding the function of the visual cortex.
Findings by Hubel & Wiesel in1959 have set the foundation for visual processing models for the past 40 years. They found that Individual neurons in the V1 area of the visual cortex responded to the orientation of lines in the visual field. These inputs fed neurons that detect more complex features, such as edges, moving lines, etc.
Others also considered systems which have higher level recognition and how to train a system. These include
Perceptrons by Rosenblatt, 1957
Neocognitrons by Fukushima, 1980
Hierarchical learning machines, Lecun, 1985
Back propagation by Rumelhart, 1986
His doctoral research looked at the activity of neurons in V2 area. They found they could generate high order patterns that some neurons discriminate among.
But in 2012, there was a jump in performance of neural nets – U. of Toronto
By 2014, some of the neural network algos perform better than humans and primates, especially in the area of image processing. This has lead to many advances such as Google deepdream which combines images and texture to create an artistic hybrid image.
Recent scientific research allows one to look at thousands of neurons simultaneously. He also talked about some of his current research which uses “tactile virtual reality” to examine the neural activity as a mouse explores a maze (the mouse walks on a ball that senses it’s steps as it learns the maze).
Jeremy also spoke about Model-free episodic control for complex sequential tasks requiring memory and learning. ML research has created models such as LSTM and Neural Turing Nets which retain state representations. Graham Taylor has looked at neural feedback modulation using gates.
He also notes that there are similar functionalities between the V1 area in the visual cortex, the A1 auditory area, and the S1, tactile area.
To find out more, he suggested visiting his github site: Freeman-lab and looking the web site neurofinder.codeneuro.org.
Utilising #Blockchain #Technology in Financial Markets
Posted on June 26th, 2016
06/22/2016 @Downtown Conference Center, 157 William Street, NY
George Samman (@sammantic ) @KPMG and four panelists:
Elliot Noma, Managing Director, @GarrettAssetManagement
Christopher Burniske, Analyst & Blockchain Products lead, @ARKInvestmentManagement
Jared Harwayne-Gidansky, @BlockchainSME
Christopher Boivin, CFA, Vice President, Markets Strategy, @BNYMellon Markets
spoke about the diverse types of blockchains and their differing uses. Many of the issues covered are discussed here.
Much of the discussion focused on the recent problem at Ethereum in which a hacker used a malicious smart contract to steal $53 million in Ether cybercurrency. See. The ensuing discussion brought forth many issues including public vs. private , permissioned vs permissionless, the legal status of smart contracts, the fragility and errors in all software.
George emphasized the number of decisions one needs to make to determine the blockchain structure needed for any specific application.
Other blockchain presentations by George can be found here.
DataDrivenNYC: bringing the power of #DataAnalysis to ordinary users, #marketers, #analysts.
Posted on June 18th, 2016
06/13/2016 @AXA Equitable Center (787 7th Avenue, New York, NY 10019)
The four speakers were
- Nitay Joffe, Founder and CTO of ActionIQ (next-generation data platform for marketing and consumer data)
- Adam Kanouse, CTO of Narrative Science (transforms data into meaningful and insightful narratives)
- Neha Narkhede, Founder and CTO of Confluent (real-time data platform built around Apache Kafka)
- Christopher Nguyen, Founder and CEO of Arimo (data intelligence platform)
Adam @NarrativeScience talked about how people with different personalities and jobs may require/prefer different takes on the same data. His firm ingests data and has systems to generate natural language reports customized to the subject area and the reader’s needs.
They current develop stories with the guidance of experts, but eventually will more to machine learning to automate new subject areas.
Next, Neha @Confluent talked about how they created Apache Kafka: a streaming platform which collects data and allows access to these data in real time.
Advanced #DeepLearning #NeuralNets: #TimeSeries
Posted on June 16th, 2016
06/15/2016 @Qplum, 185 Hudson Street, Jersey City, NJ, suite 1620
Sumit then broke the learning process into two steps: feature extraction and classification. Starting with raw data, the feature extractor is the deep learning model that prepares the data for the classifier which may be a simple linear model or random forest. In supervised training, errors in the prediction output by the classifier are feed back into the system using back propagation to tune the parameters of the feature extractor and the classifier.
In the remainder of the talk Sumit concentrated on how to improve the performance of the feature extractor.
In the general text classification (unlike image or speech recognition) the length of the input can be very long (and variable in length). In addition, analysis of text by general deep learning models
- does not capture order of words or predictions in time series
- can handle only small sized windows or the number of parameters explodes
- cannot capture long term dependencies
So, the feature extractor is cast as a time delay neural networks (#TDNN). In TDNN, the words are text is viewed as a string of words. Kernel matrices (usually of from 3 to 5 unit long) are defined which compute a dot products of the weights of the words in a contiguous block of text. The kernel matrix is shifted one word and the process is repeated until all words are processed. A second kernel matrix creates another set of features and so forth for a 3rd kernel, etc.
These features are then pooled using the mean or max of the features. This process is repeated to get additional features. Finally a point-wise non-linear transformation is applied to get the final set of features.
Unlike traditional neural network structures, these methods are new, so no one has done a study of what is revealed in the first layer, second layer, etc. Also theoretical work is lacking on the optimal number of layers for a text sample of a given size.
Historically, #TDNN has struggled with a series of problem including convergence issues, so recurrent neural networks (#RNN) were developed in which the encoder looks at the latest data point along with its own previous output. One example is the Elman Network, which each feature is the weighted sum of the kernel function (one encoder is used for all points on the time series) output with the previously computed feature value. Training is conducted as in a standard #NN using back propagation through time with the gradient accumulated over time before the encoder is re-parameterized, but RNN has a lot issues
1, exploding or vanishing gradients – depending on the largest eigenvalue
2. cannot capture long-term dependencies
3. training is somewhat brittle
The fix is called Long short-term memory. #LSTM, has additional memory “cells” to store short-term activations. It also has additional gates to alleviate the vanishing gradient problem.
(see Hochreiter et al . 1997). Now each encoder is made up of several parts as shown in his slides. It can also have a forget gate that turns off all the inputs and can peep back at the previous values of the memory cell. At Facebook, NLP and speech and vision recognition are all users of LSTM models
LSTM models, however still don’t have a long term memory. Sumit talked about how creating memory networks which will take a store and store the key features in a memory cell. A query runs against the memory cell and then concatenates the output vector with the text. A second query will retrieve the memory.
He also talked about using a dropout method to fight overfitting. Here, there are cells that randomly determine whether a signal is transmitted to the next layer
Autocoders can be used to pretrain the weights within the NN to avoid problems of creating solution that are only locally optimal instead of globally optimal.
[Many of these methods are similar in spirit to existing methods. For instance, kernel functions in RNN are very similar to moving average models in technical trading. The different features correspond to averages over different time periods and higher level features correspond to crossovers of the moving averages.
The dropoff method is similar to the techniques used in random forest to avoid overfitting.]
Hardwired: product #design and delivering #magic
Posted on June 11th, 2016
06/07/2016 @ WeWork, 115 West 18rd Street, NY, 4th floor
New Lab and Techstars talked briefly before the four speakers:
- Martin Broen, VP of Global Product Design at Pepsi
- Chris Allen, Founder and CEO of iDevices (connected home products)
- Bob Coyne, Founder and CTO of WordsEye (create 3D scenes simply by describing them in words)
- Josh Clark, Founder of Big Medium (design strategy and user experience for a mobile, multiscreen world). Josh will talk about “magical UX for IoT”.
In the first presentation, Bob Coyne @Wordseye talked about his utility that takes a text description of a scene and creates an image matching that description. This allows users to create 3-d mages without complicated #3-d graphics programs.
They parse sentences to create a semantic map which can include commands to place items, change the lighting, reorient objects, etc. They see uses in education, gaming, and image search.
[Graphics are currently primitive and the manipulations are rough, but there are only 7 months old. Has promise for creating avatars and scenes for game prototypes. Text lack the subtly of gestures, so text may need to be supplemented by gestures or other inputs.]
In the second presentation, Chris Allen @ iDevices – developers of connected home products and software – talked about the evolution of the company from an initial product in 2009 which was a connected grill.
Since then they have raised $20 million, were asked by Apple to develop products for HomeKit, currently market 7 HomeKit enabled products.
Experiences he communicated:
- Do you own research (don’t rely on conventional wisdom): despite being told that $99 was too high a price, they discovered that reducing the price to $75 did not increase sales.
- Resist pivoting away from your vision, especially when you have not intellectual property advantage: a waterproof case for phones failed.
- Create a great work environment and give your workers equity
- They build products that are compatible across platforms, but concentrate on just the three main platforms: Siri, Google, Amazon.
Next, Josh Clark @BigMedium talked about his vision of the future of interfaces: they will leap off the screen combining #speech and #gestures. They will be as magically as the devices in the world of Harry Potter. Unlike the Google glass, which was always an engineering project, we should be asking how can we make any object (even of a coffee cup) do more: design for the thing’s essential ‘thingness’.
Technology should be invisible, but magical:
- You can stand in front of a mirror memory and see how you look with a different color dress, or replay a video of what you look like when you turn around or do a side-by-side comparison with a previously worn dress.
- Asthmapolis site – when you have an asthma attack, you tap an app. Over time you can see across individuals their locations when they have an attack.
- A hackathon app using the Kinect in which one gestures to grab an image off a video so a still image from that moment appears on the phone.
It’s a challenge of imagination.
If the magic fails, we need to make sure the analogue device still works.
[In some cases, magic may not be enough. For instance, Asthmapolis pivoted away from ashma alone and now concentrates on a broader range of symptoms ]
In the last presentation, Martin Brioen@Pepsi talked about how his design team uses #prototyping to lead the development of new ideas.
Different groups within Pepsi have different perspectives and different priorities, so each views ideas differently, but to the get a consensus they all was to need to interact with the new product so they can see, touch, …
At each phase of development you use a different tools concentrated on the look of it, the feel of it, the functionality, etc. At each stage people need to interact with it to test it out. Don’t wait until you have a finished product. Don’t skip steps. Consider the full journey of the consumer;
Employ the least expensive way to try it out
They are not selling product, they are selling experiences: they create a test kitchen for the road.
#TensorFlow and Cloud Machine Learning
Posted on June 7th, 2016
06/06/2016 @Audible Inc, 1 Washington Place, Newark, NJ 15th floor
Joshua Gordon @Google talked about #MachineLearning and the TensorFlow package. TensorFlow is an open source library of machine learning programs. Using the library you can manipulate tensors by defining graphs that are functions operating on the multivariate structures.
The library runs on Linux and OSx. The library runs on Windows using Docker. Support for Android is on the way. Joshua showed several applications including one that repaints an image in van Gogh’s style by merging levels from the network identifying colors from the original image with layers from a second network trained on the painter’s style.
Next, Yufeng Guo @Google talked about out-of-the-box machine learning APIs to classify images. Google has a cloud vision API and will shortly release a speech API.
The vision API imports a jpg file and outputs a description in JSON format including items identified and the confidence that the items are correctly identified. It also gives the coordinates of items identified and links to the full description of the items in Google’s database. The face detection routine also outputs information such as the rollAngle, JoyLikelihood, etc. The service is free for up to 1000 requests per month.
How to Build a Bulletproof #SDK
Posted on June 3rd, 2016
06/02/2016 @Yahoo, 229 West 43rd Street, NY, 10th floor
The speaker from @Flurry emphasized four main themes on the way to making happy developers using your SDK:
- Respect users (hardware, not people in this case)
- Respect developers (people)
- Clarify assumptions (more about developers)
- Things you can’t control
Within each theme
- Respect users.
- Be considerate of battery life. Actions include
- Limit network calls
- Illuminate the screen only when necessary
- Network time is expensive – keep it to a minimum by downloading once and keeping the download in memory
- Phone space is limited – when you are done with the data you have downloaded, delete it.
- Minimize startup time
- Use techniques to keep startup time to below 2 seconds
- Don’t block the main thread
- If possible defer loading until after startup
- Be considerate of battery life. Actions include
- Respect developers
- Don’t do anything that causes the app to be rejected from the store, such as renaming system variables in iOS or using Id’s in Android that you should not reference
- Don’t violate any store policies
- Don’t request information that is off limits
- Don’t call private APIs
- Don’t put all your good ideas in a single SDK
- Bloatware is not welcome (see phone space and startup time above)
- It’s often better to have several small SDKs
- Create slim SDKs and ones that don’t leak
- Don’t do anything that causes the app to be rejected from the store, such as renaming system variables in iOS or using Id’s in Android that you should not reference
- Clarify assumptions
- Document all your assumptions
- Even better, design the API’s so developers can’t violate assumptions
- If the SDK fails, complain LOUDly in the debug logs
- Things you can’t control
- You need to be vigilant for system changes
- There is nothing you can do about them, but react quickly
There are differences between #iOS and #Android that require some modifications in the SDK. One example is in the speed of the exit from an app. Apple devices tend to have less memory, so they are more aggressive in terminating apps quickly and reclaiming memory. This is less so in Android.