Posted on June 29th, 2017
06/29/2017 @Galvanize, 315 Hudson Street, NY, 2nd floor
Gary Kazantsev @Bloomberg spoke about trends, predictions, and issues related to #MachineLearning.
He first outlined the major trends in #AI
- Interdisciplinary research – inputs from psychology, neurobiology
- Openness in research – sharing of the latest models
- AI Safety movement
- Accelerating change and research in the field
Gary next spoke about the lack of consensus on when machines will achieve general intelligence to match that of humans. Here there is bifurcation in the distribution ½ saying by 2050, ½ in 100 years or never.
He went over how AI is changing many fields including law, medicine, transportation,… He also talked about current trends of research.
He emphasizing the important of interpretability of how the model arrived at a decision. This push for interpretability comes from two directions.
- Models, especially in medicine and finance need to be understandable and be able to be modified. In medicine, a model may indicate that some acute cases need not be put in the ICU, but this is because these patients receive better treatment since they are critical. In finance, investors need to know when a new situation arises that was not in the training set.
- In 2016 the #EuropeanUnion established rules (#GeneralDataProtectionRegulation) that give consumer the right to an explanation and right to be forgotten. This rule will require a complete overhaul of algorithms so the algorithm decisions can be explained to consumers including the constituents of the training set, the parameters of the algorithm, etc.
AI ventures need more scientific due diligence
Posted on June 17th, 2017
06/14/2017 @Ebay, 625, 6th Ave, NY 3rd floor
Praveen Paritosh @Google gave a thought provoking presentation arguing that the current popularity of machine learning may be short lived unless additional rigor is introduced into the field. Such a fall in interest happened in the late 1980’s which became known as the “#AI winter”. He argues that greater openness is needed in sharing the successful methods applied to data sets and we need standardization in the benchmarks of success.
I believe that the main issue is a lack of theory explaining how the success methods work and why they are more successful than other methods. The theory needs to use a model of our understanding of the structure of the world to show why a particular method succeeds and why other methods are less successful. This paradigm would also give us a better understanding of the limits of such methods and why the world is structured as it is. It will also give us a cumulative knowledge base upon which to grow new methods.
This point of view is founded on the work of Karl Popper who argued that a theory in the empirical sciences can never be proven, but it can be falsified, meaning that it can and should be scrutinized by decisive experiments. Here, theory is essential for science since without theory there is not ability to test the validity of an approach that claims to be science.
One path to generating theory starts with the nature of the physical world and the way humans perceive the world. We assume that the physical world is made up of basic building blocks that assemble themselves in a large, but restricted, number of ways such as that generated by a fractal organization. Organisms, including humans, that take advantage of these regularities have a competitive advantage and have developed effective structures and DNA.
Appeals to greater standardization of the methods of testing machine learning are based on an inductivist approach which argues that science proceeds by incremental refinements in theory as theory and observations bootstrap themselves using enumerative induction toward universal laws. This approach is generally considered no longer tenable given the 20th century work of Popper, Thomas Kuhn, and other postpostivist philosophers of science including Paul Feyerabend, Imre Lakatos, and Larry Laudan.
What #VideoGames can do for #AI
Posted on May 28th, 2017
05/25/2017 @ Galvanize, 315 Hudson Street, NY, 2nd floor
Julian Togelius @NYU spoke about the state of competitions to create controllers to play video games. Much of what he talked about is contained in his paper on The #Mario AI Championship 2009-2012
The first winner in 2009 used an A* search of the action space. The A* algorithm is a complete search of the graph of possible actions prioritizing the search based on the distance from the origin to each current node + the estimated distance from each current node to the goal.
The contest in 2010 was won by Bojarski & Congdon – #Realm using a rule based agent
The competition has expanded to include a trying to create Bayesian networks to play Mario Brothers like a human: Togelius & Yannakakis 2012. See https://pdfs.semanticscholar.org/2d0b/34e31f02455c2d370a84645b295af6d59702.pdf
Another part of the competition seeks to create programs that can play multiple games and carry their learning from one game to the next as opposed to custom programs can only play a single game
Therefore they created a general video game playing competition – games written in Video Game Description Language. (http://people.idsia.ch/~tom/publications/pyvgdl.pdf) Programs are written in Java and access a competition API.
The programs are split into two competitions
- Get the framework, but cannot train – solutions are variations on search
- Do not get the framework, but can train the network – solutions are closer to neural nets
#Post-Selection #StatisticalInference in the era of #MachineLearning
Posted on May 6th, 2017
05/04/2017 @ ColumbiaUniversity, DavisAuditorium, CEPSR
Robert Tibshirani @StanfordUniversity talked about the adjusting the cutoffs for statistical significance testing of multiple null hypotheses. The #Bonferroni Correction has been used to adjustments for testing multiple hypothesis when the hypotheses are statistically independent. However, with the advent of #MachineLearning techniques, the number of possible tests and their interdependence has exploded.
This is especially true with the application of machine learning algorithms to large data sets with many possible independent variables which often use forward stepwise or Lasso regression procedures. Machine learning methods often use #regularization methods to avoid #overfitting the data such as data splitting into training, test and validation sets. For big data applications, these may be adequate since the emphasis on is prediction, not inference. Also the large size of the data set offsets issues such as the lower of power in the statistical tests conducted on a subset of the data.
Robert proposed a model for incremental variable selection in which each sequential test sliced off parts of the distribution for subsequent tests creating a truncated normal upon which one can assess the probability of the null hypothesis. This method of polyhedral selection works for a stepwise regression and well as a lasso regression with a fixed lambda.
When the value of lambda is determined by cross-validation, can use this method by adding 0.1 * sigma noise to the y values. This adjustment retains the power of the test and does not underestimate the probability of accepting the null hypothesis. This method can also be extended to other methods such as logistic regression, Cox proportional hazards model, graphics lasso.
The method can also be extended to consider the number of factors to use in the regression. This goals of this methodology are similar to those described by Bradley #Efron in his 2013 JASA paper on bootstrapping (http://statweb.stanford.edu/~ckirby/brad/papers/2013ModelSelection.pdf) and random matrix theory used to determine the number of principal components in the data as described by the #Marchenko-Pastur distribution.
There is a package in R: selectiveInference
Further information can be found in a chapter on ‘Statistical Learning with Sparsity’ by Hastie, Tibshirani, Wainwright (online pdf) and ‘Statistical Learning and selective inference’ (2015) Jonathan Taylor and Robert J. Tibshirani (PNAS)
Building #ImageClassification models that are accurate and efficient
Posted on April 28th, 2017
04/28/2017 @NYUCourantInstitute, 251 Mercer Street, NYC, room 109
Laurens van der Maaten @Facebook spoke about some of the new technologies used by Facebook to increase accuracy and lower processing needed in image identification.
He first talked about residual networks which they are developing to replace standard convolutional neural networks. Residual networks can be thought of as a series of blocks each of which is a tiny #CNN:
- 1×1 layer, like a PCA
- 3×3 convolution layer
- 1×1 layer, inverse PCA
The raw input is added to the output of this mini-network followed by a RELU transformation.
These transformations extract features while keeping information that is input into the block, so the map is changed, but does not need to be re-learned from scratch. This eliminates some problems with vanishing gradients in the back propagation as well as the unidentifiabiliy problem.
Blocks when executed in sequence gradually add features, but removing a block after training hardly degrades performance (Huang et al 2016). From this observation they concluded that the blocks were performing two functions: detect new features and pass through some of the information in the raw input. Therefore, this structure could be made more efficient if they pass through the information yet allowed each block to only extract features.
DenseNets gives each block in each layer access to all features in the layer before it. The number of feature maps increases in each layer, so there is the possibility of a combinatorial explosion of units with each layer. Fortunately, this does not happen as each layer adds 32 new modules but the computation is more efficient, so the aggregate amount of computation for a given level of accuracy decreases when using DenseNet in favor of ResNet while accuracy improves.
Next Laurens talked about making image recognition more efficient, so a larger number of images could be processed with the same level of accuracy in a shorter average time.
He started by noting that some images are easier to identify than others. So, the goal is to quickly identify the easy images and only spend further processing time on the harder, more complex images.
The key is noting that easy images are classified using only a coarse grid, but then harder images would not be classifiable. On the other hand, using a fine grid makes it harder to classify the easy image.
Laurens described a hybrid 2-d network in which there are layers analyzing the image using the coarse grid and layers analyzing the fine grid. The fine grain blocks occasionally feed into the coarse grain blocks. At each layer outputs are tested to see if the confidence level for any image exceeds a threshold. Once the threshold is exceeded, processing is stopped and the prediction is output. In this way, when the decision is easy, this conclusion is arrived at quickly. Hard images continue further down the layers and require more processing.
By estimating the percentage of the classifier exiting at each threshold, then can time the threshold levels so that more images can be processed within a given time budget
During the Q&A, Laurens said
- To avoid overfitting the model, they train the network on both the original images as well as these same images after small transformation have been done on each image.
- They are still working to expand the #DenseNet to see its upper limits on accuracy
- He is not aware of any neurophysiological structures in the human brain that correspond to the structure of blocks in #ResNet / DenseNet.
From #pixels to objects: how the #brain builds rich representation of the natural world
Posted on April 15th, 2017
04/06/2017 @RutgersUniversity, Easton Hub Auditorum, Fiber Optics Building, Busch Campus
Jack Galliant @UCBerkeley presented a survey of current research on mapping the neurophysiology of the visual system in the brain. He first talked about the overall view of visual processing since the Felleman and Van Essen article in Cerebral Cortex in 1992. Their work on macaque monkey showed that any brain area has a 50% chance of being connected to any other part of the brain. Visual processing can be split into 3 areas
1.Early visual area – 2.intermediate visual areas – 3.high level visual areas
With pooling nonlinear transformations between areas (the inspiration for the non-linear mappings in convolutional neural nets (CNN)). The visual areas were identified by retinotopic maps – about 60 areas in humans with macaques having 10 to 15 areas in the V1 area.
Another important contribution was by David J. Field who argued that the mammalian visual system can only be understood relative to the images it is exposed to. In addition, natural images have a very specific structure – 1/f noise in the power spectrum – due to the occlusion of images which can be viewed from any angle (see Olshausen & Field, American scientist, 2000)
This lead to research resolving natural images by characterizing them by the correlation of pairs of points. Beyond pairs of points that approach becomes too computational intensive. In summary, natural images are only a small part of the universe of images (most of which humans classify as white noise)
Until 2012, researchers needed to specify the characteristics to identify items in images, but LeCun, Bengio & Hinton, Nature, 2015 showed that Alexnet could resolve many images using multiple layer models, faster computation, and lots of data. These deep neural nets work well, but the reasons for their success have yet to be worked out (He estimates it will take 5 to 10 years for the math to catch up).
One interesting exercise is running a CNN and then looking for activation in a structure in the brain: mapping the convolutional layers and feature layers to the correspondence on layers in the visual cortex. This reveals that V1 has bi-or tri-phasic functions – Gabor functions in different orientations. This is highly efficient as a sparse code needs to activate as few neurons as possible.
Next they used motion-energy models to see how mammals detect motion in the brain Voxels in V1 (Shinji Nishimoto). They determined that monitoring takes 10 to 20ms using Utah arrays to monitor single neurons. They have animal watch movies and analyze the input images using combination of complex and simple cell models (use Keras) to model neurons in V1 and V2 using a 16ms time scale.
High level visual areas
Jack then talked about research identify neurons in high level visual areas that respond to specific stimuli. Starting with fMRI his groups (Huth, Nishimoto, Vu & Gallant, Neuron, 2012) has identified many categories: face areas vs. objects; place minus face. By presented images and mapping which voxels in the brain are activated one can see how the 2000 categories are mapped in the brain using wordmap as the labels. Similar concepts are mapped to similar locations in the brain, but specific items in the semantic visual system interact with the semantic language areas – so a ‘dog’ can active many areas so it can be used in different ways and can be unified as needed. Each person will have a different mapping depending on their previous good and bad experiences with dogs.
He talked about other topics including the challenges of determining how things are stored in places: Fourier power, object categories, subjective distance. In order to activate any of these areas in isolation, one needs enough stimulus to activate the earlier layers. They have progress by building a decoder from the knowledge of the voxel which run from the brain area backwards to create stimulus. A blood flow model are used with a 2 second minimum sampling period. But there is lots of continuity so they can reconstruct a series of images.
Intermediate visual area
Intermediate visual areas between the lower and higher levels of processing are hard to understand – looks at V4. They respond to shapes of intermediate complexity, but not much else like a curvature detector. Using fMRI they know what image features correlate with specific areas, but there is no strong indication differentiating one layer from another. Using the Utah array, they need to do a log-polar transform to improve prediction in V4. Using a receptor field model, they can create a predictor frame and match brain activity to images that gave the largest response.
To improve prediction on V4, Utah arrays need to do a log-polar transform. However, the images are messy and predicting V4 is not the same as understanding V4.
Finally, he talked about attenuation and tuning effects on single neurons. In an experiment in subjects watched a movie and were asked to search for either humans or vehicles, there were changes in the semantic map based on the search criterion. These tuning shift effects are a function of distance to visual periphery: Attentional effects are small in V1 and get larger in the ensuing layers.
In the Q&A, he made the following points:
- The visual word form area in the brain becomes active as you learn to read. This change does not occur for people who are illiterate.
- One of the experimental assumptions is that the system is stationary, so there is not adaptation. If adaptation does occur, then they cannot compute a noise ceiling for the signals.
[Neural nets take inspiration from the neurobiology, especially the creation of convolutional neural nets, but there is now feedback with neurobiology using the tools created in machine learning to explore possible models of brain mapping. Does the pervasive existence of Gabor filters lead to an argument that their presence indicates that natural images are closely allied with fractal patterns?]
#Driverless #Trucks will come before driverless #cars
Posted on April 13th, 2017
04/12/2017 @MetroTech 6, NYU, Brooklyn, NY
Seth Clevenger – technology editor, Transport Topics News, @sethclevenger, talked about the rollout of driverless trucks. His main message was that there are many intermediate stages from adaptive cruise control (already exists in some cars) to fully autonomous operation.
Truck manufacturers are concentrating on systems that assist rather than replace drivers. These include
- Truck platooning – could roll out by year-end. – synchronize breaking; trucks can draft off each other for a 10% increase in efficiency. Brakes are linked, but still need drivers.( Peloton Technology plans to begin fleet trials)
- Connected vehicles – just starting to be regulated. (V2V, V2I). For instance, safety messages sent by each vehicle.
- auto docking at loading docks
- traffic jam assist – move forward slowly without driver assistance
Startups include: Uber/Otto, Embark, Starsky Robotics, Driver.ai
[One of my major concerns is the integrity of the software controlling the vehicle. A failure in software could cause accidents, however, my main concern is the potential insertion of a malicious virus as a sleeper cell within the millions of lines of code. In this case, the results could be catastrophic as all breaking and acceleration systems could be programmed to fail on a specific date in the future. At that moment, all vehicles on the road would be out of control potentially resulting in millions of accidents and thousands of deaths and injuries. Preventing such an event will require coordinating amongst suppliers and enforcement of strict software standards. The large number of suppliers makes this job especially complicated. This sleeper cell could lie dormant for years before it is activated.]
Applications of #DeepLearning in #Healthcare
Posted on March 28th, 2017
03/28/2017 @NYU Courant Institute (251 Mercer St, New York, NY)
Sumit Chopra, the head of A.I. Research @Imagen Technologies, introduced the topic by saying that the two areas in our lives that will be most affect by AI are healthcare and driverless cars.
Healthcare data can be divided into
- Other – cell phones, etc.
Payer data – from insurance provider
Clinical data – incomplete since hospitals don’t share their datasets; digital form with privacy concerns
Payer data more complete unless the patient switches the payer, less detail.
He focuses on medical imaging – mainly diagnostic radiology – 600mm studies in the U.S., but shortage of skilled radiologists. Prevalence of errors. The images are very large size, high resolution, low contrast, highly subtle cues => radiology is hard to do well
Possible solution: pre-train a standard model: Alexnet/VGG/… on a small number of images, but this might not work since the signal is subtle.
Also radiology reports, which could be used for supervised training, are unstructured and it’s hard to tell what the report tells you. => weak labels at best
Much work has been done on this problem, usually using deep convolutional neural nets.
First step: image registration = rotate & crop.
Train a deep convolutional network (registration network) , the send to a detection network for binary segmentation.
Could use generative models for images to train doctors
Leverage different modalities of data
Sumit has round that a random search of hyperparameter space works better than either grid search or optimizer search.
Building an #AI #AutonomousAgent using #SupervisedLearning with @DennisMortensen
Posted on March 23rd, 2017
03/21/2017 @ Rise, 43 West 23rd Street, NY, 2nd floor
In mid 2013, firstname.lastname@example.org started x.ai to employ machine learning to set up meetings. After an introduction to the software, Dennis talked about the challenges for creating a conversational agent to act as your assistant setting up business meetings.
He talked about the 3 processes within the agent: NLU + reasoning + NLG
Natural Language Understanding needs to define the universe – what is it we can do and what is it that we cannot do and will not do?
Natural Language Understanding (NLU) Challenges
- Define intents then hire AI trainers. Need to get the intents right since it’s expensive to change to a different scheme
- What data set do we align to? What are the guidelines for labeling? Coders need to learn and remember the rules defining all the intents. Need to keep it compact, but not too much so
- They have 101 AI trainers full time. On what software do they label the words? Need a custom-built annotation platform. Spent 2 years building it.
- How do people want the agent to behave? Manually determine what is supposed to happen. This will create a new intent, but this often requires changes in the coding of the NLU
- Some of the things humans want to do are very complicated. Especially common sense
- don’t do meeting after 6:00, but if there is one at 6:15, there is a reason for this happening.
- a 6:30 PM call to Singapore might be a good idea.
- When to have a meeting and when to have a phone call
Natural Language Generation (NLG) challenges
- They have 2 interaction designers
- Need to inject empathy if it’s appropriate. For instance if there is a change in schedule, we need to respond appropriately: understanding initially and more assertive if the change needs to be unchanged. Also need to honor requests to speak in a given language.
They evaluate the performance of the software when being used by a client by
- customer-centric metrics, such as the number of schedule changes
- is the customer happy?
NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)
Posted on November 22nd, 2016
11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor
Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)
Speaker 2: Top-down vs. Bottom-up Computational Creativity – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)
Matthew Russell @DigitalReasoning spoke about understanding language using NLP, relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:
- Vagrant – allow an environmental setup.
- Jupyter Notebook – like a lab notebook
- Git – version control
- Automation –
He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.
He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.
He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.
In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of
Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure
Bottom-up: data driven – data driven, “hands off”, you learn the structure
He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other
To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.
It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.
As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.
Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative
They like a mix of top-down and bottom-up approaches:
- Want speed, but neural nets are slow
- Music has a lot of theory behind it, so it’s best to let the programmers code these rules
- Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars
Runtime written in Haskell – functional language so its great for music