AI ventures need more scientific due diligence
Posted on June 17th, 2017
06/14/2017 @Ebay, 625, 6th Ave, NY 3rd floor
Praveen Paritosh @Google gave a thought provoking presentation arguing that the current popularity of machine learning may be short lived unless additional rigor is introduced into the field. Such a fall in interest happened in the late 1980’s which became known as the “#AI winter”. He argues that greater openness is needed in sharing the successful methods applied to data sets and we need standardization in the benchmarks of success.
I believe that the main issue is a lack of theory explaining how the success methods work and why they are more successful than other methods. The theory needs to use a model of our understanding of the structure of the world to show why a particular method succeeds and why other methods are less successful. This paradigm would also give us a better understanding of the limits of such methods and why the world is structured as it is. It will also give us a cumulative knowledge base upon which to grow new methods.
This point of view is founded on the work of Karl Popper who argued that a theory in the empirical sciences can never be proven, but it can be falsified, meaning that it can and should be scrutinized by decisive experiments. Here, theory is essential for science since without theory there is not ability to test the validity of an approach that claims to be science.
One path to generating theory starts with the nature of the physical world and the way humans perceive the world. We assume that the physical world is made up of basic building blocks that assemble themselves in a large, but restricted, number of ways such as that generated by a fractal organization. Organisms, including humans, that take advantage of these regularities have a competitive advantage and have developed effective structures and DNA.
Appeals to greater standardization of the methods of testing machine learning are based on an inductivist approach which argues that science proceeds by incremental refinements in theory as theory and observations bootstrap themselves using enumerative induction toward universal laws. This approach is generally considered no longer tenable given the 20th century work of Popper, Thomas Kuhn, and other postpostivist philosophers of science including Paul Feyerabend, Imre Lakatos, and Larry Laudan.
Investing using #DeepLearning, #MacroTrading and #Chatbots
Posted on June 2nd, 2017
Qplum, 185 Hudson Street , Jersey City, suite 1620
Mansi Singhal and Gaurav Chakravorty @Qplum gave two presentations on how Qplum uses machine learning within a systematic macro investment strategy. Mansi talked about how a macro economic world view is used to focus the ML team on target markets. She walked the audience through an economic analysis of the factors driving the U.S. residential housing market and how an understanding of the drivers (interest rates, GDP, demographics,…) and anticipation of future economic trends (e.g. higher interest rates) would lead them to focus on (or not consider) that market for further analysis by the ML group.
Gaurav (http://slides.com/gchak/deep-learning-making-trading-a-science#/) talked about how they use an AutoEncoder to better understand the factors driving a statistical arbitrage strategy. Here, instead of using a method like principal components analysis, they use a deep learning algorithm to determine the factors driving the prices of a group of stocks. The model uses a relatively shallow neural net. To understand the underlying factors, they look at which factors are the largest driver of current market moves and determine the historical time periods when this factor has been active. One distinction between their factor models and classic CAPM models is that non-linearities are introduced by the activation functions within each layer of the neural net.
Next, Aziz Lookman talked about his analysis showing that an analysis of county-by-county unemployment rates affects the default rates (and therefore the investment returns) on loans within Lending Club.
Lastly, Hardik Patel @Qplum talked about the opportunities and challenges of creating a financial chatbot. The opportunity is that the investment goals and concerns are unique for each customer, so each will have different questions and need different types of information and advice.
The wide variety of questions and answers challenges the developer so their approach has been to develop and LSTM model of the questions which will point the bot to a template that will generate the answer. Their initial input will use word vectors and bag of words methods to map questions to categories.
What #VideoGames can do for #AI
Posted on May 28th, 2017
05/25/2017 @ Galvanize, 315 Hudson Street, NY, 2nd floor
Julian Togelius @NYU spoke about the state of competitions to create controllers to play video games. Much of what he talked about is contained in his paper on The #Mario AI Championship 2009-2012
The first winner in 2009 used an A* search of the action space. The A* algorithm is a complete search of the graph of possible actions prioritizing the search based on the distance from the origin to each current node + the estimated distance from each current node to the goal.
The contest in 2010 was won by Bojarski & Congdon – #Realm using a rule based agent
The competition has expanded to include a trying to create Bayesian networks to play Mario Brothers like a human: Togelius & Yannakakis 2012. See https://pdfs.semanticscholar.org/2d0b/34e31f02455c2d370a84645b295af6d59702.pdf
Another part of the competition seeks to create programs that can play multiple games and carry their learning from one game to the next as opposed to custom programs can only play a single game
Therefore they created a general video game playing competition – games written in Video Game Description Language. (http://people.idsia.ch/~tom/publications/pyvgdl.pdf) Programs are written in Java and access a competition API.
The programs are split into two competitions
- Get the framework, but cannot train – solutions are variations on search
- Do not get the framework, but can train the network – solutions are closer to neural nets
#Post-Selection #StatisticalInference in the era of #MachineLearning
Posted on May 6th, 2017
05/04/2017 @ ColumbiaUniversity, DavisAuditorium, CEPSR
Robert Tibshirani @StanfordUniversity talked about the adjusting the cutoffs for statistical significance testing of multiple null hypotheses. The #Bonferroni Correction has been used to adjustments for testing multiple hypothesis when the hypotheses are statistically independent. However, with the advent of #MachineLearning techniques, the number of possible tests and their interdependence has exploded.
This is especially true with the application of machine learning algorithms to large data sets with many possible independent variables which often use forward stepwise or Lasso regression procedures. Machine learning methods often use #regularization methods to avoid #overfitting the data such as data splitting into training, test and validation sets. For big data applications, these may be adequate since the emphasis on is prediction, not inference. Also the large size of the data set offsets issues such as the lower of power in the statistical tests conducted on a subset of the data.
Robert proposed a model for incremental variable selection in which each sequential test sliced off parts of the distribution for subsequent tests creating a truncated normal upon which one can assess the probability of the null hypothesis. This method of polyhedral selection works for a stepwise regression and well as a lasso regression with a fixed lambda.
When the value of lambda is determined by cross-validation, can use this method by adding 0.1 * sigma noise to the y values. This adjustment retains the power of the test and does not underestimate the probability of accepting the null hypothesis. This method can also be extended to other methods such as logistic regression, Cox proportional hazards model, graphics lasso.
The method can also be extended to consider the number of factors to use in the regression. This goals of this methodology are similar to those described by Bradley #Efron in his 2013 JASA paper on bootstrapping (http://statweb.stanford.edu/~ckirby/brad/papers/2013ModelSelection.pdf) and random matrix theory used to determine the number of principal components in the data as described by the #Marchenko-Pastur distribution.
There is a package in R: selectiveInference
Further information can be found in a chapter on ‘Statistical Learning with Sparsity’ by Hastie, Tibshirani, Wainwright (online pdf) and ‘Statistical Learning and selective inference’ (2015) Jonathan Taylor and Robert J. Tibshirani (PNAS)
Posted on May 6th, 2017
05/04/2017 @Ebay, 625 6th Ave, NY 3rd floor
John Novak @QxBranch talked about the process in developing quantum computers. The theory is based on Adiabatic optimization. With each qubit is started at low energy levels along with couplings with the energy levels amplified so there is a high probability that the correct solution state will be the realized output when the quantum field collapses.
In the architecture of the D-Wave computer, qubits are organized in 4 x 4 cells in a pattern called a Chimera graph. These nodes are joined together to increase the number of digits. This raises certain challenges since all nodes are not connected to all other nodes: some logical nodes need to be represented multiple items in the physical computer.
Other challenges are running the quantum computer for a sufficiently long time to refine the probabilistic output. Challenges to increase the number of digits in the computer include the need to supercool more wires and adding error correction circuits. Eventually room –temperature superconductors will need to be developed.
Building #ImageClassification models that are accurate and efficient
Posted on April 28th, 2017
04/28/2017 @NYUCourantInstitute, 251 Mercer Street, NYC, room 109
Laurens van der Maaten @Facebook spoke about some of the new technologies used by Facebook to increase accuracy and lower processing needed in image identification.
He first talked about residual networks which they are developing to replace standard convolutional neural networks. Residual networks can be thought of as a series of blocks each of which is a tiny #CNN:
- 1×1 layer, like a PCA
- 3×3 convolution layer
- 1×1 layer, inverse PCA
The raw input is added to the output of this mini-network followed by a RELU transformation.
These transformations extract features while keeping information that is input into the block, so the map is changed, but does not need to be re-learned from scratch. This eliminates some problems with vanishing gradients in the back propagation as well as the unidentifiabiliy problem.
Blocks when executed in sequence gradually add features, but removing a block after training hardly degrades performance (Huang et al 2016). From this observation they concluded that the blocks were performing two functions: detect new features and pass through some of the information in the raw input. Therefore, this structure could be made more efficient if they pass through the information yet allowed each block to only extract features.
DenseNets gives each block in each layer access to all features in the layer before it. The number of feature maps increases in each layer, so there is the possibility of a combinatorial explosion of units with each layer. Fortunately, this does not happen as each layer adds 32 new modules but the computation is more efficient, so the aggregate amount of computation for a given level of accuracy decreases when using DenseNet in favor of ResNet while accuracy improves.
Next Laurens talked about making image recognition more efficient, so a larger number of images could be processed with the same level of accuracy in a shorter average time.
He started by noting that some images are easier to identify than others. So, the goal is to quickly identify the easy images and only spend further processing time on the harder, more complex images.
The key is noting that easy images are classified using only a coarse grid, but then harder images would not be classifiable. On the other hand, using a fine grid makes it harder to classify the easy image.
Laurens described a hybrid 2-d network in which there are layers analyzing the image using the coarse grid and layers analyzing the fine grid. The fine grain blocks occasionally feed into the coarse grain blocks. At each layer outputs are tested to see if the confidence level for any image exceeds a threshold. Once the threshold is exceeded, processing is stopped and the prediction is output. In this way, when the decision is easy, this conclusion is arrived at quickly. Hard images continue further down the layers and require more processing.
By estimating the percentage of the classifier exiting at each threshold, then can time the threshold levels so that more images can be processed within a given time budget
During the Q&A, Laurens said
- To avoid overfitting the model, they train the network on both the original images as well as these same images after small transformation have been done on each image.
- They are still working to expand the #DenseNet to see its upper limits on accuracy
- He is not aware of any neurophysiological structures in the human brain that correspond to the structure of blocks in #ResNet / DenseNet.
#ExtremeEvents and short term reversals in #RiskAversion
Posted on April 17th, 2017
04/17/2017 @ 101 NJ Hall, 75 Hamilton Street, New Brunswick, NJ
Kim Oosterlinck @FreeUniversityOfBrussels presented work done by Matthieu Gilson, Kim Oosterlinck, Andrey Ukhov. Kim started by reviewing the literature that shows no consensus on whether risk aversion increases or decreases in following extreme events such as war. In addition, these studies often have only two points on which to make this evaluation.
He presented a method for tracking overall risk aversion within a population on a daily basis for several years. His analysis values the lottery part of Belgian bonds which consisted of a fixed coupon bond with the opportunity to win a cash prize every month. These bonds were sold to retail customers and made up 11% of Belgian bond market in 1938. By discounting the cash flows based on the yields for other, fixed coupon Belgian bonds, one can compare the risk neutral price (RNP) relative to the market price (MP).
When MP/RNP > 1 this indicates the average holder is risk loving.
There are three periods in their observations from 1938 to 1948.
- Risk neutral to risk averse from 1938 to 1940, when German invaded and occupied Belgian
- Risk aversion to risk seeking from 1940 to 1945 during the German occupation
- Risk seeking to risk neutral from 1945 to 1948.
Lots of competing theories on when people become more or less risk averse
These data give the strongest support is habituation to background risk as the best explanation of the increase in risk aversion. Prospect theory also does well as an explanation.
[The findings of increased risk seeking form 1940 to 1945 could also be consistent with a flat yield curve at 3% from 1month to 3 years in 1940 to a steep yield curve in 1945 going from 0% at 1 month to 3% at 3 years. ]
From #pixels to objects: how the #brain builds rich representation of the natural world
Posted on April 15th, 2017
04/06/2017 @RutgersUniversity, Easton Hub Auditorum, Fiber Optics Building, Busch Campus
Jack Galliant @UCBerkeley presented a survey of current research on mapping the neurophysiology of the visual system in the brain. He first talked about the overall view of visual processing since the Felleman and Van Essen article in Cerebral Cortex in 1992. Their work on macaque monkey showed that any brain area has a 50% chance of being connected to any other part of the brain. Visual processing can be split into 3 areas
1.Early visual area – 2.intermediate visual areas – 3.high level visual areas
With pooling nonlinear transformations between areas (the inspiration for the non-linear mappings in convolutional neural nets (CNN)). The visual areas were identified by retinotopic maps – about 60 areas in humans with macaques having 10 to 15 areas in the V1 area.
Another important contribution was by David J. Field who argued that the mammalian visual system can only be understood relative to the images it is exposed to. In addition, natural images have a very specific structure – 1/f noise in the power spectrum – due to the occlusion of images which can be viewed from any angle (see Olshausen & Field, American scientist, 2000)
This lead to research resolving natural images by characterizing them by the correlation of pairs of points. Beyond pairs of points that approach becomes too computational intensive. In summary, natural images are only a small part of the universe of images (most of which humans classify as white noise)
Until 2012, researchers needed to specify the characteristics to identify items in images, but LeCun, Bengio & Hinton, Nature, 2015 showed that Alexnet could resolve many images using multiple layer models, faster computation, and lots of data. These deep neural nets work well, but the reasons for their success have yet to be worked out (He estimates it will take 5 to 10 years for the math to catch up).
One interesting exercise is running a CNN and then looking for activation in a structure in the brain: mapping the convolutional layers and feature layers to the correspondence on layers in the visual cortex. This reveals that V1 has bi-or tri-phasic functions – Gabor functions in different orientations. This is highly efficient as a sparse code needs to activate as few neurons as possible.
Next they used motion-energy models to see how mammals detect motion in the brain Voxels in V1 (Shinji Nishimoto). They determined that monitoring takes 10 to 20ms using Utah arrays to monitor single neurons. They have animal watch movies and analyze the input images using combination of complex and simple cell models (use Keras) to model neurons in V1 and V2 using a 16ms time scale.
High level visual areas
Jack then talked about research identify neurons in high level visual areas that respond to specific stimuli. Starting with fMRI his groups (Huth, Nishimoto, Vu & Gallant, Neuron, 2012) has identified many categories: face areas vs. objects; place minus face. By presented images and mapping which voxels in the brain are activated one can see how the 2000 categories are mapped in the brain using wordmap as the labels. Similar concepts are mapped to similar locations in the brain, but specific items in the semantic visual system interact with the semantic language areas – so a ‘dog’ can active many areas so it can be used in different ways and can be unified as needed. Each person will have a different mapping depending on their previous good and bad experiences with dogs.
He talked about other topics including the challenges of determining how things are stored in places: Fourier power, object categories, subjective distance. In order to activate any of these areas in isolation, one needs enough stimulus to activate the earlier layers. They have progress by building a decoder from the knowledge of the voxel which run from the brain area backwards to create stimulus. A blood flow model are used with a 2 second minimum sampling period. But there is lots of continuity so they can reconstruct a series of images.
Intermediate visual area
Intermediate visual areas between the lower and higher levels of processing are hard to understand – looks at V4. They respond to shapes of intermediate complexity, but not much else like a curvature detector. Using fMRI they know what image features correlate with specific areas, but there is no strong indication differentiating one layer from another. Using the Utah array, they need to do a log-polar transform to improve prediction in V4. Using a receptor field model, they can create a predictor frame and match brain activity to images that gave the largest response.
To improve prediction on V4, Utah arrays need to do a log-polar transform. However, the images are messy and predicting V4 is not the same as understanding V4.
Finally, he talked about attenuation and tuning effects on single neurons. In an experiment in subjects watched a movie and were asked to search for either humans or vehicles, there were changes in the semantic map based on the search criterion. These tuning shift effects are a function of distance to visual periphery: Attentional effects are small in V1 and get larger in the ensuing layers.
In the Q&A, he made the following points:
- The visual word form area in the brain becomes active as you learn to read. This change does not occur for people who are illiterate.
- One of the experimental assumptions is that the system is stationary, so there is not adaptation. If adaptation does occur, then they cannot compute a noise ceiling for the signals.
[Neural nets take inspiration from the neurobiology, especially the creation of convolutional neural nets, but there is now feedback with neurobiology using the tools created in machine learning to explore possible models of brain mapping. Does the pervasive existence of Gabor filters lead to an argument that their presence indicates that natural images are closely allied with fractal patterns?]
How to build a #MixedReality experience for #Hololens
Posted on April 14th, 2017
4/14/2017 @MicrosoftReactorAtGrandCentral, 335 Madison Ave, NY, 4th floor
Mike Pell and John gave a roadmap for generating #MixedReality content. They started with general rules for generating content and how these rules apply to building MR content.
- Know your audience –
- Role of emotion in design – we want to believe in what is shown in a hologram.
- Think situation – where am I? at home you are comfortable doing certain things, but there are different needs and different things you are comfortable in public
- Think spatially – different if you can walk around the object
- Think inclusive – widen your audience
- Know Your medium
- For now you look ridiculous when wearing a VR headset– but maybe this eventually becomes like a welder shield which you wear when you are doing something specialized
- Breakthrough experience – stagecraft – so one can see what the hololens user is seeing
- Know Your palette
Interactive Story Design – a fast way to generate MR content
- Who is your “spect-actor” (normally someone who observers – have a sense of who the individual is for this moment – avoid blind spot, so pick a specific person. )
- Who are your “interactors” – will change as a result of the interaction – can be objects, text, people
- This creates a story
- Location – design depends on where this occurs
- Journey – how does participant change
How to bring the idea to life: how to develop the script for the MR experience
3-step micro sprints – 3 to 6 minute segments – so you don’t get attached to something that doesn’t work. Set 1 to 2 minute time limit for each step
- Parameters – limited resources help creative development
- Personify everything including text has a POV, feelings, etc.
- 3 emotional responses – what is the emotional response of a chair when you sit in it?
- 3 conduits
- Facial expression – everything has a face including interfaces and objects
- Body language
- Playtest – do something with it
- 3 perspectives
- Interacters – changes in personality over time
- 3 perspectives
- Audience – who is watching
- PMI – evaluative process – write on index cards (not as a feedback session) so everyone shares their perspective. Next loop back to the parameters (step 1)
- Plus – this is interesting
- Minus – this weak
- Interesting – neither of the above “this is interesting”
How to envision and go fast:
- Filming on location – randomly take pictures – look for things that speak to you as creating an interesting experience.
- Understand the experience – look at the people (i.e. people viewing art)
- Visualize it – put people into the scene (vector silhouette in different poses) put artwork into scene along with viewers.
- Build a prototype using Unity. Put on the Hololens and see how it feels
They then went through a example session in which a child is inside looking at a T-Rex in the MOMA outdoor patio. The first building block was getting three emotional responses for the T-Rex:
- Positive – joy looking at a potential meal: the child
- Negative – too bad the glass barrier is here
- Neutral – let me look around to see what is around me
To see where we should be going, look at what children want to do with the technology
Beyond Big: Merging Streaming & #Database Ops into a Next-Gen #BigData Platform
Posted on April 13th, 2017
04/13/2017 @Thoughtworks, 99 Madison Ave, New York, 15th floor
Amir Halfon, VP of Strategic Solutions, @iguazio talked about methods for speeding up a analytics linked to a large database. He started by saying that a traditional software stack accessing a db was designed to minimize the time taken to access slow disk storage. This is resulted in layers of software. Amir said that with modern data access and db architecture, processing is accelerated by a unified data engine that eliminate many of the layers. This also allows for the creation of a generic access of data stored in many different formats and a record-by-record security protocol.
To simplify development they only use AWS and only interface with Kafka, Hadoop, Spark. They are not virtualization (eventually reaches a speed limit), they do the actual store.
Another important method is to use “Predicate pushdown” =’ select … where … <predicate>’; usually all data are retrieved and then culled; instead if the predicate is pushed down, only the relevant data is retrieved. A.k.a. as an “offload-engine”.
MapR is a competitor using the HDFS database, as opposed to rebuilding the system from scratch.