#Post-Selection #StatisticalInference in the era of #MachineLearning
Posted on May 6th, 2017
05/04/2017 @ ColumbiaUniversity, DavisAuditorium, CEPSR
Robert Tibshirani @StanfordUniversity talked about the adjusting the cutoffs for statistical significance testing of multiple null hypotheses. The #Bonferroni Correction has been used to adjustments for testing multiple hypothesis when the hypotheses are statistically independent. However, with the advent of #MachineLearning techniques, the number of possible tests and their interdependence has exploded.
This is especially true with the application of machine learning algorithms to large data sets with many possible independent variables which often use forward stepwise or Lasso regression procedures. Machine learning methods often use #regularization methods to avoid #overfitting the data such as data splitting into training, test and validation sets. For big data applications, these may be adequate since the emphasis on is prediction, not inference. Also the large size of the data set offsets issues such as the lower of power in the statistical tests conducted on a subset of the data.
Robert proposed a model for incremental variable selection in which each sequential test sliced off parts of the distribution for subsequent tests creating a truncated normal upon which one can assess the probability of the null hypothesis. This method of polyhedral selection works for a stepwise regression and well as a lasso regression with a fixed lambda.
When the value of lambda is determined by cross-validation, can use this method by adding 0.1 * sigma noise to the y values. This adjustment retains the power of the test and does not underestimate the probability of accepting the null hypothesis. This method can also be extended to other methods such as logistic regression, Cox proportional hazards model, graphics lasso.
The method can also be extended to consider the number of factors to use in the regression. This goals of this methodology are similar to those described by Bradley #Efron in his 2013 JASA paper on bootstrapping (http://statweb.stanford.edu/~ckirby/brad/papers/2013ModelSelection.pdf) and random matrix theory used to determine the number of principal components in the data as described by the #Marchenko-Pastur distribution.
There is a package in R: selectiveInference
Further information can be found in a chapter on ‘Statistical Learning with Sparsity’ by Hastie, Tibshirani, Wainwright (online pdf) and ‘Statistical Learning and selective inference’ (2015) Jonathan Taylor and Robert J. Tibshirani (PNAS)
Posted on May 6th, 2017
05/04/2017 @Ebay, 625 6th Ave, NY 3rd floor
John Novak @QxBranch talked about the process in developing quantum computers. The theory is based on Adiabatic optimization. With each qubit is started at low energy levels along with couplings with the energy levels amplified so there is a high probability that the correct solution state will be the realized output when the quantum field collapses.
In the architecture of the D-Wave computer, qubits are organized in 4 x 4 cells in a pattern called a Chimera graph. These nodes are joined together to increase the number of digits. This raises certain challenges since all nodes are not connected to all other nodes: some logical nodes need to be represented multiple items in the physical computer.
Other challenges are running the quantum computer for a sufficiently long time to refine the probabilistic output. Challenges to increase the number of digits in the computer include the need to supercool more wires and adding error correction circuits. Eventually room –temperature superconductors will need to be developed.
Building #ImageClassification models that are accurate and efficient
Posted on April 28th, 2017
04/28/2017 @NYUCourantInstitute, 251 Mercer Street, NYC, room 109
Laurens van der Maaten @Facebook spoke about some of the new technologies used by Facebook to increase accuracy and lower processing needed in image identification.
He first talked about residual networks which they are developing to replace standard convolutional neural networks. Residual networks can be thought of as a series of blocks each of which is a tiny #CNN:
- 1×1 layer, like a PCA
- 3×3 convolution layer
- 1×1 layer, inverse PCA
The raw input is added to the output of this mini-network followed by a RELU transformation.
These transformations extract features while keeping information that is input into the block, so the map is changed, but does not need to be re-learned from scratch. This eliminates some problems with vanishing gradients in the back propagation as well as the unidentifiabiliy problem.
Blocks when executed in sequence gradually add features, but removing a block after training hardly degrades performance (Huang et al 2016). From this observation they concluded that the blocks were performing two functions: detect new features and pass through some of the information in the raw input. Therefore, this structure could be made more efficient if they pass through the information yet allowed each block to only extract features.
DenseNets gives each block in each layer access to all features in the layer before it. The number of feature maps increases in each layer, so there is the possibility of a combinatorial explosion of units with each layer. Fortunately, this does not happen as each layer adds 32 new modules but the computation is more efficient, so the aggregate amount of computation for a given level of accuracy decreases when using DenseNet in favor of ResNet while accuracy improves.
Next Laurens talked about making image recognition more efficient, so a larger number of images could be processed with the same level of accuracy in a shorter average time.
He started by noting that some images are easier to identify than others. So, the goal is to quickly identify the easy images and only spend further processing time on the harder, more complex images.
The key is noting that easy images are classified using only a coarse grid, but then harder images would not be classifiable. On the other hand, using a fine grid makes it harder to classify the easy image.
Laurens described a hybrid 2-d network in which there are layers analyzing the image using the coarse grid and layers analyzing the fine grid. The fine grain blocks occasionally feed into the coarse grain blocks. At each layer outputs are tested to see if the confidence level for any image exceeds a threshold. Once the threshold is exceeded, processing is stopped and the prediction is output. In this way, when the decision is easy, this conclusion is arrived at quickly. Hard images continue further down the layers and require more processing.
By estimating the percentage of the classifier exiting at each threshold, then can time the threshold levels so that more images can be processed within a given time budget
During the Q&A, Laurens said
- To avoid overfitting the model, they train the network on both the original images as well as these same images after small transformation have been done on each image.
- They are still working to expand the #DenseNet to see its upper limits on accuracy
- He is not aware of any neurophysiological structures in the human brain that correspond to the structure of blocks in #ResNet / DenseNet.
#ExtremeEvents and short term reversals in #RiskAversion
Posted on April 17th, 2017
04/17/2017 @ 101 NJ Hall, 75 Hamilton Street, New Brunswick, NJ
Kim Oosterlinck @FreeUniversityOfBrussels presented work done by Matthieu Gilson, Kim Oosterlinck, Andrey Ukhov. Kim started by reviewing the literature that shows no consensus on whether risk aversion increases or decreases in following extreme events such as war. In addition, these studies often have only two points on which to make this evaluation.
He presented a method for tracking overall risk aversion within a population on a daily basis for several years. His analysis values the lottery part of Belgian bonds which consisted of a fixed coupon bond with the opportunity to win a cash prize every month. These bonds were sold to retail customers and made up 11% of Belgian bond market in 1938. By discounting the cash flows based on the yields for other, fixed coupon Belgian bonds, one can compare the risk neutral price (RNP) relative to the market price (MP).
When MP/RNP > 1 this indicates the average holder is risk loving.
There are three periods in their observations from 1938 to 1948.
- Risk neutral to risk averse from 1938 to 1940, when German invaded and occupied Belgian
- Risk aversion to risk seeking from 1940 to 1945 during the German occupation
- Risk seeking to risk neutral from 1945 to 1948.
Lots of competing theories on when people become more or less risk averse
These data give the strongest support is habituation to background risk as the best explanation of the increase in risk aversion. Prospect theory also does well as an explanation.
[The findings of increased risk seeking form 1940 to 1945 could also be consistent with a flat yield curve at 3% from 1month to 3 years in 1940 to a steep yield curve in 1945 going from 0% at 1 month to 3% at 3 years. ]
From #pixels to objects: how the #brain builds rich representation of the natural world
Posted on April 15th, 2017
04/06/2017 @RutgersUniversity, Easton Hub Auditorum, Fiber Optics Building, Busch Campus
Jack Galliant @UCBerkeley presented a survey of current research on mapping the neurophysiology of the visual system in the brain. He first talked about the overall view of visual processing since the Felleman and Van Essen article in Cerebral Cortex in 1992. Their work on macaque monkey showed that any brain area has a 50% chance of being connected to any other part of the brain. Visual processing can be split into 3 areas
1.Early visual area – 2.intermediate visual areas – 3.high level visual areas
With pooling nonlinear transformations between areas (the inspiration for the non-linear mappings in convolutional neural nets (CNN)). The visual areas were identified by retinotopic maps – about 60 areas in humans with macaques having 10 to 15 areas in the V1 area.
Another important contribution was by David J. Field who argued that the mammalian visual system can only be understood relative to the images it is exposed to. In addition, natural images have a very specific structure – 1/f noise in the power spectrum – due to the occlusion of images which can be viewed from any angle (see Olshausen & Field, American scientist, 2000)
This lead to research resolving natural images by characterizing them by the correlation of pairs of points. Beyond pairs of points that approach becomes too computational intensive. In summary, natural images are only a small part of the universe of images (most of which humans classify as white noise)
Until 2012, researchers needed to specify the characteristics to identify items in images, but LeCun, Bengio & Hinton, Nature, 2015 showed that Alexnet could resolve many images using multiple layer models, faster computation, and lots of data. These deep neural nets work well, but the reasons for their success have yet to be worked out (He estimates it will take 5 to 10 years for the math to catch up).
One interesting exercise is running a CNN and then looking for activation in a structure in the brain: mapping the convolutional layers and feature layers to the correspondence on layers in the visual cortex. This reveals that V1 has bi-or tri-phasic functions – Gabor functions in different orientations. This is highly efficient as a sparse code needs to activate as few neurons as possible.
Next they used motion-energy models to see how mammals detect motion in the brain Voxels in V1 (Shinji Nishimoto). They determined that monitoring takes 10 to 20ms using Utah arrays to monitor single neurons. They have animal watch movies and analyze the input images using combination of complex and simple cell models (use Keras) to model neurons in V1 and V2 using a 16ms time scale.
High level visual areas
Jack then talked about research identify neurons in high level visual areas that respond to specific stimuli. Starting with fMRI his groups (Huth, Nishimoto, Vu & Gallant, Neuron, 2012) has identified many categories: face areas vs. objects; place minus face. By presented images and mapping which voxels in the brain are activated one can see how the 2000 categories are mapped in the brain using wordmap as the labels. Similar concepts are mapped to similar locations in the brain, but specific items in the semantic visual system interact with the semantic language areas – so a ‘dog’ can active many areas so it can be used in different ways and can be unified as needed. Each person will have a different mapping depending on their previous good and bad experiences with dogs.
He talked about other topics including the challenges of determining how things are stored in places: Fourier power, object categories, subjective distance. In order to activate any of these areas in isolation, one needs enough stimulus to activate the earlier layers. They have progress by building a decoder from the knowledge of the voxel which run from the brain area backwards to create stimulus. A blood flow model are used with a 2 second minimum sampling period. But there is lots of continuity so they can reconstruct a series of images.
Intermediate visual area
Intermediate visual areas between the lower and higher levels of processing are hard to understand – looks at V4. They respond to shapes of intermediate complexity, but not much else like a curvature detector. Using fMRI they know what image features correlate with specific areas, but there is no strong indication differentiating one layer from another. Using the Utah array, they need to do a log-polar transform to improve prediction in V4. Using a receptor field model, they can create a predictor frame and match brain activity to images that gave the largest response.
To improve prediction on V4, Utah arrays need to do a log-polar transform. However, the images are messy and predicting V4 is not the same as understanding V4.
Finally, he talked about attenuation and tuning effects on single neurons. In an experiment in subjects watched a movie and were asked to search for either humans or vehicles, there were changes in the semantic map based on the search criterion. These tuning shift effects are a function of distance to visual periphery: Attentional effects are small in V1 and get larger in the ensuing layers.
In the Q&A, he made the following points:
- The visual word form area in the brain becomes active as you learn to read. This change does not occur for people who are illiterate.
- One of the experimental assumptions is that the system is stationary, so there is not adaptation. If adaptation does occur, then they cannot compute a noise ceiling for the signals.
[Neural nets take inspiration from the neurobiology, especially the creation of convolutional neural nets, but there is now feedback with neurobiology using the tools created in machine learning to explore possible models of brain mapping. Does the pervasive existence of Gabor filters lead to an argument that their presence indicates that natural images are closely allied with fractal patterns?]
How to build a #MixedReality experience for #Hololens
Posted on April 14th, 2017
4/14/2017 @MicrosoftReactorAtGrandCentral, 335 Madison Ave, NY, 4th floor
Mike Pell and John gave a roadmap for generating #MixedReality content. They started with general rules for generating content and how these rules apply to building MR content.
- Know your audience –
- Role of emotion in design – we want to believe in what is shown in a hologram.
- Think situation – where am I? at home you are comfortable doing certain things, but there are different needs and different things you are comfortable in public
- Think spatially – different if you can walk around the object
- Think inclusive – widen your audience
- Know Your medium
- For now you look ridiculous when wearing a VR headset– but maybe this eventually becomes like a welder shield which you wear when you are doing something specialized
- Breakthrough experience – stagecraft – so one can see what the hololens user is seeing
- Know Your palette
Interactive Story Design – a fast way to generate MR content
- Who is your “spect-actor” (normally someone who observers – have a sense of who the individual is for this moment – avoid blind spot, so pick a specific person. )
- Who are your “interactors” – will change as a result of the interaction – can be objects, text, people
- This creates a story
- Location – design depends on where this occurs
- Journey – how does participant change
How to bring the idea to life: how to develop the script for the MR experience
3-step micro sprints – 3 to 6 minute segments – so you don’t get attached to something that doesn’t work. Set 1 to 2 minute time limit for each step
- Parameters – limited resources help creative development
- Personify everything including text has a POV, feelings, etc.
- 3 emotional responses – what is the emotional response of a chair when you sit in it?
- 3 conduits
- Facial expression – everything has a face including interfaces and objects
- Body language
- Playtest – do something with it
- 3 perspectives
- Interacters – changes in personality over time
- 3 perspectives
- Audience – who is watching
- PMI – evaluative process – write on index cards (not as a feedback session) so everyone shares their perspective. Next loop back to the parameters (step 1)
- Plus – this is interesting
- Minus – this weak
- Interesting – neither of the above “this is interesting”
How to envision and go fast:
- Filming on location – randomly take pictures – look for things that speak to you as creating an interesting experience.
- Understand the experience – look at the people (i.e. people viewing art)
- Visualize it – put people into the scene (vector silhouette in different poses) put artwork into scene along with viewers.
- Build a prototype using Unity. Put on the Hololens and see how it feels
They then went through a example session in which a child is inside looking at a T-Rex in the MOMA outdoor patio. The first building block was getting three emotional responses for the T-Rex:
- Positive – joy looking at a potential meal: the child
- Negative – too bad the glass barrier is here
- Neutral – let me look around to see what is around me
To see where we should be going, look at what children want to do with the technology
Beyond Big: Merging Streaming & #Database Ops into a Next-Gen #BigData Platform
Posted on April 13th, 2017
04/13/2017 @Thoughtworks, 99 Madison Ave, New York, 15th floor
Amir Halfon, VP of Strategic Solutions, @iguazio talked about methods for speeding up a analytics linked to a large database. He started by saying that a traditional software stack accessing a db was designed to minimize the time taken to access slow disk storage. This is resulted in layers of software. Amir said that with modern data access and db architecture, processing is accelerated by a unified data engine that eliminate many of the layers. This also allows for the creation of a generic access of data stored in many different formats and a record-by-record security protocol.
To simplify development they only use AWS and only interface with Kafka, Hadoop, Spark. They are not virtualization (eventually reaches a speed limit), they do the actual store.
Another important method is to use “Predicate pushdown” =’ select … where … <predicate>’; usually all data are retrieved and then culled; instead if the predicate is pushed down, only the relevant data is retrieved. A.k.a. as an “offload-engine”.
MapR is a competitor using the HDFS database, as opposed to rebuilding the system from scratch.
#Driverless #Trucks will come before driverless #cars
Posted on April 13th, 2017
04/12/2017 @MetroTech 6, NYU, Brooklyn, NY
Seth Clevenger – technology editor, Transport Topics News, @sethclevenger, talked about the rollout of driverless trucks. His main message was that there are many intermediate stages from adaptive cruise control (already exists in some cars) to fully autonomous operation.
Truck manufacturers are concentrating on systems that assist rather than replace drivers. These include
- Truck platooning – could roll out by year-end. – synchronize breaking; trucks can draft off each other for a 10% increase in efficiency. Brakes are linked, but still need drivers.( Peloton Technology plans to begin fleet trials)
- Connected vehicles – just starting to be regulated. (V2V, V2I). For instance, safety messages sent by each vehicle.
- auto docking at loading docks
- traffic jam assist – move forward slowly without driver assistance
Startups include: Uber/Otto, Embark, Starsky Robotics, Driver.ai
[One of my major concerns is the integrity of the software controlling the vehicle. A failure in software could cause accidents, however, my main concern is the potential insertion of a malicious virus as a sleeper cell within the millions of lines of code. In this case, the results could be catastrophic as all breaking and acceleration systems could be programmed to fail on a specific date in the future. At that moment, all vehicles on the road would be out of control potentially resulting in millions of accidents and thousands of deaths and injuries. Preventing such an event will require coordinating amongst suppliers and enforcement of strict software standards. The large number of suppliers makes this job especially complicated. This sleeper cell could lie dormant for years before it is activated.]
#Self-learned relevancy with Apache Solr
Posted on March 31st, 2017
03/30/2017 @ Architizer , 1 Whitehall Street, New York, NT, 10th Floor
Trey Grainger @ Lucidworks covered a wide range of topics involving search.
He first reviewed the concept of an inverted index in which terms are extracted from documents and placed in an index which points back to the documents. This allows for fast searches of single terms or combinations of terms.
Next Trey covered classic relevancy scores emphasizing
tf-idf = how well a term described the document * how important is the term overall
He noted, however, the tf-idf’s values may be limited since it does not make use of domain-specific knowledge.
Trey then talked about reflected intelligence = self–learning search which uses
- Collaboration – how have others interacted with the system
- Context – information about the user
He said this method increases relevance by boosting items that are highly requested by others. Since the items boosted are those currently relevant to others, this allows the method to adapt quickly without need for manual curation of items.
Next he talked about semantic search which using its understanding of terms in the domain.
(Solr can connect to an RDF database to leverage an ontology). For instance, one can run word2vec to extract terms and phrases for a query and them determine a set of keywords/phrases to best match the query to the contents of the db.
Also, querying a semantic knowledge graph can expand the search by traversing to other relevant terms in the db
Applications of #DeepLearning in #Healthcare
Posted on March 28th, 2017
03/28/2017 @NYU Courant Institute (251 Mercer St, New York, NY)
Sumit Chopra, the head of A.I. Research @Imagen Technologies, introduced the topic by saying that the two areas in our lives that will be most affect by AI are healthcare and driverless cars.
Healthcare data can be divided into
- Other – cell phones, etc.
Payer data – from insurance provider
Clinical data – incomplete since hospitals don’t share their datasets; digital form with privacy concerns
Payer data more complete unless the patient switches the payer, less detail.
He focuses on medical imaging – mainly diagnostic radiology – 600mm studies in the U.S., but shortage of skilled radiologists. Prevalence of errors. The images are very large size, high resolution, low contrast, highly subtle cues => radiology is hard to do well
Possible solution: pre-train a standard model: Alexnet/VGG/… on a small number of images, but this might not work since the signal is subtle.
Also radiology reports, which could be used for supervised training, are unstructured and it’s hard to tell what the report tells you. => weak labels at best
Much work has been done on this problem, usually using deep convolutional neural nets.
First step: image registration = rotate & crop.
Train a deep convolutional network (registration network) , the send to a detection network for binary segmentation.
Could use generative models for images to train doctors
Leverage different modalities of data
Sumit has round that a random search of hyperparameter space works better than either grid search or optimizer search.