Intro to #DeepLearning using #PyTorch
Posted on February 21st, 2017
02/21/2017 @ NYU Courant Institute (251 Mercer St, New York, NY)
Soumith Chintala @Facebook first talked about trends in the cutting edge of machine learning. His main point was that the world is moving from fixed agents to dynamic neural nets in which agents restructure themselves over time. Currently, the ML world is dominated by static datasets + static model structures which learn offline and do not change their structure without human intervention.
He then talked about PyTorch which is the next generation of ML tools after Lua #Torch. In creating PyTorch they wanted to keep the best features of LuaTorch, such as performance and extensibility while eliminating rigid containers and allowing for execution on multiple-GPU systems. PyTorch is also designed so programmers can create dynamic neural nets.
Other features include
- Kernel fusion – take several objects and fuse them into a single object
- Order of execution – reorder objects for faster execution
- Automatic work placement when you have multiple GPUs
PyTorch is available for download on http://pytorch.org and was released Jan 18, 2017.
Currently, PyTorch runs only on Linux and OSX.
Automatically scalable #Python & #Neuroscience as it relates to #MachineLearning
Posted on June 28th, 2016
06/28/2016 @Rise, 43 West 23rd Street, NY, 2nd floor
Braxton McKee (@braxtonmckee ) @Ufora first spoke about the challenges of creating a version of Python (#Pyfora) that naturally scales to take advantage of the hardware to handle parallelism as the problem grows.
Braxton presented an example in which we compute the minimum distance from target points a larger universe of points base on their Cartesian coordinates. This is easily written for small problems, but the computation needs to be optimized when computing this value across many cpu’s.
However, the allocation across cpu’s depends on the number of targets relative to the size of the point universe. Instead of trying to solve this analytically, they use a #Dynamicrebalancing strategy that splits the task and adds resources to the subtasks creating bottlenecks.
This approach solves many resource allocation problems, but still faces challenges
- nested parallelism. They look for parallelism within the code and look for bottlenecks at the top level of parallelism and split the task into subtasks at that level, …
- the data do not fit in memory. They break tasks into smaller tasks. They also have each task know which other caches hold data, so they can be accessed directly without going to slower main memory
- different types of architectures (such as gpu’s) require different types of optimization
- the optimizer cannot look inside python packages, so cannot optimize a bottleneck within a package.
- is a just-in-time compiler that moves stack frames from machine-to-machine and senses how to take advantage of parallelism
- tracks what data a thread is using
- dynamically schedules threads and data
- takes advantage of mutability which allows the compiler to assume that functions do no change over time so the compiler can look inside the function when optimizing execution
- is written on top of another language which allows for the possibility of porting the method to other languages
In the second presentation, Jeremy Freeman @Janelia.org spoke about the relationship between neuroscience research and machine learning models. He first talking about the early works on understanding the function of the visual cortex.
Findings by Hubel & Wiesel in1959 have set the foundation for visual processing models for the past 40 years. They found that Individual neurons in the V1 area of the visual cortex responded to the orientation of lines in the visual field. These inputs fed neurons that detect more complex features, such as edges, moving lines, etc.
Others also considered systems which have higher level recognition and how to train a system. These include
Perceptrons by Rosenblatt, 1957
Neocognitrons by Fukushima, 1980
Hierarchical learning machines, Lecun, 1985
Back propagation by Rumelhart, 1986
His doctoral research looked at the activity of neurons in V2 area. They found they could generate high order patterns that some neurons discriminate among.
But in 2012, there was a jump in performance of neural nets – U. of Toronto
By 2014, some of the neural network algos perform better than humans and primates, especially in the area of image processing. This has lead to many advances such as Google deepdream which combines images and texture to create an artistic hybrid image.
Recent scientific research allows one to look at thousands of neurons simultaneously. He also talked about some of his current research which uses “tactile virtual reality” to examine the neural activity as a mouse explores a maze (the mouse walks on a ball that senses it’s steps as it learns the maze).
Jeremy also spoke about Model-free episodic control for complex sequential tasks requiring memory and learning. ML research has created models such as LSTM and Neural Turing Nets which retain state representations. Graham Taylor has looked at neural feedback modulation using gates.
He also notes that there are similar functionalities between the V1 area in the visual cortex, the A1 auditory area, and the S1, tactile area.
To find out more, he suggested visiting his github site: Freeman-lab and looking the web site neurofinder.codeneuro.org.
#MachineLearning with #scikit-Learn
Posted on January 21st, 2016
01/21/2016 @Pivotal, 625 6th Ave, NY
Andreas Mueller @NYU spoke about scikit, a #python package that implements many of the most-used machine learning algorithms. The package makes use of the 2-d #numpy data structures and runs in Python 2.
The package accepts 2-d floating point array where samples = rows, features = columns. All models are encapsulated into classes, so the syntax to run supervised learning (here for a #RandomForest model) is:
clf = RandomForestClassifier()
y_pred = clf.predict(x_test)
The code is similar for support vector machine with the addition of
from sklearn.svm import LinearSVC
Unsupervised learning, such as pca, can also be done.
Andreas next covered cross validation. Scikit has a cross validation package that split data into 5 sets. Four are used to train the algorithm and one is used as an out-of-sample test. The training and test sets are rotated to create five parameter estimates and out-of-sample tests. Scikit has methods to do these tests, cross_val_score, and change the way to data is split, ShuffleSplit. One can also test fits for different values of C and Gamma in the model.
Pipelines, which are chains transformers to input to a classifier, can also be easily implemented and you can write your own transform methods.
As an example of an analysis using scikit, he did a Sentiment Analysis on IMDB movie reviews data. He used a bag-of-words representation: 1. tokenize, 2. build a vocabulary over all documents, 3. sparse matrix encoding
The 0-1 sparse matrix encoding of the occurrence of words in each review can then be submitted to the machine learning algorithms (he used SVC) to determine the key words differentiating positive from negative reviews. The analysis indicated that negative words were a good predictor of an overall negative review.
Lastly, Andreas talked about methods to handle terabyte data sets. He especially liked an incremental approach where the model is built using a “small” data set and data are incrementally added to improve the fit using the full data set.