Intro to #DeepLearning using #PyTorch
Posted on February 21st, 2017
02/21/2017 @ NYU Courant Institute (251 Mercer St, New York, NY)
Soumith Chintala @Facebook first talked about trends in the cutting edge of machine learning. His main point was that the world is moving from fixed agents to dynamic neural nets in which agents restructure themselves over time. Currently, the ML world is dominated by static datasets + static model structures which learn offline and do not change their structure without human intervention.
He then talked about PyTorch which is the next generation of ML tools after Lua #Torch. In creating PyTorch they wanted to keep the best features of LuaTorch, such as performance and extensibility while eliminating rigid containers and allowing for execution on multiple-GPU systems. PyTorch is also designed so programmers can create dynamic neural nets.
Other features include
- Kernel fusion – take several objects and fuse them into a single object
- Order of execution – reorder objects for faster execution
- Automatic work placement when you have multiple GPUs
PyTorch is available for download on http://pytorch.org and was released Jan 18, 2017.
Currently, PyTorch runs only on Linux and OSX.
NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)
Posted on November 22nd, 2016
11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor
Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)
Speaker 2: Top-down vs. Bottom-up Computational Creativity – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)
Matthew Russell @DigitalReasoning spoke about understanding language using NLP, relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:
- Vagrant – allow an environmental setup.
- Jupyter Notebook – like a lab notebook
- Git – version control
- Automation –
He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.
He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.
He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.
In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of
Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure
Bottom-up: data driven – data driven, “hands off”, you learn the structure
He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other
To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.
It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.
As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.
Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative
They like a mix of top-down and bottom-up approaches:
- Want speed, but neural nets are slow
- Music has a lot of theory behind it, so it’s best to let the programmers code these rules
- Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars
Runtime written in Haskell – functional language so its great for music
Advanced #DeepLearning #NeuralNets: #TimeSeries
Posted on June 16th, 2016
06/15/2016 @Qplum, 185 Hudson Street, Jersey City, NJ, suite 1620
Sumit then broke the learning process into two steps: feature extraction and classification. Starting with raw data, the feature extractor is the deep learning model that prepares the data for the classifier which may be a simple linear model or random forest. In supervised training, errors in the prediction output by the classifier are feed back into the system using back propagation to tune the parameters of the feature extractor and the classifier.
In the remainder of the talk Sumit concentrated on how to improve the performance of the feature extractor.
In the general text classification (unlike image or speech recognition) the length of the input can be very long (and variable in length). In addition, analysis of text by general deep learning models
- does not capture order of words or predictions in time series
- can handle only small sized windows or the number of parameters explodes
- cannot capture long term dependencies
So, the feature extractor is cast as a time delay neural networks (#TDNN). In TDNN, the words are text is viewed as a string of words. Kernel matrices (usually of from 3 to 5 unit long) are defined which compute a dot products of the weights of the words in a contiguous block of text. The kernel matrix is shifted one word and the process is repeated until all words are processed. A second kernel matrix creates another set of features and so forth for a 3rd kernel, etc.
These features are then pooled using the mean or max of the features. This process is repeated to get additional features. Finally a point-wise non-linear transformation is applied to get the final set of features.
Unlike traditional neural network structures, these methods are new, so no one has done a study of what is revealed in the first layer, second layer, etc. Also theoretical work is lacking on the optimal number of layers for a text sample of a given size.
Historically, #TDNN has struggled with a series of problem including convergence issues, so recurrent neural networks (#RNN) were developed in which the encoder looks at the latest data point along with its own previous output. One example is the Elman Network, which each feature is the weighted sum of the kernel function (one encoder is used for all points on the time series) output with the previously computed feature value. Training is conducted as in a standard #NN using back propagation through time with the gradient accumulated over time before the encoder is re-parameterized, but RNN has a lot issues
1, exploding or vanishing gradients – depending on the largest eigenvalue
2. cannot capture long-term dependencies
3. training is somewhat brittle
The fix is called Long short-term memory. #LSTM, has additional memory “cells” to store short-term activations. It also has additional gates to alleviate the vanishing gradient problem.
(see Hochreiter et al . 1997). Now each encoder is made up of several parts as shown in his slides. It can also have a forget gate that turns off all the inputs and can peep back at the previous values of the memory cell. At Facebook, NLP and speech and vision recognition are all users of LSTM models
LSTM models, however still don’t have a long term memory. Sumit talked about how creating memory networks which will take a store and store the key features in a memory cell. A query runs against the memory cell and then concatenates the output vector with the text. A second query will retrieve the memory.
He also talked about using a dropout method to fight overfitting. Here, there are cells that randomly determine whether a signal is transmitted to the next layer
Autocoders can be used to pretrain the weights within the NN to avoid problems of creating solution that are only locally optimal instead of globally optimal.
[Many of these methods are similar in spirit to existing methods. For instance, kernel functions in RNN are very similar to moving average models in technical trading. The different features correspond to averages over different time periods and higher level features correspond to crossovers of the moving averages.
The dropoff method is similar to the techniques used in random forest to avoid overfitting.]
Evolving from #RDBMS to #NoSQL + #SQL
Posted on May 3rd, 2016
05/03/2016 @Thoughtworks, 99 Madison Ave, 15th floor, NY
Jim Scott @MAPR spoke about #ApacheDrill which has a query language that extends ANSI SQL. Drill provides an interface that uses this SQL-extension to access data in underlying db’s that are SQL, noSQL, csv, etc.
The Ojai API has the following advantages
- Gson (in #Java) uses two lines of code to serialize #JSON to place into the data. One line to deserialize
- Idempotent – so don’t need to worry about replaying actions things twice if there is an issue.
- Drill does not requires Java, but not Hadoop so it can run on a desktop
- Schema on the fly – will take different data formats and join them together: e.g. csv + JSON
- Data is directly access from the underlying databases without needing to first transform them to a metastore
- Security – plugs into authentication mechanism of the underlying dbs. Mechanisms can go through multiple chains of ownership. Security can be done on row level and column level.
- Commands extend SQL to allow access lists in a JSON structure
- Can create views to output to parquet, csv, json formats
- FLATTEN – explode an array in a JSON structure to display as multiple rows with all other fields duplicated
#NoSQL Databases & #Docker #Containers: From Development to Deployment
Posted on April 26th, 2016
04/26/2016 @ThoughtWorks 99 Madison Ave., 15th Floor, New York, NY
Alvin Richards, VP of Product, @Aerospike spoke about employing Aerospike in Docker containers.
He started by saying that database performance demands including cache and dataLakes have made deployment complex and inefficient. Containers were developed to simplify deployment. They are similar to virtual machines, but describe the OS, programs and environmental dependencies in a standard format file. Components are
- Docker file with names + directory + processes to run to setup. OCI is the open container standard.
- Docker Compose orchestrates containers
- Docker Swarm orchestrates clustering of machines
- Docker Machine provisions machines.
Containers share root images (such as the Python image file).
Aerospike is a key value store which is built on the bare hardware (does not call the OS) for speed. It also automates data replication across nodes.
When Aerospike is run in Docker containers
- All nodes perform the same function – automated replication.
- The nodes self discover other nodes to balance the load & replication
- Application needs to understand the topology as it changes
In development, the data are often kept in the container since one usually wants to delete the development data when the development server is decommissioned. However, production servers usually don’t hold the data since these servers may be brought up and down, but the data is always retained.
Harness the power of #Web #Audio
Posted on April 20th, 2016
04/20/2016 @TechStars, 1407 Broadway, NY
Titus Blair @Dolby demonstrated the importance of sound in the mood and usability of a web page. He then showed the audience how to incorporate higher quality audio into a web site.
He first showed a video of a beach scene. Different audio tracks changed the mood from excitement to mystery to romantic to suspenseful to tropical.
By sending a wav file to the Dolby development site one creates a high quality audio file in mp4 format which can be downloaded and played through selected browser (currently including Echo and Safari).
Titus then showed two examples, a #video game and a frequency spectrum display, and walked the audience through the code needed to play audio file.
- Web code needs to test if the browser can handle the Dolby digital plus file
- Parameters in the backgroundSound variable adjust the playback rate and other qualities
- To get frequency spectrum, an audiocontext variable does an fft which can be plotted
Finally, Titus illustrated our sensitivity to sound by playing the video “How to make someone sound like an idiot”.
Slides for this presentation are available on http://Bit.ly/dolbynycjs
Hacking with the #RaspberryPi and #Windows 10 #IoT Core
Posted on March 24th, 2016
03/23/2016 @Microsoft, 101 Wood Ave South, Iselin, NJ
Nick Landry showed how to use the Windows 10 operating system to control devices in the Internet of Things.
He first talked about IoT = things + connectivity + data + analytics. He demonstrated software running on the Raspberry Pi, but emphasized that Windows 10 IoT allows developers to create code that runs on platforms from ARM devices (IoT) to phones to tablets to laptops to desktops to large displays. Within the IoT space, Windows 10 runs on
- Raspberry Pi 2 & 3 – ARM processor – Wi-Fi, Bluetooth,…
- Intel Atom E3800 processor x86 – (Tablet) – Ethernet,…
- Qualcomm Snapdragon 410 – (cell phone) – GPS, WiFi,…
W10 also has many levels of functionality to accommodate differences in interfaces (headed = screen interface, headless = no screen interface) and differences in hardware by using a single C# development core with difference SDKs to access the different capabilities of devices.
The Windows 10 stack has the W10 operating system on which Win32 sits as does UWP. The majority of UWP APIs are shared across devices including desktop, phones, ioT, etc.
Nick then walked through the steps to replace the Linux OS with Windows 10 on a Raspberry Pi from the http://dev.windows.com/iot web site. He noted that the latest Raspberry Pi, the Pi 3, requires you to download the ‘Insider preview version’ to successfully flash the hardware.
The Raspberry Pi 3 includes wi-fi and Bluetooth, but the current version of Windows 10 does not currently handle those functionalities natively, but will eventually do so.
He next showed the Raspberry Pi and talked about how sensors and controls are connected through the GPIO pins and how the Windows 10 IoT extension SDK gives you access to those pins.
Programming the device using C# in Visual Studio uses different conventions than using Sketch in the Arduino IDE
- Instead of a ‘startup’ and ‘loop’, one needs to set up a timer with the timer interrupt route serves the same function as the ‘loop’ routine in Arduino
- Downloading the code requires one to select ‘ARM’ device and ‘Remote Machine’. The Arduino IDE only needs a COM port number.
- Event handling is done using the C# programming syntax
- Visual Studio has full access to services offered across a wide range of devices. Nick demonstrated how the text-to-speech routine can be called in the same way one would call text-to-speech when developing a smartphone app.
- Simple programs require more code, but that code can be used across devices.
- You can execute Arduino Sketches in Visual Studio and you can even combine Sketch and C# code in the same application.
Nick concluded by talking about The #FezHat (from ghielectronics). The Fez Hat is a development board which fits on the Raspberry Pi and includes: controls for DC and servo motors. Terminal blocks, light sensor, LEDs, temperature sensor, user buttons, etc. all for $35. It is analogous to Shield boards for the Arduino.
For further information, Nick suggested
If you’re having problems installing Windows 10 on a Raspbery Pi see.
CodeDrivenNYC: Tools and methods to make development teams more productive
Posted on March 23rd, 2016
03/22/2016 @FirstMarkCapital, 100 Fifth Ave, NY
The speakers spoke
- James Turnbull, CTO at Kickstarter: From Rails for Reasons?
- Evan Whalen, Engineering Manager at Blue Apron: 8 Habits for Productive Teams
- Dustin Lucien, CTO at Betterment: Fluid Teams: How Betterment Builds Product
Evan Whalen@BlueApron (recipes and ingredient for those recipes delivered to your door) talked about habits of productive teams. He emphasized three points
- Psychological safety
- 8 habits of productive teams
Psychological Safety. The Google Aristotle Project concluded that successful teams fostered a feeling that team members support each other. They called this psychological safety as members were secure that other team members wouldn’t embarrass them.
8 Habits. Evan takes his inspiration from Stephen R. Covey. He main points were:
- Be proactive – clearly define responsibilities to set expectations. Members need to share responsibility
- Begin with the end in mind – convey purpose, not urgency. Future-proof APIs and schemas
- Put first things first – make engineer happiness a priority – support passion projects
- Think win-win – have regular communication with key players to create mutually beneficial solutions
- Seek for to understand, then be understood – encourage face-to-face conversation, brown bag tech talks, group code reviews, etc.
- Synergize – empower through delegation. Cross-team communication : avoid working in silos, be transparent on work priorities
- Sharpen the saw – give immediate feedback – radical candor (see Kim Scott post) – challenge directly but care personally
- Find your voice and inspire others to find theirs – balance support with delegation
Feedback. Every 6 months developers are anonymously surveyed (using Glimpse) as to their happiness and empowerment. From this information create a task list of 3 areas for improvement.
In the second presentation, Dustin Lucien @Betterment (financial planning leveraging automation) spoke about their dynamic process to best match the needs of the company with the skills and interests of individuals. They do this with a mix of teams responsible for specific products/functions (home teams) and small, mission-driven teams (pods) working on specific projects.
The flexibility to move developers from home teams to short-term focused pods avoid silos and spreads knowledge and expertise through the company.
To allocate individuals to these special projects, Betterment uses an auction system. Inspired by work done at Pandora, each customer is given a value of $5/month and the total revenue for customers affected by new products/services is auctioned. Team leads submit projects and individuals bid on them. Management determines the makeup of the pods based on the individual interest and level of enthusiasm for each project (as indicated by the bidding). This process of creating new projects and assigning individuals to pods is repeated every 60 days. Management also makes sure that there are still sufficient resources in the home team (20% to 40%) so the home teams can continue their functions.
But as the company has grown, challenges have increased to this model.
- They have 3 strong lines of business, so fungability of skills across the organization is more difficult.
- The maturing of products demands more stability in the resource allocations.
- As a result they will probably move to a 90 day cycle.
- They will adjust plans to emphasize ROI on OKR.
- Other the other hand, teams are now large enough that they can now adjust their own resources to accommodate new projects.
Dustin noted the role of management. Pods are created with at least one person with leadership aspirations. Also the company is still small enough (currently 150 people) that everyone knows everyone else. New hires are put in “bands” to encourage rapid assimilation into the company. Groups in pods have often worked together before.
He also noted that pods are allowed to deviate from the original plan. But the pods and the teams need to operate under the leadership of an architecture group (which is outside the teams and pods). That group determines the overall system architecture and reviews the development process and outputs.
To close the presentations, James Turnbull@Kickstarter talked about how they went about upgrading the tools used in their development stack.
When Kickstarter was started they chose Rails as their development tool. As usage of the site has grown, they have come to realize that they needed to rearchitect the site. James said the issue was not scaling, it was resilience as breaks in one part of the code often created issues with other parts of the code. They decided to replace the monolithic Rails application. They did this using a process involving the entire development team. All members of the development teams worked on the following steps:
- Specify broad conditions that the new system needed to satisfy
- Do a broad paper bake off – compare many languages: JRuby, Clojure, Go… – consider the community, prior art, etc.
- Create a short list – Ruby, Java, Clojure
- Do a real world bake-off: create the code for the comment subsystem to test authentication and monitoring, etc. Ask whether there are their developers familiar with the language? Will it run faster? Does it scale? How convenient is it to use. Is there a body of people who have solved these problems before?
- Made a decision. The development team conducted a town hall meeting in which groups who had worked on the bake-off code presented pros and cons to the whole team. They decided to use Java.
- The big win was developing a process to make decisions. For future development, individuals or small groups can propose experiments on technology that the group as a whole could use. They can then conduct a smaller version of the above process so the group as a whole can learn from the smaller group’s experiences.
Posted on March 14th, 2016
Nick Van Hoogenstyn spoke about HackerRank, a tool for testing programmers. Common uses are
- Screening of job candidates
- Resource when holding hackathons
- Training tool to evaluate the learning of skills
Nick described three products
- HackerRank.com – openly available to all for coding challenges and timed contests with prizes
- HackerRank for Work – paid candidate assessment program.
- Questions can be send to candidates for their completion
- A collaborative coding session allows interviewers to see how candidates solve questions
- HackerRank Jobs – mobile app that developers use to prove their skills (automatically generates a score and gives feedback)
Nick then walked through a series of problems that a candidate might face. He then talked about the measures that an interviewer/evaluator can use
- A plagiarism flag for suspicious code
- A slider to replay the individual keystrokes
- Execution time and memory used by the completed program
- Breakdown on time spent on each question
He talked about how questions can be taken from a library or can be custom generated. The evaluator can select the programming language(s) that the candidate can use.
#D3, #React and #Clustergrams
Posted on February 22nd, 2016
02/22/2016 @ Pivotal, 625 6th Ave, NY
Two speakers talked about combining the functionality of D3 + React and taking advantage of the power of D3 to create graphic displays for data analysis.
In the first presentation, Pan Wangperawong @_panw talked about how to integrate the power of D3 for graphics and React for dynamic updating of the DOM. Both D3 and React espouse functional programming and both want to manipulate the DOM, but only one can do this without conflict. React creates a virtual DOM to efficiently changes in the display. So, to utilize this facility, React should manipulate the DOM. This means that D3 does the math and creates objects that are passed to React, but D3 cannot directly access the DOM.
Pan illustrated this by walking through code behind an example application: http://starwars.meteor.com/. The details are on Github.
An overview is that React creates DOM elements and takes responsibility for interactive manipulations. D3 is called within a single function to set up these elements. The D3 calls start in line 27 of his code.
In the second presentation, Nicolas Fernandez introduced clustergrams (heat maps with rows and columns of the matrix ordered by a cluster analysis) and talked about how D3 gave him the programming tools to create interactive displays of clustergrams and related data analysis pictures.
He talked about related methods for displaying data including force-directed graphs, adjacency matrices, and raw item by characteristic matrices. He illustrated each using with a matrix of characters and which chapters they appeared in Les Miserable. The matrix can be colored when a character (column) appears in a chapter (row). The number of overlapping chapter appearances can be converted to a similarity measure which can be plotted as a adjacency matrix, force-directed graph, etc.
For each of these, Nicolas demonstrated how an interactive graph helps us better understand the structure of the book.
Next he talked about how Python (using the SciPy tool) and D3 gives him the tools to make the interactive plots. Specifically, he talked about how D3 object constancy makes it possible to zoom in with the following effects:
- remove labels for rows that will fall off the end of the page
- reposition the remaining labels to cover the height of the page
- expand the matrix to vertically fill the page
He also illustrated other effects made easy to implement in D3.
His code is available on