Posted on May 6th, 2017
05/04/2017 @Ebay, 625 6th Ave, NY 3rd floor
John Novak @QxBranch talked about the process in developing quantum computers. The theory is based on Adiabatic optimization. With each qubit is started at low energy levels along with couplings with the energy levels amplified so there is a high probability that the correct solution state will be the realized output when the quantum field collapses.
In the architecture of the D-Wave computer, qubits are organized in 4 x 4 cells in a pattern called a Chimera graph. These nodes are joined together to increase the number of digits. This raises certain challenges since all nodes are not connected to all other nodes: some logical nodes need to be represented multiple items in the physical computer.
Other challenges are running the quantum computer for a sufficiently long time to refine the probabilistic output. Challenges to increase the number of digits in the computer include the need to supercool more wires and adding error correction circuits. Eventually room –temperature superconductors will need to be developed.
Building #ImageClassification models that are accurate and efficient
Posted on April 28th, 2017
04/28/2017 @NYUCourantInstitute, 251 Mercer Street, NYC, room 109
Laurens van der Maaten @Facebook spoke about some of the new technologies used by Facebook to increase accuracy and lower processing needed in image identification.
He first talked about residual networks which they are developing to replace standard convolutional neural networks. Residual networks can be thought of as a series of blocks each of which is a tiny #CNN:
- 1×1 layer, like a PCA
- 3×3 convolution layer
- 1×1 layer, inverse PCA
The raw input is added to the output of this mini-network followed by a RELU transformation.
These transformations extract features while keeping information that is input into the block, so the map is changed, but does not need to be re-learned from scratch. This eliminates some problems with vanishing gradients in the back propagation as well as the unidentifiabiliy problem.
Blocks when executed in sequence gradually add features, but removing a block after training hardly degrades performance (Huang et al 2016). From this observation they concluded that the blocks were performing two functions: detect new features and pass through some of the information in the raw input. Therefore, this structure could be made more efficient if they pass through the information yet allowed each block to only extract features.
DenseNets gives each block in each layer access to all features in the layer before it. The number of feature maps increases in each layer, so there is the possibility of a combinatorial explosion of units with each layer. Fortunately, this does not happen as each layer adds 32 new modules but the computation is more efficient, so the aggregate amount of computation for a given level of accuracy decreases when using DenseNet in favor of ResNet while accuracy improves.
Next Laurens talked about making image recognition more efficient, so a larger number of images could be processed with the same level of accuracy in a shorter average time.
He started by noting that some images are easier to identify than others. So, the goal is to quickly identify the easy images and only spend further processing time on the harder, more complex images.
The key is noting that easy images are classified using only a coarse grid, but then harder images would not be classifiable. On the other hand, using a fine grid makes it harder to classify the easy image.
Laurens described a hybrid 2-d network in which there are layers analyzing the image using the coarse grid and layers analyzing the fine grid. The fine grain blocks occasionally feed into the coarse grain blocks. At each layer outputs are tested to see if the confidence level for any image exceeds a threshold. Once the threshold is exceeded, processing is stopped and the prediction is output. In this way, when the decision is easy, this conclusion is arrived at quickly. Hard images continue further down the layers and require more processing.
By estimating the percentage of the classifier exiting at each threshold, then can time the threshold levels so that more images can be processed within a given time budget
During the Q&A, Laurens said
- To avoid overfitting the model, they train the network on both the original images as well as these same images after small transformation have been done on each image.
- They are still working to expand the #DenseNet to see its upper limits on accuracy
- He is not aware of any neurophysiological structures in the human brain that correspond to the structure of blocks in #ResNet / DenseNet.
Intro to #DeepLearning using #PyTorch
Posted on February 21st, 2017
02/21/2017 @ NYU Courant Institute (251 Mercer St, New York, NY)
Soumith Chintala @Facebook first talked about trends in the cutting edge of machine learning. His main point was that the world is moving from fixed agents to dynamic neural nets in which agents restructure themselves over time. Currently, the ML world is dominated by static datasets + static model structures which learn offline and do not change their structure without human intervention.
He then talked about PyTorch which is the next generation of ML tools after Lua #Torch. In creating PyTorch they wanted to keep the best features of LuaTorch, such as performance and extensibility while eliminating rigid containers and allowing for execution on multiple-GPU systems. PyTorch is also designed so programmers can create dynamic neural nets.
Other features include
- Kernel fusion – take several objects and fuse them into a single object
- Order of execution – reorder objects for faster execution
- Automatic work placement when you have multiple GPUs
PyTorch is available for download on http://pytorch.org and was released Jan 18, 2017.
Currently, PyTorch runs only on Linux and OSX.
NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)
Posted on November 22nd, 2016
11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor
Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)
Speaker 2: Top-down vs. Bottom-up Computational Creativity – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)
Matthew Russell @DigitalReasoning spoke about understanding language using NLP, relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:
- Vagrant – allow an environmental setup.
- Jupyter Notebook – like a lab notebook
- Git – version control
- Automation –
He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.
He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.
He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.
In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of
Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure
Bottom-up: data driven – data driven, “hands off”, you learn the structure
He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other
To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.
It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.
As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.
Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative
They like a mix of top-down and bottom-up approaches:
- Want speed, but neural nets are slow
- Music has a lot of theory behind it, so it’s best to let the programmers code these rules
- Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars
Runtime written in Haskell – functional language so its great for music
Advanced #DeepLearning #NeuralNets: #TimeSeries
Posted on June 16th, 2016
06/15/2016 @Qplum, 185 Hudson Street, Jersey City, NJ, suite 1620
Sumit then broke the learning process into two steps: feature extraction and classification. Starting with raw data, the feature extractor is the deep learning model that prepares the data for the classifier which may be a simple linear model or random forest. In supervised training, errors in the prediction output by the classifier are feed back into the system using back propagation to tune the parameters of the feature extractor and the classifier.
In the remainder of the talk Sumit concentrated on how to improve the performance of the feature extractor.
In the general text classification (unlike image or speech recognition) the length of the input can be very long (and variable in length). In addition, analysis of text by general deep learning models
- does not capture order of words or predictions in time series
- can handle only small sized windows or the number of parameters explodes
- cannot capture long term dependencies
So, the feature extractor is cast as a time delay neural networks (#TDNN). In TDNN, the words are text is viewed as a string of words. Kernel matrices (usually of from 3 to 5 unit long) are defined which compute a dot products of the weights of the words in a contiguous block of text. The kernel matrix is shifted one word and the process is repeated until all words are processed. A second kernel matrix creates another set of features and so forth for a 3rd kernel, etc.
These features are then pooled using the mean or max of the features. This process is repeated to get additional features. Finally a point-wise non-linear transformation is applied to get the final set of features.
Unlike traditional neural network structures, these methods are new, so no one has done a study of what is revealed in the first layer, second layer, etc. Also theoretical work is lacking on the optimal number of layers for a text sample of a given size.
Historically, #TDNN has struggled with a series of problem including convergence issues, so recurrent neural networks (#RNN) were developed in which the encoder looks at the latest data point along with its own previous output. One example is the Elman Network, which each feature is the weighted sum of the kernel function (one encoder is used for all points on the time series) output with the previously computed feature value. Training is conducted as in a standard #NN using back propagation through time with the gradient accumulated over time before the encoder is re-parameterized, but RNN has a lot issues
1, exploding or vanishing gradients – depending on the largest eigenvalue
2. cannot capture long-term dependencies
3. training is somewhat brittle
The fix is called Long short-term memory. #LSTM, has additional memory “cells” to store short-term activations. It also has additional gates to alleviate the vanishing gradient problem.
(see Hochreiter et al . 1997). Now each encoder is made up of several parts as shown in his slides. It can also have a forget gate that turns off all the inputs and can peep back at the previous values of the memory cell. At Facebook, NLP and speech and vision recognition are all users of LSTM models
LSTM models, however still don’t have a long term memory. Sumit talked about how creating memory networks which will take a store and store the key features in a memory cell. A query runs against the memory cell and then concatenates the output vector with the text. A second query will retrieve the memory.
He also talked about using a dropout method to fight overfitting. Here, there are cells that randomly determine whether a signal is transmitted to the next layer
Autocoders can be used to pretrain the weights within the NN to avoid problems of creating solution that are only locally optimal instead of globally optimal.
[Many of these methods are similar in spirit to existing methods. For instance, kernel functions in RNN are very similar to moving average models in technical trading. The different features correspond to averages over different time periods and higher level features correspond to crossovers of the moving averages.
The dropoff method is similar to the techniques used in random forest to avoid overfitting.]
Evolving from #RDBMS to #NoSQL + #SQL
Posted on May 3rd, 2016
05/03/2016 @Thoughtworks, 99 Madison Ave, 15th floor, NY
Jim Scott @MAPR spoke about #ApacheDrill which has a query language that extends ANSI SQL. Drill provides an interface that uses this SQL-extension to access data in underlying db’s that are SQL, noSQL, csv, etc.
The Ojai API has the following advantages
- Gson (in #Java) uses two lines of code to serialize #JSON to place into the data. One line to deserialize
- Idempotent – so don’t need to worry about replaying actions things twice if there is an issue.
- Drill does not requires Java, but not Hadoop so it can run on a desktop
- Schema on the fly – will take different data formats and join them together: e.g. csv + JSON
- Data is directly access from the underlying databases without needing to first transform them to a metastore
- Security – plugs into authentication mechanism of the underlying dbs. Mechanisms can go through multiple chains of ownership. Security can be done on row level and column level.
- Commands extend SQL to allow access lists in a JSON structure
- Can create views to output to parquet, csv, json formats
- FLATTEN – explode an array in a JSON structure to display as multiple rows with all other fields duplicated
#NoSQL Databases & #Docker #Containers: From Development to Deployment
Posted on April 26th, 2016
04/26/2016 @ThoughtWorks 99 Madison Ave., 15th Floor, New York, NY
Alvin Richards, VP of Product, @Aerospike spoke about employing Aerospike in Docker containers.
He started by saying that database performance demands including cache and dataLakes have made deployment complex and inefficient. Containers were developed to simplify deployment. They are similar to virtual machines, but describe the OS, programs and environmental dependencies in a standard format file. Components are
- Docker file with names + directory + processes to run to setup. OCI is the open container standard.
- Docker Compose orchestrates containers
- Docker Swarm orchestrates clustering of machines
- Docker Machine provisions machines.
Containers share root images (such as the Python image file).
Aerospike is a key value store which is built on the bare hardware (does not call the OS) for speed. It also automates data replication across nodes.
When Aerospike is run in Docker containers
- All nodes perform the same function – automated replication.
- The nodes self discover other nodes to balance the load & replication
- Application needs to understand the topology as it changes
In development, the data are often kept in the container since one usually wants to delete the development data when the development server is decommissioned. However, production servers usually don’t hold the data since these servers may be brought up and down, but the data is always retained.
Harness the power of #Web #Audio
Posted on April 20th, 2016
04/20/2016 @TechStars, 1407 Broadway, NY
Titus Blair @Dolby demonstrated the importance of sound in the mood and usability of a web page. He then showed the audience how to incorporate higher quality audio into a web site.
He first showed a video of a beach scene. Different audio tracks changed the mood from excitement to mystery to romantic to suspenseful to tropical.
By sending a wav file to the Dolby development site one creates a high quality audio file in mp4 format which can be downloaded and played through selected browser (currently including Echo and Safari).
Titus then showed two examples, a #video game and a frequency spectrum display, and walked the audience through the code needed to play audio file.
- Web code needs to test if the browser can handle the Dolby digital plus file
- Parameters in the backgroundSound variable adjust the playback rate and other qualities
- To get frequency spectrum, an audiocontext variable does an fft which can be plotted
Finally, Titus illustrated our sensitivity to sound by playing the video “How to make someone sound like an idiot”.
Slides for this presentation are available on http://Bit.ly/dolbynycjs
Hacking with the #RaspberryPi and #Windows 10 #IoT Core
Posted on March 24th, 2016
03/23/2016 @Microsoft, 101 Wood Ave South, Iselin, NJ
Nick Landry showed how to use the Windows 10 operating system to control devices in the Internet of Things.
He first talked about IoT = things + connectivity + data + analytics. He demonstrated software running on the Raspberry Pi, but emphasized that Windows 10 IoT allows developers to create code that runs on platforms from ARM devices (IoT) to phones to tablets to laptops to desktops to large displays. Within the IoT space, Windows 10 runs on
- Raspberry Pi 2 & 3 – ARM processor – Wi-Fi, Bluetooth,…
- Intel Atom E3800 processor x86 – (Tablet) – Ethernet,…
- Qualcomm Snapdragon 410 – (cell phone) – GPS, WiFi,…
W10 also has many levels of functionality to accommodate differences in interfaces (headed = screen interface, headless = no screen interface) and differences in hardware by using a single C# development core with difference SDKs to access the different capabilities of devices.
The Windows 10 stack has the W10 operating system on which Win32 sits as does UWP. The majority of UWP APIs are shared across devices including desktop, phones, ioT, etc.
Nick then walked through the steps to replace the Linux OS with Windows 10 on a Raspberry Pi from the http://dev.windows.com/iot web site. He noted that the latest Raspberry Pi, the Pi 3, requires you to download the ‘Insider preview version’ to successfully flash the hardware.
The Raspberry Pi 3 includes wi-fi and Bluetooth, but the current version of Windows 10 does not currently handle those functionalities natively, but will eventually do so.
He next showed the Raspberry Pi and talked about how sensors and controls are connected through the GPIO pins and how the Windows 10 IoT extension SDK gives you access to those pins.
Programming the device using C# in Visual Studio uses different conventions than using Sketch in the Arduino IDE
- Instead of a ‘startup’ and ‘loop’, one needs to set up a timer with the timer interrupt route serves the same function as the ‘loop’ routine in Arduino
- Downloading the code requires one to select ‘ARM’ device and ‘Remote Machine’. The Arduino IDE only needs a COM port number.
- Event handling is done using the C# programming syntax
- Visual Studio has full access to services offered across a wide range of devices. Nick demonstrated how the text-to-speech routine can be called in the same way one would call text-to-speech when developing a smartphone app.
- Simple programs require more code, but that code can be used across devices.
- You can execute Arduino Sketches in Visual Studio and you can even combine Sketch and C# code in the same application.
Nick concluded by talking about The #FezHat (from ghielectronics). The Fez Hat is a development board which fits on the Raspberry Pi and includes: controls for DC and servo motors. Terminal blocks, light sensor, LEDs, temperature sensor, user buttons, etc. all for $35. It is analogous to Shield boards for the Arduino.
For further information, Nick suggested
If you’re having problems installing Windows 10 on a Raspbery Pi see.
CodeDrivenNYC: Tools and methods to make development teams more productive
Posted on March 23rd, 2016
03/22/2016 @FirstMarkCapital, 100 Fifth Ave, NY
The speakers spoke
- James Turnbull, CTO at Kickstarter: From Rails for Reasons?
- Evan Whalen, Engineering Manager at Blue Apron: 8 Habits for Productive Teams
- Dustin Lucien, CTO at Betterment: Fluid Teams: How Betterment Builds Product
Evan Whalen@BlueApron (recipes and ingredient for those recipes delivered to your door) talked about habits of productive teams. He emphasized three points
- Psychological safety
- 8 habits of productive teams
Psychological Safety. The Google Aristotle Project concluded that successful teams fostered a feeling that team members support each other. They called this psychological safety as members were secure that other team members wouldn’t embarrass them.
8 Habits. Evan takes his inspiration from Stephen R. Covey. He main points were:
- Be proactive – clearly define responsibilities to set expectations. Members need to share responsibility
- Begin with the end in mind – convey purpose, not urgency. Future-proof APIs and schemas
- Put first things first – make engineer happiness a priority – support passion projects
- Think win-win – have regular communication with key players to create mutually beneficial solutions
- Seek for to understand, then be understood – encourage face-to-face conversation, brown bag tech talks, group code reviews, etc.
- Synergize – empower through delegation. Cross-team communication : avoid working in silos, be transparent on work priorities
- Sharpen the saw – give immediate feedback – radical candor (see Kim Scott post) – challenge directly but care personally
- Find your voice and inspire others to find theirs – balance support with delegation
Feedback. Every 6 months developers are anonymously surveyed (using Glimpse) as to their happiness and empowerment. From this information create a task list of 3 areas for improvement.
In the second presentation, Dustin Lucien @Betterment (financial planning leveraging automation) spoke about their dynamic process to best match the needs of the company with the skills and interests of individuals. They do this with a mix of teams responsible for specific products/functions (home teams) and small, mission-driven teams (pods) working on specific projects.
The flexibility to move developers from home teams to short-term focused pods avoid silos and spreads knowledge and expertise through the company.
To allocate individuals to these special projects, Betterment uses an auction system. Inspired by work done at Pandora, each customer is given a value of $5/month and the total revenue for customers affected by new products/services is auctioned. Team leads submit projects and individuals bid on them. Management determines the makeup of the pods based on the individual interest and level of enthusiasm for each project (as indicated by the bidding). This process of creating new projects and assigning individuals to pods is repeated every 60 days. Management also makes sure that there are still sufficient resources in the home team (20% to 40%) so the home teams can continue their functions.
But as the company has grown, challenges have increased to this model.
- They have 3 strong lines of business, so fungability of skills across the organization is more difficult.
- The maturing of products demands more stability in the resource allocations.
- As a result they will probably move to a 90 day cycle.
- They will adjust plans to emphasize ROI on OKR.
- Other the other hand, teams are now large enough that they can now adjust their own resources to accommodate new projects.
Dustin noted the role of management. Pods are created with at least one person with leadership aspirations. Also the company is still small enough (currently 150 people) that everyone knows everyone else. New hires are put in “bands” to encourage rapid assimilation into the company. Groups in pods have often worked together before.
He also noted that pods are allowed to deviate from the original plan. But the pods and the teams need to operate under the leadership of an architecture group (which is outside the teams and pods). That group determines the overall system architecture and reviews the development process and outputs.
To close the presentations, James Turnbull@Kickstarter talked about how they went about upgrading the tools used in their development stack.
When Kickstarter was started they chose Rails as their development tool. As usage of the site has grown, they have come to realize that they needed to rearchitect the site. James said the issue was not scaling, it was resilience as breaks in one part of the code often created issues with other parts of the code. They decided to replace the monolithic Rails application. They did this using a process involving the entire development team. All members of the development teams worked on the following steps:
- Specify broad conditions that the new system needed to satisfy
- Do a broad paper bake off – compare many languages: JRuby, Clojure, Go… – consider the community, prior art, etc.
- Create a short list – Ruby, Java, Clojure
- Do a real world bake-off: create the code for the comment subsystem to test authentication and monitoring, etc. Ask whether there are their developers familiar with the language? Will it run faster? Does it scale? How convenient is it to use. Is there a body of people who have solved these problems before?
- Made a decision. The development team conducted a town hall meeting in which groups who had worked on the bake-off code presented pros and cons to the whole team. They decided to use Java.
- The big win was developing a process to make decisions. For future development, individuals or small groups can propose experiments on technology that the group as a whole could use. They can then conduct a smaller version of the above process so the group as a whole can learn from the smaller group’s experiences.