New York Tech Journal
Tech news from the Big Apple

#VideoStreaming, #webpack,#diagrams

Posted on January 18th, 2017


01/17/2017 @FirstMarkCapital, 100 Fifth Ave, NY 3rd floor

Tim Whidden, VP Engineering at 1stdibs: Webpack Before It Was Cool – Lessons Learned

Sarah Groff-Palermo, Designer and Developer: Label Goes Here: A Talk About Diagrams

Dave Yeu, VP Engineering at Livestream: A Primer to Video on the Web: Video Delivery & Its Challenges

Dave Yeu @livestream talked about some of the challenges of streaming large amounts of video and livestreaming: petabytes storage, io, cpu, latency (for live video)


  1. Long-lived connections – there are several solutions
    1. HLS (Http live streaming) which cuts video into small segments and uses http as the delivery vehicle. Originally developed by Apple as a way to deliver video to iPhone as their coverage moves from cell tower to cell tower. It uses the power of http protocol = a play list & small chunks which are separate url’s: m3u8 files that point to the actual files.
      1. But there are challenges – if you need 3 chunks in your buffer, then you have a 15 second delay. As you decrease the size of each chunk, the play list gets longer so you need to do more requests for the m3u8 file.
    2. DASH – segments follow a template which reduces index requests
    3. RTMP – persistent connections, extremely low latency, used by Facebook
  2. Authorization – but don’t want you to rebroadcast. (no key, so not DRM).
    1. Move authentication to cache level – use Varnish.
    2. Add token to the playlist, Varnish vets the token and serves the content. => all things come through their api.
    3. But – you expand the scope of your app = cache + server.
  3. Geo-restrictions
    1. Could do this: IP address + restrictions. But in this case you need to put geo-block behind the cache and server.
    2. Instead, the api generate s geo-block config. Varnish loads in a memory map and checks
    3. If there is a geo violation, then Varnish returns a modified url, so the server can decide how to respond


Tim Whidden @1stdibs, an online market place for curated goods –“ ebay for rich people” spoke about Webpack, a front end module system. He described how modules increase the usability of functions and performs other functions like code compression.


Finally, Sarah Groff-Palermo spoke about how diagrams help her clarify the code she has written and provide documentation for her and others in the future.

She described a classification of learning types from sequential learner (likes tutorials) to global learners (like to see the big picture first) (see . Sarah showed several diagrams and pointed out how they help her get and keep the global picture. She especially likes the paradigm from Ben Schneiderman  – overview, zoom and filter then details-on-demand

For further ideals she recommended

  1. the book Going Forth – lots of diagrams
  2. Now you see it by Stephen Few
  3. Flowing data – blog by Nathan Yau
  4. Keynote is a good tool to use for diagrams

posted in:  applications, Code Driven NYC, video    / leave comments:   No comments yet

#ComputerScience and #DigitalHumanities

Posted on December 8th, 2016


12/08/2016 @Princeton University Computer Science Building, Small Auditorium, Room CS 105, Olden and William Streets, Princeton NJ

img_20161208_2025322 img_20161208_2038401 img_20161208_2044441 img_20161208_2045551

Brian Kernighan @Princeton University spoke about how computers can assist in understanding research topics in the humanities.

He started by presenting examples of web sites with interactive tools for exploring historical material

  1. Explore a northern and a southern town during the Civil War:
  2. Expedia for a traveler across ancient Roman:
  3. The court records in London from 1674-1913:
  4. Hemingway and other literary stars in Paris from the records of Sylvia Beach

Brian then talked about the challenges of converting the archival data: digitize, meta tag, store, query, present results, make available to the public

In preparation for teaching a class this fall on digital humanities, he talked about his experience extracting information from a genealogy based on the descendents of Nicholas Cady ( in the U.S. from 1645 to 1910. He talked about the challenges of standard OCR transcription of page images to text: dropped characters and misplaced entries. There were then the challenges of understanding the abbreviations in the birth and death dates for individuals and the limitations of off-the-shelf software to highlight important relations in the data.

Brian highlighted some facts derived from the data:

  1. Mortality in the first five years of life was very high
  2. Names of children within a family were often recycled if an earlier child had died very young

posted in:  ACM, data analysis    / leave comments:   No comments yet

NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)

Posted on November 22nd, 2016


11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor

img_20161122_1918271 img_20161122_2039491

Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)

Speaker 2: Top-down vs. Bottom-up Computational Creativity  – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)

Matthew Russell @DigitalReasoning  spoke about understanding language using NLP,  relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:

  1. Vagrant – allow an environmental setup.
  2. Jupyter Notebook – like a lab notebook
  3. Git – version control
  4. Automation –

He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.

He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.

He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.

In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of

Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure

Bottom-up: data driven – data driven, “hands off”, you learn the structure

He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other

To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.

It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.

As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.

Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative

They like a mix of top-down and bottom-up approaches:

  1. Want speed, but neural nets are slow
  2. Music has a lot of theory behind it, so it’s best to let the programmers code these rules
  3. Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars

Runtime written in Haskell – functional language so its great for music

posted in:  AI, Big data, data analysis, Data science, NewYorkAI, Programming    / leave comments:   No comments yet

#Genomic analysis and #BigData using #FPGA’s

Posted on November 17th, 2016


11/17/2016 @ Phosphous, 1140 Broadway, NY, 11th floor

img_20161117_1954111 img_20161117_2015231 img_20161117_2017531

Rami Mehio @Edico Genome spoke about the fast analysis of a human genome  (initially did secondary analysis which is similar to telecommunications – errors in the channel) as errors come from the process due to the repeats and mistakes in the sequencer)

Genomic data doubles every 7 months historically, but the computational speed to do the analysis lags, as Moore’s law has a doubling every 18 months. With standard CPUs, mapping takes 10 to 30 hours on a 24 core server. Quality control adds several hours.

In addition, a human genome file is a 80GB Fastq file.  (this is only for a rough look at the genome at 30x = # times DNA is multiplied = #times the analysis is redone.)

Using FPGAs reduced the analysis time to 20 minutes. Also the files in CRAM compression are reduced to 50GB.

The server code is in C/C++. The FPGAs are not programmed, but their connectors are specified using the VITAL or VHDL languages.

HMM and Smith-Waterman algorithms require the bulk of the processing time, so both are implemented in the FPGAs. Other challenges are to get sufficient data to feed the FPGA which means the software needs to run in parallel. Also, the FPGAs are configured so they can change the algorithm selectively to make advantage of what needs to be done at the time.

posted in:  Big data, data, Genome, hardware    / leave comments:   No comments yet

#InternetOfThings, #Drones, #Robots, and #Music

Posted on November 16th, 2016


11/16/2016 @WeWork, 115 West 18th Street, NY, 4th floor

img_20161116_1825571 img_20161116_1847011 img_20161116_1902591 img_20161116_1929091 img_20161116_1941251

The speakers were

Charlie Key @Losant  talked about asset tracking.: fleet managment, shipment tracing, equipment tacking, heavy duty parenting. The package often consists of two parts: GPS tracking + communication (usually cellular). Hologram allows purchase of data by the byte with data sent every 5 minutes.

Use Google’sAPI to look up locations. Then check if inside a geofence.

David Lyman@BetterView captures and analyzes drone data. They have analyzed 4200 rooftops for insurance companies. Experts currently analyze the images. They are moving toward deep learning. The main drivers of the increased use of drones are Regulation, hardware, experience.

He sees a longer term opportunity: 5mm workers that should have a drone in their trucks – fence installers, HVAC maintenance.

Vaughn @Temboo: SAAS to connect actuators and sensors to the cloud, gave several examples of IoT in industry:

  1. Monginis Foods Ltd. – cakes and pastry in India, UK, EMEA: retrofit equipment and processes to implement IoT. Examples include
    1. retrofitting x-ray machines that scan every cake and pastry – automate alerts.
    2. Monitor freezers and refrigerators to reduce food spoilage.
    3. Place temperature sensors as oven monitors
    4. Integrate with payment and logistics systems to make everything more efficient.
  2. One customer monitors soil moisture, electrical conductivity, light – in agriculture
  3. Aircraft repair company – monitor parts storage and temperature and humidity of storage for audit. Tracks technical manuals.
  4. Manufacturer of lawn mowers includes sensors in motors

The usual configuration is Sensor monitoring – triggered notifications — actuator control. Vaughn gave the following advice to IoT startups:

  1. Start with a small but real, concrete problem
  2. Focus on saving time or money to create real value at the start
  3. Quick wins help build confidence and expertise
  4. Get internal backing based on having a a working system
  5. See how the data and functionality create additional uses
  6. See how existing application can be modified for other users
  7. Build new Iot capabilities on top of existing ones

Leif@Righthand robotics: Intelligent robotic order-picking systems, talked about opportunities he sees in the industrial robotics space.

Existing industrial ecosystem: build components + system integrators -> end application

Most of the cost is in integration, so he is looking for systems that  are configurable by end users (simpler integration) . Examples include: Universal robotics (UR5), ReThink robotics (Sawyer), Franka produce collaborative robotics that users can program.

He gave some examples of industrial robotic applications:

  1. Robots as a service – a machine that thins the small lettuce plants. Farmers can rent when they need it.
  2. Navii is used by Lowes to tell customers were to find items in inventory.

He sees the key is having machines learning to handle variation as manual labor is hard to scale.

Finally, Roli, demonstrated a music technology that increases the flexibility and capabilities of accomplished musicians while being easy enough for beginners to create their own music.

Their original device in 2012 replaced a keyboard with a continuous sensitive surface: The Seaboard. They are introducing a more general devices (the block) that has the flexibility to play the sounds of multiple instruments, but in a simple and elegant package.

posted in:  Drones, hardware, Hardwired NYC, Internet of Things, Internet of Things, startup    / leave comments:   No comments yet

Color on mobile phone ads, color preferences revealed, Programming and humor

Posted on October 10th, 2016


10/10/2016 @FirstMarkCapital, 100 5th ave, NY 3rd floor

img_20161010_1822301 img_20161010_1906351 img_20161010_1907381 img_20161010_1859071

Robert Haining @Paypal spoke about API theming of mobile apps: Building software for retailers to see outside their web site. He concentrates on iOS development. Theming involves color – e.g. color of buy button, image, style.

They configured the user’s site using controls in their control panel. For example, they default to text, but companies can upload a logo. The information is stored in css file. They translate the json descriptions to the objective-C SDK.

They use Apple’s NSNotification center to update whenever the page is refreshed. They locally cache themes, but download from the API when possible. For fonts, they only use embedded fonts that come with the phone, in preference to the Apple fonts.

They initially show companies a reduced set of options.

They use Oauth for verification for that particular session.

Next, Arun Ranganathan @Pinterest spoke about their API. Emphasis on finding things you like (as opposed to explicitly searching for something).  Concentrates on platforms for companies.

At Pinterest have their own internal APIs. Also have an ad APIs (white listed to partners).

Finally, they have public development APIs. IFTTT allows an interaction with pins in Pinterest. The APIs are also used by the following (makling use of a hex coding of the overall color of each picture):

  1. Topshop (UK retailer) used the pins to deduce your color preferences to market to you.
  2. Valspar (paint) uses the API to better understand the colors you would like for your house.
  3. Burberry created a custom board with unique pins.
  4. Tok&stok (Brazil furniture) allowed physical buttons to be pushed to remind you of your preferences (via Bluetooth LE) as you walk through a store.

Finally, Ben Halpern @Argo gave a highly entertaining presentation about becoming the practical dev. He applied humor to the dev life on twitter: @ThePracticalDev. He tweets on serious and humorous topics.

posted in:  Code Driven NYC    / leave comments:   No comments yet

Listening to Customers as you develop, assembling a #genome, delivering food boxes

Posted on September 21st, 2016


09/21/2016 @FirstMark, 100 Fifth Ave, NY, 3rd floor

img_20160921_1824581 img_20160921_1850401 img_20160921_1910301 img_20160921_1937151

JJ Fliegelman @WayUp (formerly CampusJob) spoke about the development process used by their application which is the largest market for college students to find jobs. JJ talked about their development steps.

He emphasized the importance of specing out ideas on what they should be building and talking to your users.

They use tools to stay in touch with your customers

  1. HelpScout – see all support tickets. Get the vibe
  2. FullStory – DVR software – plays back video recordings of how users are using the software

They also put ideas in a repository using Trello.

To illustrate their process, he examined how they work to improved job search relevance.

They look at Impact per unit Effort to measure the value. They do this across new features over time. Can prioritize and get multiple estimates. It’s a probabilistic measure.

Assessing impact – are people dropping off? Do people click on it? What are the complaints? They talk to experts using cold emails. They also cultivate a culture of educated guesses

Assess effort – get it wrong often and get better over time

They prioritize impact/effort with the least technical debt

They Spec & Build – (product, architecture, kickoff) to get organized

Use Clubhouse is their project tracker: readable by humans

Architecture spec to solve today’s problem, but look ahead. Eg.. initial architecture – used wordnet, elastic search, but found that elastic search was too slow so they moved to a graph database.

Build – build as little as possible; prototype; adjust your plan

Deploy – they will deploy things that are not worse (e.g. a button that doesn’t work yet)

They do code reviews to avoid deploying bad code

Paul Fisher @Phosphorus (from Recombine – formerly focused on the fertility space: carrier-screening. Now emphasize diagnostic DNA sequencing) talked about the processes they use to analyze DNA sequences. With the rapid development of laboratory technique, it’s a computer science question now. Use Scala, Ruby, Java.

Sequencers produce hundreds of short reads of 50 to 150 base pairs. They use a reference genome to align the reads. Want multiple reads (depth of reads) to create a consensus sequence

To lower cost and speed their analysis, they focus on particular areas to maximize their read depth.

They use a variant viewer to understand variants between the person’s and the reference genome:

  1. SNPs – one base is changed – degree of pathogenicity varies
  2. Indels – insertions & deletions
  3. CNVs – copy variations

They use several different file formats: FASTQ, Bam/Sam, VCF

Current methods have evolved to use Spark, Parquet (columnar storage db), and Adam (use Avro framework for nested collections)

Use Zepplin to share documentation: documentation that you can run.

Finally, Andrew Hogue @BlueApron spoke about the challenges he faces as the CTO. These include

Demand forecasting – use machine learning (random forest) to predict per user what they will order. Holidays are hard to predict. People order less lamb and avoid catfish. There was also a dip in orders and orders with meat during Lent.

Fulfillment – more than just inventory management since recipes change, food safety, weather, …

Subscription mechanics – weekly engagement with users. So opportunities to deepen engagement. Frequent communications can drive engagement or churn. A/B experiments need more time to run

BlueApron runs 3 Fulfillment centers for their weekly food deliveries: NJ, Texas, CA shipping 8mm boxes per month.

posted in:  applications, Big data, Code Driven NYC, data, data analysis, startup    / leave comments:   No comments yet

NYAI#5: Neural Nets (Jason Yosinski) & #ML For Production (Ken Sanford)

Posted on August 24th, 2016

#NYAI, New York #ArtificialIntelligence

08/24/2016 @Rise 43 West 23rd Street, NY, 2nd floorPreview Changes

IMG_20160824_200640[1] IMG_20160824_203211[1]

Jason Yosinski@GeometricTechnology spoke about his work on #NeuralNets to generate pictures. He started by talking about machine learning with feedback to train a robot to move more quickly and using feedback to computer-generate pictures that are appealing to humans.

Jason next talked about AlexNet, based on work by Krizhevsky et al 2012, to classify images using a neural net with 5 convolutional layers (interleaved with max pooling and contrast layers) plus 3 fully connected layers at the end. The net with 60 million parameters was training on ImageNet which contains over 1mm images. His image classification Code is available on

Jason talked about how the classifier thinks about categories when it is not being trained to identify that category. For instance, the network may learn about faces even though there is no human category since it helps the system detect things such as hats (above a face) to give it context. It also identifies text to give it context on other shapes it is trying to identify.

He next talked about generating images by inputting random noise and randomly changing pixels. Some changes will cause the goal (such as a ‘lions’) to increase in confidence. Over many random moves, the goal increases in its confidence level. Jason showed many random images that elicited high levels of confidence, but the images often looked like purple-green slime. This is probably because the network, while learning, immediately discards the overall color of the image and is therefore insensitive to aberrations from normal colors.  (See Erhan et al 2009)

[This also raises the question of how computer vision is different from human vision. If presented with a blue colored lion, the first reaction of a human might be to note how the color mismatches objects in the ‘lion’ category. One experiment would be to present the computer model with the picture of a blue lion and see how it is classified. Unlike computers, humans encode information beyond their list of items they have learned and this encoding includes extraneous information such as color or location. Maybe the difference is that humans incorporate a semantic layer that considers not only the category of the items, but other characteristics that define ‘lion-ness’.  Color may be more central to human image processing as it has been conjectured that we have color vision so we can distinguish between ripe and rotten fruits. Our vision also taps into our expectation to see certain objects within the world and we are primed to see those objects in specific contexts, so we have contextual information beyond what is available to the computer when classifying images.]

To improve the generated pictures of ‘lions’, he next used a generator to create pictures and change them until they get a picture which has high confidence of being a ‘lion’. The generator is designed to create identifiable images. The generator can even produce pictures on objects that it has not been trained to paint. (Need to apply regularization to get better pictures for the target.)

Slides at

In the second talk, Ken Sanford @Ekenomics and H20.AI talked about the H2O open source project. H2O is a machine learning engine that can run in R, Python,Java, etc.

Ken emphasized how H2O (a multilayer feed forward neural network) provides a platform that uses the Java Score Code engine. This easies the transition from the model developed in training and the model used to score inputs in a production environment.

He also talked about the Deep Water project which aims to allow other open source tools, such as MXNET, Caffe, Tensorflow,… (CNN, RNN, … models) to run in the H2O environment.

posted in:  AI, Big data, Data science, NewYorkAI, Open source    / leave comments:   No comments yet

Board #Game #Design

Posted on August 15th, 2016

Central Jersey Mensa @Mensa

08/12/2016 @ APA Hotel Woodbridge, 120 S Wood Ave, Iselin, NJ

IMG_20160812_211308[1] IMG_20160812_203105[1]

Gil Hova @FormalFerretGames, a designer and publisher of #BoardGames talked about how to design a good board game.

Unlike transformative games (only play once, but it changes you), his games are entertaining, but he emphasized that fun is not a general term, but needs to be applied to a specific audience.

To describe his approach, Gil talked about four key terms

  1. flow – feeling of being in the zone. Ways to get players in the flow include
    1. clear set of goals
    2. immediate feedback
    3. goals neither too difficult nor too easy
    4. need to make challenges progressively more difficult
  2. fiero – feeling after triumphing over adversity. emotional peak. counterbalance to flow. its a fleeting moment. For instance, the “take that” mechanism in which you punish another player. The concept of about meaningful play. see Jane McGonigal – “Reality is broken”.
  3. heuristics – rules of thumb that are not part of the rules, but ways that players figure out to play the game – e.g. bluffing in poker. The developer needs to see how rule changes change behavior. Players start with “zero level heuristics” that they use the first time you play a game. As your play more, you “climb the heuristic tree” It’s also called “lenticular design” as we see new things every time we play the game. The heuristic tree can have many shapes:
    1. bush (e.g. tic-tac-toe, only 1 or 2 heuristic rules to win –e.g. take the center square)
    2. palm tree – a long climb before you understand how to play the game and then there are a lot of tools at your disposal
    3. sequoia – lots of heuristic levels with new concepts & tools at each level (e.g. chess)
  4. core engagement – the core that appeals to players. The one thing on which game is based.
    1. Scrabble – mastery of words
    2. Bridge – communication with partner

The key thing is to incentivize interesting behavior: “game design is mind control”

If game is too random, then the play becomes not meaningful. e.g. Flux
The game needs to reward good play. The game needs to get them into the 4 key terms.
Theme and Mechanism – it doesn’t matter which comes first, but it helps if they support each other.
The theme is a promise to the players, so make the mechanism consistent with the player’s expectations from the theme.
If there is no theme, then its better be simple to explain the rules.
MDA – mechanism dynamic aesthetics – dynamics is the intersection of the aesthetics and the mechanism.
Players start with the theme and drill down to the mechanism.
Designer starts with the mechanism and moves to the theme.
In the goal of uniting the theme and mechanism, Gil advises – remove the flavor text on the card (used to describe the card) since the flavor of the card should be implied by how the card plays

Gil then talked about the game development process he uses: 4 stages of play testing

  1. proof of concept – play solo. is this a game? is it interesting?
  2. alpha – plain broad strokes, talk about it, play with other designers. discuss why it broke. discard after each play test
  3. beta functional, balanced, show it. its now a functional game. Google image for graphics
  4. gamma beautiful, graphic tests, release to market

and 3 types of playtesters that he uses during the play testing stages

  1. silent tester – just a silent opponent
  2. brilliant tester – “what if you could do this”
  3. crazy tester – play with a opponent that tries things you have not considered.

Gil closed by talking generally about the game-development industry

you cannot play a great idea!
no one will steal your game.
do not ask for an NDA
don’t be attached to your game
let your game be what it wants to be
he recommends listing to the podcast, flip the table, which looks at obscure board games.

You will need 75 to 100 tests overall to get from idea to published game.


posted in:  Games forum, Mensa    / leave comments:   No comments yet

#Unsupervised Learning (Soumith Chintala) & #Music Through #ML (Brian McFee)

Posted on July 26th, 2016


07/25/2016 @Rise, 28 West 24rd Street, NY, 2nd floor

IMG_20160725_192101[1] IMG_20160725_192826[1] IMG_20160725_193256[1] IMG_20160725_200916[1] IMG_20160725_201046[1]

Two speakers spoke about machine learning

In the first presentation, Brian McFee @NYU spoke about using ML to understanding the patterns of beats in music. He graphs beats identified by Mel-frequency cepstral coefficients (#MFCCs)

Random walk theory combines two representations of points in the graph.

  1. Local: In the graph, each point is a beat, edge connect adjacent beats. Weight edges by MFCC .
  2. Repetition: Link k-nearest neighbor by repetition = same sound – look for beats. Weight by similarity (k is set to the square root of the number of beats)
  3. Combination: A = mu * local + (1-mu)*repetition; optimize mu for a balanced random walk , so probability of a local move – probability of a repetition move over all vertices. Use a least squares optimization to find mu so the two parts of the equation make equal contributions across all points to the value of A.

The points are then partitioned by spectral clustering: normalize Laplacian – take bottom eigenvectors which encode component membership for each beat; cluster the eigenvectors Y of L to reveal the structure. Gives hierchical decomposition of the time series. m=1, the entire song. m=2 gets the two components of the song. As you add more eigenvectors, the number of segments within the song increases.

Brain then showed how this segmentation can create compelling visualizations of the structure of a song.

The Python code used for this analysis is available in the msaf library.

He has worked on convolutional neural nets, but find them to be better at handing individual notes within the song  (by contrast, rhythm is over a longer time period)

In the second presentation, Soumith Chintala talked about #GenerativeAdversarialNetworks (GAN).

Generative networks consist of a #NeuralNet “generator” that produces an image. It takes as input a high dimensional matrix (100 dimensions) of random noise. In a Generative Adversarial Networks a generator creates an image which is optimized over a loss function which evaluates “does it look real”. The decision of whether the image looks real is determined by a second neural net “discriminator” that tries to pick the fake image from a set of other real images plus the output of the generator.

Both the generator and discriminator NN’s are trained by gradient descent to optimize their individual performance: Generator = max game; discriminator = min game. The process optimizes Jensen-Shannon divergence.

Soumith then talked about extensions to GAN. These include

Class-conditional GANS – take noise + class of samples as input to the generator.

Video prediction GANS –predict what happens next given the previous 2 or 3 frames. Added a MSE loss (in addition to the discriminator classification loss) which compares what happened to what is predicted

Deep Convolution GAN – try to make the learning more stable by using a CNN.

Text-conditional GAN – input =noise + text. Use LSTM model on the text input. Generate images

Disentangling representations – InfoGAN – input random noise + categorical variables.

GAN is still unstable especially for larger images, so work to improve it includes

  1. Feature matching – take groups of features instead of just the whole image.
  2. Minibatch learning

No one has successfully used GAN for text-in to text-out

The meeting was concluded by a teaser for Watchroom – crowd funded movie on AI and VR.

posted in:  AI, data analysis, NewYorkAI    / leave comments:   No comments yet