New York Tech Journal
Tech news from the Big Apple

Intro to #DeepLearning using #PyTorch

Posted on February 21st, 2017

#ACM NY

02/21/2017 @ NYU Courant Institute (251 Mercer St, New York, NY)

Soumith Chintala @Facebook first talked about trends in the cutting edge of machine learning. His main point was that the world is moving from fixed agents to dynamic neural nets in which agents restructure themselves over time. Currently, the ML world is dominated by static datasets + static model structures which learn offline and do not change their structure without human intervention.

He then talked about PyTorch which is the next generation of ML tools after Lua #Torch. In creating PyTorch they wanted to keep the best features of LuaTorch, such as performance and extensibility while eliminating rigid containers and allowing for execution on multiple-GPU systems. PyTorch is also designed so programmers can create dynamic neural nets.

Other features include

  1. Kernel fusion – take several objects and fuse them into a single object
  2. Order of execution – reorder objects for faster execution
  3. Automatic work placement when you have multiple GPUs

PyTorch is available for download on http://pytorch.org and was released Jan 18, 2017.

Currently, PyTorch runs only on Linux and OSX.

posted in:  ACM, data analysis, Programming, Python    / leave comments:   No comments yet

#Holograms, #VR, #Technology for Kids, #HomeSecurity

Posted on February 15th, 2017

#HardwiredNYC

02/15/2017 @ Wework Chelsea, 115 West 18th Street, NY, 4th floor

The speakers were

The first speaker, David @ PRSONAS  spoke about their product which is a hologram persona that can serve as a greeter at retail stores: provide product information, financial guidance, intake of medical symptoms, etc. The greeter is a flat holographic image of a person in what David called 2 ½-d display.

The software behind the hologram can provide appropriate hand gestures, show videos, instruct users to input data on a tablet, etc.

David talked about how they have customized the image to avoid falling into the uncanny valley (close to human-looking so feels creepy) by modeling the image as a non-human character.

++

Next, Sophia @ SVFR spoke about how her company is striving to become the common site for distribution of VR, #AR and #MR videos. She likened today for VR as the early 1990’s were for Yahoo, when distribution of web content was still in its infancy.

She talked about the barriers to widespread VR production. These include lack of universally available hardware to record VR, lack of editing tools, but most importantly, we don’t yet know how to tell a story taking advantage of the VR experience.

++

Next, Bethany @TechnologyWillSaveUs spoke about how her company is creating kits for students to experiment in creating their own technology. The kits contain sensors, motors, etc. and are linked to a programming language on their web portal which is an extension of Scratch.

As an example she demonstrated a programmable wrist band that can react to motion, etc.

Bethany then talked about their company strategy which emphasizes a range of products.

  1. Create a range of products: variety of prices, can create bundled products
  2. Product-market-fit: hardware is more difficult, so put development is on a tight production schedules with lots of feedback. Monitor ROI for various products.
  3. By having a range of product, there are activities for all parts of the company at any given time.

She talked about how the company strives to stay ahead of the competitions (Little Bits, Lego Mindstorms) by carefully target price points and creating a wide range of products for different age groups.

Finally, John @Canary talked about their stand-alone, in-home security system which is connected to an app on your phone.

He emphasized the importance of Product design = Relationship design

You need

  1. Quality time – the app needs to interactive. They made it easier to access the time line of videos taken by the system
  2. Crisis management – can contact the police if there is a notification – help the homeowner overcome a crisis: the assist the home-owner filing an insurance claim.
  3. Trust – connected-home customers are concerned about privacy. Use ICASlabs recently released device security protocols
  4. A little magic – surprise and delight. Good example is Netflix onboarding that asks for your movie preferences then starts recommending movies upon the first use.

John also mentioned that they store videos of arrivals and departures, temperature, air quality, how active are occupants. Videos are stored 24 hours or 1 month depending on the contract. They are partnering with insurance companies to get homeowner discounts for using Canary.

 

posted in:  applications, Hardwired NYC, Internet of Things, startup, VR    / leave comments:   No comments yet

The #Cannabis #Entrepreneur

Posted on January 18th, 2017

#digitalNYC

01/18/2017 @AlleyChelsea, 119 West 24th St., NY 4th floor

https://cannabisentrepreneur.splashthat.com/

A panel of four talked about the current and future state of the marijuana industry

Ryan Smith @Leaflink – (https://www.leaflink.com/) Join More Than 750 Dispensaries & 75 Leading Brands on the Platform for Orders, Sales & Relationship Management. The need to find the right people especially to deal with the complexity, regulatory, compliance that are not present in many other industries. This is especially true of technology. They scale across states by buying supplies within state, (product cannot go across state lines) and partner with other business:  but better business are structured as franchisees so they are separate across states. They are a technology company, so they don’t touch the product. Jeff Sessions will probably focus on immigration, not drugs.

Melissa Meyer @Women Grow (http://womengrow.com/) leadership summit – connect patients with data. Uses Angel for recruiting.

Morgan Paxhia @ Poseidon asset management (http://poseidonassetmanagement.com/). Operational issues persiste: 3 years ago the industry was very undercapitalized.  It’s still undercapitalized, but less so (Still low salaries). Some generic software packages are used, but customized packages need to be developed.  In California: Small farmers create coops, but taxes are still too high, so the black market is still large. Marijuana policy program – need to figure out how to work with the administration.

Lauren Rudick @Hiller(http://www.hillerpc.com/) The first question is is it legal? Washington allows lawyers to own businesses. Seed-to-sale tracking needed. Washington & Massachusetts – easy to qualifying for medical. Colorado, Oregon – easy for business to operate: stable. For profitability Puerto Rico might be the best state since there is no federal tax. Hippa compliant for medical sales.

 

posted in:  Cannabis, startup    / leave comments:   No comments yet

#VideoStreaming, #webpack,#diagrams

Posted on January 18th, 2017

#CodeDrivenNYC

01/17/2017 @FirstMarkCapital, 100 Fifth Ave, NY 3rd floor

Tim Whidden, VP Engineering at 1stdibs: Webpack Before It Was Cool – Lessons Learned

Sarah Groff-Palermo, Designer and Developer: Label Goes Here: A Talk About Diagrams

Dave Yeu, VP Engineering at Livestream: A Primer to Video on the Web: Video Delivery & Its Challenges

Dave Yeu @livestream talked about some of the challenges of streaming large amounts of video and livestreaming: petabytes storage, io, cpu, latency (for live video)

Problems

  1. Long-lived connections – there are several solutions
    1. HLS (Http live streaming) which cuts video into small segments and uses http as the delivery vehicle. Originally developed by Apple as a way to deliver video to iPhone as their coverage moves from cell tower to cell tower. It uses the power of http protocol = a play list & small chunks which are separate url’s: m3u8 files that point to the actual files.
      1. But there are challenges – if you need 3 chunks in your buffer, then you have a 15 second delay. As you decrease the size of each chunk, the play list gets longer so you need to do more requests for the m3u8 file.
    2. DASH – segments follow a template which reduces index requests
    3. RTMP – persistent connections, extremely low latency, used by Facebook
  2. Authorization – but don’t want you to rebroadcast. (no key, so not DRM).
    1. Move authentication to cache level – use Varnish.
    2. Add token to the playlist, Varnish vets the token and serves the content. => all things come through their api.
    3. But – you expand the scope of your app = cache + server.
  3. Geo-restrictions
    1. Could do this: IP address + restrictions. But in this case you need to put geo-block behind the cache and server.
    2. Instead, the api generate s geo-block config. Varnish loads in a memory map and checks
    3. If there is a geo violation, then Varnish returns a modified url, so the server can decide how to respond

++

Tim Whidden @1stdibs, an online market place for curated goods –“ ebay for rich people” spoke about Webpack, a front end module system. He described how modules increase the usability of functions and performs other functions like code compression.

++

Finally, Sarah Groff-Palermo @sarahgp.com spoke about how diagrams help her clarify the code she has written and provide documentation for her and others in the future.

She described a classification of learning types from sequential learner (likes tutorials) to global learners (like to see the big picture first) (see http://www4.ncsu.edu/unity/lockers/users/f/felder/public/ILSdir/styles.htm) . Sarah showed several diagrams and pointed out how they help her get and keep the global picture. She especially likes the paradigm from Ben Schneiderman  – overview, zoom and filter then details-on-demand

For further ideals she recommended

  1. the book Going Forth – lots of diagrams
  2. Now you see it by Stephen Few
  3. Flowing data – blog by Nathan Yau
  4. Keynote is a good tool to use for diagrams

posted in:  applications, Code Driven NYC, video    / leave comments:   No comments yet

#ComputerScience and #DigitalHumanities

Posted on December 8th, 2016

PRINCETON #ACM / #IEEE-CS CHAPTERS DECEMBER 2016 JOINT MEETING

12/08/2016 @Princeton University Computer Science Building, Small Auditorium, Room CS 105, Olden and William Streets, Princeton NJ

img_20161208_2025322 img_20161208_2038401 img_20161208_2044441 img_20161208_2045551

Brian Kernighan @Princeton University spoke about how computers can assist in understanding research topics in the humanities.

He started by presenting examples of web sites with interactive tools for exploring historical material

  1. Explore a northern and a southern town during the Civil War: http://valley.lib.virginia.edu/
  2. Expedia for a traveler across ancient Roman: http://orbis.stanford.edu/
  3. The court records in London from 1674-1913: https://www.oldbaileyonline.org/
  4. Hemingway and other literary stars in Paris from the records of Sylvia Beach

Brian then talked about the challenges of converting the archival data: digitize, meta tag, store, query, present results, make available to the public

In preparation for teaching a class this fall on digital humanities, he talked about his experience extracting information from a genealogy based on the descendents of Nicholas Cady (https://archive.org/details/descendantsofnic01alle) in the U.S. from 1645 to 1910. He talked about the challenges of standard OCR transcription of page images to text: dropped characters and misplaced entries. There were then the challenges of understanding the abbreviations in the birth and death dates for individuals and the limitations of off-the-shelf software to highlight important relations in the data.

Brian highlighted some facts derived from the data:

  1. Mortality in the first five years of life was very high
  2. Names of children within a family were often recycled if an earlier child had died very young

posted in:  ACM, data analysis    / leave comments:   No comments yet

NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)

Posted on November 22nd, 2016

#NYAI

11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor

img_20161122_1918271 img_20161122_2039491

Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)

Speaker 2: Top-down vs. Bottom-up Computational Creativity  – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)

Matthew Russell @DigitalReasoning  spoke about understanding language using NLP,  relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:

  1. Vagrant – allow an environmental setup.
  2. Jupyter Notebook – like a lab notebook
  3. Git – version control
  4. Automation –

He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.

He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.

He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.

In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of

Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure

Bottom-up: data driven – data driven, “hands off”, you learn the structure

He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other

To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.

It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.

As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.

Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative

They like a mix of top-down and bottom-up approaches:

  1. Want speed, but neural nets are slow
  2. Music has a lot of theory behind it, so it’s best to let the programmers code these rules
  3. Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars

Runtime written in Haskell – functional language so its great for music

posted in:  AI, Big data, data analysis, Data science, NewYorkAI, Programming    / leave comments:   No comments yet

#Genomic analysis and #BigData using #FPGA’s

Posted on November 17th, 2016

#BigDataGenomicsNYC

11/17/2016 @ Phosphous, 1140 Broadway, NY, 11th floor

img_20161117_1954111 img_20161117_2015231 img_20161117_2017531

Rami Mehio @Edico Genome spoke about the fast analysis of a human genome  (initially did secondary analysis which is similar to telecommunications – errors in the channel) as errors come from the process due to the repeats and mistakes in the sequencer)

Genomic data doubles every 7 months historically, but the computational speed to do the analysis lags, as Moore’s law has a doubling every 18 months. With standard CPUs, mapping takes 10 to 30 hours on a 24 core server. Quality control adds several hours.

In addition, a human genome file is a 80GB Fastq file.  (this is only for a rough look at the genome at 30x = # times DNA is multiplied = #times the analysis is redone.)

Using FPGAs reduced the analysis time to 20 minutes. Also the files in CRAM compression are reduced to 50GB.

The server code is in C/C++. The FPGAs are not programmed, but their connectors are specified using the VITAL or VHDL languages.

HMM and Smith-Waterman algorithms require the bulk of the processing time, so both are implemented in the FPGAs. Other challenges are to get sufficient data to feed the FPGA which means the software needs to run in parallel. Also, the FPGAs are configured so they can change the algorithm selectively to make advantage of what needs to be done at the time.

posted in:  Big data, data, Genome, hardware    / leave comments:   No comments yet

#InternetOfThings, #Drones, #Robots, and #Music

Posted on November 16th, 2016

#HardwiredNYC

11/16/2016 @WeWork, 115 West 18th Street, NY, 4th floor

img_20161116_1825571 img_20161116_1847011 img_20161116_1902591 img_20161116_1929091 img_20161116_1941251

The speakers were

Charlie Key @Losant  talked about asset tracking.: fleet managment, shipment tracing, equipment tacking, heavy duty parenting. The package often consists of two parts: GPS tracking + communication (usually cellular). Hologram allows purchase of data by the byte with data sent every 5 minutes.

Use Google’sAPI to look up locations. Then check if inside a geofence.

David Lyman@BetterView captures and analyzes drone data. They have analyzed 4200 rooftops for insurance companies. Experts currently analyze the images. They are moving toward deep learning. The main drivers of the increased use of drones are Regulation, hardware, experience.

He sees a longer term opportunity: 5mm workers that should have a drone in their trucks – fence installers, HVAC maintenance.

Vaughn @Temboo: SAAS to connect actuators and sensors to the cloud, gave several examples of IoT in industry:

  1. Monginis Foods Ltd. – cakes and pastry in India, UK, EMEA: retrofit equipment and processes to implement IoT. Examples include
    1. retrofitting x-ray machines that scan every cake and pastry – automate alerts.
    2. Monitor freezers and refrigerators to reduce food spoilage.
    3. Place temperature sensors as oven monitors
    4. Integrate with payment and logistics systems to make everything more efficient.
  2. One customer monitors soil moisture, electrical conductivity, light – in agriculture
  3. Aircraft repair company – monitor parts storage and temperature and humidity of storage for audit. Tracks technical manuals.
  4. Manufacturer of lawn mowers includes sensors in motors

The usual configuration is Sensor monitoring – triggered notifications — actuator control. Vaughn gave the following advice to IoT startups:

  1. Start with a small but real, concrete problem
  2. Focus on saving time or money to create real value at the start
  3. Quick wins help build confidence and expertise
  4. Get internal backing based on having a a working system
  5. See how the data and functionality create additional uses
  6. See how existing application can be modified for other users
  7. Build new Iot capabilities on top of existing ones

Leif@Righthand robotics: Intelligent robotic order-picking systems, talked about opportunities he sees in the industrial robotics space.

Existing industrial ecosystem: build components + system integrators -> end application

Most of the cost is in integration, so he is looking for systems that  are configurable by end users (simpler integration) . Examples include: Universal robotics (UR5), ReThink robotics (Sawyer), Franka produce collaborative robotics that users can program.

He gave some examples of industrial robotic applications:

  1. Robots as a service – a machine that thins the small lettuce plants. Farmers can rent when they need it.
  2. Navii is used by Lowes to tell customers were to find items in inventory.

He sees the key is having machines learning to handle variation as manual labor is hard to scale.

Finally, Roli, demonstrated a music technology that increases the flexibility and capabilities of accomplished musicians while being easy enough for beginners to create their own music.

Their original device in 2012 replaced a keyboard with a continuous sensitive surface: The Seaboard. They are introducing a more general devices (the block) that has the flexibility to play the sounds of multiple instruments, but in a simple and elegant package.

posted in:  Drones, hardware, Hardwired NYC, Internet of Things, Internet of Things, startup    / leave comments:   No comments yet

Color on mobile phone ads, color preferences revealed, Programming and humor

Posted on October 10th, 2016

#CodeDrivenNYC

10/10/2016 @FirstMarkCapital, 100 5th ave, NY 3rd floor

img_20161010_1822301 img_20161010_1906351 img_20161010_1907381 img_20161010_1859071

Robert Haining @Paypal spoke about API theming of mobile apps: Building software for retailers to see outside their web site. He concentrates on iOS development. Theming involves color – e.g. color of buy button, image, style.

They configured the user’s site using controls in their control panel. For example, they default to text, but companies can upload a logo. The information is stored in css file. They translate the json descriptions to the objective-C SDK.

They use Apple’s NSNotification center to update whenever the page is refreshed. They locally cache themes, but download from the API when possible. For fonts, they only use embedded fonts that come with the phone, in preference to the Apple fonts.

They initially show companies a reduced set of options.

They use Oauth for verification for that particular session.

Next, Arun Ranganathan @Pinterest spoke about their API. Emphasis on finding things you like (as opposed to explicitly searching for something).  Concentrates on platforms for companies.

At Pinterest have their own internal APIs. Also have an ad APIs (white listed to partners).

Finally, they have public development APIs. IFTTT allows an interaction with pins in Pinterest. The APIs are also used by the following (makling use of a hex coding of the overall color of each picture):

  1. Topshop (UK retailer) used the pins to deduce your color preferences to market to you.
  2. Valspar (paint) uses the API to better understand the colors you would like for your house.
  3. Burberry created a custom board with unique pins.
  4. Tok&stok (Brazil furniture) allowed physical buttons to be pushed to remind you of your preferences (via Bluetooth LE) as you walk through a store.

Finally, Ben Halpern @Argo gave a highly entertaining presentation about becoming the practical dev. He applied humor to the dev life on twitter: @ThePracticalDev. He tweets on serious and humorous topics.

posted in:  Code Driven NYC    / leave comments:   No comments yet

Listening to Customers as you develop, assembling a #genome, delivering food boxes

Posted on September 21st, 2016

#CodeDrivenNYC

09/21/2016 @FirstMark, 100 Fifth Ave, NY, 3rd floor

img_20160921_1824581 img_20160921_1850401 img_20160921_1910301 img_20160921_1937151

JJ Fliegelman @WayUp (formerly CampusJob) spoke about the development process used by their application which is the largest market for college students to find jobs. JJ talked about their development steps.

He emphasized the importance of specing out ideas on what they should be building and talking to your users.

They use tools to stay in touch with your customers

  1. HelpScout – see all support tickets. Get the vibe
  2. FullStory – DVR software – plays back video recordings of how users are using the software

They also put ideas in a repository using Trello.

To illustrate their process, he examined how they work to improved job search relevance.

They look at Impact per unit Effort to measure the value. They do this across new features over time. Can prioritize and get multiple estimates. It’s a probabilistic measure.

Assessing impact – are people dropping off? Do people click on it? What are the complaints? They talk to experts using cold emails. They also cultivate a culture of educated guesses

Assess effort – get it wrong often and get better over time

They prioritize impact/effort with the least technical debt

They Spec & Build – (product, architecture, kickoff) to get organized

Use Clubhouse is their project tracker: readable by humans

Architecture spec to solve today’s problem, but look ahead. Eg.. initial architecture – used wordnet, elastic search, but found that elastic search was too slow so they moved to a graph database.

Build – build as little as possible; prototype; adjust your plan

Deploy – they will deploy things that are not worse (e.g. a button that doesn’t work yet)

They do code reviews to avoid deploying bad code

Paul Fisher @Phosphorus (from Recombine – formerly focused on the fertility space: carrier-screening. Now emphasize diagnostic DNA sequencing) talked about the processes they use to analyze DNA sequences. With the rapid development of laboratory technique, it’s a computer science question now. Use Scala, Ruby, Java.

Sequencers produce hundreds of short reads of 50 to 150 base pairs. They use a reference genome to align the reads. Want multiple reads (depth of reads) to create a consensus sequence

To lower cost and speed their analysis, they focus on particular areas to maximize their read depth.

They use a variant viewer to understand variants between the person’s and the reference genome:

  1. SNPs – one base is changed – degree of pathogenicity varies
  2. Indels – insertions & deletions
  3. CNVs – copy variations

They use several different file formats: FASTQ, Bam/Sam, VCF

Current methods have evolved to use Spark, Parquet (columnar storage db), and Adam (use Avro framework for nested collections)

Use Zepplin to share documentation: documentation that you can run.

Finally, Andrew Hogue @BlueApron spoke about the challenges he faces as the CTO. These include

Demand forecasting – use machine learning (random forest) to predict per user what they will order. Holidays are hard to predict. People order less lamb and avoid catfish. There was also a dip in orders and orders with meat during Lent.

Fulfillment – more than just inventory management since recipes change, food safety, weather, …

Subscription mechanics – weekly engagement with users. So opportunities to deepen engagement. Frequent communications can drive engagement or churn. A/B experiments need more time to run

BlueApron runs 3 Fulfillment centers for their weekly food deliveries: NJ, Texas, CA shipping 8mm boxes per month.

posted in:  applications, Big data, Code Driven NYC, data, data analysis, startup    / leave comments:   No comments yet