New York Tech Journal
Tech news from the Big Apple

How to build a #MixedReality experience for #Hololens

Posted on April 14th, 2017

#NYSoftwareEngineers, @NYSE

4/14/2017 @MicrosoftReactorAtGrandCentral, 335 Madison Ave, NY, 4th floor

Mike Pell and John gave a roadmap for generating #MixedReality content. They started with general rules for generating content and how these rules apply to building MR content.

  1. Know your audience –
    1. Role of emotion in design – we want to believe in what is shown in a hologram.
    2. Think situation – where am I? at home you are comfortable doing certain things, but there are different needs and different things you are comfortable in public
    3. Think spatially – different if you can walk around the object
    4. Think inclusive – widen your audience
  2. Know Your medium
    1. For now you look ridiculous when wearing a VR headset– but maybe this eventually becomes like a welder shield which you wear when you are doing something specialized
    2. Breakthrough experience – stagecraft – so one can see what the hololens user is seeing
  3. Know Your palette

Interactive Story Design – a fast way to generate MR content

  1. Character
    1. Who is your “spect-actor” (normally someone who observers – have a sense of who the individual is for this moment – avoid blind spot, so pick a specific person. )
    2. Who are your “interactors” – will change as a result of the interaction – can be objects, text, people
    3. This creates a story
  2. Location – design depends on where this occurs
  3. Journey – how does participant change

How to bring the idea to life: how to develop the script for the MR experience

3-step micro sprints – 3 to 6 minute segments – so you don’t get attached to something that doesn’t work. Set 1 to 2 minute time limit for each step

  1. Parameters – limited resources help creative development
    1. Personify everything including text has a POV, feelings, etc.
    2. 3 emotional responses – what is the emotional response of a chair when you sit in it?
      1. Negative
      2. Neutral
  1. 3 conduits
    1. Language
    2. Facial expression – everything has a face including interfaces and objects
  1. Playtest – do something with it
    1. 3 perspectives
      1. Participant
      2. Interacters – changes in personality over time
  1. PMI – evaluative process – write on index cards (not as a feedback session) so everyone shares their perspective. Next loop back to the parameters (step 1)
    1. Plus – this is interesting
    2. Minus – this weak
    3. Interesting – neither of the above “this is interesting”

How to envision and go fast:

  1. Filming on location – randomly take pictures – look for things that speak to you as creating an interesting experience.
  2. Understand the experience – look at the people (i.e. people viewing art)
  3. Visualize it – put people into the scene (vector silhouette in different poses) put artwork into scene along with viewers.
  4. Build a prototype using Unity. Put on the Hololens and see how it feels

They then went through a example session in which a child is inside looking at a T-Rex in the MOMA outdoor patio. The first building block was getting three emotional responses for the T-Rex:

  1. Positive – joy looking at a potential meal: the child
  2. Negative – too bad the glass barrier is here
  3. Neutral – let me look around to see what is around me

To see where we should be going, look at what children want to do with the technology

posted in:  Animation, Art, UI, video    / leave comments:   No comments yet

Beyond Big: Merging Streaming & #Database Ops into a Next-Gen #BigData Platform

Posted on April 13th, 2017

#SQLNYC

04/13/2017 @Thoughtworks, 99 Madison Ave, New York, 15th floor

Amir Halfon, VP of Strategic Solutions, @iguazio talked about methods for speeding up a analytics linked to a large database. He started by saying that a traditional software stack accessing a db was designed to minimize the time taken to access slow disk storage. This is resulted in layers of software. Amir said that with modern data access and db architecture, processing is accelerated by a unified data engine that eliminate many of the layers. This also allows for the creation of a generic access of data stored in many different formats and a record-by-record security protocol.

To simplify development they only use AWS  and only interface with Kafka, Hadoop, Spark. They are not virtualization (eventually reaches a speed limit), they do the actual store.

Another important method is to use “Predicate pushdown” =’ select … where … <predicate>’; usually all data are retrieved and then culled; instead if the predicate is pushed down, only the relevant data is retrieved. A.k.a. as an “offload-engine”.

MapR is a competitor using the HDFS database, as opposed to rebuilding the system from scratch.

posted in:  Big data, databases    / leave comments:   No comments yet

#Self-learned relevancy with Apache Solr

Posted on March 31st, 2017

#NYCapache

03/30/2017 @ Architizer , 1 Whitehall Street, New York, NT, 10th Floor

Trey Grainger @ Lucidworks covered a wide range of topics involving search.

He first reviewed the concept of an inverted index in which terms are extracted from documents and placed in an index which points back to the documents. This allows for fast searches of single terms or combinations of terms.

Next Trey covered classic relevancy scores emphasizing

tf-idf = how well a term described the document * how important is the term overall

He noted, however, the tf-idf’s values may be limited since it does not make use of domain-specific knowledge.

Trey then talked about reflected intelligence = self–learning search which uses

  1. Content
  2. Collaboration – how have others interacted with the system
  3. Context – information about the user

He said this method increases relevance by boosting items that are highly requested by others. Since the items boosted are those currently relevant to others, this allows the method to adapt quickly without need for manual curation of items.

Next he talked about semantic search which using its understanding of terms in the domain.

(Solr can connect to an RDF database to  leverage an ontology). For instance, one can run word2vec to extract terms and phrases for a query and them determine a set of keywords/phrases to best match the query to the contents of the db.

Also, querying a semantic knowledge graph can expand the search by traversing to other relevant terms in the db

posted in:  Big data, databases, Open source    / leave comments:   No comments yet

#Web and #Mobile #Development Panel

Posted on March 2nd, 2017

#StrtupBoost

03/02/2017 @TheYard, 106 West 32rd Street, NY 2nd floor

A panel consisting of the following people spoke about setting up your startup.

MODERATOR, FOUNDER & CEO OF STRTUPBOOST + SPORTSWONKS, Jason Malki

Founder & CEO, Torops, Konstantine Sukherman

Founder & President, Mango Concept, Michael daniels

Founder & Creative Director, Awesome, Firat Parlak

Managing Partner, New Logic Technology, Alex Sokoletsky

Founder & CEO, bromin7, Sergey Belov

 

Recommend platforms to create MVP website. WordPress, but depends on the client, how fast you want to get up and running, and who are the clients. Also some funding shops have custom platforms. Drupal & WordPress are good alternatives. A couple of days are all that is needed to create an MBP. The platform depends on product. WordPress is not a fully scalable application. Invest custom if your product needs to scale to be successful. 70% of web is WordPress. 1 million visits/day is often the point when scalability becomes a problem.

When does UI/UX become important? You should define the UX before you build. You will then do a better job of predicting your costs and features along with the time frame.  It’s the most important part of the startup. A prototype will make your pitch easier.

How do you build a dev team? Web sites: Drubal, Behance, WorkingButNotWorking. Initially reach out to your network. For a founder its different – what is the skill set are you are looking for?

How do you hire a CTO? Alternative is to outsource or get a technical advisor (a few hours/week). One of co-founders is better if they have technical background. If just starting, will need to offer CTO a lot of equity. If need technical co-founder might offer equal or even more equity. At later stage, will need to give less equity. Also the CTO might be good technically, but need not know the all the area of dev. Get people excited.

An internal team will give you greater control. But partnering might be most cost efficient. Want to build a long term relationship – outside tem must be interested in the product. In-house developers must have equity. Don’t squeeze dev too much – it’s about building a relationship. Everyone should be happy. Good dev are hard to find > $100k/year. Get it out in the market as quickly as possible. Need proof of concept in the market. Get people in house to manage the outsource developers. 40% in-house and 60% outside can be a good mix. Find a senior designer to start (don’t leave it to a junior designer).

How do you choose a dev shop? Needs to more than a dev shop – need business analysts. Startups have great ideas, but need a partner to help on strategy. Need technical knowledge but should also have interest in the field. Personalities need to match. Good idea to keep some people who built the project even when you are able to hire an in-house team. You need to build a communication channel. Select the shop or can build an off-shore team – depends on whether you need to senior staff. Need to interview the developers who will be working on the project. Can scale faster if you get the right partner.

Okay to the start without understanding the code. Need to get someone who will take over the task. Should talk to developers 2x-3x/day. This will help you generate ideas. Every good dev team will give you an estimate, but it’s just an estimate. Don’t try to push you agenda on your developers. If it takes longer, they are trying to make the product better. Get out as quickly as possible if dev team is not producing or communicating.

Will an angel invest in a company which is using a dev shop? The VC at some point will ask you to build an internal team. If large amount being raised you may be asked to put together an in-house team. Otherwise, you may do either. Investors are looking for a good idea! Be clear to the dev shop on the amount of money that is available to use for dev.

The technology stack is key if you are developing in house. If the CTO is using an outside team, then it is sufficient for the CTO to have a computer science degree and decent resume. Mainly need someone who can explain complex things to non-technical people. if AI is involved, the CTO should understand the core ideas of AI.

Never speak technical language to investors. Investors want to know the idea, revenue streams, the team.  Investors like to know how product will function, so they want to know on-boarding and scaling, but they don’t need to know the dev stack. When you want to raise millions, investors will do a tech review so they know if the product will scale.

Usually need some type of MVP, but it can be small and cheap to develop. Raise money for this from friends and family.

Make sure you understand the scope – detailed scope will help prevent dev overruns. If you need to go outside the scope, then cut back features. Break it down to multiple phases and emphasize the key features. Always have a 20% buffer. Design dictates the development. Lock in the design since changes for developers are expensive.

Any good dev shop will have a flexible contract, so you will need to pay for design changes, but they will need to absorb charges if they misestimate the time. Also it depends on the amount of overage and the relationship.

You cannot really protect the idea. Otherwise, patent it and you might be able to negotiate to let the shop share the tech in return for a lower price.  Don’t be afraid that someone will steal your idea. Executing is the difficult part.

 

posted in:  startup, technology, UI, UX    / leave comments:   No comments yet

Intro to #DeepLearning using #PyTorch

Posted on February 21st, 2017

#ACM NY

02/21/2017 @ NYU Courant Institute (251 Mercer St, New York, NY)

Soumith Chintala @Facebook first talked about trends in the cutting edge of machine learning. His main point was that the world is moving from fixed agents to dynamic neural nets in which agents restructure themselves over time. Currently, the ML world is dominated by static datasets + static model structures which learn offline and do not change their structure without human intervention.

He then talked about PyTorch which is the next generation of ML tools after Lua #Torch. In creating PyTorch they wanted to keep the best features of LuaTorch, such as performance and extensibility while eliminating rigid containers and allowing for execution on multiple-GPU systems. PyTorch is also designed so programmers can create dynamic neural nets.

Other features include

  1. Kernel fusion – take several objects and fuse them into a single object
  2. Order of execution – reorder objects for faster execution
  3. Automatic work placement when you have multiple GPUs

PyTorch is available for download on http://pytorch.org and was released Jan 18, 2017.

Currently, PyTorch runs only on Linux and OSX.

posted in:  ACM, data analysis, Programming, Python    / leave comments:   No comments yet

#VideoStreaming, #webpack,#diagrams

Posted on January 18th, 2017

#CodeDrivenNYC

01/17/2017 @FirstMarkCapital, 100 Fifth Ave, NY 3rd floor

Tim Whidden, VP Engineering at 1stdibs: Webpack Before It Was Cool – Lessons Learned

Sarah Groff-Palermo, Designer and Developer: Label Goes Here: A Talk About Diagrams

Dave Yeu, VP Engineering at Livestream: A Primer to Video on the Web: Video Delivery & Its Challenges

Dave Yeu @livestream talked about some of the challenges of streaming large amounts of video and livestreaming: petabytes storage, io, cpu, latency (for live video)

Problems

  1. Long-lived connections – there are several solutions
    1. HLS (Http live streaming) which cuts video into small segments and uses http as the delivery vehicle. Originally developed by Apple as a way to deliver video to iPhone as their coverage moves from cell tower to cell tower. It uses the power of http protocol = a play list & small chunks which are separate url’s: m3u8 files that point to the actual files.
      1. But there are challenges – if you need 3 chunks in your buffer, then you have a 15 second delay. As you decrease the size of each chunk, the play list gets longer so you need to do more requests for the m3u8 file.
    2. DASH – segments follow a template which reduces index requests
    3. RTMP – persistent connections, extremely low latency, used by Facebook
  2. Authorization – but don’t want you to rebroadcast. (no key, so not DRM).
    1. Move authentication to cache level – use Varnish.
    2. Add token to the playlist, Varnish vets the token and serves the content. => all things come through their api.
    3. But – you expand the scope of your app = cache + server.
  3. Geo-restrictions
    1. Could do this: IP address + restrictions. But in this case you need to put geo-block behind the cache and server.
    2. Instead, the api generate s geo-block config. Varnish loads in a memory map and checks
    3. If there is a geo violation, then Varnish returns a modified url, so the server can decide how to respond

++

Tim Whidden @1stdibs, an online market place for curated goods –“ ebay for rich people” spoke about Webpack, a front end module system. He described how modules increase the usability of functions and performs other functions like code compression.

++

Finally, Sarah Groff-Palermo @sarahgp.com spoke about how diagrams help her clarify the code she has written and provide documentation for her and others in the future.

She described a classification of learning types from sequential learner (likes tutorials) to global learners (like to see the big picture first) (see http://www4.ncsu.edu/unity/lockers/users/f/felder/public/ILSdir/styles.htm) . Sarah showed several diagrams and pointed out how they help her get and keep the global picture. She especially likes the paradigm from Ben Schneiderman  – overview, zoom and filter then details-on-demand

For further ideals she recommended

  1. the book Going Forth – lots of diagrams
  2. Now you see it by Stephen Few
  3. Flowing data – blog by Nathan Yau
  4. Keynote is a good tool to use for diagrams

posted in:  applications, Code Driven NYC, video    / leave comments:   No comments yet

NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)

Posted on November 22nd, 2016

#NYAI

11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor

img_20161122_1918271 img_20161122_2039491

Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)

Speaker 2: Top-down vs. Bottom-up Computational Creativity  – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)

Matthew Russell @DigitalReasoning  spoke about understanding language using NLP,  relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:

  1. Vagrant – allow an environmental setup.
  2. Jupyter Notebook – like a lab notebook
  3. Git – version control
  4. Automation –

He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.

He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.

He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.

In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of

Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure

Bottom-up: data driven – data driven, “hands off”, you learn the structure

He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other

To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.

It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.

As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.

Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative

They like a mix of top-down and bottom-up approaches:

  1. Want speed, but neural nets are slow
  2. Music has a lot of theory behind it, so it’s best to let the programmers code these rules
  3. Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars

Runtime written in Haskell – functional language so its great for music

posted in:  AI, Big data, data analysis, Data science, NewYorkAI, Programming    / leave comments:   No comments yet

#Genomic analysis and #BigData using #FPGA’s

Posted on November 17th, 2016

#BigDataGenomicsNYC

11/17/2016 @ Phosphous, 1140 Broadway, NY, 11th floor

img_20161117_1954111 img_20161117_2015231 img_20161117_2017531

Rami Mehio @Edico Genome spoke about the fast analysis of a human genome  (initially did secondary analysis which is similar to telecommunications – errors in the channel) as errors come from the process due to the repeats and mistakes in the sequencer)

Genomic data doubles every 7 months historically, but the computational speed to do the analysis lags, as Moore’s law has a doubling every 18 months. With standard CPUs, mapping takes 10 to 30 hours on a 24 core server. Quality control adds several hours.

In addition, a human genome file is a 80GB Fastq file.  (this is only for a rough look at the genome at 30x = # times DNA is multiplied = #times the analysis is redone.)

Using FPGAs reduced the analysis time to 20 minutes. Also the files in CRAM compression are reduced to 50GB.

The server code is in C/C++. The FPGAs are not programmed, but their connectors are specified using the VITAL or VHDL languages.

HMM and Smith-Waterman algorithms require the bulk of the processing time, so both are implemented in the FPGAs. Other challenges are to get sufficient data to feed the FPGA which means the software needs to run in parallel. Also, the FPGAs are configured so they can change the algorithm selectively to make advantage of what needs to be done at the time.

posted in:  Big data, data, Genome, hardware    / leave comments:   No comments yet

Listening to Customers as you develop, assembling a #genome, delivering food boxes

Posted on September 21st, 2016

#CodeDrivenNYC

09/21/2016 @FirstMark, 100 Fifth Ave, NY, 3rd floor

img_20160921_1824581 img_20160921_1850401 img_20160921_1910301 img_20160921_1937151

JJ Fliegelman @WayUp (formerly CampusJob) spoke about the development process used by their application which is the largest market for college students to find jobs. JJ talked about their development steps.

He emphasized the importance of specing out ideas on what they should be building and talking to your users.

They use tools to stay in touch with your customers

  1. HelpScout – see all support tickets. Get the vibe
  2. FullStory – DVR software – plays back video recordings of how users are using the software

They also put ideas in a repository using Trello.

To illustrate their process, he examined how they work to improved job search relevance.

They look at Impact per unit Effort to measure the value. They do this across new features over time. Can prioritize and get multiple estimates. It’s a probabilistic measure.

Assessing impact – are people dropping off? Do people click on it? What are the complaints? They talk to experts using cold emails. They also cultivate a culture of educated guesses

Assess effort – get it wrong often and get better over time

They prioritize impact/effort with the least technical debt

They Spec & Build – (product, architecture, kickoff) to get organized

Use Clubhouse is their project tracker: readable by humans

Architecture spec to solve today’s problem, but look ahead. Eg.. initial architecture – used wordnet, elastic search, but found that elastic search was too slow so they moved to a graph database.

Build – build as little as possible; prototype; adjust your plan

Deploy – they will deploy things that are not worse (e.g. a button that doesn’t work yet)

They do code reviews to avoid deploying bad code

Paul Fisher @Phosphorus (from Recombine – formerly focused on the fertility space: carrier-screening. Now emphasize diagnostic DNA sequencing) talked about the processes they use to analyze DNA sequences. With the rapid development of laboratory technique, it’s a computer science question now. Use Scala, Ruby, Java.

Sequencers produce hundreds of short reads of 50 to 150 base pairs. They use a reference genome to align the reads. Want multiple reads (depth of reads) to create a consensus sequence

To lower cost and speed their analysis, they focus on particular areas to maximize their read depth.

They use a variant viewer to understand variants between the person’s and the reference genome:

  1. SNPs – one base is changed – degree of pathogenicity varies
  2. Indels – insertions & deletions
  3. CNVs – copy variations

They use several different file formats: FASTQ, Bam/Sam, VCF

Current methods have evolved to use Spark, Parquet (columnar storage db), and Adam (use Avro framework for nested collections)

Use Zepplin to share documentation: documentation that you can run.

Finally, Andrew Hogue @BlueApron spoke about the challenges he faces as the CTO. These include

Demand forecasting – use machine learning (random forest) to predict per user what they will order. Holidays are hard to predict. People order less lamb and avoid catfish. There was also a dip in orders and orders with meat during Lent.

Fulfillment – more than just inventory management since recipes change, food safety, weather, …

Subscription mechanics – weekly engagement with users. So opportunities to deepen engagement. Frequent communications can drive engagement or churn. A/B experiments need more time to run

BlueApron runs 3 Fulfillment centers for their weekly food deliveries: NJ, Texas, CA shipping 8mm boxes per month.

posted in:  applications, Big data, Code Driven NYC, data, data analysis, startup    / leave comments:   No comments yet

DataDrivenNYC: bringing the power of #DataAnalysis to ordinary users, #marketers, #analysts.

Posted on June 18th, 2016

#DataDrivenNYC

06/13/2016 @AXA Equitable Center (787 7th Avenue, New York, NY 10019)

20160613_183900 20160613_185245 20160613_191943 20160613_194901

The four speakers were

Adam @NarrativeScience talked about how people with different personalities and jobs may require/prefer different takes on the same data. His firm ingests data and has systems to generate natural language reports customized to the subject area and the reader’s needs.

They current develop stories with the guidance of experts, but eventually will more to machine learning to automate new subject areas.

Next, Neha @Confluent talked about how they created Apache Kafka: a streaming platform which collects data and allows access to these data in real time.

Read more…

posted in:  data, data analysis, Data Driven NYC, Data science, databases, Open source    / leave comments:   No comments yet