New York Tech Journal
Tech news from the Big Apple

#VideoStreaming, #webpack,#diagrams

Posted on January 18th, 2017

#CodeDrivenNYC

01/17/2017 @FirstMarkCapital, 100 Fifth Ave, NY 3rd floor

Tim Whidden, VP Engineering at 1stdibs: Webpack Before It Was Cool – Lessons Learned

Sarah Groff-Palermo, Designer and Developer: Label Goes Here: A Talk About Diagrams

Dave Yeu, VP Engineering at Livestream: A Primer to Video on the Web: Video Delivery & Its Challenges

Dave Yeu @livestream talked about some of the challenges of streaming large amounts of video and livestreaming: petabytes storage, io, cpu, latency (for live video)

Problems

  1. Long-lived connections – there are several solutions
    1. HLS (Http live streaming) which cuts video into small segments and uses http as the delivery vehicle. Originally developed by Apple as a way to deliver video to iPhone as their coverage moves from cell tower to cell tower. It uses the power of http protocol = a play list & small chunks which are separate url’s: m3u8 files that point to the actual files.
      1. But there are challenges – if you need 3 chunks in your buffer, then you have a 15 second delay. As you decrease the size of each chunk, the play list gets longer so you need to do more requests for the m3u8 file.
    2. DASH – segments follow a template which reduces index requests
    3. RTMP – persistent connections, extremely low latency, used by Facebook
  2. Authorization – but don’t want you to rebroadcast. (no key, so not DRM).
    1. Move authentication to cache level – use Varnish.
    2. Add token to the playlist, Varnish vets the token and serves the content. => all things come through their api.
    3. But – you expand the scope of your app = cache + server.
  3. Geo-restrictions
    1. Could do this: IP address + restrictions. But in this case you need to put geo-block behind the cache and server.
    2. Instead, the api generate s geo-block config. Varnish loads in a memory map and checks
    3. If there is a geo violation, then Varnish returns a modified url, so the server can decide how to respond

++

Tim Whidden @1stdibs, an online market place for curated goods –“ ebay for rich people” spoke about Webpack, a front end module system. He described how modules increase the usability of functions and performs other functions like code compression.

++

Finally, Sarah Groff-Palermo @sarahgp.com spoke about how diagrams help her clarify the code she has written and provide documentation for her and others in the future.

She described a classification of learning types from sequential learner (likes tutorials) to global learners (like to see the big picture first) (see http://www4.ncsu.edu/unity/lockers/users/f/felder/public/ILSdir/styles.htm) . Sarah showed several diagrams and pointed out how they help her get and keep the global picture. She especially likes the paradigm from Ben Schneiderman  – overview, zoom and filter then details-on-demand

For further ideals she recommended

  1. the book Going Forth – lots of diagrams
  2. Now you see it by Stephen Few
  3. Flowing data – blog by Nathan Yau
  4. Keynote is a good tool to use for diagrams

posted in:  applications, Code Driven NYC, video    / leave comments:   No comments yet

Color on mobile phone ads, color preferences revealed, Programming and humor

Posted on October 10th, 2016

#CodeDrivenNYC

10/10/2016 @FirstMarkCapital, 100 5th ave, NY 3rd floor

img_20161010_1822301 img_20161010_1906351 img_20161010_1907381 img_20161010_1859071

Robert Haining @Paypal spoke about API theming of mobile apps: Building software for retailers to see outside their web site. He concentrates on iOS development. Theming involves color – e.g. color of buy button, image, style.

They configured the user’s site using controls in their control panel. For example, they default to text, but companies can upload a logo. The information is stored in css file. They translate the json descriptions to the objective-C SDK.

They use Apple’s NSNotification center to update whenever the page is refreshed. They locally cache themes, but download from the API when possible. For fonts, they only use embedded fonts that come with the phone, in preference to the Apple fonts.

They initially show companies a reduced set of options.

They use Oauth for verification for that particular session.

Next, Arun Ranganathan @Pinterest spoke about their API. Emphasis on finding things you like (as opposed to explicitly searching for something).  Concentrates on platforms for companies.

At Pinterest have their own internal APIs. Also have an ad APIs (white listed to partners).

Finally, they have public development APIs. IFTTT allows an interaction with pins in Pinterest. The APIs are also used by the following (makling use of a hex coding of the overall color of each picture):

  1. Topshop (UK retailer) used the pins to deduce your color preferences to market to you.
  2. Valspar (paint) uses the API to better understand the colors you would like for your house.
  3. Burberry created a custom board with unique pins.
  4. Tok&stok (Brazil furniture) allowed physical buttons to be pushed to remind you of your preferences (via Bluetooth LE) as you walk through a store.

Finally, Ben Halpern @Argo gave a highly entertaining presentation about becoming the practical dev. He applied humor to the dev life on twitter: @ThePracticalDev. He tweets on serious and humorous topics.

posted in:  Code Driven NYC    / leave comments:   No comments yet

Listening to Customers as you develop, assembling a #genome, delivering food boxes

Posted on September 21st, 2016

#CodeDrivenNYC

09/21/2016 @FirstMark, 100 Fifth Ave, NY, 3rd floor

img_20160921_1824581 img_20160921_1850401 img_20160921_1910301 img_20160921_1937151

JJ Fliegelman @WayUp (formerly CampusJob) spoke about the development process used by their application which is the largest market for college students to find jobs. JJ talked about their development steps.

He emphasized the importance of specing out ideas on what they should be building and talking to your users.

They use tools to stay in touch with your customers

  1. HelpScout – see all support tickets. Get the vibe
  2. FullStory – DVR software – plays back video recordings of how users are using the software

They also put ideas in a repository using Trello.

To illustrate their process, he examined how they work to improved job search relevance.

They look at Impact per unit Effort to measure the value. They do this across new features over time. Can prioritize and get multiple estimates. It’s a probabilistic measure.

Assessing impact – are people dropping off? Do people click on it? What are the complaints? They talk to experts using cold emails. They also cultivate a culture of educated guesses

Assess effort – get it wrong often and get better over time

They prioritize impact/effort with the least technical debt

They Spec & Build – (product, architecture, kickoff) to get organized

Use Clubhouse is their project tracker: readable by humans

Architecture spec to solve today’s problem, but look ahead. Eg.. initial architecture – used wordnet, elastic search, but found that elastic search was too slow so they moved to a graph database.

Build – build as little as possible; prototype; adjust your plan

Deploy – they will deploy things that are not worse (e.g. a button that doesn’t work yet)

They do code reviews to avoid deploying bad code

Paul Fisher @Phosphorus (from Recombine – formerly focused on the fertility space: carrier-screening. Now emphasize diagnostic DNA sequencing) talked about the processes they use to analyze DNA sequences. With the rapid development of laboratory technique, it’s a computer science question now. Use Scala, Ruby, Java.

Sequencers produce hundreds of short reads of 50 to 150 base pairs. They use a reference genome to align the reads. Want multiple reads (depth of reads) to create a consensus sequence

To lower cost and speed their analysis, they focus on particular areas to maximize their read depth.

They use a variant viewer to understand variants between the person’s and the reference genome:

  1. SNPs – one base is changed – degree of pathogenicity varies
  2. Indels – insertions & deletions
  3. CNVs – copy variations

They use several different file formats: FASTQ, Bam/Sam, VCF

Current methods have evolved to use Spark, Parquet (columnar storage db), and Adam (use Avro framework for nested collections)

Use Zepplin to share documentation: documentation that you can run.

Finally, Andrew Hogue @BlueApron spoke about the challenges he faces as the CTO. These include

Demand forecasting – use machine learning (random forest) to predict per user what they will order. Holidays are hard to predict. People order less lamb and avoid catfish. There was also a dip in orders and orders with meat during Lent.

Fulfillment – more than just inventory management since recipes change, food safety, weather, …

Subscription mechanics – weekly engagement with users. So opportunities to deepen engagement. Frequent communications can drive engagement or churn. A/B experiments need more time to run

BlueApron runs 3 Fulfillment centers for their weekly food deliveries: NJ, Texas, CA shipping 8mm boxes per month.

posted in:  applications, Big data, Code Driven NYC, data, data analysis, startup    / leave comments:   No comments yet

CodeDrivenNYC: Tools and methods to make development teams more productive

Posted on March 23rd, 2016

#CodeDrivenNYC

03/22/2016 @FirstMarkCapital, 100 Fifth Ave, NY

20160322_183028[1]20160322_185912[1] 20160322_185458[1] 20160322_193207[1]

The speakers spoke

Evan Whalen@BlueApron (recipes and ingredient for those recipes delivered to your door) talked about habits of productive teams. He emphasized three points

  1. Psychological safety
  2. 8 habits of productive teams
  3. Feedback

Psychological Safety. The Google Aristotle Project concluded that successful teams fostered a feeling that team members support each other. They called this psychological safety as members were secure that other team members wouldn’t embarrass them.

8 Habits. Evan takes his inspiration from Stephen R. Covey. He main points were:

  1. Be proactive – clearly define responsibilities to set expectations. Members need to share responsibility
  2. Begin with the end in mind – convey purpose, not urgency. Future-proof APIs and schemas
  3. Put first things first – make engineer happiness a priority – support passion projects
  4. Think win-win – have regular communication with key players to create mutually beneficial solutions
  5. Seek for to understand, then be understood – encourage face-to-face conversation, brown bag tech talks, group code reviews, etc.
  6. Synergize – empower through delegation. Cross-team communication : avoid working in silos, be transparent on work priorities
  7. Sharpen the saw – give immediate feedback – radical candor (see Kim Scott post) – challenge directly but care personally
  8. Find your voice and inspire others to find theirs – balance support with delegation

Feedback. Every 6 months developers are anonymously surveyed (using Glimpse) as to their happiness and empowerment. From this information create a task list of 3 areas for improvement.

In the second presentation, Dustin Lucien @Betterment (financial planning leveraging automation) spoke about their dynamic process to best match the needs of the company with the skills and interests of individuals. They do this with a mix of teams responsible for specific products/functions (home teams) and small, mission-driven teams (pods) working on specific projects.

The flexibility to move developers from home teams to short-term focused pods avoid silos and spreads knowledge and expertise through the company.

To allocate individuals to these special projects, Betterment uses an auction system. Inspired by work done at Pandora, each customer is given a value of $5/month and the total revenue for customers affected by new products/services is auctioned. Team leads submit projects and individuals bid on them. Management determines the makeup of the pods based on the individual interest and level of enthusiasm for each project (as indicated by the bidding). This process of creating new projects and assigning individuals to pods is repeated every 60 days. Management also makes sure that there are still sufficient resources in the home team (20% to 40%) so the home teams can continue their functions.

But as the company has grown, challenges have increased to this model.

  1. They have 3 strong lines of business, so fungability of skills across the organization is more difficult.
  2. The maturing of products demands more stability in the resource allocations.
  3. As a result they will probably move to a 90 day cycle.
  4. They will adjust plans to emphasize ROI on OKR.
  5. Other the other hand, teams are now large enough that they can now adjust their own resources to accommodate new projects.

Dustin noted the role of management. Pods are created with at least one person with leadership aspirations. Also the company is still small enough (currently 150 people) that everyone knows everyone else. New hires are put in “bands” to encourage rapid assimilation into the company. Groups in pods have often worked together before.

He also noted that pods are allowed to deviate from the original plan. But the pods and the teams need to operate under the leadership of an architecture group (which is outside the teams and pods). That group determines the overall system architecture and reviews the development process and outputs.

To close the presentations, James Turnbull@Kickstarter talked about how they went about upgrading the tools used in their development stack.

When Kickstarter was started they chose Rails as their development tool. As usage of the site has grown, they have come to realize that they needed to rearchitect the site. James said the issue was not scaling, it was resilience as breaks in one part of the code often created issues with other parts of the code. They decided to replace the monolithic Rails application. They did this using a process involving the entire development team. All members of the development teams worked on the following steps:

  1. Specify broad conditions that the new system needed to satisfy
  2. Do a broad paper bake off – compare many languages: JRuby, Clojure, Go… – consider the community, prior art, etc.
  3. Create a short list – Ruby, Java, Clojure
  4. Do a real world bake-off: create the code for the comment subsystem to test authentication and monitoring, etc. Ask whether there are their developers familiar with the language? Will it run faster? Does it scale? How convenient is it to use. Is there a body of people who have solved these problems before?
  5. Made a decision. The development team conducted a town hall meeting in which groups who had worked on the bake-off code presented pros and cons to the whole team. They decided to use Java.
  6. The big win was developing a process to make decisions. For future development, individuals or small groups can propose experiments on technology that the group as a whole could use. They can then conduct a smaller version of the above process so the group as a whole can learn from the smaller group’s experiences.

posted in:  Code Driven NYC, Programming    / leave comments:   No comments yet

Code Driven NYC: building internal tools and organizational strength

Posted on February 19th, 2016

#CodeDrivenNYC

02/16/2016 @FirstmarkCapital, 100 5th Ave, NY

20160218_182544[1] 20160218_184408[1] 20160218_190545[1] 20160218_190946[1]

Three speakers talked about internal company tools and organization.

Mahmoud Arram @Bluecore (email server for retailers) talked about creating an Extract, Transform and Load (#ETL) tool to facilitate the export of information to clients. Due to the wide variety of types of requests (encrypted, unzipped, etc.) and the size of the outputs, they elected to build their own ETL tool. Chronometer is a scheduler to run a pipeline made up of steps, each defined by a yaml format description.

Their interface does syntax checks on the yaml descriptions and the system allows them to control which clusters are deployed.

The use Stathat to monitor their systems and PagerDuty to forward alarms.

Next, Kenny Chen @DigitalOcean (cloud hosting) spoke about the organizational challenges Digital Ocean faces as they streamline their infrastructure. One of the issues is the rapid change in tools from five years ago at which time they started using Rails, Perl and MYSql.

A bigger challenge is deciding whether to break apart the monolithic code which has become less appropriate with increased complexity, added features and a larger staff. They are migrating toward #MarkFowler’s idea of Bounded Context: dividing the code into models with explicit interrelationships.

They started with models which monitor the servers. These can fail without affecting customers. This requires a change in administrative structure to smaller programming teams which concentrate on individuals models. Current teams are from 4 to 6 programmers.

In the third presentation, Gil Shklarski @FlatironHealth spoke about the importance of career ladders to both the company and the individual. Some of the ideas were inspired by #KevinScott who wrote “how I structured engineering teams at LinkedIn and AdMob for success”.

A career ladder starts with asking how we

  1. Make things
  2. Operate things
  3. Function as a team

By understanding this we can create a list of needs that are important to both the individual and the company. These include the Ladder competencies which are split into 3 swimlanes

  1. Technical skill
  2. Get shit done
  3. Contribution to culture

One’s ability to grown along these swimlanes, gives direction, clarity and accountability to guide one to become more productive. It also promotes consistency, fairness and clarity needed for compensation and titles.

Gil noted that an engineering ladder is a living document, so it needs to change as the company changes. He said that there is a right time to create the document. Create it too early and you lock yourself into a specific model before the business has established itself. Create it too late and ferment confusion and conflict between new and established ways to promote and compensate.

posted in:  Code Driven NYC, Programming    / leave comments:   No comments yet

CodeDrivenNYC: #Web #Annotation, #NeuralNets #DeepLearning, #WebGL #Anatomy

Posted on December 17th, 2015

#CodeDrivenNYC

12/16/2015 @FirstMarkCap, 100 5th Ave, NY

Three speakers talked about challenging programming problems and how they solved them

20151216_181952[1] BioDigital20151216_185209[1] 20151216_190526[1]

Matt Brown @Genius talked about how they implemented their product which allows users to annotate text on web pages. The challenge is locating the text that was annotated on a web page and the web page may be modified after the annotation was added. In this case, the text fragment and the location of the fragment may have changed, but the annotation should still point to the same part of the text. This means that the location of the text in the dom may have changed and the fragment itself may have been modified.

To restore the annotation they use fuzzy matching in the following steps

  1. Identify regions that may hold the text
  2. Conduct a fuzzy search to find possible starting and ending points for the matching text
  3. Highlight the text that is the closest match from the candidates in the fuzzy search

The user highlights text in the original web page and the program stores the highlighted fragment along with text showing the context both before and after the fragment.

When the user loads the web page, the following steps are performed to locate the fragment

  1. Use jQuery body.text to extract all text from the web site
  2. Build a list of infrequently used words and locate these words in the web site text
  3. Use the JavaScript implementation of Google’s diff-match-patch library to find the fragment in the text (The library uses the Bitap algorithm to find the best fuzzy match). The algorithm finds starting locations for text matches to the fragment. If the fragment is longer than 64 characters, only the first 64 characters are used. Searches are conducted using the before-context with the fragment to determine the general location in the text and using only the fragment to determine the possible starting points of the fragment in the text.
  4. Reverse the order of characters in both the fragment and the text. Repeat the previous step to determine possible ending points of the fragment in the text.
  5. Extract candidate locations for the fragment and pick the location which has the minimum Levenshtein distance (fewest character substitutions/inserts/removals).
  6. Highlight the text in that location. Repeat this process for each stored fragment.

Next, Peter Brodsky @HyperScience spoke about how his company is making the training of neural nets more efficient. HyperScience trains neural nets (containing up to 6 layers) on a variety of tasks (e.g. looking for abnormal employee behavior, reassembling shredded documents, eliminating porn from web sites).

The problems they want to overcome are

  1. Local minimum solutions are obtained instead of a global minimum
  2. Expensive to train
  3. Poor reuse

To overcome these problems they do the following. Once the nets are trained, they examine the nets and extract subnets that have similar patterns of weights. They test whether these subnets are performing common functions by swapping subnets across neural networks. If the performance does not change then they assume that the subnets are performing a common task. Over time they create libraries of subnets.

They can then describe the internal structure of the net in terms of the functions of subnets instead of in terms of nodes. This improves their ability to understand the processing within the net.

This has several advantages.

  1. They can create larger and more complex networks
  2. They can start with a weight vector and guide the net away from local minima and toward the global minimum.
  3. Their networks should learn faster since the standard building blocks are already in place and do not need to be reinvented.

In the third presentation, Tarek Sherif @BioDigital talked about how BioDigital is implementing anatomical content for the web. The challenge is to create 3d, interactive pictures showing human bodies in motion or in sections, layers, etc.

BioDigital uses webGL to render their content in HTML/CSS/JS on all browsers and mobile devices. Due to the computational load, optimization of memory management and JavaScript code is important.

The content can be static, animated or a series of animations. The challenge is to keep the size down for quick downloads, but have the user experience the beauty of the images.

Displaying anatomical content is challenging since it can be

  1. Deeply nested – e.g. brain inside skull
  2. Hierarchical – is the click on the hand or the arm?
  3. Scale – from cells to the whole body

User interactions can include –highlighting, dissection, isolation, transparency, annotation, rotation,…

Mobile is even more challenging

  1. Limited memory and GPU
  2. Variety of devices
    1. GL variable limits
    2. Shader precision
    3. Available extensions

To allow their images to be plugged into web sites, they create an API

  1. Create an iframe to embed into a page
  2. Allows basic interactions
  3. The underlying JavaScript can be customized

API challenges

  1. 3d terminology and concepts
  2. 3d navigation
  3. Anatomical concepts
  4. Architecture of the Human

Examples can be seen at https://developer.biodigital.com

The artists primarily use Maya and Zbrush as their creative tools.

Models can be customized for specific patients.

posted in:  Animation, applications, Code Driven NYC, Programming    / leave comments:   No comments yet

CodeDrivenNYC: caching web pages, #NLP, bringing #coding to the masses

Posted on November 20th, 2015

#CodeDrivenNYC

11/19/2015 @FirstMark, 100 Fifth Ave, NY

20151119_182753[1] 20151119_184814[1] 20151119_185517[1] 20151119_190304[1] 20151119_191249[1]

The first of the three presenters, David Mauro @Buzzfeed spoke about creating Mattress, their first open source IoS framework. Mattress caches web pages for later, off-line consumption. It also makes it appear that the page is loading quicker when online.

David spoke about the hurdles implementing this product

  1. How do we download an entire web page?
  2. How do we provide the content back to user

Their first decision was to download the URL using UIWebView and then capture all requests as they come through using  NSURLProtocol. UIWebView runs on main thread and is resource intensive, but the alternative to manually parse the HTML and the JS. They download the URL using UIWebView and then capture all requests as they come through using  NSURLProtocol. But WKWebView does not handle NSUIRLProtocol and there is a bug so you cannot just save another NSURLCache. They use commonCrypto to retain the URL, with the name hashed so even the longest name is uniquely identified.

They also need to know when a page if done downloaded.  Automated solutions have tendencies to either terminate prematurely or not terminate at all. Instead, they ask the user when the download is done.

How to provide the content back to NSURLProtocol? First ask the user if they are offline. If so, they retrieve the page from the custom offline cache. If they are online, the system reloads the initial request.

The system was designed as a simple API that can be run either in foreground or in background fetch. The background fetch needs to be monitored carefully so it does not use too much of the battery or slow the processing excessively.

The second speaker, Rob Spectre @Twilio demonstrated how easily applications can be made interactive using the Natural Language Processing tool, Textblob running in python.

Rob showed how to create an app that receives SMS text messages and changes its response based on your message. In just a few lines of code, Rob showed how the response can be differentiated based on the length of the message, it’s sentiment, it’s sentence structure, etc.

Ryan Bubinski @Codecademy asked the question “What is code?”

As an overview of the many ways to answer that question he recommended the 38,000 word article written by Paul Ford in Bloomberg June 2015

He summarized his view by saying that code is a lever that is becoming more powerful every day. As an example, he mentioned OpenFace, an open source program which uses a neural net for face recognition.

Making this lever available to more people requires either

  1. Making coding easier or
  2. Making it easier to learn how to code

 

posted in:  Code Driven NYC, iOS, Natural User Interface, Open source, Programming, UI    / leave comments:   No comments yet

CodeDrivenNYC: bridging the #Culture gap between #software and #hardware, modern #SQL, #managing engineers

Posted on October 28th, 2015

#CodeDrivenNYC

10/28/2015 @FirstMark, 100 5th Ave, NY

20151028_182543[1] 20151028_184649[1] 20151028_192021[1]

Three speakers spoke

Colin Vernon @littleBits spoke about bridging the cultural gap between software and hardware engineers. This is especially important at a company that creates modules for users to assemble into standalone prototype devices (Internet of Things). From his position as Director of Platform, Colin talked about the wide diversity of skills needed to manage the hardware/software stack.

This diversity also leads to differences:

  1. Two cultures: software – agile, hardware = not agile
  2. Two paradigms: software – abstractions; hardware – simplest method is best.
  3. Communication styles: software – chatty; hardware – brevity and clarity

He recommends letting the cultures be different and to concentrate on touch points

  1. Identify what you have in common and strengthen it.
  2. Compromise, don’t pick anyone’s last choice. – e.g. currently doing things in Go
  3. Don’t meet in the middle (no overlap). Have both sides stretch to create an overlap.

Next, Spencer Kimball @Cockroach Labs compared their data base to other popular data bases. Cockroach aims to combine the best characteristics of SQL databases with the advantages of replication across many nodes such as survivability, consistency, and ease of deployment.

They like SQL since it is widely adopted, schemas are useful to clarify your thinking, and can also use the relational structure for complex data analytics. They have also extended SQL by allowing their databases to scale out across servers, create hierarchical tables & other modern features; don’t lock database when schema changes.

Spencer next talked about the key foundational ideas they have incorporated

  1. Started by considering Transactional Key Value store as a foundational building block.
  2. Provide fully distributed transactions.
    1. serializable by default.
    2. No strict locking – makes things faster, but increases chance of defaulting on shared resources.
  3. Others use consistent hashing scheme to locate where data is stored, but this slows sorting which makes it problematic for relationship databases.
  4. Use a bi-level index to get the best of range-segmented key space yet allow the db to expand
  5. Raft is a consensus algorithm that is simpler to understand than Paxos. It can replicate data which makes it robust and gives consistent answers. It is designed for strong consistency (as opposed to eventual consistency).

Finally, Duncan Glazier @Shopkeep talked about his methods to improve organizational efficiency and produce happy engineers. His main point was that the goals of engineers and managers should be aligned.

Everyone in an organization should have goals including challenges & a metric of success. By making these goals visible to all others in the company, everyone can see how their goals match those of management and others in the firm. He also feels that it is important to get feedback from managers and peers.

posted in:  Code Driven NYC, databases, Programming    / leave comments:   No comments yet

CodeDrivenNYC: #management of software teams, #JSON schema

Posted on September 17th, 2015

CodeDriven NYC

09/16/2015 @FirstMark,100 5th Ave, NY

Two speakers talked about different aspects of software development.

20150916_184328[1] 20150916_184553[1] 20150916_185823[1]

In the first talk, Seth Purcell @Signpost spoke about various aspects of management in a software company. His emphasis was on the idea that the goals of managing a team are no different from your individual pursuit of quality outputs: efficient allocation of resources (people vs your time) and developing skills (mentoring vs self-education).

Seth talked about ways a manager can fail: common failure modes

  1. Doing your old job – staying in the code
  2. Being a reluctant manager
  3. Not hiring the best
  4. Engineers gone wild
  5. “we’re a very flat organization” == we don’t know what management is

He closed by talking about steps one can undertake toward becoming a manager – it is hard. less fun than engineering. Take ownership of a project. Then take ownership of a complex project which requires others.

Seth recommended several books including:

High Output Management by Andrew Grove

The Effective Executive by Peter Drucker

20150916_192507[1]

In the second presentation, Michael Boufford @Greenhouse proposed adoption of universal schema to streamline the interpreting JSON outputs from RESTful APIs. The schema would be standard templates for data, such as names with web sites publishing the schema and code to automate verification that the JSON followed these templates. He also proposed that the schema definitions would be nested.

In the discussion, an audience member suggested that APIBlueprint, which generates a JSON schema from documentation, is already a step in this direction.

posted in:  Code Driven NYC, Programming    / leave comments:   No comments yet

Code Driven NYC: P2P #Security, #DistributedSystems, #AI system

Posted on June 22nd, 2015

CodeDrivenNYC

06/22/2015 @Bluecore, 124 Rivington St, NY

Three speakers presented at the initial meeting of @CodeDrivenNYC:

Max Krohn, founder of Keybase (previously founder and CTO ofOkCupid and SparkNotes): “The Quest for User-Friendly Crypto”

James Socol, Platform at Groundwork (and formerly of Bit.ly / Mozilla): “Lessons Learned Building Distributed Systems”

Alex Poon, Co-Founder and COO of x.ai (AI-powered personal assistant): “Building a startup with Scala and JavaScript”

20150622_182504[1] 20150622_185819[1] 20150622_190337[1]

Each talked about topics that are uniquely of interest to developers. These include design of systems and development decisions such as choice of language and database for a particular application.

Max @Keybase talked about steps needed toward greater security on person-to-person communication over the internet. He first divided internet communications into two types:

  1. You communicate to a central location
  2. Person to person communications

Banks and other financial institutions have developed secure communication methods to and from a central location. However, person-to-person communications still do not have that level of security. Max then spoke about the p2p challenges we face

  1. Public identity – how do we know that “Bob” is “Bob” before you get their public key -> maybe virally recruiting users solves this problem
  2. Secret key manager – “Bob” cannot control his keys – it’s hard to move it to all devices. Also you can lose the code or lose a device on which you key is stored.
  3. Simplified polished app – the general user population will not use anything other than an easy-to-use utility.

James Socol, formerly @Mozila and @Bitly, spoke about the lessons he has learned building distributed systems at Bitly. The system challenges at Bitly (the business model is covered in notes on Bitly) including handling 230B click every day. This can only be done on a distributed system. Characteristics of this system include:

  1. Concurrency of components – but can be hard to coordinate
  2. lack of a global clock – server clocks can drift and become out of sync
  3. independent failure of components – do you make sure things are done even if as service is temporary unavailable

To master such as large system each task must be small and focused, preferring a multitude of cheap, generic components over a small set of expensive parts. Methods to handle this system include

  1. Async is better than synchronous, but it’s important to know your requirements to determine which components can be run asynchronously. For instance, critical path items need to be synchronous (e.g. creation of the shortlink).
  2. Events are better than commands – events flow one way, commands need a response. Its better to concentrate on what happened than who asked for it and how to respond to the request.
  3. Annotations are better than filters – keep everything since it might be used later e.g. geocode everything and filter downstream.
  4. Dealing with failure – deal with backpressure with retry mechanism. Need monitoring to know where there is an issue.

The last speaker, Alex Poon @x.ai (use AI to create an agent to negotiate and schedule meetings – the business model is covered in my notes on x.ai) talked about the language and architecture decision when creating the system.

They use Scala and JS: Scala for intense processing. JS for UI and to leverage AngularJS.

They use Mongo as the DB as it is schema-less (and good for development as they are not prematurely locked into a structure as they develop the product). They use the Mongoose library so the DB is accessible from both Scala and JS.

They use a queue-based architecture : web apps -> AWS -> AI allocators -> classifiers. They can accept some delays in the queue since they can accept deviations from a strictly real-time system.

They use RESTful APIs using JSON to communicate across processes.

Alex noted that over time certain features have migrated from JS to Scala to take advantage of Scala’s AI engine since some of the business logic has become more complex over time. One example is a “surrender meeting” : if your friend does not respond to requests there is some point at what time do you give up trying to schedule the meeting.

posted in:  Code Driven NYC, Programming    / leave comments:   No comments yet