The Rise of the #DataArtist
Posted on March 9th, 2016
03/09/2016 @ PivotalLabs, 625 6th ave, NY
Olivier Meyer & Ryan Haber@Zoomdata talked about the advantages of interactive #DataAnalysis. They showed how a single picture can show the ruin of an army through cold and casualties. This was done by Charles Minard in his graphic of Napoleon’s 1812 invasion of Russia. There, 6 time series are displayed to great effect.
Next, they talked about the complexity of displaying facts buried in large data sets. This complexity creates a new category: Data artist who sits between the business analyst and the data scientist
They demonstrated how their program facilitates the interactive search for patterns in the data by retrieving only the relevant subset when needed for the graphics display. They call this microservices & data sharpening (initially a rough picture is presented, but results are refined as you watch).
Many interesting points were brought up in the discussion.
- Before diving into the data, one needs hypotheses of what is relevant to decision making
- Care must be taken, since interactive graphics (as in all graphics – see Darrell Huff “How to Lie with Statistics”) can inspire misleading or unfounded conclusions
- The data artist is obligated to present graphics that are truthful
- Generic templates may not be the best data presentation
- One needs to balance the customization of the data presentation with the time & effort expended to create an improved graphic
- Graphically inspired conclusions need to be supported by relevant statistics
- Frequently, statistics (alone) are not the best way to present findings
- The best way to communicate is dependent on the audience.
- The tools for data exploration may or may not be different from those for presenting conclusions.
Massively Collaborative Problem Solving
Posted on November 11th, 2015
11/11/2015 @Pivotal Labs, 625 6th Ave, NY
Matt Weber @zoomdata started by describing how simple rules can create complex, interesting systems
- #Conway’s game of life – simple rule
- The #Delphi Method (Rand Corporation) – collaboration
He next described his use of #Amazon Turk in 2009 to obtain interesting answers to complex problems. His example was a question asking for ways to make the U.S. energy self-sufficient. He used
Simple rules + iterative collaboration = massively #collaborative #ProblemSolving
Answers were selected using three simple tasks
- Create –each worker creates a list of 7 proposals – repeated by 50 workers
- Rate – Each proposal was rated on a 1-10 scale – done by 20 workers
- Atomize – take the 7 proposals with the highest aggregate score. Of the 7 proposals ask which need more details – ask 50 workers
This person proposed 7 as the maximum number of items that can be kept in working memory. Answer: Who is George A. Miller in his paper the Magical number seven plus or minus two?
End of round one.
- Take the top proposals and ask another set of workers to make a plan of action for this proposal
- 20 workers rate the subproposals on a 1-10 scale
- Select the top sub-proposals
Repeat for each of the top tasks
Matt then displayed the answers and commented on how many proposals were reasonable and well-thought-out
He next talked about design considerations when determining what problems could be successfully addressed by this method. The main consideration is to pick a general topic and let the crowd guide the process. The problem should be of general interest and be framed so it is
- Human readable – also can be handled by computers
- Short text – can be written and consumed fast
- Keep it relevant and passionate – people need to be involved
The problem needs to be encapsulated so it is bite-size and does not need a context.
The #UX of Events Data: helping event organizers understand their audience
Posted on October 14th, 2015
10/14/2015 @Pivotal Labs, 625 6th Ave, NY
Chett Rubenstein @InsightXM spoke about InsightXM’s work on understanding attendance and registration of events such as trade shows, conferences, and festivals.
Chett described how insightXM can analyze the data which organizers already collect to help them achieve their goals such as increasing attendance, reaching a target market, etc.
He then talked about how insightXM improved their process to help the clients solve their problems. They proceeded in three iterations:
- First iteration – build a platform to upload data with some basic analytics
- Second iteration – build tools to help clients visualize files with large numbers of fields. Build mouse-overs so you can see the contents of the data fields. One of their interactive graphs shows the cumulative registrations over time, a map of the geographic distribution of registrations, and a slider and filters to slice the data by time and customer characteristics.
- Third iteration – make the data upload and categorization easy. The deliverables are bullet points summarizing any graphics presented to the client. InsightXM does the analysis behind the scenes.
Chett talked about current and future directions of insightXM and marketing in general.
- Increased used of behavioral analytics to better know the customer
- Linguistic analysis of marketing materials
- Real time demographic and behavioral prediction of customer preferences. For example, once a badge is scanned at a booth, you will know the individual’s behavioral preferences.
- Demographic lead scoring within CRM systems
- Referral engines at conferences suggesting sessions to attend based on individual preferences and behavior patterns of other attendees
Beaker Notebook: the #UX of Iterative Data Exploration
Posted on August 12th, 2015
08/12/2015 @ Pivotal Labs, 625 6th Ave, NY
Beaker was developed by Two Sigma, an investment manager, to give their researcher a tool to analyze markets and document their findings. It is now an open source product.
The notebook is divided into sections and sections can be grouped hierarchically into larger sections. Within a section, an analysis can be performed in Python, for instance, and the output is saved to Beaker variables. These variables can be analyzed using R, Python or any of the supported languages. Beaker can also produce interactive graphics using its own native charting package. The notebook with code, data, and graphs can be saved for further analysis.
Jeff and Scott next talked about the design challenges when creating Beaker. These include:
- All languages are fully supported.
- Open source
- Environment independent
To create an expandable library of supported languages they have an intermediate Beaker language with plug-ins to handle each programming language. To insure Beaker can run on different operating systems, on- and off-the cloud, the user interface is text-based with little formatting.
To accommodate the wide range of programming and data analysis experience across users, they developed several interfaces from verbose (shows language employed, etc.) to terse. To help all levels of users, they adapted the web interface to provide key features available on local desktops, but frequently not available in browsers: 1. Menus in the upper margins, 2. Windows that can be repositioned on the desktop, 3. File dialogs.
To give the web app these functions, they used a framework from ‘The Electron’ which is developed in Chromium incorporating the tools from Node.js.
Data and data structures are passed across languages using #JSON. This offers generality, but with some loss of accuracy for floating point numbers. (in the future they plan to pass values using binary files). They are currently working on methods to share notebook sections (and possibly forked versions).
The audience was invited to try out the system at BeakerNotebook.com.
The #UX of #StatisticalSoftware for #MobileDevices
Posted on December 10th, 2014
12/10/2014 @Pivotal, 625 6th Ave, NY
Sungjoon Nam @NumberAnalytics talked about the software he has developed for the analysis of business data without the clutter of standard statistical interfaces.
When he started teaching at Rutgers Business School in Newark, he realized how hard it was for the students to navigate the interface of SPSS and decipher the statistical tables output by the package.
He found similar issues with SAS and R, so he developed a web interface (using R as the statistical engine) that takes in data and produces outputs that directly address the business decision. In addition, the analysis were clearly labeled by business question addressed, so users can go directly to the needed analysis without needing to decide if it was a regression analysis, clustering, or other statistical technique.
One of the techniques used to guide the user to the important factors is the use colors to show when variables are statistically significant. Instead of using tables, they use graphs to show which variables are statistically significant. The software also provides a text description of what is significant.
Sungjoon then talked about the challenges when moving the application to the iPad. One challenge was that local storage may not be sufficient to handle some data sets, so alternatives such as dropbox must be available. Screen space is also limited, so they adopted a rule where user interactions move left to right on the screen and cover only one topic per page.
He closed by listing some lesson learned when presenting the software at a training class in China:
- Google does not work – avoid Google graphs.
- Make sure it runs on Windows XP, IE 6.0 – also Chrome is not available since it’s from Google
- Internet speed varies widely from provider-to-provider – make sure the site works in all environments
- Internet server speeds may vary over time.
- Use a local contact
One interesting design decision was to not include data cleaning facilities in the software. This greatly simplifies the interface and the technical demands on the user. The assumption is that the user will analyze clean data from sources such as Salesforce and Alibaba.