New York Tech Journal
Tech news from the Big Apple

Point and click graphics, Warehouse management, Spark

Posted on March 26th, 2016

#CenterNJDataScience

03/26/2016 @South Brunswick Library, 110 Kingston Lane, Monmouth Junction, NJ

20160326_110818[1] 20160326_120457[1] 20160326_124334[1]

In the first presentation, Joel Wattacheril demonstrated how easily one can generate graphics using #Tableau. He used two data sets with country-by-country statistics:  guns per 100 people and homicides per 100 people.  He showed how one can generate bar, scatter and choropleth plots using the point and click interface.

The ensuing discussion noted other plotting tools: #Clickview, #Spotfire, #PowerBI, #MicroStrategies, #MicrosoftSpar. There was also discussion of ways to use R or SQL to compact data prior to delivering it to Tableau.

Next, Christian Brennan @Halls talked about the challenges of optimizing the placement of items in a warehouse. When items arrive in the warehouse they are stored in a grid of shelves. The time to put-away items and the time to pick-up items in the grid can be minimized by strategically placing the items.

Topics in the discussion included methods of statistical analysis and decisions that affect the time to put and pick items.

Chris also noted that Highjump is working on a #HoloLens application (augmented reality) for picking items in storage.

In the last presentation, Phil D’Agostino@Qubble spoke about how one can take advantage of real time analysis using #Spark. Spark uses streaming methods to process blocks of data which are then analyzed using Machine Learning tools. He then described the following use cases

  1. Fraud detection (useful since processing is done in seconds or minutes)
  2. Churn prediction – often done in telco, finance,…
  3. IoT – such as highway congestion data collected from phone GPS
  4. Recommender – lambda architecture, collaborative filters, look for trending data
  5. AdTech data pipeline – especially for started, but incomplete transactions

Phil also talked about how Qubble facilitates the use of Spark and #Hadoop by minimizing costs of storage and processing on #AWS.

posted in:  Big data, CenterNJDataScience, data analysis, Map Reduce    / leave comments:   No comments yet

Accelerating Data Science with Spark and Zeppelin

Posted on February 17th, 2016

#NewYorkApacheStormUserGroup

02/17/2016 @ADP, 135 West 18th Street, NY

20160217_201313[1] 20160217_200127[1] 20160217_193321[1]

Two presenters talked about tools designed to work with Spark and other data analysis tools.

Oleg @Hortonworks spoke about #Apache #NiFi, an interactive GUI to orchestrate the flow of data. The drag and drop interface specifies how cases are streamed across applications with limited capabilities to filter or edit the data streams. The tool handles asynchronous processes and can be used to document how each case passes from process to process.

Next , Vinay Shukla talked about# Zeppelin, a notebook similar to #Jupyter (iPython) for displaying simple graphics (such as bar, line, and pie charts) in a browser window. It supports multiple languages on the back end (such as #Python) and is integrated with #Spark and #Hadoop.

Support for #R and visualizations at the level of #ggplot are scheduled for introduction in the second half of 2016.

Several audience members asked questions about who would use NiFi and Zeppelin since the tools do not have the analytic power of R or other analysis tools, yet their use requires more data sophistication than when using Excel or other business presentation tools.

posted in:  data, Map Reduce, NY Storm Users Group, Programming    / leave comments:   No comments yet