#Hadoop/ #Java 8 Stream Debugging
Posted on July 27th, 2015
07/27/2015 @ LinkedIn, 350 5th Ave (Empire State Building), NY
Two speakers spoke.
Keith D’Souza @LinkedIn spoke about how LinkedIn integrates map reduce into its work flow. Specifically, he talked about the system and tools they use to generate recommendations on people and job offers that are presented to users. Some of their main off-line tools are
Project Takeout to handle user requests to download a data archive.
Gobblin takes data from multiple sources and sends to the primary db.
Azkaban is a workflow project manager controlling Hadoop with commands to supervise job dependencies and track jobs.
Dr. Elephant – analyzes jobs that have run to assess whether processes can be made more efficient.
In the second talk Tim Fagan spoke about five techniques for debugging java 8 streams. Here an object is followed by a series of operators that manipulate the object. He illustrated these debugging methods on a “one-line program” to eliminate words in a string of text containing the letters x,y,z, and produce an alphabetically sorted list without duplicate words.
Tim presented the following debugging methods:
- Collect – print lists of the outputs on each step within the operation. But this is verbose & one needs to split the code and create intermediate variables.
- Peek & guess – insert peek(out::println) which is an operator that does nothing to affect the stream, but can create outputs to the console. But the output can be confusing due to the order of processing of the steps.
- Peek & collect – use peek(debug1::add) where debug1 is an arraylist. Much easier to read than peek&guess since outputs are grouped by the location of the peek within the stream.
- Peek with labels – create a log method that allows labels to be added to each peek(log(“step 1: “)
- Decorate – add the debug() function to call the initial object before starting the stream of operations. This creates tags which are printed to the console after each operation
Tim recommended the decorate method and acknowledged the flexibility of the peek methods.