Beyond Big: Merging Streaming & #Database Ops into a Next-Gen #BigData Platform
Posted on April 13th, 2017
04/13/2017 @Thoughtworks, 99 Madison Ave, New York, 15th floor
Amir Halfon, VP of Strategic Solutions, @iguazio talked about methods for speeding up a analytics linked to a large database. He started by saying that a traditional software stack accessing a db was designed to minimize the time taken to access slow disk storage. This is resulted in layers of software. Amir said that with modern data access and db architecture, processing is accelerated by a unified data engine that eliminate many of the layers. This also allows for the creation of a generic access of data stored in many different formats and a record-by-record security protocol.
To simplify development they only use AWS and only interface with Kafka, Hadoop, Spark. They are not virtualization (eventually reaches a speed limit), they do the actual store.
Another important method is to use “Predicate pushdown” =’ select … where … <predicate>’; usually all data are retrieved and then culled; instead if the predicate is pushed down, only the relevant data is retrieved. A.k.a. as an “offload-engine”.
MapR is a competitor using the HDFS database, as opposed to rebuilding the system from scratch.