Evolving from #RDBMS to #NoSQL + #SQL
Posted on May 3rd, 2016
05/03/2016 @Thoughtworks, 99 Madison Ave, 15th floor, NY
Jim Scott @MAPR spoke about #ApacheDrill which has a query language that extends ANSI SQL. Drill provides an interface that uses this SQL-extension to access data in underlying db’s that are SQL, noSQL, csv, etc.
The Ojai API has the following advantages
- Gson (in #Java) uses two lines of code to serialize #JSON to place into the data. One line to deserialize
- Idempotent – so don’t need to worry about replaying actions things twice if there is an issue.
- Drill does not requires Java, but not Hadoop so it can run on a desktop
- Schema on the fly – will take different data formats and join them together: e.g. csv + JSON
- Data is directly access from the underlying databases without needing to first transform them to a metastore
- Security – plugs into authentication mechanism of the underlying dbs. Mechanisms can go through multiple chains of ownership. Security can be done on row level and column level.
- Commands extend SQL to allow access lists in a JSON structure
- Can create views to output to parquet, csv, json formats
- FLATTEN – explode an array in a JSON structure to display as multiple rows with all other fields duplicated