New York Tech Journal
Tech news from the Big Apple

Evolving from #RDBMS to #NoSQL + #SQL

Posted on May 3rd, 2016


05/03/2016 @Thoughtworks, 99 Madison Ave, 15th floor, NY

20160503_190816[1] 20160503_192802[1] 20160503_194637[1]

Jim Scott @MAPR spoke about #ApacheDrill which has a query language that extends ANSI SQL. Drill provides an interface that uses this SQL-extension to access data in underlying db’s that are SQL, noSQL, csv, etc.

The Ojai API has the following advantages

  1. Gson (in #Java) uses two lines of code to serialize #JSON to place into the data. One line to deserialize
  2. Idempotent – so don’t need to worry about replaying actions things twice if there is an issue.
  3. Drill does not requires Java, but not Hadoop so it can run on a desktop
  4. Schema on the fly – will take different data formats and join them together: e.g. csv + JSON
  5. Data is directly access from the underlying databases without needing to first transform them to a metastore
  6. Security – plugs into authentication mechanism of the underlying dbs. Mechanisms can go through multiple chains of ownership. Security can be done on row level and column level.
  7. Commands extend SQL to allow access lists in a JSON structure
    2. SUM
    3. Can create views to output to parquet, csv, json formats
    4. FLATTEN – explode an array in a JSON structure to display as multiple rows with all other fields duplicated

posted in:  data, data analysis, databases, Open source, Programming    / leave comments:   No comments yet