Oalva Hadoop Data Migrator©

Patent Pending

The days of the old beastly, expensive, monolithic data warehouse appliances are over. The new, open-source, modern data architecture has arrived. Oalva was one of the first (if not the very first) to adopt Hive’s LLAP and is actively moving customers from Teradata, Netezza, Greenplum, Exadata and PivotalHD to Hive LLAP.

Unparalleled Record-Setting Performance

We recently completed an implementation for a customer where we achieved 62 concurrent queries with a response time of 990ms. That’s right, you read that correctly, that is over 220,000 queries per hour (QPH) with a sub-second average response time. We achieved the highest recorded throughput on record, which is more than double the previously highest recorded by Yahoo! Japan of 100,000 QPH. And this was not a simple workload. For this customer, the vast majority of queries performed a select with 5-10 on the fly sums and averages against a join of three tables – a 10-50 Million record table with a 10-20 thousand record table with a 10-20 thousand record table. In all, the customer had 10,000 tables.

Furthermore, there is no hockey stick as query workload increases, the response time increases linearly.

Number of Concurrent Queries vs. Response Time, high-end (220,000 queries per hour) query load:

Concurrent Queries Response Time
47 836 ms
54 903 ms
60 983 ms

Response Time (milliseconds)

  • Concurrent Queries

Migration Automation and Acceleration

We have a large collection of shell, SQL and python script that creates the Hive Tables based on the Teradata or Netezza schema. The code also performs a one-time historical data load. Finally, we parse the Teradata or Netezza query log to create a list of queries to replay against Hive LLAP and validate performance.

Automated

Schema Creation

On-Demand Historical Data Load

Query Log Replay against Hadoop

Data Comparison (down to row-column data values)

Untitled

All the Benefits of Hadoop

Inherently, while moving to Hadoop, you get to keep your existing data architecture and move to a modern platform that supports in-memory databases, real-time analytics processing and machine learning, at a fraction of the cost.  Our customers have achieved an average savings of 49%.

 

References: https://hortonworks.com/products/data-center/hdp/

What is Hive LLAP?

LLAP is a Query Executor for Apache Hadoop Hive using the Tez execution engine that creates pre-built YARN containers with a query cache to allow for very fast query execution against structured data sets. Introduced by Hortonworks as Tech Preview in HDP 2.5.3 and for general availability in HDP 2.6, it offers unparalleled query performance. Oalva was one of the very first (if not the very first) Hadoop Solutions Integrators (SI) and built our EDW offload accelerator to make it very easy to move from their archaic appliances to.

Hive 2 with LLAP: Architecture Overview

 

References: https://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/

Taking the work out of it for you so you can Run FAST

Note: If you have an expensive appliance other than a Teradata or Netezza, please do call us at 1-888-556-7693 – we would love to add a third source platform to our Hadoop Data Migrator product.