All About Programming: BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB is a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. It allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) An adaptive optimization framework that builds and maintains a set of multi-dimensional samples from original data over time, and (2) A dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy and/or response time requirements. We have evaluated BlinkDB on the well-known TPC-H benchmarks, a real-world analytic workload derived from Conviva Inc. and are in the process of deploying it at Facebook Inc.

Read full article from BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

No comments:

Post a Comment

Labels

Popular Posts