设为首页收藏本站

LUPA开源社区

 找回密码
 注册
文章 帖子 博客
LUPA开源社区 首页 业界资讯 软件追踪 查看内容

Apache Drill 0.4.0发布,大型数据集分析系统

2014-8-14 15:25| 发布者: joejoe0332| 查看: 2086| 评论: 0|原作者: oschina|来自: oschina

摘要:   Apache Drill是为大数据集的互动分析而生,是Google的Dremel的开源版本。它的目标是可以高效地对大数据集进行分析,可以运行在1000台以上的 服务器,在几秒内处理PB级的数据和万亿条的数据记录,目前Drill还在Ap ...

  Apache Drill是为大数据集的互动分析而生,是Google的Dremel的开源版本。它的目标是可以高效地对大数据集进行分析,可以运行在1000台以上的 服务器,在几秒内处理PB级的数据和万亿条的数据记录,目前Drill还在Apache进行孵化。近日,Apache Drill 0.4.0发布,该版本是一个开发者预览版本。该版本是一个具有大量改进和新特征引入:


  • A new way to work with data and metadata including the first query engine to champion advanced Apache Parquet format files to support self-describing data, completely avoiding a central metadata repository.

  • A completely new columnar execution engine that leverages both runtime code compilation and advanced memory management for query execution.

  • Advanced cost-based query optimization that works with or without stats providing complex distributed query planning.

  • Focus on full SQL capability with support for correlated subqueries, complex subexpressions and scalar subqueries.

  • The first query engine to support JSON everything, enabling instant analysis of semi-structured and partially schemed data without setup or extra effort.

  • Full complex data semantics combined with complete SQL data types allow you to use JavaScript notation to access and interact with complex fields and data structures.  This includes support for exact Decimal, Date, Time and Interval types.

  • In-query dynamic schema discovery allows you to redefine blob fields as complex objects, using advanced CONVERT_FROM and CONVERT_TO semantics.

  • Support for more than 150 data formats and thousands of existing function libraries through strong integration with Hive Serdes and UDFs.

  • Additional support for high performance native Drill storage plugins and UDFs.

  • A friendly web interface with query and profiling tools including an advanced query plan visualizer and execution flow visualizations.

  • A complete set of interfaces and APIs including support for JDBC, C++, Java, ODBC*, REST and CLI

  • Advanced dynamic analysis capabilities on top of HBase including dynamic schema discovery, high speed parallel scanning and operator pushdown.

  • Support for in-memory and beyond memory datasets with an multi-staged innovative sort algorithm that produces faster time to first record sorting than traditional query engines.

  • Ability to meet query SLAs and avoid resource starvation with multiple query resource queues.

  • Support for wide rows with thousands of columns within a single query.

  • An advanced modular design with extensibility points at storage, query, planning and operator execution to work for a large set of standalone or embedded setups.

  • Full scaling: run embedded on Linux, Mac or PC for development purposes or scale up to a full cluster on any platform.

  • Support for use of Zookeeper and HBase for Drill configuration and profiling management.

  • The only open source distributed query engine architected to work with all types of big data, not just Hadoop data sources.


酷毙

雷人

鲜花

鸡蛋

漂亮
  • 快毕业了,没工作经验,
    找份工作好难啊?
    赶紧去人才芯片公司磨练吧!!

最新评论

关于LUPA|人才芯片工程|人才招聘|LUPA认证|LUPA教育|LUPA开源社区 ( 浙B2-20090187 浙公网安备 33010602006705号   

返回顶部