博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
druid相关的时间序列数据库——也用到了倒排相关的优化技术
阅读量:5953 次
发布时间:2019-06-19

本文共 2685 字,大约阅读时间需要 8 分钟。

Cattell [6] maintains a great summary about existing Scalable SQL and NoSQL data stores. Hu [18] contributed another great summary for streaming databases. Druid feature-wise sits some-
where between Google’s Dremel [28] and PowerDrill [17]. Druid has most of the features implemented in Dremel (Dremel handles arbitrary nested data structures 
while Druid only allows for a single
level of array-based nesting) and many of the interesting compression algorithms mentioned in PowerDrill. Although Druid builds on many of the same principles as other distributed columnar data stores [15], 
many of these data stores are  
designed to be more generic key-value stores [23] and do not sup
 
port computation directly in the storage layer. There are also other 
 
data stores designed for some of the same data warehousing issues 
 
that Druid is meant to solve. These systems include in-memory 
databases such as SAP’s HANA [14] and VoltDB [43]. These data 
 
stores lack Druid’slowlatency ingestion characteristics. Druidalso 
 
has native analytical features baked in, similar to ParAccel [34], 
 
however, Druid allows system wide rolling software updates with 
 
no downtime. 
 
Druid is similiar to C-Store [38] and LazyBase [8] in that it has 
twosubsystems,aread-optimizedsubsysteminthehistoricalnodes 
 
andawrite-optimizedsubsysteminreal-timenodes. Real-timenodes 
 
are designed to ingest a high volume of append heavy data, and do 
 
not support data updates. Unlike the two aforementioned systems, 
 
Druid is meant for OLAP transactions and not OLTP transactions. 
 
Druid’s low latency data ingestion features share some similar-
 
ities with Trident/Storm [27] and Spark Streaming [45], however,
 
both systems are focused on stream processing whereas Druid is 
 
focused on ingestion and aggregation. 
Stream processors are great 
 
complements to Druid as a means of pre-processing the data before 
 
the data enters Druid. 
 
There are a class of systems that specialize in queries on top of
cluster computing frameworks. Shark [13] is such a system for  
queriesontopofSpark,andCloudera’sImpala[9]isanothersystem 
 
focused on optimizing query performance on top of HDFS. Druid
historical nodes download data locally and only work with native  
Druid indexes. We believe this setup allows for faster query laten
 
cies. 
 
Druid leverages a unique combination of algorithms in its archi-
tecture. Although we believe no other data store has the same set  
of functionality as Druid, some of Druid’s optimization techniques 
 
suchas using inverted indices to perform fast filter sarealsousedin
other data stores [26].
 
druid白皮书:http://static.druid.io/docs/druid.pdf
本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6433333.html
,如需转载请自行联系原作者
你可能感兴趣的文章
CollectionView水平和竖直瀑布流的实现
查看>>
前端知识复习一(css)
查看>>
spark集群启动步骤及web ui查看
查看>>
Maven学习笔记二:常用命令
查看>>
利用WCF改进文件流传输的三种方式
查看>>
程序员的素养
查看>>
Spring学习总结(2)——Spring的常用注解
查看>>
关于IT行业人员吃的都是青春饭?[透彻讲解]
查看>>
钱到用时方恨少(随记)
查看>>
mybatis主键返回的实现
查看>>
org.openqa.selenium.StaleElementReferenceException
查看>>
Android Intent传递对象为什么要序列化?
查看>>
数论之 莫比乌斯函数
查看>>
linux下查找某个文件位置的方法
查看>>
python之MySQL学习——数据操作
查看>>
Harmonic Number (II)
查看>>
长连接、短连接、长轮询和WebSocket
查看>>
day30 模拟ssh远程执行命令
查看>>
做错的题目——给Array附加属性
查看>>
Url.Action取消字符转义
查看>>