Apache HBase: Difference between revisions

Apache HBase
	File:HBase Logo.png
Developer(s)	Apache Software Foundation
Stable release	1.1.0.1 / 21 May 2015
Repository	gitbox.apache.org/repos/asf/hbase.git ;
Written in	Java
Operating system	Cross-platform
License	Apache License 2.0
Website	hbase.apache.org

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 20:22, 29 September 2015

HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).

HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original BigTable paper.^[1] Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs.

HBase is not a direct replacement for a classic SQL database, however Apache Phoenix project provides a SQL layer for Hbase as well as JDBC driver that can be integrated with various analytics and business intelligence applications.

Hbase is now serving several data-driven websites,^[2]^[3] including Facebook's Messaging Platform.^[4]^[5]

In the parlance of Eric Brewer’s CAP theorem, HBase is a CP type system.

History

Apache HBase began as a project by the company Powerset out of a need to process massive amounts of data for the purposes of natural language search. It is now a top-level Apache project.

Facebook elected to implement its new messaging platform using HBase in November 2010.^[4]

Use cases & production deployments

Due to its characteristics, HBase is used by mostly all the main web companies.

Enterprises that use HBase

The following is a list of notable enterprises that have used or are using HBase:

Amadeus IT Group, as its main long-term storage DB.
Facebook uses HBase for its messaging platform.
Netflix^[6]
Sophos, for some of their back-end systems.
Spotify uses HBase as base for Hadoop and machine learning jobs.^[7]
Tuenti uses HBase for its messaging platform.

References

^ Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data
^ Powered By HBase
^ StumbleUpon HBase Presentation
^ ^a ^b The Underlying Technology of Messages
^ Facebook: Why our 'next-gen' comms ditched MySQL Retrieved: 17 December 2010
^ Cheolsoo Park and Ashwin Shankar. "Netflix: Integrating Spark at Petabyte Scale".
^ Josh Baer. "How Apache Drives Spotify's Music Recommendations".

Bibliography

Dimiduk, Nick; Khurana, Amandeep (28 November 2012). HBase in Action (1st ed.). Manning Publications. p. 350. ISBN 978-1617290527.
George, Lars (20 September 2011). HBase: The Definitive Guide (1st ed.). O'Reilly Media. p. 556. ISBN 978-1449396107.
Jiang, Yifeng (16 August 2012). HBase Administration Cookbook (1st ed.). Packt Publishing. p. 332. ISBN 978-1849517140.

External links

[1] Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data

[2] Powered By HBase

[3] StumbleUpon HBase Presentation

[the-underlying-technology-of-messages-4] The Underlying Technology of Messages

[theregister-5] Facebook: Why our 'next-gen' comms ditched MySQL Retrieved: 17 December 2010

[6] Cheolsoo Park and Ashwin Shankar. "Netflix: Integrating Spark at Petabyte Scale".

[7] Josh Baer. "How Apache Drives Spotify's Music Recommendations".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

@@ Line 39: / Line 39: @@
 * [[Amadeus IT Group]], as its main long-term storage DB.
 * [[Facebook]] uses HBase for its messaging platform.
+* [[Netflix]]<ref>{{cite web|url=http://apachebigdata2015.sched.org/event/2a65daf0baa4cfbc227a8cb74a9103a2?iframe=no&w=i:100;&sidebar=yes&bg=no |title=Netflix: Integrating Spark at Petabyte Scale |author=Cheolsoo Park and Ashwin Shankar}}</ref>
-* [[Spotify]] uses HBase as base for Hadoop and machine learning jobs.
-* [[Sophos]] uses HBase for some of their back-end systems.
+* [[Sophos]], for some of their back-end systems.
+* [[Spotify]] uses HBase as base for Hadoop and machine learning jobs.<ref>{{cite web|url=http://apachebigdata2015.sched.org/event/2a65daf0baa4cfbc227a8cb74a9103a2?iframe=no&w=i:100;&sidebar=yes&bg=no |title=How Apache Drives Spotify's Music Recommendations |author=Josh Baer}}</ref>
 * [[Tuenti]] uses HBase for its messaging platform.

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category