Jump to content

Apache HBase: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Monkbot (talk | contribs)
m Task 18 (cosmetic): eval 18 templates: hyphenate params (6×);
Line 12: Line 12:
|branch1 = 1.4.x
|branch1 = 1.4.x
|version1 = 1.4.13
|version1 = 1.4.13
|date1 = {{Start date and age|df=yes|2020|02|29}}<ref name="releases">{{cite web|url=https://hbase.apache.org/downloads.html|title=Apache HBase – Apache HBase Downloads|accessdate=8 December 2020}}</ref>
|date1 = {{Start date and age|df=yes|2020|02|29}}<ref name="releases">{{cite web|url=https://hbase.apache.org/downloads.html|title=Apache HBase – Apache HBase Downloads|access-date=8 December 2020}}</ref>
|branch2 = 1.6.x
|branch2 = 1.6.x
|version2 = 1.6.0
|version2 = 1.6.0
Line 36: Line 36:
HBase is not a direct replacement for a classic [[SQL]] [[database]], however [[Apache Phoenix]] project provides a SQL layer for HBase as well as [[JDBC]] driver that can be integrated with various [[analytics]] and [[business intelligence]] applications. The [[Apache Trafodion]] project provides a SQL query engine with [[ODBC]] and [[JDBC]] drivers and [[ACID#Distributed transactions|distributed ACID transaction protection]] across multiple statements, tables and rows that use HBase as a storage engine.
HBase is not a direct replacement for a classic [[SQL]] [[database]], however [[Apache Phoenix]] project provides a SQL layer for HBase as well as [[JDBC]] driver that can be integrated with various [[analytics]] and [[business intelligence]] applications. The [[Apache Trafodion]] project provides a SQL query engine with [[ODBC]] and [[JDBC]] drivers and [[ACID#Distributed transactions|distributed ACID transaction protection]] across multiple statements, tables and rows that use HBase as a storage engine.


HBase is now serving several data-driven websites<ref>{{cite web|url=http://hbase.apache.org/poweredbyhbase.html|title=Apache HBase – Powered By Apache HBase™|website=hbase.apache.org|accessdate=8 April 2018}}</ref> but [[Facebook]]'s Messaging Platform recently migrated from HBase to [[MyRocks]].<ref name="the-underlying-technology-of-messages">{{cite web|url=https://code.fb.com/data-infrastructure/migrating-messenger-storage-to-optimize-performance/|title=Migrating Messenger storage to optimize performance|website=www.facebook.com|accessdate=5 July 2018}}</ref><ref name="theregister">[https://www.theregister.co.uk/2010/12/17/facebook_messages_tech/ Facebook: Why our 'next-gen' comms ditched MySQL] Retrieved: 17 December 2010</ref> Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application.
HBase is now serving several data-driven websites<ref>{{cite web|url=http://hbase.apache.org/poweredbyhbase.html|title=Apache HBase – Powered By Apache HBase™|website=hbase.apache.org|access-date=8 April 2018}}</ref> but [[Facebook]]'s Messaging Platform recently migrated from HBase to [[MyRocks]].<ref name="the-underlying-technology-of-messages">{{cite web|url=https://code.fb.com/data-infrastructure/migrating-messenger-storage-to-optimize-performance/|title=Migrating Messenger storage to optimize performance|website=www.facebook.com|access-date=5 July 2018}}</ref><ref name="theregister">[https://www.theregister.co.uk/2010/12/17/facebook_messages_tech/ Facebook: Why our 'next-gen' comms ditched MySQL] Retrieved: 17 December 2010</ref> Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application.


In the parlance of Eric Brewer's [[CAP Theorem]], HBase is a CP type system.
In the parlance of Eric Brewer's [[CAP Theorem]], HBase is a CP type system.
Line 54: Line 54:
* [[23andMe]]
* [[23andMe]]
* [[Adobe Systems|Adobe]]
* [[Adobe Systems|Adobe]]
* [[Airbnb]] uses HBase as part of its AirStream realtime stream computation framework<ref>{{cite web|url=http://www.slideshare.net/HBaseCon/apache-hbase-at-airbnb|title=Apache HBase at Airbnb|last=HBaseCon|date=2 August 2016|website=slideshare.net|accessdate=8 April 2018}}</ref>
* [[Airbnb]] uses HBase as part of its AirStream realtime stream computation framework<ref>{{cite web|url=http://www.slideshare.net/HBaseCon/apache-hbase-at-airbnb|title=Apache HBase at Airbnb|last=HBaseCon|date=2 August 2016|website=slideshare.net|access-date=8 April 2018}}</ref>
* [[Alibaba Group]]
* [[Alibaba Group]]
* [[Amadeus IT Group]], as its main long-term storage DB.
* [[Amadeus IT Group]], as its main long-term storage DB.
Line 62: Line 62:
* [[Flurry (company)|Flurry]]
* [[Flurry (company)|Flurry]]
* [[HubSpot]]
* [[HubSpot]]
* [[Imgur]] uses HBase to power its notifications system<ref>{{cite web|url=https://dzone.com/articles/why-imgur-dropped-mysql-in-favor-of-hbase|title=Why Imgur Dropped MySQL in Favor of HBase - DZone Database|website=dzone.com|accessdate=8 April 2018}}</ref><ref>{{cite web|url=http://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/|title=Tech Tuesday: Imgur Notifications: From MySQL to HBase - The Imgur Blog|website=blog.imgur.com|accessdate=8 April 2018}}</ref>
* [[Imgur]] uses HBase to power its notifications system<ref>{{cite web|url=https://dzone.com/articles/why-imgur-dropped-mysql-in-favor-of-hbase|title=Why Imgur Dropped MySQL in Favor of HBase - DZone Database|website=dzone.com|access-date=8 April 2018}}</ref><ref>{{cite web|url=http://blog.imgur.com/2015/09/15/tech-tuesday-imgur-notifications-from-mysql-to-hbase/|title=Tech Tuesday: Imgur Notifications: From MySQL to HBase - The Imgur Blog|website=blog.imgur.com|access-date=8 April 2018}}</ref>
* [[Kakao]]<ref>{{cite web|url=http://apachebigdata2015.sched.org/event/de6abfbd8f0b9e66b1c03feb2b9e2078?iframe=yes&w=i:100;&sidebar=yes&bg=no |title=S2Graph : A Large-Scale Graph Database with HBase |author=Doyung Yoon}}</ref>
* [[Kakao]]<ref>{{cite web|url=http://apachebigdata2015.sched.org/event/de6abfbd8f0b9e66b1c03feb2b9e2078?iframe=yes&w=i:100;&sidebar=yes&bg=no |title=S2Graph : A Large-Scale Graph Database with HBase |author=Doyung Yoon}}</ref>
*[[Meesho]]
*[[Meesho]]

Revision as of 00:18, 29 December 2020

Apache HBase
Original author(s)Powerset
Developer(s)Apache Software Foundation
Initial release28 March 2008; 16 years ago (2008-03-28)
Stable release
1.4.x1.4.13 / 29 February 2020; 4 years ago (2020-02-29)[1]
1.6.x1.6.0 / 6 March 2020; 4 years ago (2020-03-06)[1]
2.2.x2.2.6 / 4 September 2020; 4 years ago (2020-09-04)[1]
Preview release
2.3.3 / 2 November 2020; 4 years ago (2020-11-02)[1]
RepositoryHBase Repository
Written inJava
Operating systemCross-platform
TypeDistributed database
LicenseApache License 2.0
Websitehbase.apache.org

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).

HBase features compression, in-memory operation, and Bloom filters on a per-column basis as outlined in the original Bigtable paper.[2] Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or Thrift gateway APIs. HBase is a wide-column store and has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for faster read and write operations on large datasets with high throughput and low input/output latency.

HBase is not a direct replacement for a classic SQL database, however Apache Phoenix project provides a SQL layer for HBase as well as JDBC driver that can be integrated with various analytics and business intelligence applications. The Apache Trafodion project provides a SQL query engine with ODBC and JDBC drivers and distributed ACID transaction protection across multiple statements, tables and rows that use HBase as a storage engine.

HBase is now serving several data-driven websites[3] but Facebook's Messaging Platform recently migrated from HBase to MyRocks.[4][5] Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application.

In the parlance of Eric Brewer's CAP Theorem, HBase is a CP type system.

History

Apache HBase began as a project by the company Powerset out of a need to process massive amounts of data for the purposes of natural-language search. Since 2010 it is a top-level Apache project.

Facebook elected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018.[4]

The 2.2.z series is the current stable release line, it supersedes earlier release lines.

Use cases & production deployments

Enterprises that use HBase

The following is a list of notable enterprises that have used or are using HBase:

See also

References

  1. ^ a b c d "Apache HBase – Apache HBase Downloads". Retrieved 8 December 2020.
  2. ^ Chang, et al. (2006). Bigtable: A Distributed Storage System for Structured Data
  3. ^ "Apache HBase – Powered By Apache HBase™". hbase.apache.org. Retrieved 8 April 2018.
  4. ^ a b "Migrating Messenger storage to optimize performance". www.facebook.com. Retrieved 5 July 2018.
  5. ^ Facebook: Why our 'next-gen' comms ditched MySQL Retrieved: 17 December 2010
  6. ^ HBaseCon (2 August 2016). "Apache HBase at Airbnb". slideshare.net. Retrieved 8 April 2018.
  7. ^ "Near Real Time Search Indexing".
  8. ^ "Is data locality always out of the box in Hadoop?".
  9. ^ "Why Imgur Dropped MySQL in Favor of HBase - DZone Database". dzone.com. Retrieved 8 April 2018.
  10. ^ "Tech Tuesday: Imgur Notifications: From MySQL to HBase - The Imgur Blog". blog.imgur.com. Retrieved 8 April 2018.
  11. ^ Doyung Yoon. "S2Graph : A Large-Scale Graph Database with HBase".
  12. ^ Cheolsoo Park and Ashwin Shankar. "Netflix: Integrating Spark at Petabyte Scale".
  13. ^ Engineering, Pinterest (30 March 2018). "Improving HBase backup efficiency at Pinterest". Medium. Retrieved 14 April 2020. {{cite web}}: |first= has generic name (help)
  14. ^ "Hbase at Salesforce.com".
  15. ^ Josh Baer. "How Apache Drives Spotify's Music Recommendations".
  16. ^ "Tuenti Group Chat: Simple, yet complex".
  17. ^ "Tuenti Asyncthrift".

Bibliography