Jump to content

Sqoop: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Added Informatica ETL tool which supports Sqoop connector
 
(29 intermediate revisions by 22 users not shown)
Line 1: Line 1:
{{Infobox software
{{Infobox software
| name = Apache Sqoop
| name = Apache Sqoop
| logo =
| logo = Apache Sqoop logo.svg
| released = {{Start date and age|2009|06|01|df=yes}} <!-- https://blog.cloudera.com/blog/2009/06/introducing-sqoop/ -->
| screenshot =
| screenshot =
| caption =
| caption =
| developer = [[Apache Software Foundation]]
| developer = [[Apache Software Foundation]]
| status = Active
| discontinued = yes
| latest release version = 1.4.6
| latest release date = {{release date|2015|05|11}}
| latest release version = 1.4.7
| latest release date = {{Start date and age|2017|12|06}}
| latest preview version =
| latest preview version =
| latest preview date =
| latest preview date =
| operating system = [[Cross-platform]]
| operating system = [[Cross-platform]]
| repo = {{URL|https://gitbox.apache.org/repos/asf?p{{=}}sqoop.git|Sqoop Repository}}
| programming language = [[Java (programming language)|Java]]
| programming language = [[Java (programming language)|Java]]
| genre = [[Data management]]
| genre = [[Data management]]
| license = [[Apache License]] 2.0
| license = [[Apache License 2.0]]
| website = {{URL|https://sqoop.apache.org}}
| website = {{URL|https://sqoop.apache.org}}
}}
}}
'''Sqoop''' is a [[command-line interface]] application for transferring data between [[relational database]]s and [[Hadoop]].<ref name="mainpage">{{cite web |url=https://sqoop.apache.org|title=Hadoop: Apache Sqoop|access-date=Sep 8, 2012}}</ref>


The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic.<ref>{{Cite web|title=moving Sqoop to the Attic|url=http://mail-archives.apache.org/mod_mbox/sqoop-user/202106.mbox/browser|access-date=2021-06-27|website=mail-archives.apache.org}}</ref>
'''Sqoop''' is a [[command-line interface]] application for transferring data between [[relational database]]s and [[Hadoop]].<ref name="mainpage">{{cite web |url=https://sqoop.apache.org|title=Hadoop: Apache Sqoop|accessdate=Sep 8, 2012}}</ref> It supports incremental loads of a single table or a free form [[SQL query]] as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in [[Apache Hive|Hive]] or [[HBase]].<ref>{{cite web |url=https://blogs.apache.org/sqoop/entry/apache_sqoop_overview|title=Apache Sqoop - Overview|accessdate=Sep 8, 2012}}</ref> Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from sql+hadoop.
Sqoop became a top-level [[Apache Software Foundation|Apache]] project in March 2012.<ref>{{cite web |url=https://blogs.apache.org/sqoop/entry/apache_sqoop_graduates_from_incubator|title=Apache Sqoop Graduates from Incubator|accessdate=Sep 8, 2012}}</ref>


==Description==
Informatica Big Data Edition provides Sqoop based connector from version 10.1. Informatica supports Import and Export for most of the relational data bases. Which is helpfull to to the ETL Use cases.
Sqoop supports incremental loads of a single table or a free form [[SQL query]] as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in [[Apache Hive|Hive]] or [[HBase]].<ref>{{cite web |url=https://blogs.apache.org/sqoop/entry/apache_sqoop_overview|title=Apache Sqoop - Overview|access-date=Sep 8, 2012}}</ref> Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from "SQL-to-Hadoop".<ref>{{cite web |url=https://blog.cloudera.com/blog/2009/06/introducing-sqoop/|title=Introducing Sqoop|access-date=Jan 1, 2019}}</ref>
Sqoop became a top-level [[Apache Software Foundation|Apache]] project in March 2012.<ref>{{cite web |url=https://blogs.apache.org/sqoop/entry/apache_sqoop_graduates_from_incubator|title=Apache Sqoop Graduates from Incubator|access-date=Sep 8, 2012}}</ref>


[[Informatica]] provides a Sqoop-based [[Connector (computer science)|connector]] from version 10.1.
[[Pentaho]] provides [[open source]] Sqoop based connector steps, ''Sqoop Import''<ref name="2015-12-10_PSI" /> and ''Sqoop Export'',<ref name="2015-12-10_PSE"/> in their [[Extract, transform, load|ETL]] suite [[Pentaho Data Integration]] since version 4.5 of the software.<ref name="2012-07-27_dbta" /> [[Microsoft]] uses a Sqoop-based connector to help transfer data from [[Microsoft SQL Server]] databases to Hadoop.<ref>{{cite web |url=https://www.microsoft.com/en-us/download/details.aspx?id=27584|title=Microsoft SQL Server Connector for Apache Hadoop|accessdate=Sep 8, 2012}}</ref>
[[Pentaho]] provides [[open-source software|open-source]] Sqoop based connector steps, ''Sqoop Import''<ref name="2015-12-10_PSI" /> and ''Sqoop Export'',<ref name="2015-12-10_PSE"/> in their [[Extract, transform, load|ETL]] suite [[Pentaho Data Integration]] since version 4.5 of the software.<ref name="2012-07-27_dbta" /> [[Microsoft]] uses a Sqoop-based connector to help transfer data from [[Microsoft SQL Server]] databases to Hadoop.<ref>{{cite web |url=https://www.microsoft.com/en-us/download/details.aspx?id=27584|title=Microsoft SQL Server Connector for Apache Hadoop|website=[[Microsoft]] |access-date=Sep 8, 2012}}</ref>
[[Couchbase, Inc.]] also provides a [[Couchbase Server]]-Hadoop connector by means of Sqoop.<ref>{{cite web |url=http://www.couchbase.com/develop/connectors/hadoop|title=Couchbase Hadoop Connector|accessdate=Sep 8, 2012}}</ref>
[[Couchbase, Inc.]] also provides a [[Couchbase Server]]-Hadoop connector by means of Sqoop.<ref>{{cite web|url=http://www.couchbase.com/develop/connectors/hadoop|title=Couchbase Hadoop Connector|access-date=Sep 8, 2012|url-status=dead|archive-url=https://web.archive.org/web/20120825184036/http://www.couchbase.com/develop/connectors/hadoop|archive-date=2012-08-25}}</ref>

In 2015 [[Ralph Kimball]] described '''Sqoop''' as follow under the heading ''The Future of [[Extract, transform, load|ETL]]'':<ref name="Kimball_2015-12-01_KG">{{cite web | url = http://www.kimballgroup.com/2015/12/design-tip-180-the-future-is-bright/ | title = Design Tip #180 The Future Is Bright | last = Kimball | first = Ralph | authorlink = Ralph Kimball | publisher = [[Kimball Group]] | date = 2015-12-01 | accessdate = 2015-12-03 | archiveurl = https://web.archive.org/web/20151203201607/http://www.kimballgroup.com/2015/12/design-tip-180-the-future-is-bright/ | archivedate = 2015-12-03 | quote = Several big changes must take place in the ETL environment. First, the data feeds from original sources must support huge bandwidths, at least gigabytes per second. Learn about Sqoop loading data into Hadoop. If these words mean nothing to you, you have some reading to do! Start with Wikipedia. }}</ref>
{{quote|Several big changes must take place in the ETL environment. First, the data feeds from original sources must support huge bandwidths, at least gigabytes per second. Learn about Sqoop loading data into Hadoop. If these words mean nothing to you, you have some reading to do! Start with Wikipedia.}}


==See also==
==See also==
*[[Apache Hive|Hive]]
*[[Apache Hadoop]]
*[[Apache Accumulo|Accumulo]]
*[[Apache Hive]]
*[[HBase]]
*[[Apache Accumulo]]
*[[Apache HBase]]


==References==
==References==
Line 41: Line 44:
| publisher = [[Database Trends and Applications]] (dbta.com)
| publisher = [[Database Trends and Applications]] (dbta.com)
| date = 2012-07-27
| date = 2012-07-27
| accessdate = 2015-12-08
| access-date = 2015-12-08
| archiveurl = https://web.archive.org/web/20151208144234/http://www.dbta.com/Editorial/News-Flashes/Big-Data-Analytics-Vendor-Pentaho-Announces-Tighter-Integration-with-Cloudera-Extends-Visual-Interface-to-Include-Hadoop-Sqoop-and-Oozie-84025.aspx
| archive-url = https://web.archive.org/web/20151208144234/http://www.dbta.com/Editorial/News-Flashes/Big-Data-Analytics-Vendor-Pentaho-Announces-Tighter-Integration-with-Cloudera-Extends-Visual-Interface-to-Include-Hadoop-Sqoop-and-Oozie-84025.aspx
| archivedate = 2015-12-08
| archive-date = 2015-12-08
| quote = Pentaho’s Business Analytics 4.5 is now certified on Cloudera’s latest releases, Cloudera Enterprise 4.0 and CDH4. Pentaho also announced that its visual design studio capabilities have been extended to the Sqoop and Oozie components of Hadoop.
| quote = Pentaho’s Business Analytics 4.5 is now certified on Cloudera’s latest releases, Cloudera Enterprise 4.0 and CDH4. Pentaho also announced that its visual design studio capabilities have been extended to the Sqoop and Oozie components of Hadoop.
}}</ref>
}}</ref>
Line 52: Line 55:
| publisher = [[Pentaho]]
| publisher = [[Pentaho]]
| date = 2015-12-10
| date = 2015-12-10
| accessdate = 2015-12-10
| access-date = 2015-12-10
| archiveurl = https://web.archive.org/web/20151210171525/http://wiki.pentaho.com/display/EAI/Sqoop+Export
| archive-url = https://web.archive.org/web/20151210171525/http://wiki.pentaho.com/display/EAI/Sqoop+Export
| archivedate = 2015-12-10
| archive-date = 2015-12-10
| quote = The Sqoop Export job allows you to export data from Hadoop into an RDBMS using Apache Sqoop.
| quote = The Sqoop Export job allows you to export data from Hadoop into an RDBMS using Apache Sqoop.
}}</ref>
}}</ref>
Line 63: Line 66:
| publisher = [[Pentaho]]
| publisher = [[Pentaho]]
| date = 2015-12-10
| date = 2015-12-10
| accessdate = 2015-12-10
| access-date = 2015-12-10
| archiveurl = https://web.archive.org/web/20151210170913/http://wiki.pentaho.com/display/EAI/Sqoop+Import
| archive-url = https://web.archive.org/web/20151210170913/http://wiki.pentaho.com/display/EAI/Sqoop+Import
| archivedate = 2015-12-10
| archive-date = 2015-12-10
| quote = The Sqoop Import job allows you to import data from a relational database into the Hadoop Distributed File System (HDFS) using Apache Sqoop.
| quote = The Sqoop Import job allows you to import data from a relational database into the Hadoop Distributed File System (HDFS) using Apache Sqoop.
}}</ref>
}}</ref>
Line 73: Line 76:
==Bibliography==
==Bibliography==
{{Refbegin}}
{{Refbegin}}
*{{Cite book
*{{Cite book |first1 = Tom
| first1 = Tom
|last1 = White
|title = Hadoop: The Definitive Guide
| last1 = White
|edition = 2nd
| title = Hadoop: The Definitive Guide
|chapter = Chapter 15: Sqoop
| edition = 2nd
|year = 2010
| chapter = Chapter 15: Sqoop
| publisher = [[O'Reilly Media]]
|publisher = [[O'Reilly Media]]
| pages = 477–495
|pages = [https://archive.org/details/hadoopdefinitive0000whit/page/477 477–495]
| isbn = 978-1-449-38973-4
|isbn = 978-1-449-38973-4
|chapter-url = https://archive.org/details/hadoopdefinitive0000whit/page/477
| url = http://oreilly.com/catalog/9780596521974
}}
}}
{{Refend}}
{{Refend}}
Line 89: Line 92:
*{{Official website|https://sqoop.apache.org}}
*{{Official website|https://sqoop.apache.org}}
*[https://cwiki.apache.org/confluence/display/SQOOP/Home Sqoop Wiki]
*[https://cwiki.apache.org/confluence/display/SQOOP/Home Sqoop Wiki]
*[http://qnalist.com/q/sqoop-user Sqoop Users Mailing List Archives]
*[https://web.archive.org/web/20140202154003/http://qnalist.com/q/sqoop-user Sqoop Users Mailing List Archives]


{{Apache}}
{{Apache Software Foundation}}


[[Category:Apache Attic|Sqoop]]
[[Category:Cloud applications]]
[[Category:Cloud applications]]
[[Category:Hadoop]]
[[Category:Hadoop]]
[[Category:Apache Software Foundation projects]]

Latest revision as of 19:04, 17 July 2024

Apache Sqoop
Developer(s)Apache Software Foundation
Initial release1 June 2009; 15 years ago (2009-06-01)
Final release
1.4.7 / December 6, 2017; 7 years ago (2017-12-06)
RepositorySqoop Repository
Written inJava
Operating systemCross-platform
TypeData management
LicenseApache License 2.0
Websitesqoop.apache.org

Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.[1]

The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic.[2]

Description

[edit]

Sqoop supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive or HBase.[3] Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from "SQL-to-Hadoop".[4] Sqoop became a top-level Apache project in March 2012.[5]

Informatica provides a Sqoop-based connector from version 10.1. Pentaho provides open-source Sqoop based connector steps, Sqoop Import[6] and Sqoop Export,[7] in their ETL suite Pentaho Data Integration since version 4.5 of the software.[8] Microsoft uses a Sqoop-based connector to help transfer data from Microsoft SQL Server databases to Hadoop.[9] Couchbase, Inc. also provides a Couchbase Server-Hadoop connector by means of Sqoop.[10]

See also

[edit]

References

[edit]
  1. ^ "Hadoop: Apache Sqoop". Retrieved Sep 8, 2012.
  2. ^ "moving Sqoop to the Attic". mail-archives.apache.org. Retrieved 2021-06-27.
  3. ^ "Apache Sqoop - Overview". Retrieved Sep 8, 2012.
  4. ^ "Introducing Sqoop". Retrieved Jan 1, 2019.
  5. ^ "Apache Sqoop Graduates from Incubator". Retrieved Sep 8, 2012.
  6. ^ "Sqoop Import". Pentaho. 2015-12-10. Archived from the original on 2015-12-10. Retrieved 2015-12-10. The Sqoop Import job allows you to import data from a relational database into the Hadoop Distributed File System (HDFS) using Apache Sqoop.
  7. ^ "Sqoop Export". Pentaho. 2015-12-10. Archived from the original on 2015-12-10. Retrieved 2015-12-10. The Sqoop Export job allows you to export data from Hadoop into an RDBMS using Apache Sqoop.
  8. ^ "Big Data Analytics Vendor Pentaho Announces Tighter Integration with Cloudera; Extends Visual Interface to Include Hadoop Sqoop and Oozie". Database Trends and Applications (dbta.com). 2012-07-27. Archived from the original on 2015-12-08. Retrieved 2015-12-08. Pentaho's Business Analytics 4.5 is now certified on Cloudera's latest releases, Cloudera Enterprise 4.0 and CDH4. Pentaho also announced that its visual design studio capabilities have been extended to the Sqoop and Oozie components of Hadoop.
  9. ^ "Microsoft SQL Server Connector for Apache Hadoop". Microsoft. Retrieved Sep 8, 2012.
  10. ^ "Couchbase Hadoop Connector". Archived from the original on 2012-08-25. Retrieved Sep 8, 2012.

Bibliography

[edit]
[edit]