Jump to content

Data hub

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 202.108.130.138 (talk) at 08:53, 13 June 2013. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A data hub (data management system, or DMS) is software for collaborating on gathering, sharing and using data.[1]

The term is usually used to refer to the new web-based generation of such products. They can be either platforms for handling lots of different kinds of data, or in verticals specialising in one particular field.

Features

At core, a DMS is a list of datasets that are of diverse schema.

Once you have that, people expect the following features, and/or tight integration with tools that provide them:[2]

  • Load and update data from any source (ETL)
  • Store datasets and index them for querying
  • View, analyse and update data in a tabular interface (spreadsheet)
  • Visualise data, for example with charts or maps
  • Analyse data, for example with statistics and machine learning
  • Organise many people to enter or correct data (crowd-sourcing)
  • Measure and ensure the quality of data, and its provenance
  • Permissions; data can be open, private or shared
  • Find datasets, and organise them to help others find them
  • Sell data, sharing processing costs between users

List of data hubs

It's considered that a desktop operating system (e.g. Unix, OSX, Windows) is the legacy DMS that we use at the moment to do the things that would be better done by a good DMS[2].

References

  1. ^ "Data Hubs, Data Management Systems and CKAN | OKFN Notebook". Open Knowledge Foundation. 2011-04-27. Retrieved 2012-03-08.
  2. ^ a b c d e "From CMS to DMS: C is for Content, D is for Data". ScraperWiki. 2012-03-09. Retrieved 2012-03-12.