Data publishing: Difference between revisions
Fgnievinski (talk | contribs) No edit summary |
m Open access bot: hdl updated in citation with #oabot. |
||
(20 intermediate revisions by 13 users not shown) | |||
Line 2: | Line 2: | ||
'''Data publishing''' (also '''data publication''') is the act of releasing [[research data]] in [[academic publishing|published form]] for use by others. It is a practice consisting in preparing certain [[data]] or [[data set]](s) for public use thus to make them available to everyone to use as they wish. |
'''Data publishing''' (also '''data publication''') is the act of releasing [[research data]] in [[academic publishing|published form]] for use by others. It is a practice consisting in preparing certain [[data]] or [[data set]](s) for public use thus to make them available to everyone to use as they wish. |
||
This practice is an integral part of the [[open science]] movement. |
This practice is an integral part of the [[open science]] movement. |
||
There is a large and multidisciplinary consensus on the benefits resulting from this practice.<ref name="Costello2009">{{Cite journal|author=Costello MJ|year=2009|title=Motivating online publication of data|journal=BioScience|volume=59|issue=5|pages=418–427|doi=10.1525/bio.2009.59.5.9|s2cid=55591360}}</ref> |
There is a large and multidisciplinary consensus on the benefits resulting from this practice.<ref name="Costello2009">{{Cite journal|author=Costello MJ|year=2009|title=Motivating online publication of data|journal=BioScience|volume=59|issue=5|pages=418–427|doi=10.1525/bio.2009.59.5.9|s2cid=55591360|hdl=2292/7173|hdl-access=free}}</ref><ref name="Smith2009">{{Cite journal|author=Smith VS|year=2009|title=Data publication: towards a database of everything|journal=BMC Research Notes|volume=2|issue=113|pages=113|doi=10.1186/1756-0500-2-113|pmc=2702265|pmid=19552813 |doi-access=free }}</ref><ref>{{Cite journal|author1=Lawrence, B|author2=Jones, C.|author3=Matthews, B.|author4=Pepler, S.|author5=Callaghan, S.|year=2011|title=Citation and Peer Review of Data: Moving Towards Formal Data Publication|url=http://www.ijdc.net/index.php/ijdc/article/view/181|journal=International Journal of Digital Curation|volume=6|issue=2|pages=4–37|doi=10.2218/ijdc.v6i2.205|doi-access=free}}</ref> |
||
<ref name="Smith2009">{{Cite journal|author=Smith VS|year=2009|title=Data publication: towards a database of everything|journal=BMC Research Notes|volume=2|issue=113|pages=113|doi=10.1186/1756-0500-2-113|pmc=2702265|pmid=19552813}}</ref> |
|||
<ref>{{Cite journal|author1=Lawrence, B|author2=Jones, C.|author3=Matthews, B.|author4=Pepler, S.|author5=Callaghan, S.|year=2011|title=Citation and Peer Review of Data: Moving Towards Formal Data Publication|url=http://www.ijdc.net/index.php/ijdc/article/view/181|journal=International Journal of Digital Curation|volume=6|issue=2|pages=4–37|doi=10.2218/ijdc.v6i2.205|doi-access=free}}</ref> |
|||
The main goal is to elevate data to be first class research outputs.<ref name="CallaghanEtAl2012">{{Cite journal| |
The main goal is to elevate data to be first class research outputs.<ref name="CallaghanEtAl2012">{{Cite journal|vauthors=Callaghan S, Donegan S, Pepler S, Thorley M, Cunningham N, Kirsch P, Ault L, Bell P, Bowie R, Leadbetter A, Lowry R, Moncoiffé G, Harrison K, Smith-Haddon B, Weatherby A, Wright D |year=2012|title=Making data a first class scientific output: Data citation and publication by NERCs environmental data centres|url=http://ijdc.net/index.php/ijdc/article/view/208|journal=International Journal of Digital Curation|volume=7|issue=1|pages=107–113|doi=10.2218/ijdc.v7i1.218|doi-access=free}}</ref> There are a number of initiatives underway as well as points of consensus and issues still in contention.<ref name="KratzStrasser2014">{{Cite journal|vauthors=Kratz J, Strasser C|year=2014|title=Data publication consensus and controversies|journal=F1000Research|volume=3|issue=94|pages=94|doi=10.12688/f1000research.4518|pmid=25075301|pmc=4097345 |doi-access=free }}</ref> |
||
There are a number of initiatives underway as well as points of consensus and issues still in contention.<ref name="KratzStrasser2014">{{Cite journal|vauthors=Kratz J, Strasser C|year=2014|title=Data publication consensus and controversies|journal=F1000Research|volume=3|issue=94|pages=94|doi=10.12688/f1000research.4518|pmid=25075301|pmc=4097345}}</ref> |
|||
There are several distinct ways to make research data available, including: |
There are several distinct ways to make research data available, including: |
||
* publishing data as supplemental material associated with a [[research article]], typically with the data files hosted by the publisher of the article |
* publishing data as supplemental material associated with a [[research article]], typically with the data files hosted by the publisher of the article |
||
* hosting data on a publicly available website, with files available for download |
* hosting data on a publicly available website, with files available for download |
||
* hosting data in a repository that has been developed to support data publication, e.g. [[figshare]], [[Dryad (repository)|Dryad]], [[Dataverse]], [[Zenodo]]. A large number of general and specialty (such as by research topic) data repositories exist.<ref name="AssanteEtAl2016">{{Cite journal|author1=Assante, M.|author2=Candela, L.|author3=Castelli, D.|author4=Tani, A.|year=2016|title=Are Scientific Data Repositories Coping with Research Data Publishing?|journal=Data Science Journal|volume=15|doi=10.5334/dsj-2016-006|doi-access=free}}</ref> For example, the [[UK Data Service]] enables users to deposit data |
* hosting data in a repository that has been developed to support data publication, e.g. [[figshare]], [[Dryad (repository)|Dryad]], [[Dataverse]], [[Zenodo]]. A large number of general and specialty (such as by research topic) data repositories exist.<ref name="AssanteEtAl2016">{{Cite journal|author1=Assante, M.|author2=Candela, L.|author3=Castelli, D.|author4=Tani, A.|year=2016|title=Are Scientific Data Repositories Coping with Research Data Publishing?|journal=Data Science Journal|volume=15|doi=10.5334/dsj-2016-006|doi-access=free}}</ref> For example, the [[UK Data Service]] enables users to deposit [[data collection]]s and re-share these for research purposes. |
||
* publishing a data paper about the dataset, which may be published as a preprint, in a regular [[Scientific journal|journal]], or in a data journal that is dedicated to supporting data papers. The data may be hosted by the journal or hosted separately in a data repository. |
* publishing a data paper about the dataset, which may be published as a preprint, in a regular [[Scientific journal|journal]], or in a data journal that is dedicated to supporting data papers. The data may be hosted by the journal or hosted separately in a data repository. |
||
Publishing data allows researchers to both make their data available to others to use, and enables datasets to be cited similarly to other research publication types (such as articles or books), thereby enabling producers of datasets to gain academic credit for their work. |
Publishing data allows researchers to both make their data available to others to use, and enables datasets to be cited similarly to other research publication types (such as articles or books), thereby enabling producers of datasets to gain academic credit for their work. |
||
The motivations for publishing data may range for a desire to make research more accessible, to enable citability of datasets, or research funder or publisher mandates that require open data publishing. The UK Data Service is one key organisation working with others to raise the importance of |
The motivations for publishing data may range for a desire to make research more accessible, to enable citability of datasets, or research funder or publisher mandates that require open data publishing. The UK Data Service is one key organisation working with others to raise the importance of citing data correctly<ref>{{cite web |last1=Service |first1=UK Data |title=New to using data |url=https://www.ukdataservice.ac.uk/citethedata.aspx |website=UK Data Service}}</ref> and helping researchers to do so. |
||
Solutions to preserve privacy within data publishing has been proposed, including privacy protection algorithms, data ”masking” methods, and regional privacy level calculation algorithm.<ref>{{Cite |
Solutions to preserve privacy within data publishing has been proposed, including privacy protection algorithms, data ”masking” methods, and regional privacy level calculation algorithm.<ref>{{Cite book|last1=Zhang|first1=Longbin|last2=Wang|first2=Yuxiang|last3=Xu|first3=Xiaoliang|title=2017 Fifth International Conference on Advanced Cloud and Big Data (CBD) |chapter=Logic-Partition Based Gaussian Sampling for Online Aggregation |date=August 2017|chapter-url=http://dx.doi.org/10.1109/cbd.2017.39|pages=182–187|publisher=IEEE|doi=10.1109/cbd.2017.39|isbn=978-1-5386-1072-5|s2cid=40025084}}</ref> |
||
== Methods for publishing data == |
== Methods for publishing data == |
||
{{More citations needed section|date=April 2022}} |
|||
=== Data files as supplementary material === |
=== Data files as supplementary material === |
||
A large number of journals and publishers support supplementary material being attached to research articles, including datasets. Though historically such material might have been distributed only by request or on [[microform]] to libraries, journals today typically host such material online. Supplementary material is available to subscribers to the journal or, if the article or journal is open access, to everyone. |
A large number of journals and publishers support supplementary material being attached to research articles, including datasets. Though historically such material might have been distributed only by request or on [[microform]] to libraries, journals today typically host such material online. Supplementary material is available to subscribers to the journal or, if the article or journal is open access, to everyone. |
||
Line 29: | Line 26: | ||
===Data papers{{anchor|Papers}}{{anchor|Paper}}=== |
===Data papers{{anchor|Papers}}{{anchor|Paper}}=== |
||
'''Data papers''' or '''data articles''' are “scholarly publication of a searchable metadata document describing a particular on-line accessible dataset, or a group of datasets, published in accordance to the standard academic practices”.<ref name="ChavanPenev2011">{{Cite journal |author1=Chavan, V. |author2= Penev, L. |name-list-style=amp |title=The data paper: a mechanism to incentivize data publishing in biodiversity science |journal=BMC Bioinformatics |year=2011 |volume=12 |issue=15 |doi=10.1186/1471-2105-12-S15-S2 |pages=S2 |pmid=22373175 |pmc=3287445}}</ref> |
'''Data papers''' or '''data articles''' are “scholarly publication of a searchable metadata document describing a particular on-line accessible dataset, or a group of datasets, published in accordance to the standard academic practices”.<ref name="ChavanPenev2011">{{Cite journal |author1=Chavan, V. |author2= Penev, L. |name-list-style=amp |title=The data paper: a mechanism to incentivize data publishing in biodiversity science |journal=BMC Bioinformatics |year=2011 |volume=12 |issue=15 |doi=10.1186/1471-2105-12-S15-S2 |pages=S2 |pmid=22373175 |pmc=3287445 |doi-access= free }}</ref> Their final aim is to provide “information on the what, where, why, how and who of the data”.<ref name="CallaghanEtAl2012"/> The intent of a data paper is to offer descriptive information on the related dataset(s) focusing on data collection, distinguishing features, access and potential reuse rather than on data processing and analysis.<ref name="NewmanCorke2009">{{Cite journal |author1=Newman Paul |author2=Corke Peter |title=Data papers — peer reviewed publication of high quality data sets|journal=International Journal of Robotics Research|year=2009|volume=28|issue=5|pages=587|doi=10.1177/0278364909104283|s2cid=209308576 |doi-access=free}}</ref> Because data papers are considered academic publications no different than other types of papers, they allow scientists sharing data to receive credit in currency recognizable within the academic system, thus "making data sharing count".<ref name="Gorgolewski2013">{{Cite journal |vauthors=Gorgolewski KJ, Margulies DS, Milham MP |title=Making data sharing count: a publication-based solution|journal=Frontiers in Neuroscience|year=2013|volume=7|pages=9|doi=10.3389/fnins.2013.00009|pmid=23390412|pmc=3565154|doi-access=free}}</ref> This provides not only an additional incentive to share data, but also through the [[peer review]] process, increases the quality of metadata and thus reusability of the shared data. |
||
Their final aim is to provide “information on the what, where, why, how and who of the data”.<ref name="CallaghanEtAl2012"/> |
|||
The intent of a data paper is to offer descriptive information on the related dataset(s) focusing on data collection, distinguishing features, access and potential reuse rather than on data processing and analysis.<ref name="NewmanCorke2009">{{Cite journal |author1=Newman Paul |author2=Corke Peter |title=Data papers — peer reviewed publication of high quality data sets|journal=International Journal of Robotics Research|year=2009|volume=28|issue=5|pages=587|doi=10.1177/0278364909104283|url=http://ijr.sagepub.com/content/28/5/587}}</ref> Because data papers are considered academic publications no different than other types of papers, they allow scientists sharing data to receive credit in currency recognizable within the academic system, thus "making data sharing count".<ref name="Gorgolewski2013">{{Cite journal |vauthors=Gorgolewski KJ, Margulies DS, Milham MP |title=Making data sharing count: a publication-based solution|journal=Frontiers in Neuroscience|year=2013|volume=7|pages=9|doi=10.3389/fnins.2013.00009|pmid=23390412|pmc=3565154|doi-access=free}}</ref> This provides not only an additional incentive to share data, but also through the [[peer review]] process, increases the quality of metadata and thus reusability of the shared data. |
|||
Thus data papers represent the [[scholarly communication]] approach to [[data sharing]]. Despite their potentiality, data papers are not the ultimate and complete solution for all the data sharing and reuse issues and, in some cases, they are considered to induce false expectations in the research community.<ref name="ParsonsFox2013">{{Cite journal |author1=Parsons, M.A. |author2=Fox, P.A.|title=Is data publication the right metaphor?|journal=Data Science Journal|year=2013|volume=12|pages=WDS31–WDS46|doi=10.2481/dsj.WDS-042|url=https://www.jstage.jst.go.jp/article/dsj/12/0/12_WDS-042/_article|doi-access=free}}</ref> |
|||
Thus data papers represent the [[scholarly communication]] approach to [[data sharing]]. |
|||
Despite their potentiality, data papers are not the ultimate and complete solution for all the data sharing and reuse issues and, in some cases, they are considered to induce false expectations in the research community.<ref name="ParsonsFox2013">{{Cite journal |author1=Parsons, M.A. |author2=Fox, P.A.|title=Is data publication the right metaphor?|journal=Data Science Journal|year=2013|volume=12|pages=WDS31–WDS46|doi=10.2481/dsj.WDS-042|url=https://www.jstage.jst.go.jp/article/dsj/12/0/12_WDS-042/_article|doi-access=free}}</ref> |
|||
===Data journals{{anchor|Journals}}{{anchor|Journal}}=== |
===Data journals{{anchor|Journals}}{{anchor|Journal}}=== |
||
Line 40: | Line 34: | ||
Data papers are supported by a rich array of '''data journals''', some of which are "pure", i.e. they are dedicated to publish data papers only, while others – the majority – are "mixed", i.e. they publish a number of articles types including data papers. |
Data papers are supported by a rich array of '''data journals''', some of which are "pure", i.e. they are dedicated to publish data papers only, while others – the majority – are "mixed", i.e. they publish a number of articles types including data papers. |
||
A comprehensive survey on data journals is available.<ref name="CandelaEtAl2015">{{Cite journal | |
A comprehensive survey on data journals is available.<ref name="CandelaEtAl2015">{{Cite journal |vauthors=Candela L, Castelli D, Manghi P, Tani A |title=Data Journals: A Survey|journal=Journal of the Association for Information Science and Technology|year=2015|volume=66|issue=1|pages=1747–1762|doi=10.1002/asi.23358|s2cid=31358007|url=https://zenodo.org/record/18377}}</ref> A non-exhaustive list of data journals has been compiled by staff at the University of Edinburgh.<ref>{{Cite web|url=https://www.wiki.ed.ac.uk/display/datashare/Sources+of+dataset+peer+review|title = Sources of dataset peer review - datashare - Wiki Service}}</ref> |
||
A non-exhaustive list of data journals has been compiled by staff at the University of Edinburgh.<ref>{{Cite web|url=https://www.wiki.ed.ac.uk/display/datashare/Sources+of+dataset+peer+review|title = Sources of dataset peer review - datashare - Wiki Service}}</ref> |
|||
Examples of "pure" data journals are: ''[[Earth System Science Data]]'', ''[[Journal of Open Archaeology Data]]'', ''[[Open Health Data]]'', ''[[Polar Data Journal]]'', and ''[[Scientific Data (journal)|Scientific Data]]''. |
Examples of "pure" data journals are: ''[[Earth System Science Data]]'', ''[[Journal of Open Archaeology Data]]'', ''[[Open Health Data]]'', ''[[Polar Data Journal]]'', and ''[[Scientific Data (journal)|Scientific Data]]''. |
||
Examples of "mixed" journals publishing data papers are: ''[[Biodiversity Data Journal]]'', ''[[F1000Research]]'', ''[[GigaScience]]'', ''[[ |
Examples of "mixed" journals publishing data papers are: ''[[Biodiversity Data Journal]]'', ''[[F1000Research]]'', ''[[GigaScience]]'', ''[[Gigabyte (journal)|GigaByte]]'', ''[[PLOS ONE]]'', and ''[[SpringerPlus]]''. |
||
==Data citation{{anchor|Citation}}== |
===Data citation{{anchor|Citation}}=== |
||
[[File:Data Dryad citation on Wikipedia.png|thumb|A data citation example]] |
|||
{{main|Data citation}} |
|||
Data citation is the provision of accurate, consistent and standardised referencing for [[Data set|datasets]] just as bibliographic [[citation]]s are provided for other published sources like [[research article]]s or [[monograph]]s. Typically the well established [[Digital object identifier|Digital Object Identifier (DOI)]] approach is used with DOIs taking users to a [[website]] that contains the [[metadata]] on the dataset and the dataset itself.<ref>[http://www.ands.org.au/guides/data-citation-awareness.html Australian National Data Service: Data Citation Awareness] {{Webarchive|url=https://web.archive.org/web/20120307190949/http://www.ands.org.au/guides/data-citation-awareness.html |date=2012-03-07 }} (Accessed 20 March 2012)</ref><ref>Ball, A., Duke, M. (2011). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/briefing-papers/</ref> |
|||
'''Data citation''' is the provision of accurate, consistent and standardised referencing for [[Data set|datasets]] just as bibliographic [[citation]]s are provided for other published sources like [[research article]]s or [[monograph]]s. Typically the well established [[Digital object identifier|Digital Object Identifier (DOI)]] approach is used with DOIs taking users to a [[website]] that contains the [[metadata]] on the dataset and the dataset itself.<ref name="awareness2012" /><ref name="ball2011" /> |
|||
====History of development==== |
|||
A 2011 paper reported an inability to determine how often data citation happened in social sciences.<ref name="mooney2011" /> |
|||
2012-13 papers reported that data citation was becoming more common but the practice for it was not standard.<ref name="edmunds2012" /><ref name="outofcite2013" /><ref name="mooney2012" /> |
|||
In 2014 [[FORCE11|FORCE 11]] published the Joint Declaration of Data Citation Principles covering the purpose, function and attributes of data citation.<ref name="synthgroup2014" /> |
|||
In October 2018 [[CrossRef]] expressed its support for cataloging datasets and recommending their citation.<ref name="lin2018" /> |
|||
A popular data-oriented journal reported in April 2019 that it would now use data citations.<ref name="citeneeded2019" /> |
|||
A June 2019 paper suggested that increased data citation will make the practice more valuable for everyone by encouraging data sharing and also by increasing the prestige of people who share.<ref name="pierce2019" /> |
|||
Data citation is an emerging topic in [[computer science]] and it has been defined as a computational problem.<ref name="buneman2016" /> Indeed, citing data poses significant challenges to computer scientists and the main problems to address are related to:<ref name="silvello2018" /> |
|||
* the use of heterogeneous data models and formats – e.g., relational databases, Comma-Separated Values (CSV), [[Extensible Markup Language]] (XML),<ref name="buneman2010" /><ref name="silvello2017" /> [[Resource Description Framework]] (RDF);<ref name="silvello2015" /> |
|||
* the transience of data; |
|||
* the necessity to cite data at different levels of coarseness – i.e., deep citations;<ref name="buneman2006" /> |
|||
* the necessity to automatically generate citations to data with variable granularity. |
|||
==See also== |
==See also== |
||
*[[Data archiving]] |
*[[Data archiving]] |
||
*[[Registry of Research Data Repositories]] |
|||
*[[Disciplinary repository]] |
*[[Disciplinary repository]] |
||
*[[Open science data]] |
|||
*[[Registry of Research Data Repositories]] |
|||
==References== |
==References== |
||
{{Reflist| |
{{Reflist|refs= |
||
<!-- DO NOT ADD "}}" above it will break the reflist and cause cite errors--> |
|||
<ref name="awareness2012">[http://www.ands.org.au/guides/data-citation-awareness.html Australian National Data Service: Data Citation Awareness] {{Webarchive|url=https://web.archive.org/web/20120307190949/http://www.ands.org.au/guides/data-citation-awareness.html |date=2012-03-07}} (Accessed 20 March 2012)</ref> |
|||
<ref name="ball2011">Ball, A., Duke, M. (2011). 'Data Citation and Linking'. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/briefing-papers/</ref> |
|||
<ref name="mooney2011">{{cite journal |
|||
| last1 = MOONEY |
|||
| first1 = Hailey |
|||
| s2cid = 34513423 |
|||
| title = Citing data sources in the social sciences: do authors do it? |
|||
| journal = Learned Publishing |
|||
| date = April 2011 |
|||
| volume = 24 |
|||
| issue = 2 |
|||
| pages = 99–108 |
|||
| doi = 10.1087/20110204 |
|||
| doi-access= free |
|||
}}</ref> |
|||
<ref name="edmunds2012">{{Cite journal |
|||
| last1 = Edmunds |
|||
| first1 = Scott C. |
|||
| last2 = Pollard |
|||
| first2 = Tom J. |
|||
| last3 = Hole |
|||
| first3 = Brian |
|||
| last4 = Basford |
|||
| first4 = Alexandra T. |
|||
| date = 2012-07-02 |
|||
| title = Adventures in data citation: sorghum genome data exemplifies the new gold standard |
|||
| url = |
|||
| journal = BMC Research Notes |
|||
| volume = 5 |
|||
| issue = 1 |
|||
| pages = 223 |
|||
| doi = 10.1186/1756-0500-5-223 |
|||
| issn = 1756-0500 |
|||
| pmc = 3392744 |
|||
| pmid = 22571506 |
|||
| doi-access = free |
|||
}}</ref> |
|||
<ref name="outofcite2013">{{cite journal |
|||
| title = Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data |
|||
| journal = Data Science Journal |
|||
| date = 2013 |
|||
| volume = 12 |
|||
| pages = CIDCR1–CIDCR75 |
|||
| doi = 10.2481/dsj.OSOM13-043 |
|||
| doi-access= free |
|||
}}</ref> |
|||
<ref name="mooney2012">{{cite journal |
|||
| last1 = Mooney |
|||
| first1 = Hailey |
|||
| last2 = Newton |
|||
| first2 = Mark P. |
|||
| title = The Anatomy of a Data Citation: Discovery, Reuse, and Credit |
|||
| journal = Academic Commons |
|||
| date = 2012 |
|||
| volume = 1 |
|||
| issue = 1 |
|||
| pages = eP1035 |
|||
| doi = 10.7916/D8MW2STM |
|||
| publisher = Columbia University |
|||
}}</ref> |
|||
<ref name="synthgroup2014">{{cite journal |
|||
| author1 = Data Citation Synthesis Group |
|||
| editor1-last = Martone |
|||
| editor1-first = M. |
|||
| title = Joint Declaration of Data Citation Principles |
|||
| date = 2014 |
|||
| doi = 10.25490/a97f-egyk |
|||
| url = https://www.force11.org/datacitationprinciples |
|||
| publisher = [[Force11 Scholarly Communication Institute]] |
|||
| location = San Diego |
|||
}}</ref> |
|||
<ref name="lin2018">{{cite web |
|||
| last1 = Lin |
|||
| first1 = Jennifer |
|||
| title = Data citation: let's do this |
|||
| url = https://www.crossref.org/blog/data-citation-lets-do-this/ |
|||
| website = Crossref |
|||
| language = en |
|||
| date = 4 October 2018 |
|||
}}</ref> |
|||
<ref name="citeneeded2019">{{cite journal |
|||
| title = Data cita{{not a typo|tion nee}}ded |
|||
| journal = Scientific Data |
|||
| date = 10 April 2019 |
|||
| volume = 6 |
|||
| issue = 1 |
|||
| pages = 27 |
|||
| doi = 10.1038/s41597-019-0026-5 |
|||
| pmid = 30971699 |
|||
| pmc = 6472333 |
|||
| bibcode = 2019NatSD...6...27. |
|||
}}</ref> |
|||
<ref name="pierce2019">{{cite journal |
|||
| last1 = Pierce |
|||
| first1 = Heather H. |
|||
| last2 = Dev |
|||
| first2 = Anurupa |
|||
| last3 = Statham |
|||
| first3 = Emily |
|||
| last4 = Bierer |
|||
| first4 = Barbara E. |
|||
| title = Credit data generators for data reuse |
|||
| journal = Nature |
|||
| date = 4 June 2019 |
|||
| volume = 570 |
|||
| issue = 7759 |
|||
| pages = 30–32 |
|||
| doi = 10.1038/d41586-019-01715-4 |
|||
| pmid = 31164773 |
|||
| bibcode = 2019Natur.570...30P |
|||
| s2cid = 174809246 |
|||
| doi-access= free |
|||
}}</ref> |
|||
<ref name="buneman2016">{{cite journal |
|||
| last1 = Buneman |
|||
| first1 = Peter |
|||
| last2 = Davidson |
|||
| first2 = Susan |
|||
| last3 = Frew |
|||
| first3 = James |
|||
| title = Why data citation is a computational problem |
|||
| journal = Communications of the ACM |
|||
| issn = 0001-0782 |
|||
| date = September 2016 |
|||
| volume = 59 |
|||
| issue = 9 |
|||
| pages = 50–57 |
|||
| doi = 10.1145/2893181 |
|||
| pmid = 29151602 |
|||
| pmc = 5687090 |
|||
}}</ref> |
|||
<ref name="silvello2018">Silvello, G. (2018). 'Theory and Practice of Data Citation'. Journal of the Association for Information Science and Technology (JASIST) (AIS Review), vol. 69 issue 1, pp. 6-20, 2018. Available online (open access): https://onlinelibrary.wiley.com/doi/full/10.1002/asi.23917</ref> |
|||
<ref name="buneman2010">Buneman, P. and Silvello, G. (2010). 'A Rule-Based Citation System for Structured and Evolving Datasets'. IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 3, No. 3. IEEE Computer Society, pp. 33-41, September 2010. Available online: http://sites.computer.org/debull/A10sept/buneman.pdf</ref> |
|||
<ref name="silvello2017">Silvello, G. (2017). 'Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data'. Journal of the Association for Information Science and Technology (JASIST), Volume 68 issue 6, pp. 1505-1524, June 2017. Available online: http://www.dei.unipd.it/~silvello/papers/2016-DataCitation-JASIST-Silvello.pdf</ref> |
|||
<ref name="silvello2015">Silvello, G. (2015). 'A Methodology for Citing Linked Open Data Subsets'. D-Lib Magazine 21 (1/2), 2015. Available online: http://www.dlib.org/dlib/january15/silvello/01silvello.html</ref> |
|||
<ref name="buneman2006">Buneman, P. (2006). 'How to Cite Curated Databases and how to Make Them Citable'. In Proc. of the 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, pages 195–203, 2006.</ref> |
|||
}} |
|||
{{Data}} |
|||
[[Category:Data publishing| ]] |
[[Category:Data publishing| ]] |
Latest revision as of 05:58, 15 April 2024
Data publishing (also data publication) is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice.[1][2][3]
The main goal is to elevate data to be first class research outputs.[4] There are a number of initiatives underway as well as points of consensus and issues still in contention.[5]
There are several distinct ways to make research data available, including:
- publishing data as supplemental material associated with a research article, typically with the data files hosted by the publisher of the article
- hosting data on a publicly available website, with files available for download
- hosting data in a repository that has been developed to support data publication, e.g. figshare, Dryad, Dataverse, Zenodo. A large number of general and specialty (such as by research topic) data repositories exist.[6] For example, the UK Data Service enables users to deposit data collections and re-share these for research purposes.
- publishing a data paper about the dataset, which may be published as a preprint, in a regular journal, or in a data journal that is dedicated to supporting data papers. The data may be hosted by the journal or hosted separately in a data repository.
Publishing data allows researchers to both make their data available to others to use, and enables datasets to be cited similarly to other research publication types (such as articles or books), thereby enabling producers of datasets to gain academic credit for their work.
The motivations for publishing data may range for a desire to make research more accessible, to enable citability of datasets, or research funder or publisher mandates that require open data publishing. The UK Data Service is one key organisation working with others to raise the importance of citing data correctly[7] and helping researchers to do so.
Solutions to preserve privacy within data publishing has been proposed, including privacy protection algorithms, data ”masking” methods, and regional privacy level calculation algorithm.[8]
Methods for publishing data
[edit]This section needs additional citations for verification. (April 2022) |
Data files as supplementary material
[edit]A large number of journals and publishers support supplementary material being attached to research articles, including datasets. Though historically such material might have been distributed only by request or on microform to libraries, journals today typically host such material online. Supplementary material is available to subscribers to the journal or, if the article or journal is open access, to everyone.
Data repositories
[edit]There are a large number of data repositories, on both general and specialized topics. Many repositories are disciplinary repositories, focused on a particular research discipline such as the UK Data Service which is a trusted digital repository of social, economic and humanities data. Repositories may be free for researchers to upload their data or may charge a one-time or ongoing fee for hosting the data. These repositories offer a publicly accessible web interface for searching and browsing hosted datasets, and may include additional features such as a digital object identifier, for permanent citation of the data, and linking to associated published papers and code.
Data papers
[edit]Data papers or data articles are “scholarly publication of a searchable metadata document describing a particular on-line accessible dataset, or a group of datasets, published in accordance to the standard academic practices”.[9] Their final aim is to provide “information on the what, where, why, how and who of the data”.[4] The intent of a data paper is to offer descriptive information on the related dataset(s) focusing on data collection, distinguishing features, access and potential reuse rather than on data processing and analysis.[10] Because data papers are considered academic publications no different than other types of papers, they allow scientists sharing data to receive credit in currency recognizable within the academic system, thus "making data sharing count".[11] This provides not only an additional incentive to share data, but also through the peer review process, increases the quality of metadata and thus reusability of the shared data.
Thus data papers represent the scholarly communication approach to data sharing. Despite their potentiality, data papers are not the ultimate and complete solution for all the data sharing and reuse issues and, in some cases, they are considered to induce false expectations in the research community.[12]
Data journals
[edit]Data papers are supported by a rich array of data journals, some of which are "pure", i.e. they are dedicated to publish data papers only, while others – the majority – are "mixed", i.e. they publish a number of articles types including data papers.
A comprehensive survey on data journals is available.[13] A non-exhaustive list of data journals has been compiled by staff at the University of Edinburgh.[14]
Examples of "pure" data journals are: Earth System Science Data, Journal of Open Archaeology Data, Open Health Data, Polar Data Journal, and Scientific Data.
Examples of "mixed" journals publishing data papers are: Biodiversity Data Journal, F1000Research, GigaScience, GigaByte, PLOS ONE, and SpringerPlus.
Data citation
[edit]Data citation is the provision of accurate, consistent and standardised referencing for datasets just as bibliographic citations are provided for other published sources like research articles or monographs. Typically the well established Digital Object Identifier (DOI) approach is used with DOIs taking users to a website that contains the metadata on the dataset and the dataset itself.[15][16]
History of development
[edit]A 2011 paper reported an inability to determine how often data citation happened in social sciences.[17]
2012-13 papers reported that data citation was becoming more common but the practice for it was not standard.[18][19][20]
In 2014 FORCE 11 published the Joint Declaration of Data Citation Principles covering the purpose, function and attributes of data citation.[21]
In October 2018 CrossRef expressed its support for cataloging datasets and recommending their citation.[22]
A popular data-oriented journal reported in April 2019 that it would now use data citations.[23]
A June 2019 paper suggested that increased data citation will make the practice more valuable for everyone by encouraging data sharing and also by increasing the prestige of people who share.[24]
Data citation is an emerging topic in computer science and it has been defined as a computational problem.[25] Indeed, citing data poses significant challenges to computer scientists and the main problems to address are related to:[26]
- the use of heterogeneous data models and formats – e.g., relational databases, Comma-Separated Values (CSV), Extensible Markup Language (XML),[27][28] Resource Description Framework (RDF);[29]
- the transience of data;
- the necessity to cite data at different levels of coarseness – i.e., deep citations;[30]
- the necessity to automatically generate citations to data with variable granularity.
See also
[edit]References
[edit]- ^ Costello MJ (2009). "Motivating online publication of data". BioScience. 59 (5): 418–427. doi:10.1525/bio.2009.59.5.9. hdl:2292/7173. S2CID 55591360.
- ^ Smith VS (2009). "Data publication: towards a database of everything". BMC Research Notes. 2 (113): 113. doi:10.1186/1756-0500-2-113. PMC 2702265. PMID 19552813.
- ^ Lawrence, B; Jones, C.; Matthews, B.; Pepler, S.; Callaghan, S. (2011). "Citation and Peer Review of Data: Moving Towards Formal Data Publication". International Journal of Digital Curation. 6 (2): 4–37. doi:10.2218/ijdc.v6i2.205.
- ^ a b Callaghan S, Donegan S, Pepler S, Thorley M, Cunningham N, Kirsch P, Ault L, Bell P, Bowie R, Leadbetter A, Lowry R, Moncoiffé G, Harrison K, Smith-Haddon B, Weatherby A, Wright D (2012). "Making data a first class scientific output: Data citation and publication by NERCs environmental data centres". International Journal of Digital Curation. 7 (1): 107–113. doi:10.2218/ijdc.v7i1.218.
- ^ Kratz J, Strasser C (2014). "Data publication consensus and controversies". F1000Research. 3 (94): 94. doi:10.12688/f1000research.4518. PMC 4097345. PMID 25075301.
- ^ Assante, M.; Candela, L.; Castelli, D.; Tani, A. (2016). "Are Scientific Data Repositories Coping with Research Data Publishing?". Data Science Journal. 15. doi:10.5334/dsj-2016-006.
- ^ Service, UK Data. "New to using data". UK Data Service.
- ^ Zhang, Longbin; Wang, Yuxiang; Xu, Xiaoliang (August 2017). "Logic-Partition Based Gaussian Sampling for Online Aggregation". 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD). IEEE. pp. 182–187. doi:10.1109/cbd.2017.39. ISBN 978-1-5386-1072-5. S2CID 40025084.
- ^ Chavan, V. & Penev, L. (2011). "The data paper: a mechanism to incentivize data publishing in biodiversity science". BMC Bioinformatics. 12 (15): S2. doi:10.1186/1471-2105-12-S15-S2. PMC 3287445. PMID 22373175.
- ^ Newman Paul; Corke Peter (2009). "Data papers — peer reviewed publication of high quality data sets". International Journal of Robotics Research. 28 (5): 587. doi:10.1177/0278364909104283. S2CID 209308576.
- ^ Gorgolewski KJ, Margulies DS, Milham MP (2013). "Making data sharing count: a publication-based solution". Frontiers in Neuroscience. 7: 9. doi:10.3389/fnins.2013.00009. PMC 3565154. PMID 23390412.
- ^ Parsons, M.A.; Fox, P.A. (2013). "Is data publication the right metaphor?". Data Science Journal. 12: WDS31 – WDS46. doi:10.2481/dsj.WDS-042.
- ^ Candela L, Castelli D, Manghi P, Tani A (2015). "Data Journals: A Survey". Journal of the Association for Information Science and Technology. 66 (1): 1747–1762. doi:10.1002/asi.23358. S2CID 31358007.
- ^ "Sources of dataset peer review - datashare - Wiki Service".
- ^ Australian National Data Service: Data Citation Awareness Archived 2012-03-07 at the Wayback Machine (Accessed 20 March 2012)
- ^ Ball, A., Duke, M. (2011). 'Data Citation and Linking'. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/briefing-papers/
- ^ MOONEY, Hailey (April 2011). "Citing data sources in the social sciences: do authors do it?". Learned Publishing. 24 (2): 99–108. doi:10.1087/20110204. S2CID 34513423.
- ^ Edmunds, Scott C.; Pollard, Tom J.; Hole, Brian; Basford, Alexandra T. (2012-07-02). "Adventures in data citation: sorghum genome data exemplifies the new gold standard". BMC Research Notes. 5 (1): 223. doi:10.1186/1756-0500-5-223. ISSN 1756-0500. PMC 3392744. PMID 22571506.
- ^ "Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data". Data Science Journal. 12: CIDCR1 – CIDCR75. 2013. doi:10.2481/dsj.OSOM13-043.
- ^ Mooney, Hailey; Newton, Mark P. (2012). "The Anatomy of a Data Citation: Discovery, Reuse, and Credit". Academic Commons. 1 (1). Columbia University: eP1035. doi:10.7916/D8MW2STM.
- ^ Data Citation Synthesis Group (2014). Martone, M. (ed.). "Joint Declaration of Data Citation Principles". San Diego: Force11 Scholarly Communication Institute. doi:10.25490/a97f-egyk.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Lin, Jennifer (4 October 2018). "Data citation: let's do this". Crossref.
- ^ "Data citation needed". Scientific Data. 6 (1): 27. 10 April 2019. Bibcode:2019NatSD...6...27.. doi:10.1038/s41597-019-0026-5. PMC 6472333. PMID 30971699.
- ^ Pierce, Heather H.; Dev, Anurupa; Statham, Emily; Bierer, Barbara E. (4 June 2019). "Credit data generators for data reuse". Nature. 570 (7759): 30–32. Bibcode:2019Natur.570...30P. doi:10.1038/d41586-019-01715-4. PMID 31164773. S2CID 174809246.
- ^ Buneman, Peter; Davidson, Susan; Frew, James (September 2016). "Why data citation is a computational problem". Communications of the ACM. 59 (9): 50–57. doi:10.1145/2893181. ISSN 0001-0782. PMC 5687090. PMID 29151602.
- ^ Silvello, G. (2018). 'Theory and Practice of Data Citation'. Journal of the Association for Information Science and Technology (JASIST) (AIS Review), vol. 69 issue 1, pp. 6-20, 2018. Available online (open access): https://onlinelibrary.wiley.com/doi/full/10.1002/asi.23917
- ^ Buneman, P. and Silvello, G. (2010). 'A Rule-Based Citation System for Structured and Evolving Datasets'. IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 3, No. 3. IEEE Computer Society, pp. 33-41, September 2010. Available online: http://sites.computer.org/debull/A10sept/buneman.pdf
- ^ Silvello, G. (2017). 'Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data'. Journal of the Association for Information Science and Technology (JASIST), Volume 68 issue 6, pp. 1505-1524, June 2017. Available online: http://www.dei.unipd.it/~silvello/papers/2016-DataCitation-JASIST-Silvello.pdf
- ^ Silvello, G. (2015). 'A Methodology for Citing Linked Open Data Subsets'. D-Lib Magazine 21 (1/2), 2015. Available online: http://www.dlib.org/dlib/january15/silvello/01silvello.html
- ^ Buneman, P. (2006). 'How to Cite Curated Databases and how to Make Them Citable'. In Proc. of the 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, pages 195–203, 2006.