Wikipedia:GLAM/NZThesisProject: Difference between revisions
→Documentation: Adding link to schema page on Wikidata |
Obedmakolo (talk | contribs) Tags: Mobile edit Mobile web edit |
||
Line 65: | Line 65: | ||
== Participants == |
== Participants == |
||
[[User: Obedmakolo|Obedmakolo]] [[User:Zeborah|Zeborah]] |
|||
* [[User:Ambrosia10|Ambrosia10]] |
* [[User:Ambrosia10|Ambrosia10]] |
||
* [[User:Giantflightlessbirds|Giantflightlessbirds]] |
* [[User:Giantflightlessbirds|Giantflightlessbirds]] |
Revision as of 23:50, 15 November 2022
Background
This project is focussed on uploading metadata for New Zealand academic theses to Wikidata, in order for them to be more openly citable and accessible. We believe this is the first attempt to upload a national dataset of theses.
The project came about while Giantflightlessbirds was a Wikipedian in Residence at Lincoln University. During that short residency, librarian Zeborah raised the possibility of adding Lincoln University's theses to Wikidata. She had an opportunity to present on to her academic librarian colleagues at the online conference Aotearoa Institutional Repositories Community Days 30 September – 1 October 2021 on adding thesis metadata into Wikidata. In preparation for this presentation she reached out to Giantflightlessbirds who in turn invited Ambrosia10 and DrThneed to join in the discussion. This group met several times to discuss the proposal of uploading all New Zealand academic theses into Wikidata and to prepare for the presentation at the conference.
Discussion documents, slides and other project documentation is being collated in Google Docs folders as some of the participating academic librarians are not Wikimedians. Some of this documentation is linked to in the documentation section of this page.
Scope
The intention is to collect metadata for theses from New Zealand universities and polytechnics, and upload a core set of statements for each thesis in the first instance. After this core set of statements has been uploaded, there is potential for further work to increase the findability and linkage of the theses, for example the data includes keywords, often in controlled vocabularies such as ANZSRC, which could be mapped to main subject statements. We also have data connecting theses to degree programmes and advisors.
A dataset of approximately 66,500 theses has been compiled, from 13 New Zealand institutions. The theses range from diploma and bachelor's theses through to Doctor of Science, and span the time period 1907 to 2022. Whilst many of the theses are digitised and available through an institutional repository, others are represented only by their metadata. Because of variability in the data both within and between institutions, there is a lot of clean up and standardising of data required. Deborah Fitchett has done significant work aggregating and collating the data in Excel, and DrThneed will clean it up in OpenRefine and upload to Wikidata. There will likely be some problems to resolve with institutions where, for example, a thesis is held in more than one library and has been modelled differently by each.
Funding
The thesis dataset is a large and complex dataset, with 66.5k items and several languages, including some apparent duplicate items within and between institutions that need to be clarified with the academic librarians involved, and some incomplete data that may need follow up. The inconsistencies in data format between institutions will require a lot of time to standardise and clean up. For instance, we have counted more than 50 ways of indicating in a title that a work is a thesis but we need to remove these additions to ensure the title of each thesis is as the author intended and in order to make a good citation.
We estimate the data cleaning, checking and upload to Wikidata to take approximately 200 hours of work by an experienced data wrangler. At an hourly rate of $NZ25 this amounts to $NZ5000.
We are approaching Wikimedia Aotearoa New Zealand to support obtaining a contractor to complete this work.
Progress
Ambrosia10 and DrThneed used a small sample dataset to work on mapping the thesis data to Wikidata properties, and Ambrosia10 developed a Wikidata Cradle schema for an academic thesis in consultation with the other members of the group as well as the academic librarians contributing the data. This ontology will likely need to be modified during the project.
Zeborah undertook significant work collating and aggregating the data and was able to pass the dataset onto DrThneed in the beginning of March. DrThneed then spent time exploring the dataset and began a small trial upload of 116 theses into Wikidata both to test the proposed workflow and the schema that had been previously created.
Feedback is in the process of being gathered from the participating institutions and as at April 2022 DrThneed is continuing to work on the dataset preparing it for upload to Wikidata. It is anticipated that the upload of a core set of statements for the full theses dataset will be complete in May/June 2022.
A small team met before Christmas to work on ANZSRC vocabularies in Wikidata, which would be a useful prelude to uploading keywords to the theses items. Progress on the ANZSRC Mix'n'Matches has been slow but we intend to return to this work after upload of the core statements for the main dataset.
DrThneed has created a dashboard that measures edits to Wikidata items with the statement "on focus list of NZThesisProject".
Events
- First meeting with librarians 2021
- Second update with project members & librarians 25 March 2022 : DrThneed presented her findings to the Project participants and contributors and requested feedback from the contributing libraries on issues this trial upload raised.
- Third update 28 July 2022 showing how theses are connected in Wikidata and cited in Wikipedia, and some of the data visualisations now possible, as well as tools to improve the data.
Documentation
- Cradle model
- Data schema for theses, authors and advisors, on Wikidata
- Google doc documenting the process and progress of the project
- Documentation giving recommendations to librarians when providing data
- This Month in GLAM March 2022 report
- Youtube video of DrThneed's report on project progress.
Tools
DrThneed has made some Wikidata property dashboards to see progress on the project. They are both linked from the Wikidata project page. One table shows properties for theses, and the other properties for people (thesis authors). A third table shows some properties we don't expect to find, like volume number and published in - this helps check that our thesis items haven't been inappropriately merged with other types of publications.
The Wikidata project page also contains a link to some Histropedia timelines, and some Sparql queries to visualise the data e.g. a map of where authors have been educated or employed, bubble charts of main subjects or author occupations, links between advisors and students.
Tasks
If you would like to help, some easy tasks are making sure the theses are cited on relevant author Wikipedia pages, or matching authors to author name strings in the Mix'n'match tool.
Citing theses on Wikipedia
This Googlesheet containing lists of thesis authors and their Wikipedia pages shows theses by people who have Wikipedia pages. Instructions for citing using the CiteQ template are on the first sheet. If you click on a thesis URL and find that it has been digitised, you can consider adding a "full work available at URL" statement to Wikidata. This will link the title of the thesis on Wikipedia straight to the repository item.
Mix'n'match
The Mix'n'match tool is a way to match the author name strings from the thesis project to authors on Wikidata. If you search Wikidata and do not find the author, try removing middle names, initials etc. If you are sure the person is not in Wikidata, click the 'new' button to create an item for them. You may be able to find other identifiers to add to the new record e.g. Orcid or ResearchGate. Or if they have a university profile page you can add the university as an 'employer' statement, and then use their profile URL as the reference URL for the statement. You do NOT need to link the author and the thesis item. DrThneed will periodically download matches from the Mix'n'match catalogue and match the authors and theses, and also add other information such as advisors.
If you are not familiar with the Mix'n'match tool, this screencapture shows how to match items, using the Alexander Turnbull library catalogue as an example.
Participants
Obedmakolo Zeborah
Outcomes and impact
- July 2022 After Dr Thneed presented to the librarian community who provided the thesis data in July, Ambrosia10 did a twitter thread explaining to the wider Wikidata community and others on twitter about the project and the progress being made. Dr. Amanda Whitmire, librarian at Stanford Hopkins Marine Station, responded by expressing a desire for the theses from that station be added to Wikidata. This led to an exchange where Dr Thneed and Ambrosia10 expressed encouragement and support in the preparation of theses data by Dr Whitmire being uploaded into Wikidata. As at 9 August 2022 Dr Whitmire has made over 1000 edits to Wikidata including adding 353 Stanford theses from folks who worked at Hopkins Marine Station. She has also created numerous items for the authors of those theses, and has learned how to cite them on Wikipedia.
- August 2022 As a result of Dr Thneed creating a youtube video about the project and her workflow using OpenRefine she has been contacted by a PhD student in Leipzig who is doing a PhD on dissertations.
- September 2022 As a result of Dr Thneed's twitter and Wikidata outreach awareness was raised of the NZ Thesis project and the London School of Economics Wikidata Thesis project were able to adapt queries and visualisations used in the NZThesisProject for their own (and vice versa).
- September 2022 User:Schwede66 wanted to work on New Zealand Rhodes scholars. Dr Thneed scraped and imported a list to OpenRefine, and matched to existing Wikidata items, and then created a Mixnmatch catalogue for the remaining scholars to be matched or created. As most of the scholars have completed a degree at a university in New Zealand and many return to teach in New Zealand institutions, there is a large overlap between Rhodes scholars and the thesis project. Additionally we have been able to match some scholars to their Oxford thesis.
- October 2022 DrThneed presented on the project to the Australia Wikimedia Community Meeting. DrThneed encouraged anyone who knows an institution keen to put thesis data into Wikidata to contact her.
.