The term "research data" is difficult to grasp as each discipline ultimately has its own understanding of what constitutes research data in the corresponding research area. In general, one could say that research data arises in the course of various subject-related research processes. This includes both raw data and the research results derived from them in the form of publications, including the associated metadata and documentation.
Life cycle of research data
The data lifecycle model describes data from production, analysis and long-term storage to accessibility and re-use. Various models exist for this purpose.
Scientific research produces a wide variety of digital data. For the permanent and comprehensible preservation of data and for the possible reuse of primary research data by third parties, research data management is required. This concerns every step in the handling of research data, from project planning to the permanent storage of data.
Third-party funding bodies such as the German Research Foundation (DFG) or the European Union attach great importance to research data management and set guidelines for research data management and open data in their funding programs.
A data management plan (DMP)describes the digital research data generated within the scope of a research project and the planned handling of this data. The description includes, for example, information on the type of data and how it is generated, the standards and metadata used, and the planned measures for archiving and data preservation. In addition, DMPs can contain information about access options, licenses, persistent identifiers for the data records, or information about usage options beyond the original purpose. A DMP is an active document that is adapted to changes in the course of the project.
Various tools already exist for creating data management plans:
Research Data Management Organizer (RDMO) JGU: https://rdmo.zdv.uni-mainz.de/
https://dmponline.dcc.ac.uk/Digital Curation Center (UK) (adapted to Horizon 2020)
https://dmptool.org/ University of California (USA) (templates)
To avoid data loss, the data should be stored on several storage media if possible. It is advisable to store data on the university server, as regular back-ups are performed for this server. Furthermore, attention should be paid to standardize names so that data records can be clearly identified even after a long period of time. In addition, the metadata associated with the data records should be recorded and also saved. Furthermore, if necessary (e.g. protection of personal data), access protection must be provided.
Metadata
Metadata from a library point of view means additional descriptive data regarding objects, such as a book or a journal. The metadata is used to describe resources in a uniform and structured manner. In this context, metadata is, for example, information about the author or the year of publication. If, for example, research data is to be placed in a repository, metadata is required for understanding the data set, for reusability and for the searchability of the repository. General standards such as Dublin Core and specialist standards such as ISO 19115 (geosciences) exist for the uniform collection of metadata.
Long-term archiving
The permanent storage of digital data is a great challenge. The data should not only be stored as a bitstream in the long term, but should also remain readable and traceable. Since software environments and storage media are constantly changing, it must be ensured that the software environments necessary for reading and acquiring the data can be simulated and that the bitstream is retained in its exact sequence and corresponds to the original bitstream. It is important to use non-proprietary and documented software formats. Successful long-term archiving can only be successful if it follows standardized processes.
JGU archive: https://researchdata.uni-mainz.de/wiki-archiving-research-data/
Data publication
Data repositories offer the possibility of publishing research (raw) data. In this way, research (raw) data can also be quoted and can also be used by other scientists. The JGU offers a repository for the publication and storage of research data.
re3data offers an overview of about 1500 different subject-specific data repositories.
As a general repository for the publication of articles and primary research data, "Zenodo", which is maintained by CERN, can be used. Data packets up to 50 gigabytes in size can be stored in Zenodo. In addition to the Open Data Option, it is also possible to store data for a certain period of time with access protection.
Persistent identifiers (PID)
PIDs are used to uniquely and permanently identify digital objects and to keep them locatable in this way. PIDs prevent the creation of dysfunctional links, for example when a publisher's web address changes. There are different types of PIDs, such as DOI, Handle, URN, etc.
Digital Object Identifier (DOI): http://www.doi.org/ The DOI is best known among PIDs and is also most widely used internationally. DOIs are free of charge for academic purposes.
Handle: http://www.handle.net/Handle is another internationally widely used PID. For example, the DOI system is based on Handle. Its use may require a small fee.
Uniform Resource Name (URN): http://www.dnb.de/DE/Netzpublikationen/URNService/urnservice_node.html In Germany, URNs are issued and administered free of charge via the German National Library. URNs are assigned for publications and are probably known to most of them from libraries. URNs are more common in Europe. Persistent Uniform Resource Locator
(PURL): https://archive.org/services/purl/ PURL are mainly used in North America (e.g. in libraries) and are similar to HTTP forwarding.
In connection with Open Data it is often demanded that data should be FAIR. I.e.:
Findable
Accessible
Interoperable
Reusable
Further information:
https://www.go-fair.org/fair-principles/
https://blogs.tib.eu/wp/tib/2017/09/12/the-fair-data-principles-for-research-data/
38 paragraph 4: The author of a scientific contribution which has arisen within the framework of at least half of a publicly funded research activity and which has appeared in a collection published at least twice a year, even if he has granted the publisher or editor an exclusive right of use, has the right to make the contribution publicly accessible after twelve months from its initial publication in the accepted manuscript version, insofar as this serves no commercial purpose. The source of the first publication must be indicated. An agreement deviating to the disadvantage of the author is ineffective. After one year it is therefore possible to publish journal articles, e.g. in the institutional repository of the JGU (https://openscience.ub.uni-mainz.de/). This is relevant, for example, for the fulfilment of the Open Access obligation in Horizon 2020 projects. The website SHERPA/RoMEO http://www.sherpa.ac.uk/romeo/index.php informs about copyright regulations and self-archiving guidelines of various scientific publishers and journals. CC-licences https://de.creativecommons.org/ are standardized license agreements with the help of which authors can release their digital objects for use in different degrees. The use of CC BY is common in the scientific field. CC 0: No rights reserved CC BY: Citation CC BY SA: Citation-Share Alike CC BY ND: Citation – no editing Information on open source software and a guide for JGU on licensing software under open source licenses can be found at http://www.uni-mainz.de/forschung/Dateien/Open_Source_Lizenzen.pdf.Urheberrechtsgesetz (Germany) - Open Access publication of journal articles
SHERPA/RoMEO
Creative Commons licences (CC-Lizenzen)
Examples:Open-source-software
When dealing with personal data, e.g. in the context of interviews/surveys, etc., the German Bundes- und Landesdatenschutzgesetz as well as the European General Data Protection Regulation must be taken into account. https://www.eugdpr.org/
If you have any questions about data protection, please contact the data protection officer responsible for you. For the JGU you will find these under https://organisation.uni-mainz.de/datenschutzbeauftragter/.
Lectures
Aktionstag Forschungsdaten June, 13th 2018
"Vorstellung des Kompetenzteams Forschungsdaten"; Dr. Anne Vieten
Von Forschungsdaten zu Projekten; Aline Deicke
"Wer sammelt Forschungsdaten? Das richtige Repositorium mit re3data finden"; Frank Tristram
Aktionstag Forschungsdaten June, 26th 2019
"NFDI Status Quo"; Prof. Dr. Kai-Christian Bruhn
"Wer braucht schon Datenmanagementpläne?"; Franziska Helbing
"Datenschutzrechtliche Fragen beim Umgang mit Forschungsdaten"; Dr. Sebastian Golla