This document outlines some of the basic definitions of data as it is used within the academic environment. Research data come in many different shapes and sizes, but it covers “any information collected, stored, and processed to produce and validate original research results.”1
Working with data throughout a research project is incredibly challenging. Research Data Management becomes important as it is the process of organising and documenting of data processes (collection, description, de-identification, curation, archiving and publication) within a project. Professional data management practices can make research more coherent and shareable, or FAIR. FAIR stands for data that is Findable, Accessible, Interoperable and Reusable. Even if you can not make your data completely accessible, practising good research data management helps you make your research more efficient.
Creative work, undertaken on a systemic basis, in order to increase the stock of knowledge, including knowledge of man[sic], culture and society, and to devise new applications of available knowledge
2 The Organization for Economic Co-operation and Development-OECD (2015)
According to the Frascati Manual(2015) The Organization for Economic Co-operation and Development-OECD (2015) an R&D activity can be distinguished from a non-R&D activity if five core criteria are met; namely the activity must be:
According to The Organization for Economic Co-operation and Development-OECD (2015) “All five criteria must be met, at least in principle, every time an R&D activity is undertaken whether on a continuous or occasional basis.”
R&D specifically excludes educational, training, and administrative work undertaken as part of normal operational processes, as well as certain large-scale data gathering, analysis and/or processing activities (such as the national census, country-level topographical surveys, etc.) that can only be undertaken at the governmental level. National Intellectual Property Management Office-NIMPO (2012) guideline 1 document provides further information on Table 3 about what is excluded from the definition of R&D:
UCT’s current data definitions are shown below (Casrai (n.d.), Department of Science and Technology-DET (2012), University of South Carolina Libraries (n.d.)):
|Anonymity||N/A||A situation in which the identity of the research participants neither collected nor shared. I.e. where no-one, including the researcher, knows the identity of the research participants. NOT synonymous with confidentiality. Examples include anonymous surveys, tip-offs, etc.|
|Coded data||N/A||Data tagged or assigned with identifiers as the precursor to analysis.|
|Confidentiality||N/A||A situation in which the identity of the research participants is collected but not shared. Many kinds of data including personal interviews can be made confidential through removing disclosive data.|
|Confidential data||Disclosive data||Data that contains sensitive personal information that should not be shared. See direct identifier and Indirect identifier.|
|Data de-identification||Anonymisation, confidentiality||The process of removing information that could reveal research participants’ identities. Can include the removal of direct identifiers and indirect identifiers through omission, abstraction, redaction or perturbation.|
|Direct identifier||NA||Unit of information that can be used in isolation to identify an individual. E,g, ID number, name and surname, telephone number, email address.|
|Experimental data||Laboratory data||Data collected in an environment with high control over variables, such as chemical reactions.|
|Field data||Field studies||Data collected in an uncontrolled/in-situ setting, such as field notes, participant observation, etiology (observed animal behaviours), etc.|
|Indirect identifier||NA||Unit of information that can be used in conjunction with other units to identify an individual. E.g. position + date of study, first name + position + institution, subject specialisation + institution.|
|Metadata||Categories, keywords, descriptive information, study type||Data that provides information about another object or resource (which can itself be data). May include information about authorship, creation or modification (object logs), unique identification (DOIs), categorisation (keywords, subject categories), organisation (hierarchical information)|
|Microdata||Unit record data||The ”thing” of data - the data which informs analysis. E.g. interview transcripts, census records, astronomical data, video recordings of a theatrical performance, etc. Most commonly used for tabular data.|
|‘Open’ Data||Shared data, public data||Data that is published with few or no restrictions constraining its reuse. Typically shared under an Open Government licence, Creative Commons, or GNU Open licence.|
|Personal data||N/A||Data pertaining to an individual’s identity, activities or characteristics. See confidential data.|
|Primary data||Core data, main data||The data from which the core analysis for a research project is drawn.|
|Processed data||Cleaned data||Data which has undergone some process of clarification, enhancement, error-checking, removing outliers, conversion into different formats, etc. May or may not include disclosive information (see below).|
|Qualitative data||N/A||Data that is collected about the quality of an object, interaction, or process, and/or understanding a particular thought process or perception. Typically represented in language and not by numbers.|
|Quantitative data||N/A||Data that can be expressed numerically and/or granularly, or are analysed according to statistical models. Often expressed in tabular or similar formats, composed of lists of variables.|
|Quasi-statistics||N/A||Conducting or supplementing qualitative analysis with simple numerical analysis. E.g. “30% of the research participants referred to their working conditions negatively.”|
|Raw data||Original data||Data/information captured directly from the collecting instrument, before processing. Examples include interview audio recordings, laboratory machine readouts, field notes, etc.|
|Research data||Data||Facts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation. Data may be in any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms, or statistical records. NOT synonymous (but may in cases overlap) with Enterprise data. Also see Research & Development (R&D).|
|Secondary data||Ancillary data, supplementary data||Additional data collected that may or may not form part of the analysis.|
Casrai. n.d. “Casrai Standard Dictionary of Research Administration Information.” https://bit.ly/2PtoW1i.
Department of Science and Technology-DET. 2012. “Act No. 28 of 2013: Intellectual Property Laws Amendment Act 2013.” Government Gazette, 570. 2012. https://bit.ly/2RGTHBB.
National Intellectual Property Management Office-NIMPO. 2012. “Guideline 1 of 2012: Interpretation of the Scope of Intellectual Property Rights from Publicly-Financed Research and Development Act (Act 51 of 2008): Setting the Scene.” Pretoria: NIPMO. 2012. https://www.ru.ac.za/media/rhodesuniversity/content/research/documents/South_African_IPR-PFRD_Act,_2008_(Act_51_of_2008).pdf.
The Organization for Economic Co-operation and Development-OECD. 2015. “Frascati Manual 2015: Guidelines for Collecting and Reporting Data on Research and Experimental Development, the Measurement of Scientific, Technological and Innovation Activities.” Paris: OECD Publishing. 2015. https://bit.ly/2NBY9Oz.
University of South Carolina Libraries. n.d. “Glossary of Research Terms.” https://bit.ly/2tA8iEf.
LibGuides@ Macalester University. Available at: https://libguides.macalester.edu/c.php?g=527786&p=3608583↩︎
OECD (The Organisation for Economic Co-operation and Development). (2015). Frascati Manual 2015: Guidelines for Collecting and Reporting Data on Research and Experimental Development, the Measurement of Scientific, Technological and Innovation Activities. Parsi: OECD Publishing. Accessible: https://bit.ly/2NBY9Oz↩︎