Data Management

Documentation & Metadata

Documentation will assist in the organization, discovery, and ease of use of your research data in the future. Metadata is "data about data"--descriptive information about a particular data set, object, or resource, including how it is formatted, and when and by whom it was collected. By adding metadata completely and carefully, it will ensure that the data is accessible for any user, and it will be easier for other researchers to cite your information.

It is important to begin documenting information related to the project from the start. By documenting salient information early in the process, it will be much easier to ensure that all aspects of the project are relayed properly to your users. This will also aid in the long-term preservation of the data because it will show who was involved and what steps were taken to get to the results that you found.

A wide range of metadata standards exists for researchers to choose from to document their project. An example of a data standard is the DDI (Data Documentation Initiative), a standard designed for numeric data. For further help in documenting your data please contact our team.

Below are some of the general guidelines that should be documented regardless of the discipline or project type. This metadata should be stored with the project data at the very least in a readme.txt file. If the information is included in an article or presentation, then you can reference that item so that the information can be accessed there. When recording this information, think about how you would search for similar projects, and be sure to include that information so that other researchers in the field can easily access the materials.

Most essential fields to augment discovery

  • Title: Name of dataset or research project that produced it. (Include both if applicable.)
  • Creator(s): Names and addresses of the group that created the data. This could be an individual or organization.
  • Identifier: Unique identifier or number that is used to identify the data. This could be an internal project number or code to reference the data.
  • Abstract or Description: A brief synopsis of the project or data that another researcher can review quickly to see the relevance of the project to what they are seeking.
  • Dates: All the dates associated with the project. The most important is probably the release date of the data, but you'll eventually want to include start and end date of the project, time period covered by the data or project, maintenance cycle of the data, update schedule of the data, and any other important dates that will help document the process and aid in preservation.
  • Rights: Any known intellectual property rights held for the data or project.

Recommended fields

  • Contributor(s): Names and addresses of additional individuals that contributed to the project.
  • Subject: Keywords, phrases, or subject headings that will describe the subject or content of the data. In adding these, think of how you would search for the materials.
  • Funders: Organizations or agencies that funded the research or project.
  • Access Information: The location of the data and how the researcher can access the materials. Confidentiality can be addressed here as well.
  • Language: The language(s) of the content.
  • Location: If the data relates to a physical location, the spatial coverage should be documented.
  • Methodology: The process of how the data was generated, include the equipment software used including the version, the experimental protocol, data validation and quality assurance of the data, and other information that would be documented during the process.
  • Data Processing: Documenting the alterations made to the data will aid in preservation of the data and record who made changes and for what reasons at specific times.
  • Sources: Citations for the sources that were used during the project. Include where the other data or material was stored and how it was accessed when appropriate.
  • List of File Names: List all of the data files associated with the project and include the file extensions. (Example:
  • File Formats: Format(s) of the data and any software that is required to read the data including the version. (Example: TIFF, FITS, JPEG, HTML)
  • File Structure: Organization of the data file(s) and the layout of the variables when applicable.
  • Variable List: List of variables in the data files, when applicable.
  • Code Lists: Explanation of codes or abbreviations used in the file names, variables of the data, or the project over all that will help the user understand the project. (Example: 999 indicates a missing value in the data)
  • Versions: Date/time stamp for each file and use a separate identifier for each version.
  • Checksums: Used to test if your file has changed over time. This will aid in the long term preservation of the data and help make it secure by tracking alterations.
  • Related Materials: Links or location of materials that are related to the project. (Examples: Articles, presentations, papers)
  • Citation:The recommended way to cite the data or the information needed. Visting Citing Your Data for more information.