You are here

Harvard Forest Data Archive

Printer-friendly version

Guidelines for Submission of Data & Metadata

Researchers are asked to comply with current HF Data Policies for all projects based at the Forest. Data and metadata may be submitted via email to the Information Manager. Special arrangements should be made for data files too large to send as email attachments.

HF information management staff are available to work with contributors to resolve any data or metadata issues. Once a new dataset is posted in the HF Data Archive, the contributor is notified and given an opportunity to request changes or corrections.  The dataset is then submitted to the EDI (Ecological Data Initiative) repository, which retains all submitted versions and assigns a digital object identifier (DOI) to each version.

Datasets

The HF Data Archive is organized by dataset. Each dataset contains one or more data files with accompanying metadata. Datasets vary widely in size and scope. Data files in a single dataset typically share common objectives, methods, study sites, funding, and personnel.

Each dataset is assigned a unique ID number in the HF Data Archive (e.g., HF001). This number serves to identify the dataset for information management purposes and has no ordinal significance.

Metadata

Metadata provide the information required to locate, access, interpret, and use data correctly.

For new projects, please download, complete, and submit the HF Metadata Form. This form may also be used to submit metadata updates (if any) for ongoing projects. Variable-level metadata for individual data files may be included in this form or optionally submitted as separate files.

Keywords should be selected from the LTER Controlled Vocabulary and HF Controlled Vocabulary. See Browse by Keyword for a complete list of all keywords currently in use.

Once received, metadata will be converted by HF information management staff to EML (Ecological Metadata Language).

Data

Each dataset should contain the data necessary to support the published findings and to recreate the original analysis.

Tabular data. The preferred formats for submission are Excel spreadsheets or comma-delimited text files. Please do not use spaces or tabs as delimiters. The first line of each file should contain concise variable names in R-compatible format (preferably lower case, separated by underscores). Missing values should be indicated by NA. Please avoid using empty fields. Codes must be clearly explained in the accompanying metadata. Numerical values with long decimal fractions may be rounded to be consistent with measurement accuracy.

Spatial data.  For GIS data, please specify the map projection and units. For remote sensing data, please specify the resolution and bounding coordinates.

Other data types.  Files of almost any type can be archived.  Whenever possible, please use open file formats and avoid proprietary formats.

Large data files.  Files larger than 1 Gb may be stored in EDI only, with links from the HF Data Archive.

Data in other repositories.  For data files archived in other repositories (e.g., genetic sequence data in GenBank), please provide an inventory file (as an Excel spreadsheet or comma-delimited text file) that provides information for each data file (one row per file), including identifier in the other repository, DOI, title or short description, file name, file format, file size, and repository URL. The inventory file (but not the original data file or files) will be archived in the HF Data Archive and EDI.