About the Archive


 
The Odum Institute Data Archive policies are informed by policies and guidelines developed collaboratively by members of the Data Preservation Alliance for the Social Sciences (Data-PASS). The Odum Institute is a founding member of Data-PASS, which is a voluntary partnership of institutions dedicated to social science data archiving, cataloging, and preservation. Odum Institute Data Archive policies establish procedural standards for the Data Archive and outline acceptable uses of Data Archive services.

Collection Development Policy

Digital Preservation Policy

UNC Dataverse Terms of Use

Data Security Guidelines

Metadata Guidelines

Data Deposit Form

The Odum Institute Data Archive systems and processes were developed with industry standards and best practices in mind. The Data Archive continually strives to meet and/or exceed these standards in order to earn and maintain its status as a trusted digital data repository.
In demonstration its commitment to achieving these standards, the Odum Institute Data Archive has earned the 2014-2017 Data Seal of Approval (DSA), which is awarded to repositories that are recognized among the archives community as a trustworthy source of data. The DSA is a peer-reviewed self-assessment that includes 16 requirements based on the following five criteria:

  1. The data can be found on the Internet

  2. The data are accessible

  3. The data are in a usable format

  4. The data are reliable

  5. The data are identified in a unique and persistent way

DSA_logoView the Odum Data Archive’s DSA Assessment here.

Data Curation Workflow

The Odum Institute Data Archive data curation workflow consists of four primary stages: deposit, triage, processing, and access. During each stage, Data Archive staff executes tasks in accordance with archival standards and best practices to ensure data are properly processed and preserved for long-term discovery, access, and reuse. The data curation workflow is in part an implementation of the Odum Institute Data Archive Collection Development Policy, Digital Preservation Policy, and UNC Dataverse Terms of Use.

datacurationworkflow

 

I. Deposit

The Data Curation Workflow begins when the data contributor delivers files to the Odum Institute Data Archive for long-term preservation and access. These files constitute the submission information package, or SIP. Along with the SIP, the data contributor also submits a completed and signed Data Deposit Form.

II. Triage

Triage involves decision-making processes on how data curation should proceed based on a thorough review of SIP contents and information provided on the Data Deposit Form.

REVIEW FILES. The Archive conducts a comprehensive review of SIP files to determine whether or not the data type and file format meet or have the ability to meet preservation and data quality standards and that accompanying documentation is sufficient to appropriately and accurately describe, interpret, and use the data. The file review also includes an evaluation of data file contents to identify potential confidentiality risks due to the presence of personally identifiable information, protected health information, or otherwise sensitive data. If the SIP fails to meet preservation and data quality standards, or if the data present confidentiality issues, the Archive will return the SIP to the data contributor with recommendations for resolution.

DEVELOP PLANS. Based on findings from the file review, the Archive develops a plan for file processing and archiving. This plan may include directives on file normalization, use of specified controlled vocabularies and standard metadata schemes, and implementation of file access restrictions. Specifics of the plan, which may be developed in collaboration with the data contributor, are informed by anticipated uses of the data by the designated user community.

III. Processing

Processing files involves several actions to ensure that data ingested into the archive adhere to archival standards and best practices for long-term preservation, access, and reuse. During processing, the Archive implements file version control procedures to maintain integrity of the SIP and AIP.

NORMALIZE FILES. For files that are not in Odum Institute Data Archive preferred file formats or other sustainable formats as defined by the most current Library of Congress Recommended Formats Statement, the Archive normalizes the files to the appropriate formats.

CLEAN DATA. To ensure the archive system displays the data in a manner that optimizes understanding and use of the data, the Archive performs data cleaning. Data cleaning may include editing data files to add complete variable labels and value labels in accordance with codebooks, standardizing missing and null values, resolving data inconsistencies and errors (e.g., out-of-range or wild codes), and applying anonymization strategies (e.g., redaction, statistical disclosure control) to confidential data.

BUILD DOCUMENT SET. The Archive assembles documents are requisite for interpretation and reuse of the data. These documents may include codebooks or data dictionaries, methodology reports, questionnaire instruments, articles, annotated analysis code, and other documents that provide additional context, rich description of the data, and appropriate uses of the data.

APPLY STANDARD VOCABULARY. Based on the type and disciplinary context of the data, the Archive selects and applies a standard vocabulary to be used when generating descriptive metadata. Because the Odum Institute Data Archive primarily houses social science data, the Archive has adopted standard vocabularies issued by the Data Documentation Initiative (DDI). For all other types of data, the Archive selects and applies a standard vocabulary appropriate for the disciplinary domain associated with the data.

The subsequent data curation workflow tasks are executed within the Dataverse archival platform system:

GENERATE METADATA. In the Dataverse system, the Archive inputs values for core metadata in a standardized form that aligns inputs with the DataCite Metadata Scheme. The DataCite Metadata Scheme includes elements that ensure full citation of the data for discovery and identification purposes. The Dataverse also provides additional metadata blocks to allow for input of rich, domain-specific data description using standards that include the DDI Metadata Specification, Virtual Observatory Discovery and Provenance Metadata, and ISA-TAB Specification.

ASSEMBLE AIP. The Archive assembles the data and documentation files and ingests them into archive system via the Dataverse interface. These files, along with the standardized metadata describing the data, comprise the archival information package, or AIP, which is stored for long-term preservation.

IV. Access

To facilitate access to the data, the Archive performs final tasks to ensure that users are able to discover and request access to the data that have successfully undergone all preceding data curation workflow tasks, and that the system provides the level of access appropriate to the terms of use of the dataset.

BRAND DATAVERSE. The Archive adds logos and/or text to the Dataverse dataset record to enable users to correctly identify the data by associating the data with a particular individual and/or organization.

SET TERMS OF USE. However, if required by the data contributor (or other stakeholder that holds authority over the allowable uses of the data), the Archive will include specified terms of use language to be displayed to users. These terms of use may define or set limitations on allowable uses of the data. If terms of use are not specified, the default terms of use for datasets in the Dataverse is a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license, which waives all rights to the work under copyright law.

SET ACCESS RESTRICTIONS. If a dataset requires access to be restricted for reasons that may include funding agency rules, embargo, licensing conditions, or confidentiality risks, the Archive will set access restrictions in the system that allow only authorized individuals to access the data.

REVIEW AND TEST. Prior to data publication, the Archive performs a final inspection of the AIP and executes Dataverse user functions (e.g., dataset search, file download, data analysis, access request) on the AIP to ensure the data renders and displays properly to authorized users and that system parameters are set correctly for the dataset according to the file processing and archiving plan.

PUBLISH DATA. Once the final review and test are completed, the Archive publishes the dataset record and allows for discovery and access of the data to authorized users. The data, metadata, and other supplementary information presented and delivered to the user is considered the dissemination information package, or DIP.

 

Archival Infrastructure

The technical infrastructure supporting the Odum Institute Data Archive was developed and is managed in accordance with the Reference Model for an Open Archival Information System (OAIS). Along with archival storage, the infrastructure provides operating system, network, and security services to enable implementation of archival functions and enforcement of policies.


 

odumfamily

The Odum Institute Data Archive was formally established in 1969 with grant funds awarded in 1967 by the National Science Foundation. The purpose of the grant was to establish an “academic center of excellence in science,” which would include computing facilities for a new Social Science Statistical Laboratory and Data Center. Data archiving was a role with which the Odum Institute had already been familiar when it became home to the Louis Harris Data Center in 1965 as part of an agreement with Louis Harris (a 1942 UNC alum) and the University of North Carolina at Chapel Hill.


 


Since then, the Odum Institute has expanded its catalog—considered one of the largest for machine-readable social science data in the United States—to include other significant social science data collections including the National Network of State Polls, the Carolina Poll, the Southern Focus Poll, and the most complete collection of 1970 U.S. Census datasets. These holdings reflect the distinguished history of the Odum Institute and its founder, Howard W. Odum, known for progressive research on populations in the American South. Continuing in this tradition, the Data Archive welcomes contributions of datasets that complement its collections of social science data related to the Southern United States as well as state-level polling data.


 


odumcolleagueIn addition to its role as steward of archival data collections, the Odum Institute Data Archive provides comprehensive data management and curation services. It offers an extensive range of data tools, resources, and training programs to support researchers throughout all phases of the research lifecycle from project planning to data archiving and sharing. The Odum Institute hosts the UNC Dataverse data repository, which allows researchers to self-archive and share their data as well as to discover and download files for almost 25,000 datasets. Professional data curators are available to assist researchers in their efforts to manage, archive, and share their data.