About the Archive
In demonstration its commitment to achieving these standards, the Odum Institute Data Archive has earned the 2014-2017 Data Seal of Approval (DSA), which is awarded to repositories that are recognized among the archives community as a trustworthy source of data. The DSA is a peer-reviewed self-assessment that includes 16 requirements based on the following five criteria:
The data can be found on the Internet
The data are accessible
The data are in a usable format
The data are reliable
The data are identified in a unique and persistent way
View the Odum Data Archive’s DSA Assessment here.
Data Curation Workflow
The Data Curation Workflow begins when the data contributor delivers files to the Odum Institute Data Archive for long-term preservation and access. These files constitute the submission information package, or SIP. Along with the SIP, the data contributor also submits a completed and signed Data Deposit Form.
Triage involves decision-making processes on how data curation should proceed based on a thorough review of SIP contents and information provided on the Data Deposit Form.
REVIEW FILES. The Archive conducts a comprehensive review of SIP files to determine whether or not the data type and file format meet or have the ability to meet preservation and data quality standards and that accompanying documentation is sufficient to appropriately and accurately describe, interpret, and use the data. The file review also includes an evaluation of data file contents to identify potential confidentiality risks due to the presence of personally identifiable information, protected health information, or otherwise sensitive data. If the SIP fails to meet preservation and data quality standards, or if the data present confidentiality issues, the Archive will return the SIP to the data contributor with recommendations for resolution.
DEVELOP PLANS. Based on findings from the file review, the Archive develops a plan for file processing and archiving. This plan may include directives on file normalization, use of specified controlled vocabularies and standard metadata schemes, and implementation of file access restrictions. Specifics of the plan, which may be developed in collaboration with the data contributor, are informed by anticipated uses of the data by the designated user community.
Processing files involves several actions to ensure that data ingested into the archive adhere to archival standards and best practices for long-term preservation, access, and reuse. During processing, the Archive implements file version control procedures to maintain integrity of the SIP and AIP.
NORMALIZE FILES. For files that are not in Odum Institute Data Archive preferred file formats or other sustainable formats as defined by the most current Library of Congress Recommended Formats Statement, the Archive normalizes the files to the appropriate formats.
CLEAN DATA. To ensure the archive system displays the data in a manner that optimizes understanding and use of the data, the Archive performs data cleaning. Data cleaning may include editing data files to add complete variable labels and value labels in accordance with codebooks, standardizing missing and null values, resolving data inconsistencies and errors (e.g., out-of-range or wild codes), and applying anonymization strategies (e.g., redaction, statistical disclosure control) to confidential data.
BUILD DOCUMENT SET. The Archive assembles documents are requisite for interpretation and reuse of the data. These documents may include codebooks or data dictionaries, methodology reports, questionnaire instruments, articles, annotated analysis code, and other documents that provide additional context, rich description of the data, and appropriate uses of the data.
APPLY STANDARD VOCABULARY. Based on the type and disciplinary context of the data, the Archive selects and applies a standard vocabulary to be used when generating descriptive metadata. Because the Odum Institute Data Archive primarily houses social science data, the Archive has adopted standard vocabularies issued by the Data Documentation Initiative (DDI). For all other types of data, the Archive selects and applies a standard vocabulary appropriate for the disciplinary domain associated with the data.
The subsequent data curation workflow tasks are executed within the Dataverse archival platform system:
GENERATE METADATA. In the Dataverse system, the Archive inputs values for core metadata in a standardized form that aligns inputs with the DataCite Metadata Scheme. The DataCite Metadata Scheme includes elements that ensure full citation of the data for discovery and identification purposes. The Dataverse also provides additional metadata blocks to allow for input of rich, domain-specific data description using standards that include the DDI Metadata Specification, Virtual Observatory Discovery and Provenance Metadata, and ISA-TAB Specification.
ASSEMBLE AIP. The Archive assembles the data and documentation files and ingests them into archive system via the Dataverse interface. These files, along with the standardized metadata describing the data, comprise the archival information package, or AIP, which is stored for long-term preservation.
BRAND DATAVERSE. The Archive adds logos and/or text to the Dataverse dataset record to enable users to correctly identify the data by associating the data with a particular individual and/or organization.
SET ACCESS RESTRICTIONS. If a dataset requires access to be restricted for reasons that may include funding agency rules, embargo, licensing conditions, or confidentiality risks, the Archive will set access restrictions in the system that allow only authorized individuals to access the data.
REVIEW AND TEST. Prior to data publication, the Archive performs a final inspection of the AIP and executes Dataverse user functions (e.g., dataset search, file download, data analysis, access request) on the AIP to ensure the data renders and displays properly to authorized users and that system parameters are set correctly for the dataset according to the file processing and archiving plan.
PUBLISH DATA. Once the final review and test are completed, the Archive publishes the dataset record and allows for discovery and access of the data to authorized users. The data, metadata, and other supplementary information presented and delivered to the user is considered the dissemination information package, or DIP.
The technical infrastructure supporting the Odum Institute Data Archive was developed and is managed in accordance with the Reference Model for an Open Archival Information System (OAIS). Along with archival storage, the infrastructure provides operating system, network, and security services to enable implementation of archival functions and enforcement of policies.
The Odum Institute Data Archive was formally established in 1969 with grant funds awarded in 1967 by the National Science Foundation. The purpose of the grant was to establish an “academic center of excellence in science,” which would include computing facilities for a new Social Science Statistical Laboratory and Data Center. Data archiving was a role with which the Odum Institute had already been familiar when it became home to the Louis Harris Data Center in 1965 as part of an agreement with Louis Harris (a 1942 UNC alum) and the University of North Carolina at Chapel Hill.
Since then, the Odum Institute has expanded its catalog—considered one of the largest for machine-readable social science data in the United States—to include other significant social science data collections including the National Network of State Polls, the Carolina Poll, the Southern Focus Poll, and the most complete collection of 1970 U.S. Census datasets. These holdings reflect the distinguished history of the Odum Institute and its founder, Howard W. Odum, known for progressive research on populations in the American South. Continuing in this tradition, the Data Archive welcomes contributions of datasets that complement its collections of social science data related to the Southern United States as well as state-level polling data.
In addition to its role as steward of archival data collections, the Odum Institute Data Archive provides comprehensive data management and curation services. It offers an extensive range of data tools, resources, and training programs to support researchers throughout all phases of the research lifecycle from project planning to data archiving and sharing. The Odum Institute hosts the UNC Dataverse data repository, which allows researchers to self-archive and share their data as well as to discover and download files for almost 25,000 datasets. Professional data curators are available to assist researchers in their efforts to manage, archive, and share their data.