Skip to main content

ODUM INSTITUTE DATA ARCHIVE

Providing trusted long-term preservation and stewardship of research data assets to broaden scientific inquiry, promote research reproducibility, and foster data fluency now and into the future.

COVID-19 Updates

Please note that Odum Institute Data Archive physical offices will be closed beginning Monday, March 16, with services available online as we work remotely to do our part to help reduce the transmission of the COVID-19 virus. We will continue to remain responsive to email inquiries and are happy to schedule virtual meetings via telephone or Zoom upon request.

Please be assured that Odum Institute Data Archive staff remain committed to providing our regular data management and curation services so that researchers, students, and staff can access assistance with writing and implementing data management plans, depositing data into the UNC Dataverse (https://dataverse.unc.edu/) or other trustworthy repository, finding high-quality research data for secondary analysis, and other data-related activities to support your research projects.

We apologize for the convenience, and hope that you and your families stay safe and healthy during this time.
 


About the Archive


The Odum Institute Data Archive is a leader in research data stewardship, with over 50 years of experience beginning with the acquisition of the Louis Harris Data Center in 1965. Our longstanding commitment to data access and research transparency has been a driving force behind ongoing efforts to enhance our infrastructure, workflows, and policies to ensure that the data assets in our care remain FAIRfindable, accessible, interoperable, and reusable–now and into the future.

The Odum Institute Data Archive is home to one of the largest catalogs of social science research data in the U.S. that includes the Harris Polls, North Carolina Vital Statistics, and the most complete collection of 1970s U.S. Census data. In addition, we manage and provide access to the UNC Dataverse repository to enable individual scientists, research teams, scholarly journals, and other members of the research community to archive and publish their own datasets.

The Odum Institute Data Archive recognizes that scientific reproducibility is a primary concern among members of the research community and its stakeholders. Because good data management and curation is a precondition for scientific reproducibility, we offer services for data management plan development and implementation, finding & accessing data, data management training & education, and data curation for reproducibility training.

Contact us at odumarchive@unc.edu to learn more about how we can work together to make your data FAIR!
 

Odum Institute Data Archive policies and guidelines establish procedural standards for the Data Archive and outline acceptable uses of Data Archive services. Odum Institute Data Archive policies are informed by policies and guidelines developed collaboratively by the members of the Data Preservation Alliance for the Social Sciences (Data-PASS), which is a voluntary partnership of institutions dedicated to data archiving, cataloging, and preservation.

 
The Odum Institute Data Archive systems and processes were developed with industry standards and best practices for digital data repositories in mind. The Data Archive continually strives to meet and/or exceed these standards in order to earn and maintain trustworthy data repository status.

Back to top

Archive Experts



A headshot of Thu-Mai Christian.

Thu-Mai Christian

Assistant Director for Archives


odumarchive@unc.edu


A headshot of Mandy Gooch.

Mandy Gooch

Research Data Archivist


odumarchive@unc.edu


A headshot of Cheryl Thompson.

Cheryl Thompson

Research Data Archivist


odumarchive@unc.edu

Back to top


Data Management


Data management refers to the activities that support long-term preservation, access, and use of data. This includes, but is not limited to:

  • Planning for data management
  • Describing, formatting, and storing data
  • Curating, archiving, and sharing data
  • Using reproducible research best practices

 
The goal of data management is to ensure that your data are discoverable, interpretable, and re-usable by future researchers. In addition, data management and reproducible research practices help sustain the value of your data and allow others to verify and build upon published results.

The Odum Institute Data Archive offers consultations and support services on all aspects of research data management. Consultations are offered at no charge to UNC students, faculty, and staff. We offer a review of data management plans, assistance with data archiving solutions, training on reproducible research and data management best practices, and other guidance on planning for and implementing best practices for data management across the research lifecycle for projects and research groups.

ARCHIVE SERVICES

The Odum Institute Data Archive provides the following services to assist UNC faculty, students and staff in their courses and research.

For researchers or research projects applying for grant funding and in need of Odum Data Archive support throughout the life of the project, please contact odumarchive@unc.edu for a consultation to discuss collaboration and inclusion in your grant proposal.

Odum archivists can provide assistance and support in developing both a data management policy as well as a data management workflow for research groups and projects.

We offer data verification implementation services for journals with data verification policies, or those journals who may be interested in increasing the requirements of their current data policies by requiring verification of manuscript results before manuscript publication. Data verification services include data curation, data archiving, and data verification workflows. Quotes are available by request. Please contact odumarchive@unc.edu for more information.

Workshops, training sessions, and in-class lectures are provided on a variety of topics such as finding & accessing data, data management best practices, reproducible research best practices, and archiving and sharing your data with UNC Dataverse. We can customize our training and education sessions to specific audiences and/or needs with sessions ranging from 20 minutes to half-day workshops.

The Odum Institute Data Archive is home to one of the largest catalogs of social science research data in the U.S. that includes the Harris Polls, North Carolina Vital Statistics, the 1970s U.S. Census data, as well as a variety of state and public opinion polls. We also accept data donations to the Odum Institute Data Archive and ask that you please review our Collection Development Policy to see if your social science dataset(s) are within our mission and scope. If you have any questions, please feel free to contact us for more information.

We can assist you with data management policy compliance to ensure you are meeting the requirements established by funding agencies. Our services range from data management training to archiving and sharing your data via a trustworthy repository. Let us help you navigate the data management landscape so you can focus more on your research and stress less about your data management plan.

If you are having difficulty locating data relevant to your research needs and interests, please contact us for assistance. It is recommended you provide us with as much detail as possible about the types of data, geography, demographics, sample sizes, and any other pertinent variables related to your research needs. This will assist us in finding the most appropriate resources for you to explore.

For researchers collecting qualitative data, we provide qualitative data management support, training, and institutional access to the Qualitative Data Repository. Our qualitative data expertise ensures your data are managed, curated, shared, and preserved for the long-term to increase access and re-use of your research.

 

Back to top

Data Repository: UNC Dataverse


The Odum Institute hosts and manages UNC Dataverse. As a trusted repository for research data, the Odum Institute Data Archive makes UNC Dataverse available to researchers for long-term data archiving and data sharing. Researchers can also find valuable datasets that others have made publicly available, including those donated to the Odum Institute’s collection of historical social science data.

WHY UNC DATAVERSE?

The Dataverse, which was developed by the Institute of Quantitative Social Science (IQSS) at Harvard University, is an open-source repository software application for archiving, sharing, and accessing research data. The Dataverse system by design supports the FAIR Principles for scientific data management and stewardship by ensuring data are findable, accessible, interoperable, and reusable. As such, Dataverse offers several value-added features to enhance data discovery, access, and use including those listed below.

  • Automatic generation of data citation with DOI
  • Standardized descriptive metadata
  • Customizable terms of use
  • Support for restricted file access
  • User activity tracking and download metrics
  • Faceted browse and advanced search
  • Dataset version tracking
  • Data exploration tools
  • Rich data support for tabular data
  • File format normalization for preservation
  • APIs for tool interoperability and integration
 

Because of these value-added features and its built-in data preservation and access functionality, Dataverse has been adopted by individuals, teams, and organizations around the world. Representing countries across 6 continents, they are Dataverse users, developers, service providers, and supporters who are part of a growing community dedicated to the continual enhancement and sustainability of the Dataverse application.

The Odum Institute makes UNC Dataverse available to the research community at no cost. This user-friendly repository platform enables you to archive and share your data. Please see Deposit Your Data in UNC Dataverse to get started.

If you would prefer the guidance of Odum Institute Data Archive staff to help ensure compliance with archival standards and best practices, the Archive offers Guided Service. This mid-level tier minimizes the initial set-up of the Dataverse and allows you to perform the less complicated tasks of the data archiving process. Our knowledgeable staff will work with you to determine the most appropriate organization of data files, descriptive metadata, and file access levels. At your request, we will also provide initial training on the data deposit process.

The Lifecycle Service level provides comprehensive data curation performed by Data Archive staff dedicated to ensuring that your data management needs are being met throughout all stages of the research lifecycle from project planning to data archiving.

The Odum Institute Data Archive offers additional services for data that require special provisions for storage, management and/or publication. For individuals and organizations working to establish data management best practices, the Archive also offers training and policy development.

Examples of specialized services include:

  • De-identification of datasets containing personally identifiable information (PII) and/or protected health information (PHI)
  • Organizational data management policy development and implementation
  • Data management and curation training
  • Data curation and verification for enforcement of journal-based data policies
  •  

    ServiceSelf-Service
    $0
    Guided Service
    $3,000 + per dataset fee*
    Lifecycle Service
    $5,000 + per dataset fee*
    UNC Dataverse tool accessXXX
    Data citation generationXXX
    Persistent identification (DOI)XXX
    Basic utilization reportingXXX
    Long-term preservationXXX
    Standardized metadataXXX
    User supportlimitedstandarddedicated
    Introductory Dataverse software trainingXX
    Dataset collection arrangementXX
    Metadata template developmentXX
    Data Management Plan implementationX
    File format normalizationX
    Data file reviewX
    Access policy enforcementX
    Education and training program developmentX

     

    The Odum Institute Data Archive is home to one of the largest catalogs of social science research data in the U.S. that includes the Harris Polls, North Carolina Vital Statistics, and the most complete collection of 1970s U.S. Census data. We actively seek donations of data that complement the scope of our collection, in particular those datasets that focus on topics related to the Southern region of the United States and state-level public opinion polls. The Data Archive also prioritizes data considered to be at risk of being lost.

    To demonstrate its commitment to achieving standards for trusted digital data repositories, the Odum Institute Data Archive has earned the 2014-2017 Data Seal of Approval (DSA). The DSA is awarded to repositories that have been recognized among the archives community as a trusted source of data based on evidence that organizational infrastructure, digital object management, and technology support.

    • Data Curation Workflows
    • Archive Infrastructure

    The Odum Institute Data Archive encourages researchers to deposit their research data in a professional data repository. There is no cost to researchers to self-deposit their data in UNC Dataverse, which has no filetype or disciplinary domain restrictions. If you would prefer that the Odum Institute handle data file preparation and deposit, please see the description of our Data Management Services.

    Please review the information below and Archive Policies as you prepare data files for archiving and sharing to help ensure that your data meet standards for long-term data preservation, access, and reuse.

    Data Documentation
    Just as important as data files, contextual information is essential for appropriate interpretation and use of data. Documentation such as README files, methodology reports, codebooks/data dictionaries, and instruments must be deposited alongside dataset files. Data depositors should consider what types of information would be required for others to understand the content of data files as well as critical information about the creation, manipulation, and structure of the data.

    In addition to documentation, it is important that datasets be described using machine-readable standardized metadata in order to facilitate and enhance findability and interoperability. UNC Dataverse requires a minimum set of descriptive metadata based on the DataCite Metadata Schema standard, which facilitates accurate identification of the data for discovery and citation purposes.

    Additional domain-specific metadata blocks are available to allow for rich description to enhance independent and informed use of the data.

    Data depositors are strongly encouraged to provide as much metadata as possible when depositing datasets to UNC Dataverse. Refer to Archive Policies for more information on metadata standards and requirements for datasets housed in UNC Dataverse.

    Recommended File Formats
    The file formats used by researchers are often informed by individual research practices and domain-specific standards. However, to avoid risks to long-term data preservation, access, and use that can arise from software obsolescence, the Odum Institute Data Archive recommends that data files be submitted in formats that are widely adopted, non-proprietary, free of external software dependencies, and well-documented.

    DATASETS

    UNC Dataverse automatically generates tab-delimited preservation copies of data files for specified file formats. This optimizes UNC Dataverse preservation capabilities and allows users to explore tabular data using the UNC Dataverse interface and download data in multiple file formats. Therefore, we prefer the following formats for data files.

    IBM SPSS.por OR .savVersions 7 to 22
    Stata.dtaVersions 4 to 13
    R.RDataVersions 1 to 3
    Microsoft Excel.xlsx.xls is not supported
    Comma-separated values.csvLimited support
    Geospatial data.shp AND .shx AND .dbf AND .prjAll four files are the minimum required to enable visualization

    DOCUMENTS

    To ensure that supporting documentation necessary for others to appropriately interpret and use the data are rendered properly, we prefer the following file formats for document files such as README files, codebooks/data dictionaries, instruments, and methodology reports.

    Text.txt
    Adobe Portable Document Format.pdf/ua OR .pdf/a OR .pdf

    OTHER FILE TYPES

    While UNC Dataverse accepts all file formats and preserves them at the bit level, data depositors should still consider the long-term sustainability of every file format. Software dependencies, proprietary status, limited adoption, lack of public documentation, and other factors can negatively impact the ability to render files in the future. Refer to the Library of Congress Recommended Formats Statement when selecting file formats not listed above for UNC Dataverse deposit, or contact us for guidance on identifying a suitable file format or converting files to a preferred format for long-term preservation.

    Confidential and Sensitive Data
    Odum Institute Data Archive systems and workflows are designed to uphold all applicable laws and regulations governing the protection of human subjects. Standards and procedures for handling and storage of sensitive and confidential data are outlined in our Archive Policies.

    Please note that UNC Dataverse is NOT a suitable repository for storing data containing personally identifiable information (PII) and/or protected health information (PHI). The UNC Dataverse Terms of Use stipulate that prior to deposit into UNC Dataverse, all data must be de-identified in such a way that study participants represented in the data cannot be individually identified. For information on available resources for archiving confidential data, please contact odumarchive@unc.edu.

    In the case of datasets that have been de-identified but that still require access restrictions, depositors can limit dataset and/or individual file access to authorized users using UNC Dataverse permissions functionality. For instructions on how to set restrictions and grant permissions for datasets in UNC Dataverse, refer to the Dataverse User Guide.

    UNC Dataverse makes sharing data easy by providing a user-friendly interface for uploading dataset files, generating standardized machine-readable metadata, and tracking dataset versions. You can deposit your data in UNC Dataverse in 5 simple steps.

    For additional details on UNC Dataverse functionality or for technical assistance, refer to the Dataverse User Guide or contact odumarchive@unc.edu.

    Back to top


    Data Archive Research Agenda


    In addition to offering research support services, the Odum Institute Data Archive is also involved in their own research through grant-funded proposals and collaborations with partner institutions and colleagues. Our research keeps us abreast of the needs of our users, engaged with our own communities, and maintains and improves our knowledge and skills in data management, reproducible research, and data repository development. Our projects build upon current and emerging research and give back to the users and communities we support and are involved with every day.
     

    The American Journal of Political Science and State Politics and Policy Quarterly, among others, have adopted a pre-publication verification policy where authors must submit their research materials to confirm the analytic results. While archives and publishers are adopting audit workflows to support these verification policies, many opponents express concerns about the additional effort, time, and specialized expertise being placed on authors. We are exploring the challenges that researchers face in complying with these policies and a novel analysis of factors impacting the time spent in verification.

    The Odum Institute Data Archive has partnered with the American Journal of Political Science and State Politics & Policy Quarterly to implement and enforce their data policies, which require authors to submit their data and code to a designated repository for independent verification of computational reproducibility of reported findings prior to final article acceptance and publication.

    The Sloan Foundation-funded Confirmable Reproducible Research (CoRe2) environment supports and promotes computational reproducibility by linking open science tools to streamline manuscript publication and data curation + verification workflows.

    The Odum Institute Data Archive is a founding member of the Curating for Reproducibility (CURE) Consortium, which is sponsored in part by the Institution of Museum and Library Services. CURE supports curation of research data and review of code and associated digital scholarly objects for the purpose of facilitating the digital preservation of the evidence base necessary for future understanding, evaluation, and reproducibility of scientific claims.

    With generous support from the Robert Wood Johnson Foundation, this project sought to identify the most effective and efficient methods for implementing journal-based data policies that incorporate review and verification.

    The CRADLE project developed a suite of face-to-face (workshops and one-on-one guides) and online (MOOC) courses which focus on the knowledge, skills, and competencies required of library and archives leaders to effectively respond to the growing and more complex data management needs of their institutions.

    This working group will be a focal point within RDA for working on guidelines and standards for curating for reproducible and FAIR data and code and engaging the RDA community on the issue.

    Back to top

    Resources


    Planning for research data management is not only good practice, but also a requirement for much of funded research. The following are lists of selected resources that may be of interest as you plan for and engage in data management and sharing activities.

    Research data management helps to ensure that others–or your future self–will be able to interpret and use your data for verification, extension, and teaching. To be effective, however, data management tasks should be performed during all stages of the research lifecycle. You can find research data management training and resources at the links below.

    • DataONE Data Management Modules: The Data Observation Network for Earth (DataONE) offers a series of education modules on essential data management topics including data management planning, quality control, metadata, and data protection.
    • Guide to Social Science Data Preparation and Archiving: This guide published by the Inter-university Consortium for Social and Political Research (ICPSR) provides in-depth information on data management tasks required for eventual data archiving and sharing.
    • MANTRA Research Data Management and Training: A self-paced online course for students, research faculty, information professionals, and others responsible for managing research data.
    • Project TIER: Project TIER (Teaching Integrity in Empirical Research) offers tools, methods, and instruction on transparent research practices, including the TIER Protocol. The TIER protocol is a framework for documenting and organizing research artifacts to support reproducibility.
    • Research Data Management and Sharing: This 5-week Coursera massive open online course (MOOC) provides an introduction to research data management and sharing concepts and strategies.

    Many government and private funding agencies have issued policies requiring researchers to submit a data management plan (DMP) with their proposal packages. The resources below can help you understand funders’ DMP requirements and write a DMP that meets those requirements.

    • DMPTool: The Odum Institute Data Archive sponsors the DMPTool, which is a free online “wizard” tool for creating data management plans that align with funding agency policy requirements.
    • Data Management Plan Examples: The example data management plans below were written in accordance with NSF policy requirements. Researchers writing data management plans for other funding agencies should find the language used in these examples still useful.
    • Data Management Plan Checklist: The Digital Curation Centre has developed a comprehensive data management plan checklist based on funding agency requirements.

     
    Funding Agency Data Policies:
    Federal agency policies can be found in the Data Sharing Requirements by Federal Agency database hosted by the Scholarly Publishing and Academic Resources Coalition (SPARC).Below are links to data policies issued by top federal agency funders of UNC research.

    Attention: The internal data of table “20” is corrupted!

    Several tools are available to assist with data management. Below is a list of selected tools that support important aspects of data management at various points throughout the research lifecycle.

    • Carolina Data Acquisition and Reporting Tool (CDART): Developed by the UNC Collaborative Studies Coordinating Center and NC TraCS, CDART is a clinical research data management tool for project planning and design, data capture, quality assurance, analysis, and reporting.
    • Code Ocean: Code Ocean is an online tool for collaborative research that enables analysis code scripting, execution, and sharing in the cloud.
    • DMPTool: The Odum Institute Data Archive sponsors the DMPTool, which is a free online “wizard” tool for creating data management plans that align with funding agency policy requirements.
    • Open Science Framework (OSF): The Open Science Framework is a free online tool that supports collaborative research project management by providing a platform for sharing materials and data within a research team or among distributed teams.
    • Project TIER: Project TIER (Teaching Integrity in Empirical Research) offers tools, methods, and instruction on transparent research practices, including the TIER Protocol. The TIER protocol is a framework for documenting and organizing research artifacts to support reproducibility.
    • Qualitative Data Repository: UNC is an institutional member of QDR, which provides qualitative data curation and repository services along with resources on managing qualitative data. UNC affiliates who would like to submit their qualitative data to QDR for curation and publication should contact odumarchive@unc.edu.
    • The UNC Information Security Office provides information on requirements for protecting sensitive and proprietary data in accordance with ITS policies.
    • UNC Dataverse is an open-source web-based repository platform for archiving, sharing, and accessing research data. See Data Repository: UNC Dataverse for more information.
    • Carolina Digital Repository (CDR) is a digital archive for scholarly materials produced by members of the University of North Carolina at Chapel Hill community. The main goal of the CDR is to keep UNC digital scholarly output safe, accessible and discoverable for as long as needed.
    • REDCap is a secure web platform for building and managing online databases and surveys. REDCap’s streamlined process for rapidly creating and designing projects offers a vast array of tools that can be tailored to virtually any data collection strategy.

    Below is a list of several sources for downloadable datasets that can be used for secondary analysis, instruction, and other research purposes.

    • UNC Dataverse: UNC Dataverse provides access to almost 25,000 research datasets including the Louis Harris polls, North Carolina Vital Statistics, and the most complete collection of the 1970 United States Census.
    • Inter-university Consortium for Political and Social Research (ICPSR): UNC is a member of ICPSR, which maintains an extensive archive of data related to education, aging, criminal justice, substance abuse and other fields in the social sciences.
    • Roper Center: The Odum Institute sponsors UNC-affiliate access to Roper Center resources. The Roper Center offers both U.S. and international public opinion polling data dating back to the 1930s.

     

    Additional Data Sources: Below are links and information about other useful sources of research data.

    AddHealthhttps://www.cpc.unc.edu/projects/addhealthAddHealth is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States starting in 1994-95 and following their lives through adulthood. Follow-up interviews were also conducted from 2016-2018 to collect data as the cohort enters their fourth decade of life.
    American Community Survey (ACS)https://www.census.gov/programs-surveys/acs/The ACS is an annual survey that collects individual-level demographic, housing, social, and economic data.
    American Housing Survey (AHS)https://www.census.gov/programs-surveys/ahs/data.htmlThe AHS is a biennial survey that collects housing unit data that includes information on size and composition of housing, vacancies, characteristics of occupants, housing costs, etc.
    American National Election Studies (ANES)http://www.electionstudies.org/studypages/download/datacenter_all_NoData.phpThe ANES has collected individual-level data on voting, public opinion, and political participation for most general and midterm election years.
    Association of Religion Data Archives (ARDA)http://www.thearda.com/Archive/browse.aspARDA provides access to religion-related data that include both U.S. and international-based datasets.
    Behavioral Risk Factor Surveillance System (BRFSS)https://www.cdc.gov/brfss/annual_data/annual_data.htmThe BRFSS collects annual individual-level survey data related to health-related risk behaviors, chronic health conditions, and use of preventive services.
    Bureau of Economic Analysis (BEA)https://www.bea.gov/itable/The BEA maintains national, industry, international, regional economic data that includes GDP, personal income, industry input-output, and international transactions and direct investments.
    Bureau of Labor Statisticshttps://www.bls.gov/dataThe Bureau of Labor Statistics provides access to data on inflation and prices, employment, pay and benefits, spending, occupations, and other economics-related topics.
    Data.govhttp://Data.govData.gov is a database portal with links to access datasets produced by federal, state, and city governments, as well as other institutions in the United States.
    Data.gov.ukhttp://Data.gov.ukData.gov.uk is a database portal with links to access data produced by government agencies, public bodies, and local authorities in the U.K.
    Demographic & Health Surveys (DHS) Programhttp://dhsprogram.com/data/The DHS program has collected nationally representative survey data related to population, health, HIV, and nutrition from over 90 countries.
    Eurostathttp://ec.europa.eu/eurostat/data/databaseEurostat provides access to economic, demographic, and environmental data about European countries and regions.
    FRED Economic Datahttps://fred.stlouisfed.org/FRED provides access to a wide variety of economic data including banking, labor markets, national accounts, and prices data, both U.S.- and international-based.
    General Social Survey (GSS)http://gss.norc.org/Get-The-DataThe GSS collects individual-level survey data from Americans to study trends in attitudes and behaviors towards a variety of topics including crime, civil liberties, morality, and well-being.
    Global Health Observatory (GHO) Datahttp://apps.who.int/gho/data/node.homeThe GHO provides access to World Health Organization data for its 194 member states. Data include over 1,000 health-related indicators.
    HealthData.govhttps://www.healthdata.gov/HealthData.gov is an access portal to health-related data produced by government agencies, public bodies, and local authorities in the U.S.
    Inter-university Consortium for Political and Social Research (ICPSR)https://www.icpsr.umich.edu/icpsrweb/ICPSR maintains a data archive of social and behavioral sciences datasets. It also hosts several thematic collections of data related to education, criminal justice, arts and culture, and aging.
    Latin American Public Opinion Project (LAPOP)http://www.vanderbilt.edu/lapop/about.phpLAPOP has collected public opinion surveys in the 28 countries of North, Central, and South America, as well as countries in the Caribbean. IRB approval is required for access.
    National Center for Education Statistics (NCES)https://nces.ed.gov/datatools/NCES collects data on the condition of education in the U.S. Data topics include literacy, early childhood, elementary and secondary education, and postsecondary education. Collections include international data.
    National Centers for Environmental Information (NCEI)https://www.ncei.noaa.gov/NCEI provides access to environmental data including weather and climate, coastal, oceanic, and geophysics data.
    Pew Research Centerhttp://www.pewresearch.org/Pew Research Center provides access to data produced from Pew Research public opinion polling, and other empirical research studies. Data topics include politics, media, religion, Hispanic trends, and other contemporary issues.
    ProQuest Statistical Abstract of the U.S.http://statabs.proquest.com/sa/index.htmlThe ProQuest Statistical Abstract of the U.S. provides summary statistics on social, political, and economic conditions of the U.S.
    Sheps Centerhttps://www.shepscenter.unc.edu/data/The Cecil G. Sheps Center for Health Services Research seeks to improve the health of individuals, families, and populations by understanding the problems, issues and alternatives in the design and delivery of health care services.
    Survey of Consumer Finances (SCF)https://www.federalreserve.gov/econres/scfindex.htmEvery three years, the Federal Reserve Board conducts a cross-sectional survey of U.S. families to collect data on income, demographics, pensions, and balance sheets.
    UCI Machine Learning Repositoryhttps://archive.ics.uci.edu/ml/index.phpThe UCI Machine Learning Repository offers a collection of data from various disciplines used for empirical analysis of machine learning algorithms.
    UK Data Archivehttp://www.data-archive.ac.uk/The UK Data Archive hosts the UK’s largest collection of social and economic data. Collections include UK surveys, international macrodata, and census data.
    UNICEF Datahttps://data.unicef.org/UNICEF data hosts data related to children and women collected from more than 100 countries worldwide through the Multiple Indicator Cluster Surveys (MICS) household survey program.
    Uniform Crime Reporting (UCR) Statisticshttps://www.ucrdatatool.gov/The FBI’s UCR Program collects crime data on the national, state, city, and county levels.
    U.S. Censushttps://data.census.gov/cedsci/The Census Bureau is the leading source of quality data about the nation's people and economy.
    Voting and Elections Collectionhttp://library.cqpress.com/elections/download-data.phpDownloadable national- or county-level data for current or historical elections for the offices of president, house, senate, and governor.
    World Bankhttps://www.worldbank.org/The World Bank Group is one of the world’s largest sources of funding and knowledge for developing countries
    TermDefinition
    Archival Information Package (AIP)“an information package that is used to transmit archival objects into a digital archival system, store the objects within the system, and transmit objects from the system.” (ISO 14721: The Reference Model for an Open Archival Information System)
    DataThe National Science Foundation’s definition describes data as something “determined by the community of interest through the process of peer review and program management” (National Science Foundation).
    Data managementAs defined by NOAA's Administrative Order 212-15 "consists of two major activities conducted in coordination: data management services and data stewardship. They constitute a comprehensive end-to-end process including movement of data and information from the observing system sensors to the data user. This process includes the acquisition, quality control, metadata cataloging, validation, reprocessing, storage, retrieval, dissemination, and archival of data.”
    Data CurationAs defined by The University of Illinois’ Graduate School of Library and Information Science “the active and ongoing management of data through its life cycle of interest and usefulness to scholarship, science, and education. Data curation activities enable data discovery and retrieval, maintain its quality, add value, and provide for reuse over time, and this new field includes authentication, archiving, management, preservation, retrieval, and representation.”
    Designated CommunityAs defined by ISO 14721: The Reference Model for an Open Archival Information System (OAIS) “An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. A Designated Community is defined by the Archive, and this definition may change over time.”
    Dissemination Information Package (DIP)The materials delivered to the data consumer through the archive system is the dissemination information package. This package is the final product after the submission information package has been successfully transformed into an archival information package.
    IngestEstablishes evidence of authenticity and ensures that files within the Submission Information Package (SIP) are in proper formats and include necessary documentation. Data repositories may also perform additional tasks such as checks for confidential information and data quality reviews. (ISO 14721 Reference Model for an Open Archival Information System)
    MetadataDefined as structured information that describes, explains, locates, and otherwise represents something else. Metadata allows data to be found and interpreted. At a minimum, one needs to know who created the data, when the data were created or published, and a title or descriptive name used to refer to the dataset.
    Open AccessAs defined by SPARC “Open Access is the free, immediate, online availability of research articles combined with the rights to use these articles fully in the digital environment. Open Access is the needed modern update for the communication of research that fully utilizes the Internet for what it was originally built to do—accelerate research.”
    Open Archival Information System (OAIS)The OAIS model presents a high-level framework for understanding archival concepts, defines elements and processes within digital repositories, and establishes a set of responsibilities for the long-term preservation of digital information. (ISO 14721 Reference Model for an Open Archival Information System)
    Open dataAs defined by the Open Definition “Open data is data that can be freely used, re-used and
    redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.”
    Personally identifiable information (PII)Data which includes variables and observations that contain
    information which could potentially be used to deduce a participant or participants within a study. This
    information is highly sensitive and should either be removed from or recoded within the data in such a way that deductive disclosure risk is minimal.
    Protected Health Information (PHI)Similar to personally identifiable information, protected health
    information within data could include variables or observations that contain specific health information that might lead to deductive disclosure of a participant(s).
    ReproducibilityAs defined by Pröll & Rauber, “An experiment is reproducible, if and only if consistent,
    scientific results can be obtained, by processing the same data with the same algorithms using the same tools. For an experiment to be reproducible, we need to have knowledge of at least the following
    information:
    • Research data and metadata used
    • Methods applied in the experiment
    • Tools, software and execution environment used in the experiment.
    Sensitive dataData that contain sensitive variables and observations (PII and/or PHI) that could lead to
    the identification of the participant(s) of the study. Any human subjects research must undergo a review
    before being approved to ensure that sensitive data is properly managed, secured, and stored with limited access and/or complete restricted access.
    Submission Information Package (SIP)The files received directly from the data producer are called the submission information package. These files will undergo processing in order to create an Archival
    Information package (AIP).

    Back to top