Skip to main content

COMMON ISSUES

This section identifies common problems or errors that the Odum Verification Team sees in verification packages. We identify the problem and present what to look for in your materials along with potential solutions, as appropriate. The following problems are ordered by frequency starting with the most frequent:

1. Variable documentation involves errors related to the variables and their descriptions, values, and labels or construction in the codebook. Common mistakes include missing description for variables or all values/labels are not explained such as missing/blank values (e.g., 999,  NA) and values for binary, ordinal, or categorical variables (e.g., 1=Strongly agree, Agree, etc.). This information supports accurate interpretation and future use of these data. We recommend that you consider what a future researcher that had no involvement in the study needs to know to use these variables.

2. Discrepancies in numeric results happen in the in-test results, tables, and output. These errors include rounding errors and value mismatches, often these are minor differences but we sometimes see major discrepancy. We flag a result discrepancy if the value that we produced is even one number off of the final digit in your manuscript (i.e., If we produce the value of 12.34 and your manuscript result is 12.35, then we will flag it as a result discrepancy). We recommend that you re-run your analyses and ensure the output matches exactly with the numeric results in the manuscript, paying attention to the SPPQ guidelines for rounding and decimal places. If you have questions on SPPQ policy on rounding and number of decimal places, please contact SPPQ Editors.

3. Compute environment information are errors related to inaccurate or unspecified software, packages, versions, or dependencies in the compute environment. Our verifiers use this information to build a similar compute environment to your environment. We have found that compute environment can impact code execution and output and results. We ask that the author document the software, packages, and compute environment in the README and include install package commands in the code. If you are using high performance or throughput systems (HPC, HTC), details on HPC system, memory requirements, modules, build, etc. are required along with anticipated run time.

4. Typos creep up in your materials and documentation that we will ask to be corrected. Often, we see typos in the filenames in the README, variable names or copy and paste errors for variable descriptions in the codebook, and commands to call in files in the code. Please double-check your files and/or have a fresh set of eyes to review these materials.

5. Misalignment of variables between the codebook and analytical data is a common issue. SPPQ requires that authors provide a codebook containing documentation for variables in the analytical data set. These errors go two ways – variables listed in the codebook but not present in the data and variables in the data that are not listed in the codebook. Please open your data files and codebook to compare the variables. If any variables are created by your code, please note this in the codebook.

6. README file should contain documentation about your Dataverse package and its files. Often, we find problems with the list of files and file descriptions. Examples include files not listed, missing file descriptions, or missing note about restricted-access files (i.e., proprietary or sensitive data cannot be shared in Dataverse). We recommend that you check the file list in the README matches the files that you uploaded into SPPQ Dataverse before you submit.

7. Missing code sections or files are pervasive in the submissions that we review. This error means that we are not able to re-execute your workflow from start to finish. For instance, authors do not include code for the appendices, specific table or output, or a step in their workflow like creating the analytical data from the raw data sets. Our suggestion to test your workflow on a different and clean computer will help catch files on your computer that your code depends on to run and  highlight any manuscript results that are not produced.

8. The file paths impact the re-execution of your workflow outside of your local compute environment. We often see absolute file paths, and unpreserved file and folder structures in the packages that we review. We recommend that you use relative file paths in your code to improve the long-term interoperability and reproducibility. See the example below for the difference between absolute vs. relative file paths. If your code depends on a file and folder structure, please upload the files with their folder structure into Dataverse either by creating a .zip archive or adding file path to your files in Dataverse.

9. Data citations are an important requirement of the SPPQ policy to acknowledge the original data producers work and ensure a future verifier can discover the data that you used. We are looking for a formal-styled citation similar to a citation for a journal article where the original data producer,  data title, year, identifier or version, and access information is included. The errors that we see are missing data citations, incomplete citations such as missing identifiers, or prose about the data source (not a styled citation).

10. The tables and figures produced by your code must match exactly the tables and figures in your manuscript. We find visual differences in outputs in terms of the axis scales of values, lines, colors, formatting, shading, among others. We recommend that you compare the output that your code produces to the tables and figures in your manuscript to ensure they match exactly.

11. Code documentation is problematic when it is unclear what the code is supposed to be doing or discerning how the output aligns with your results (e.g., which values are for Table 2). Documenting code has become a best practice for software and programming. We recommend that you provide comments in your code identifying and explaining the individual steps in your data preparation and analytical approach. We also suggest that you provide comments or print out information to help us identify which output corresponds to which result especially for tables and any in-text results (e.g., table numbers with row or column name, manuscript page number and/or section where it can be found in your manuscript).

Back to top

Back to top