Introduction
|
|
Project Structure
|
Organize and name files so that they make intuitive sense to your future self, and follow the narrative of the data analysis.
Populate folders with README files that describe the project and gives context for the analyses.
Original/raw data remains original and should never be modified.
Keep a clear record of every modification that has been made. Ideally, this is in the form of a script that can automatically generate cleaned data from the raw data.
Generated files (processed data, figures, etc) should not be intermingled in the same directory as files that must be backed up.
For something to be reproducible as a whole every step needs to be reproducible.
|
Metadata
|
All projects should include a README file in the top directory.
README files should include contact points and names of maintainers, date, brief description of the intent of the project, and the source of any data files.
Use a README to include changes made over time.
README files should be made in a plain text format.
|
Modifying data
|
Cleaned data should have its own README if any manual cleaning was performed.
Any modification should have a clear paper trail.
Using GUIs for modifying data often has unexpected results.
Using GUIs for cleaning up data seems quick, but doing it with even only a modicum of reproducibility becomes laborious fast.
|
Concluding thoughts
|
Organize files so that they make intuitive sense and follow the narrative of the data analysis.
Populate folders with metadata that describes the folder contents, where those contents came from, and gives context for the analyses that you’re about to perform.
Always make copies of data for modification, and never over-write the raw data.
Keep a clear record of every modification that has been made. Ideally, this is in the form of a script that can automatically generate cleaned data from the raw data.
If manual cleaning is necessary, create a README file that details every single change that has been made, such that a newcomer could re-create these changes.
|