Project Structure

Here are some characteristics of files that you should pay attention to

  1. File history
  2. File function
  3. File format
  4. File origin

Key Point: Organize files so that they make intuitive sense and follow the narrative of the data analysis.

File Structure

Pic of folder structure

code directory

  • A common strategy of organization is to seperate code in it's own directory.
  • Common synomous names for this directory:
    • scripts
    • seperate by script type:
      • r
      • py
      • sh

README files

  • let users know important information about project
  • are the starting point for anyone looking at project
  • should be in plain text format (either .txt or .md)

Useful README File Content

  1. Project name
  2. Today's date (update as project is updated)
  3. Maintainer's contact info
  4. Data Origin
  5. 3-4 sentences about the goal of the project
  6. Dependencies / How to Install / Project Structure / Style Guide

data directory

  • The data directory is where all your data is kept, therefore incredibly important to think through structure.
  • Every data directory is unique and appropriate structure can vary wildly between individuals, projects and points in time.
  • One universal important concept with this directory is to identify which data files
    • are original (raw)
    • modified
      • outputs from scripts
      • programatically and manually edited

Identify Output Files

Output Files: Files that are generated from other files.

Key Points

  • output files should be clearly identifiable from your project structure or naming.
  • keep raw (input) files raw and never edit original copies.
  • keep a clear record of every modification that has been made to files. Ideally, this is in the form of a script that can automatically generate cleaned data from the raw data.

Identify Output Files

Know that all files in these output folders can be repopulated easily.

Pic of folder structure

Helpful Naming Conventions

Using file naming to convey order.

Pic of folder structure

Helpful Naming Conventions

Using file naming to convey history.

Pic of folder structure

Conclusions

  • Every project is different, requiring different organization strategies.

    • Talk with collegues on how they organize their projects
    • Browse Github Repositories for ideas
  • Your project structure will evolve throughout the projects lifetime.

  • Organize your project that makes sense for collaborators and most importantly - future you.