Project workspaces
We use a standardized directory structure, called workspace, where we run and store all our experiments for a given project.
Create a workspace
cookiecutter https://github.com/stracquadaniolab/cookiecutter-workspace-nf.git
The system will ask you a few questions and then create the structure for you.
Directory structure
.
├── conf
├── data
├── logs
├── resources
├── results
└── readme.md
The conf
directory contains Nextflow config files to run a pipeline; you must
define the parameters of each experiment in a config file rather than passing
them on the command line.
The data
directory contains data to be processed by a pipeline. This directory
usually contains the raw data (e.g. data from sequencing experiments). You
should take some time to organize it in a meaningful and consistent way.
The resources
directory contains data retrieved from external
sources/repositories, like annotation files (e.g. genome GFF) or geneset GMT
files.
The logs
directory contains the log of each pipeline run.
The results
directory contains the result of experiments. You should take some
time to organize it in a meaningful and consistent way. It is strongly
recommended to results in a directory named like 2022-01-01-my-first-test
.
The readme.md
file contains a description of the project and how the data
and results
folder are organized.
Naming guidelines
The team MUST use the Google naming guidelines, specifically:
- Make file and directory names lowercase.
- Separate words with hyphens, not underscores.
- Use only standard ASCII alphanumeric characters in file and directory names.
IMPORTANT: Raw data or external resources are allowed to keep their naming standard if it makes them easier to identify.