Project workflows

We use a standardized workflow structure based on Nextflow, which simplify developing methods and run experiments across different environments.

Create a new workflow structure

cookiecutter https://github.com/stracquadaniolab/cookiecutter-workflow-nf.git

The system will ask you a few questions and then create the structure for you. Successively, create a repo in GitHub with the name of the directory just created.

Directory structure

├── .github
│  └── workflows
│     └── ci.yml
├── bin
│  ├── fit.py
│  └── plots.py
├── conf
│  └── base.config
├── containers
│  ├── Dockerfile
│  └── environment.yml
├── testdata
│  └── mydata.txt
├── .bumpversion.cfg
├── .devcontainer.json
├── .gitignore
├── main.nf
├── nextflow.config
└── readme.md

The workflow file

The main.nf file contains the entrypoint for the workflow, and it uses Nextflow DSL2 by default. The workflow parameters are stored in the nextflow.config file, which in turn include other files in the conf directory; usually, you only have to define the parameters of your specific pipeline, since the conf/base.conf file includes profiles to run your workflow in different computing environment, e.g. Slurm, GitHub.

Please, refer to the Nextflow documentation for an overview of the framework.

Custom scripts management

Custom code (aka your scripts and classes) needed by the pipeline should be added to the bin directory; the code in this directory is automatically added to $PATH when running the pipeline, which makes custom scripts easily portable and accessible. If you are using Python, you should have a file for each class of operations, e.g. a file plots.py for all the plots, and use docopt to have standard Unix command line interface. See the auto-generated pipeline for an example.

Software management

Third-party software is managed by micromamba and specified in a environment.yml file; keep the yml file updated and specify the version of each software you are using in order to ensure reproducibility.

To ensure reproducibility and running experiments on local machines and HPC clusters, it is strongly recommended to build a Docker image. The bundled Dockerfile can be used to build an image with the software specified in your environment.yml file. To do that, run:

docker build . -t ghcr.io/stracquadaniolab/<my-workflow>:<version> -f containers/Dockerfile

where <my-workflow> is the name of your workflow and <version> is the current version of your workflow, starting from 0.0.0.

The template comes with an auto-generated .devcontainer.json file, which allows developing your scripts inside a container with all the software specified in environment.yml using vscode.

Sometimes you would want to pull a docker image from GitHub container registry:

docker pull ghcr.io/stracquadaniolab/<workflow_name>:<version>

In order to successfully pull an image, first you need to authenticate yourself with your personal access token, see here: Authenticating with the container registry

Testing

It is important to build workflows that can be automatically tested; thus, you will have to add small test data into the testdata directory, and modify the test profile in conf/base.config configuration file to specify any parameter needed for your workflow to run. See the auto-generated pipeline for an example.

Versioning

All projects must follow a semantic version scheme. The format adopted is MAJOR.MINOR.PATCH:

MAJOR: drastic changes that make disruptive changes with a previous release.
MINOR: add functions to the workflow but keeps everything compatible within the MAJOR version.
PATCH: bug fixes or settings update.

To update the version of your workflow, you should run the following command from the command line:

bump2version major #for major release
bump2version minor #for minor release
bump2version patch #for patch release

Push your code to GitHub

As the project is version controlled using Git, you can push your code to GitHub as follows:

git add . 
git commit -am "new: added super cool feature"
git push -u origin master

Importantly, after a bumpversion, you also have to push the tag just created as follows:

git push --tags

Continuous integration

Each pipeline comes with a pre-configured GitHub workflow to automatically test the code and build a Docker image; the workflow is stored in .github/workflows/ci.yml. Please note that a Docker image is only released when you push a tag.

Documentation

Each workflow must have an updated readme.md file, describing:

what the workflow does
how to configure the workflow
how to run the workflow
a description of the output generated

A readme.md file with the required sections is automatically generated by this cookiecutter.