Project workflows
We use a standardized workflow structure based on Nextflow, which simplify developing methods and run experiments across different environments.
Create a new workflow structure
cookiecutter https://github.com/stracquadaniolab/cookiecutter-workflow-nf.git
The system will ask you a few questions and then create the structure for you. Successively, create a repo in GitHub with the name of the directory just created.
Directory structure
├── .github
│ └── workflows
│ └── ci.yml
├── bin
│ ├── fit.py
│ └── plots.py
├── conf
│ └── base.config
├── containers
│ ├── Dockerfile
│ └── environment.yml
├── testdata
│ └── mydata.txt
├── .bumpversion.cfg
├── .devcontainer.json
├── .gitignore
├── main.nf
├── nextflow.config
└── readme.md
The workflow file
The main.nf
file contains the entrypoint for the workflow, and it uses
Nextflow DSL2 by default. The workflow parameters are stored in the
nextflow.config
file, which in turn include other files in the conf
directory; usually, you only have to define the parameters of your specific
pipeline, since the conf/base.conf
file includes profiles to run your workflow
in different computing environment, e.g. Slurm, GitHub.
Please, refer to the Nextflow documentation for an overview of the framework.
Custom scripts management
Custom code (aka your scripts and classes) needed by the pipeline should be
added to the bin
directory; the code in this directory is automatically added
to $PATH
when running the pipeline, which makes custom scripts easily portable
and accessible. If you are using Python, you should have a file for each class
of operations, e.g. a file plots.py
for all the plots, and use
docopt
to have standard Unix command line
interface. See the auto-generated pipeline for an example.
Software management
Third-party software is managed by micromamba
and specified in a
environment.yml
file; keep the yml
file updated and specify the version of
each software you are using in order to ensure reproducibility.
To ensure reproducibility and running experiments on local machines and HPC
clusters, it is strongly recommended to build a Docker image. The bundled
Dockerfile
can be used to build an image with the software specified in your
environment.yml
file. To do that, run:
docker build . -t ghcr.io/stracquadaniolab/<my-workflow>:<version> -f containers/Dockerfile
where <my-workflow>
is the name of your workflow and <version>
is the current version of your workflow, starting from 0.0.0
.
The template comes with an auto-generated .devcontainer.json
file, which
allows developing your scripts inside a container with all the software
specified in environment.yml
using vscode
.
Sometimes you would want to pull a docker image from GitHub container registry:
docker pull ghcr.io/stracquadaniolab/<workflow_name>:<version>
In order to successfully pull an image, first you need to authenticate yourself
with your personal access token, see here: Authenticating with the container
registry
Testing
It is important to build workflows that can be automatically tested; thus, you
will have to add small test data into the testdata
directory, and modify the
test
profile in conf/base.config
configuration file to specify any parameter
needed for your workflow to run. See the auto-generated pipeline for an example.
Versioning
All projects must follow a semantic version scheme. The format adopted is
MAJOR.MINOR.PATCH
:
- MAJOR: drastic changes that make disruptive changes with a previous release.
- MINOR: add functions to the workflow but keeps everything compatible within the MAJOR version.
- PATCH: bug fixes or settings update.
To update the version of your workflow, you should run the following command from the command line:
bump2version major #for major release
bump2version minor #for minor release
bump2version patch #for patch release
Push your code to GitHub
As the project is version controlled using Git, you can push your code to GitHub as follows:
git add .
git commit -am "new: added super cool feature"
git push -u origin master
Importantly, after a bumpversion
, you also have to push the tag just created as follows:
git push --tags
Continuous integration
Each pipeline comes with a pre-configured GitHub workflow to automatically test
the code and build a Docker image; the workflow is stored in
.github/workflows/ci.yml
. Please note that a Docker image is only released
when you push a tag.
Documentation
Each workflow must have an updated readme.md
file, describing:
- what the workflow does
- how to configure the workflow
- how to run the workflow
- a description of the output generated
A readme.md
file with the required sections is automatically generated by this
cookiecutter.