Bioinformatics rely on a vast number of tools (packages, electronic notebooks, programming languages and their libraries) that bioinformaticians need to be able to install, manage and run. A growing challenge is represented by the organisation of data inputs and outputs – particularly as genomic datasets continue to expand.
This one-day training workshop will introduce key concepts and working modalities that address these challenges, which are rapidly being adopted in the industry, including:
- Using containers (such as Docker and Singularity) – currently the easiest method for managing and deploying software, easier sharing of code, and higher reproducibility of the pipelines.
- Workflow languages (Nextflow DSL2) – workflow managers provide a framework for running analyses. They intrinsically provide a degree of data provenance and are easy to re-run analyses with different datasets or parameters in a range of computing environments.
- GNU/Linux command-line
Prerequisites
- You will need a basic understanding of navigating the GNU/Linux command line. You should be able to use commands such as cd, ls cat, grep.
- You will need a basic understanding of microbial genomics.
- You will need a stable internet connection and a web browser
Outcomes
By the end of the workshop,
- You will learn how bioinformaticians organise their data and analysis.
- You will learn how to deploy bioinformatics software through Linux containers.
- You will be introduced to chaining bioinformatics software to run in a “pipeline” via NextFlow.
- You will be introduced to writing your own workflows using existing NextFlow modules.
- You will learn how to use these frameworks to run regular bioinformatics analyses such as assembling a microbial genome, creating a phylogenetic tree, and running basic genotyping.
Programme (GMT)
09:00 | Orientation and Testing VMs – participants will be given credentials to access their VMs during this Orientation Session.
10:00 | Welcome – Organising Committee
10:10 | How does a modern bioinformatician organise their work? – (slides) – Nabil-Fareed Alikhan, Quadram Institute Bioscience
10:50 | Getting things done with Conda and Snakemake – Anna Price, Cardiff University
11:30 | The value and use of containers – Anna Price, Cardiff University
12:00 | Lunch Break
13:00 | Practical session 1 – Assemble and examine a microbial genome using containers – Anna Price, Cardiff University
14:30 | Provence and portability through Nextflow – (Slides) – Andrea Telatin, Quadram Institute Bioscience
15:00 | Practical session 2 – Basic bioinformatics using Nextflow – Andrea Telatin & Nabil-Fareed Alikhan, Quadram Institute Bioscience
16:20 | Afternoon Break
16:50 | Working with Nextflow, DSL2 modules and Bactopia – (Slides) – Robert Petit, Wyoming Public Health Laboratory
17:20 | Discussion Panel and Q&A
18:00 | Final Remarks
Other resources
- https://biocorecrg.github.io/CoursesCRG_Containers_Nextflow_May_2021/
- https://www.h3abionet.org/categories/training/introduction-to-nextflow-workshop-december-2020
- If you haven’t used the shell before: https://swcarpentry.github.io/shell-novice/
- Nextflow and Snakemake head to head comparison: https://github.com/fmaguire/amr_training_workshop_practical
- https://gitlab.com/cgps/ghru/pipelines/dsl2/pipelines
- https://github.com/Bioinfo-skills-2022-CLIMB-VM/
- https://github.com/Bioinfo-skills-2022-CLIMB-VM/Assemble-a-microbial-genome-using-containers
- https://soundcloud.com/microbinfie/bactopia-part-1
- https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf
- https://bactopia.github.io/
Organising committee
- Anna Price, PriceA35@cardiff.ac.uk
- Nabil Fareed-Alikan, nabil-fareed.alikhan@quadram.ac.uk
- Robert Petit, robert.petit@wyo.gov
- Mavis Foster-Nyarko, climb-big-data@quadram.ac.uk
- Lisa Marchioretto; lisa.marchioretto@quadram.ac.uk