nf-core blog

Making nf-core/configs strict syntax compliant

Thu, 05 Mar 2026 08:00:00 GMT

## Introduction Nextflow will very soon be making it's new '[Strict Syntax](https://www.nextflow.io/docs/latest/strict-syntax.html#preparing-for-strict-syntax)' mode default (from 26.04). This more restrictive way of writing Nextflow code is being implemented by the Nextflow team to improve error messages and more consistent code. Representing best-practice Nextflow code, nf-core is therefore also pushing forward with requiring strict syntax compliance across the board - whether in pipelines, modules, and also now configs. The core team, lead by Nicolas (@nvnieuwk), has done a survey of existing configs and [identified 57 configs](https://github.com/nf-core/configs/issues/1020) on nf-core/configs repository that are not currently Nextflow Syntax compliant. Once the users on these infrastructures update to the latest version of Nextflow, their pipelines will not be able to execute. ## What are we doing? To assist configs maintainers and users in avoiding potential problems running pipelines on their infrastructure when they update to future versions of Nextflows, we are trying a multi-pronged approach to spread awareness of the changes: 1. Notify the **config maintainers** of soon-to-be non-compliance on a dedicated [issue](https://github.com/nf-core/configs/issues/1020) 2. Post (this!) blog post for outside visibility, but also a cross-reference on the nf-core Slack's [#announcements](https://nfcore.slack.com/archives/CE6P95170) channel 3. For configs without fixes, the core team will update config description message displayed at the beginning of nf-core runs to make **users** aware and to tell them to contact us - e.g. if the original config author no longer is responsibility 4. If we do not hear from config maintainers or pipeline users, we will consider a form of deprecating the config We have also prepared a ['migration' guide](/docs/tutorials/migrate_to_strict_syntax/config_strict_syntax) describing recommended fixes for common 'patterns' of non-compliance. Note this remains an open document, and we welcome any contributions, whether fixes, improvements, or updates! ## Timeline We do not have a fixed timeline for this roll out, except for the deadline of the release of Nextflow v26.04. The GitHub issue has already been made, and we will monitor the speed of responses before moving onto phase 3 and 4. ## Summary The nf-core core team hopes that with these steps we will ensure continued smooth and uninterrupted execution of the communities' exciting analysis and research! If you have any questions, or need any help feel free to ask for help at any time on Slack, on the [#configs](https://nfcore.slack.com/archives/CGRTMASKY) channel.

nf-core and AI

Wed, 14 Jan 2026 09:00:00 GMT

![Badly photoshop image of a Pigeon developing code on a laptop, and Maxime Garcia giving a thumbs up from the laptop screen](../../../assets/images/blog/statement-on-ai/maxime-pigeon.png) > If a pigeon were to commit good code, I would accept it \- _Maxime Garcia (@maxulysse), Jan. 2026_ ## Short summary nf-core's policy remains unchanged. Humans are still ultimately responsible for their submitted code. If you're using AI tools, try to stick by these guidelines: - Keep PRs as small and focussed as possible - Avoid any unnecessary changes, such as moving or refactoring code (unless that is the explicit intention of the PR) - Review all generated code yourself before opening a PR, and ensure that you understand it - Engage with the community review process and expect to make revisions ## Background As everyone reading this is most likely aware, Large Language Models (LLMs) have exploded in usage in many areas of society usually under the term artificial intelligence (AI). Particularly in software development, many tools and services have been developed and adopted by many companies, institutes, and developers. However, the use of AI/LLMs remain a contentious topic, and many conflicting opinions on their use, ethics, and implications remain. Within nf-core, two discussions have recently taken place regarding the use of AI in code contributions [^1] [^2], with two main viewpoints emerging of strong enthusiasm vs scepticism. [^1]: [https://github.com/nf-core/proposals/issues/91](https://github.com/nf-core/proposals/issues/91) [^2]: [https://github.com/nf-core/proposals/issues/61](https://github.com/nf-core/proposals/issues/61) Despite differing opinions and experiences, we reached a consensus on an approach that aims to be as inclusive as possible on how AI and LLMs will be treated within nf-core. ## Pros and Cons To briefly summarise the main arguments for and against the use of AI in code contributions to nf-core. For the people who were very much in favour, they felt that: - Their use of AI related tools made their development work more efficient, speeding up development - The AI tools took over more boring leg work allowing them to work on more fun tasks - In some cases the use of LLM tools can help with ensuring more consistency in code and quality across the community - AIs are just ultimately tools, no different to linters, code formatters, or other automated tools already being widely used - it's up to the human behind the keyboard not to abuse them On the flip side people who were more sceptical said they worried about: - AI code and automated reviews having variable and inconsistent quality - Increased risk of 'drive-by PRs' with limited engagement by author - Increased risk of low-effort but large contributions with significant code changes outside original scope - Moral, ethical, environmental, and legal questions such as source data attribution remaining unaddressed - Ultimately risking greater waste of reviewer and community maintenance time ## nf-core's stance nf-core core team will not ban or prohibit the use of AI or LLM tools in code contributions, nor force any community member to use AI tools. Our position is that the human using the AI tools remains responsible for any code they submit[^3]. [^3]: Something adopted by other projects, a good guidance document is that of [FastAPI](https://fastapi.tiangolo.com/contributing/#closing-automated-and-ai-prs). As with any other code, **nf-core community members reserve the right to reject contributions that do not meet our guidelines**. This therefore includes cases where PRs, reviews, or other content have been obviously AI-generated without due care. Accordingly, repeated drive-by or low quality PRs will see the same consequences as with any other violation of our contribution guidelines. We will continue to encourage community members to be open and honest, for example, acknowledging very significant use of AI tools in PR descriptions. The community will be mindful when assisting members who wish to apply AI tools to nf-core development, that we do not impact the experience of others. For example, we will aim to not clutter the template with many AI helper files where unnecessary (e.g. `AGENT.md`, `CLAUDE.md`, `.mcp.json` etc.), but will consult with the community via blog posts and RFCs on [nf-core/proposals](https://github.com/nf-core/proposals). We will also aim to further develop and promote best-practice guidelines and etiquette. For example, we hope to find approaches that encourage smaller, focused, and incremental feature PRs - something that will also benefit 'human' contributors, such as through faster review response time. Finally, we will continue to monitor the situation and community feedback. If at any point you have concerns about a PR or possible adoption of AI features in the community, feel free to contact the core team. ## Conclusion To conclude, nothing changes within nf-core. Humans are still responsible for their code, regardless of the tools they use. _This statement was drafted by James Fellows Yates with help from GitHub Co-Pilot Pro's GPT-4.1 autocompletion. It was reviewed by the nf-core core team. However the image is pure, handmade, artisanal (terrible) 'photoshopping'._

Running nf-core pipelines on Arm

Mon, 12 Jan 2026 11:00:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; ## Running nf-core pipelines on Arm The bioinformatics community has long relied on x86 processors, but Arm architecture is rapidly becoming a compelling alternative. Over the past year, a collaborative effort between nf-core, Arm, AWS, Seqera, and the broader Bioconda community has been working to make Arm-native execution a reality for Nextflow pipelines. This post covers what we've learned, where we've got to, and how you can start running on Arm today. ## Why Arm matters for bioinformatics Arm-based processors like AWS Graviton offer significant advantages over traditional x86 chips. Our benchmarking with nf-core/rnaseq comparing `c7a.48xlarge` (x86) against `c8g.48xlarge` (Graviton) showed comparable runtimes with **20-25% cost savings**. The Arm instances delivered equivalent performance at a lower price point. Beyond the financial benefits, Arm processors consume less power, making them a greener choice for compute-intensive workflows. For developers, there's another compelling reason: Apple Silicon. With Arm support in Bioconda, you can now run your pipelines natively on Mac M1-M5 machines without emulation overhead. As we discovered during this project, Docker on Mac runs linux/arm64 containers natively (since Docker runs in a Linux VM), so Arm containers are actually faster than emulated x86 ones on Apple Silicon. Oxford Nanopore's benchmarking showed [4-5x speedups when running native Arm containers on Mac](https://epi2me.nanoporetech.com/inkling/) compared to emulated x86. ## The challenge: software availability The main barrier to running bioinformatics workflows on Arm has been software availability. Most bioinformatics tools are distributed through Bioconda, which historically only built packages for x86 architectures. While Bioconda announced official support for `linux-aarch64` and `osx-arm64` in July 2024, the actual availability of Arm packages varied significantly across tools. nf-core pipelines use hundreds of different software packages, and each one needs Arm-compatible builds before a pipeline can run on Graviton instances. This is where the collaboration began. _Brendan Bouffler and David Lecomber's Nextflow Summit 2024 talk on running pipelines on Arm architecture_ ## A community-wide porting effort In October 2024, we created the [`#arm64` channel](https://nf-co.re/join) on the nf-core Slack to coordinate efforts. [David Lecomber](https://github.com/dslarm), then at Arm, led an intensive package porting effort, working alongside [Pablo Gonzalez de Aledo](https://github.com/pabloaledo) then at Seqera and [Angel Pizarro](https://github.com/delagoya) at AWS. Between them, David and Pablo have submitted [_nearly 300_ PRs to Bioconda recipes](https://github.com/bioconda/bioconda-recipes/pulls?q=is%3Apr+author%3Apabloaledo+author%3Adslarm) to enable Arm builds. The approach was methodical. Phil Ewels built [discovery scripts](https://github.com/ewels/nf-core-arm-discovery) to identify which conda packages were used across all nf-core pipelines, then tested whether each package could build successfully for Arm using Seqera's Wave container service. Packages that failed were prioritized based on how many pipelines they blocked. The work involved updating `meta.yaml` files to enable Arm builds: ```yaml extra: additional-platforms: - linux-aarch64 - osx-arm64 ``` Many packages built without issues once this flag was added. Others required more detective work. David encountered some memorable challenges along the way, including a package that failed because the `which` command had a buffer overflow bug from 1999 (it assumed paths would never exceed 256 characters - Bioconda's build directories are typically 250 characters), and another package whose official website had been compromised and replaced with content about a Colombian adult film actress. _Angel Pizarro's Nextflow Summit 2025 talk, all about the technical journey of porting packages to Arm_ ## Current status: most pipelines are Arm-ready The results have been remarkable. The community has systematically worked through the top nf-core pipelines, and progress has been rapid. As David Lecomber noted in the Slack channel: "Pipelines are falling like dominos!" As of late 2025, **61 of the top 101 nf-core pipelines** have full Arm support, with 590 packages successfully building on Arm. Here's a snapshot of some key pipelines: | Pipeline | Arm Build Success | | --------------------- | ------------------------ | | rnaseq | 100% | | sarek | 100% (excluding dragmap) | | chipseq | 100% | | atacseq | 100% | | fetchngs | 100% | | methylseq | 100% | | taxprofiler | 100% | | scrnaseq | 100% | | nanoseq | 100% | | ampliseq | 100% | | oncoanalyser | 100% | | differentialabundance | 100% | The full status across all nf-core pipelines is tracked in the [nf-core-arm-discovery repository](https://github.com/ewels/nf-core-arm-discovery), with package-level details tracked on our [GitHub project board](https://github.com/orgs/nf-core/projects/89/views/1). ## How to run your pipelines on Arm If you're ready to try Arm, here's how to get started: ### Using Seqera Platform with AWS Graviton 1. Create a compute environment using Graviton instance types (e.g., `r8g`, `m7g`, `c7g`, `c8g`) 2. Enable Wave containers with Arm architecture support 3. Launch your pipeline as normal The platform will automatically build Arm-native containers for your workflow. ### Using Nextflow directly You can use Wave's multi-architecture container support with the following configuration: ```groovy docker.enabled = true wave.enabled = true wave.strategy = ['conda'] process.arch = 'linux/arm64' ``` ### Setting up an Arm development environment on AWS Angel Pizarro has created a [CloudFormation template](https://github.com/delagoya/nextflow-devbox) that sets up an EC2 Graviton instance with VSCode server, Miniconda, Docker, Nextflow, and nf-core pre-installed. This makes it easy to spin up an Arm development environment in about 10-15 minutes. You can either use the web-based VSCode running on the instance, or connect your local editor to the instance using Remote-SSH. Just be aware that if you do use the Remote-SSH extension, you'll need to install the extension you'll need, like the `nf-core-extensionpack`. Full instructions are in the [README file](https://github.com/delagoya/nextflow-devbox/blob/main/README.md). By the way, you can also use this template to set up an x86_64 instance. ### Local development on Apple Silicon If you're developing on a Mac with Apple Silicon, Bioconda packages now install natively. Just ensure your conda/mamba is configured correctly: ```bash conda config --add channels bioconda conda config --add channels conda-forge conda config --set channel_priority strict ``` **Important**: Make sure you're specifying packages from the correct channel. A common issue we encountered was pipelines referencing packages as `bioconda::packagename` when they had migrated to `conda-forge`. This works on x86 (using old cached binaries) but fails on Arm where only the conda-forge version exists. ## Contributing to the effort Some packages still need work, and you can help! David Lecomber created the [bioconda-contrib-notes repository](https://github.com/dslarm/bioconda-contrib-notes) with detailed documentation on how to port packages, which has now been [integrated into the nf-core website](https://nf-co.re/docs/contributing/software_packaging/arm64_builds). Key resources: - [nf-core Arm64 builds documentation](https://nf-co.re/docs/contributing/software_packaging/arm64_builds) - [GitHub project board tracking progress](https://github.com/orgs/nf-core/projects/89) - [Bioconda aarch64 documentation](https://bioconda.github.io/developer/aarch64.html) - [nf-core Slack #arm64 channel](https://nf-co.re/join) - [Package status tracking](https://github.com/dslarm/bioconda-contrib-notes/blob/main/wave_missing.txt) If you encounter a package that doesn't build for Arm, the basic process is: 1. Fork the bioconda-recipes repository 2. Add the `additional-platforms` section to the package's `meta.yaml` 3. Bump the build number 4. Open a pull request The Bioconda community is welcoming and will help guide your contribution through review. During this project, we found that many packages just needed the platform flag added, while others required fixes for architecture-specific assumptions in the code. ## What's next With software availability largely solved, the focus is now shifting to: 1. **Automated CI testing**: Edmund Miller has [merged Arm CI testing](https://github.com/nf-core/modules/pull/7747) into nf-core/modules, allowing us to routinely test modules on Arm runners. We're expanding this to more pipelines, starting with [rnaseq](https://github.com/nf-core/rnaseq/pull/1641). 2. **Tooling improvements**: The [nf-core/tools 3.4.0 release](/blog/2025/tools-3_4_0) added improved Arm64 architecture handling with dedicated `arm64` and `emulate_amd64` profiles. 3. **Performance optimization**: Some tools like bwa-mem2 have Arm-specific optimizations using SVE (Scalable Vector Extension) instructions that can provide [significant speedups on Graviton3 and above](https://github.com/bwa-mem2/bwa-mem2/pull/248). 4. **Commercial software support**: Tools like Sentieon are now [available on Arm](https://developer.arm.com/community/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/gencove-adopts-sentieon-and-aws-graviton), expanding options for production pipelines. 5. **Bioconda ecosystem health**: David has been tracking download statistics and found that of the top 100 most-downloaded Bioconda packages, 99 now work on linux-aarch64 and 93 on osx-arm64. ## The future: automatic Arm support with Seqera Containers The package porting work described above is just the first phase. The real game-changer will be nf-core's ongoing migration from BioContainers to Seqera Containers. You can read the full details in the two-part blog series: - [Migration from BioContainers to Seqera Containers: Part 1](/blog/2024/seqera-containers-part-1) - Why we're making this change - [Migration from BioContainers to Seqera Containers: Part 2](/blog/2024/seqera-containers-part-2) - How it all works Today, running on Arm requires manual work: checking if packages are available, building containers, configuring pipelines. The package porting effort has been about getting the raw materials in place. Once the Seqera Containers migration is complete, Arm support will be automatic. When a developer edits an `environment.yml` file in a module, GitHub Actions will automatically build containers for both `linux/amd64` and `linux/arm64`. The pipeline will ship with pre-built Arm containers ready to use. New profiles will make running on Arm as simple as adding a flag: ```bash # Run on Arm with Docker nextflow run nf-core/rnaseq -profile docker_arm # Run on Arm with Singularity nextflow run nf-core/rnaseq -profile singularity_arm # Run on Arm with Conda nextflow run nf-core/rnaseq -profile conda_arm ``` The containers will be hosted on Seqera Containers infrastructure, built on-demand using Wave, with full build logs, security scans, and conda lock files for reproducibility. This means the manual package porting work happening now is laying the foundation. Once a tool has Arm support in Bioconda, it will automatically flow through to every nf-core pipeline that uses it - no additional work required. ## Acknowledgments This work wouldn't have been possible without the contributions of many people. Special thanks to: - **David Lecomber (formerly Arm)** for his tireless work on package porting - submitting hundreds of PRs and creating comprehensive documentation - **Angel Pizarro (AWS)** for benchmarking, advocacy, CloudFormation templates, and testing infrastructure - **Brendan Bouffler (AWS)** for his belief in the project and coordinating support and resources - **Pablo Gonzalez de Aledo (formerly Seqera)** for tooling, Wave integration, and early package porting work - **Edmund Miller** for implementing Arm CI testing in nf-core/modules and pipelines - **Roman Valls Guimera** for getting oncoanalyser running on Arm and extensive debugging - **Maxime U Garcia** for pipeline fixes and coordination - **Arm** for enabling David's time to work on this project The entire **Bioconda maintainer team** for their support and rapid reviews The collaboration demonstrates what's possible when the open-source community works together toward a shared goal. What started as a hackathon project in October 2024 has grown into a sustained effort that's making Arm a standard part of the bioinformatics computing landscape. > Want to get involved? Join the [#arm64 channel](https://nf-co.re/join) on the nf-core Slack, check out the [project board](https://github.com/orgs/nf-core/projects/89), or dive into the [Arm64 documentation](https://nf-co.re/docs/contributing/software_packaging/arm64_builds). And to understand the bigger picture of where this is all heading, read our blog posts on the [Seqera Containers migration](/blog/2024/seqera-containers-part-1).

Maintainers Minutes: November 2025

Mon, 08 Dec 2025 09:00:00 GMT

The 'Maintainers Minutes' aims to give further insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## Overview This month's maintainers meeting covered several topics around testing infrastructure, release processes, and community contributions. ## nf-test improvements We now have four nf-core core members with write access to the nf-test repository: [Edmund](https://github.com/edmundmiller), [Sateesh](https://github.com/sateeshperi), [Nicolas](https://github.com/nvnieuwk), and [Maxime](https://github.com/maxulysse). This increased access should help us better align nf-test development with our nf-core's needs and help push nf-test development forward. We discussed priority improvements we'd like to see in nf-test, including: - [The ability to exclude tags](https://github.com/askimed/nf-test/pull/283) - [Topic channels support](https://github.com/askimed/nf-test/issues/258) - [Compatibility with strict syntax](https://github.com/askimed/nf-test/issues/326) - [Local storing of downloaded files/containers/conda environments](https://github.com/askimed/nf-test/issues/231) These enhancements will make testing more efficient across our pipelines and will allow us to remove some boilerplate code. ## Improving the release process A significant portion of the meeting focused on making our release process more sustainable and less burdensome for maintainers. ### Trunk-based development We discussed the proposal for [trunk-based development](https://github.com/nf-core/proposals/issues/49). While this approach could streamline our workflow, it raises concerns about traceability. Since most beginners run pipelines without the `-r TAG` flag, they would automatically pull the most recent version, making it harder to track which version users are actually running. ### Review bottlenecks Getting code reviews remains challenging, even for small PRs. Maintainers often need to personally reach out to get reviews, and the review trade can be quite uneven. Many contributors also don't feel confident judging scientific accuracy. We emphasized that maintainers and the core team should focus primarily on the Nextflow code quality and nf-core guidelines compliance, not necessarily the underlying science, which the community around the pipeline should be responsible for. ### Automating guideline checks For the `dev` → `main`/`master` release process, we discussed focusing on overall guideline compliance rather than re-reviewing individual PRs. Some ideas to reduce this burden include: - Using AI to automate guideline checking - Improving our linting tools - Creating automated linting reports that get pushed to repository issues or PRs The goal is to catch and fix linting issues earlier in the development process, rather than discovering them during the release. ## Infrastructure updates [@mashehu](https://github.com/mashehu) has implemented new CI for the nf-core/test-datasets repository that now warns when files are being deleted. This should help prevent accidental removal of test data. ## Topic channels Exciting news - the first batch of modules with topic channels for version reporting are now merged and ready to use! See the [blog post](https://nf-co.re/blog/2025/version_topics) and [migration guide](https://nf-co.re/docs/tutorials/migrate_to_topics/update_modules) for more information. ## The end We will continue to work on these topics in the coming months, particularly around making the release process more sustainable and less dependent on individual maintainers. As always, we are looking for community involvement, so we encourage anyone to join the discussion in relevant PRs and Slack channels and threads! \- :heart: from your #maintainers team!

The Mass Spectrometry Proteomics Special Interest Group

Wed, 26 Nov 2025 11:00:00 GMT

We are excited to announce the formation of `massspec-proteomics`, the **Mass Spectrometry Proteomics** special interest group within nf-core! ## Introduction As the use of Nextflow for proteomics data analysis continues to grow, so does the number of pipelines and modules being developed by the community. From established pipelines like `quantms` to newer initiatives like `mspepid` and `WOMBAT-P`, there is a wealth of expertise distributed across different teams. However, this growth also brings a challenge: how do we ensure we aren't reinventing the wheel? The new MassSpec-Proteomics SIG has been created to serve as a central hub to coordinate these efforts. Our goal is to establish a cohesive, modular ecosystem in nf-core where components can be shared and pipelines can be interoperable. ## Our Vision The group's primary focus is to establish guidelines and standards tailored for mass spectrometry data analysis. We are moving towards a vision where specific data types, such as **DDA** (Data-Dependent Acquisition), **DIA** (Data-Independent Acquisition), and **TMT** (Isobaric Labeling), are handled by dedicated, thin, and modular pipelines that can be easily chained together. To achieve this, we are prioritizing: 1. **Shared Components:** Systematically refactoring and contributing high-quality modules and subworkflows (e.g., from `quantms` and for tools like FragPipe) to the official `nf-core/modules` repository. 2. **Standardization:** Establishing best practices for QC and benchmarking to ensure all proteomics pipelines meet the same high standards. 3. **Collaboration:** providing a forum for developers to align on roadmaps and technical solutions. ## Join Us! Whether you are a developer working on a specific tool, a bioinformatician building pipelines, or a user with feedback on current workflows, we want to hear from you. - **Slack:** Join the discussion in the `#massspec-proteomics` channel on the [nf-core Slack](https://nf-co.re/join). - **Meetings:** We will be organizing regular virtual meetings to discuss technical challenges and roadmap planning. Keep an eye on the Slack channel for the schedule. We look forward to building the future of nf-core proteomics with you!

nf-core/tools - 3.5.0

Wed, 19 Nov 2025 10:00:00 GMT

This release is small in scope, but introduces major changes to the syntax in nf-core/modules. These changes are part of the comming Nextflow syntax changes, which will be delivered to pipelines through template updates gradually. For more information on the Nextflow syntax adoption in nf-core pipelines, you can read [the blogpost](https://nf-co.re/blog/2025/nextflow_syntax_nf-core_roadmap) detailing the roadmap. ## Topic channels for version handling in modules With a heroic push during the hackathon led by [@nvnieuwk](https://github.com/nvnieuwk), we switched to [topic channels](https://www.nextflow.io/docs/latest/reference/channel.html#topic) for version handling in nf-core/modules. This means we don't write the tool versions to a `versions.yml` file anymore, but instead use simpler channel logic to broadcast and collect the software versions of the tools used in modules. The main change happens in the `main.nf` files: ```groovy title="main.nf" output: tuple val(meta), path("*.html"), emit: html tuple val(meta), path("*.zip") , emit: zip path "versions.yml" , emit: versions // [!code --] tuple val("${task.process}"), val('fastqc'), eval('fastqc --version | sed "/FastQC v/!d; s/.*v//"'), emit: versions_fastqc, topic: versions // [!code ++] ︙ cat <<-END_VERSIONS > versions.yml // [!code --] "${task.process}": // [!code --] fastqc: \$( fastqc --version | sed '/FastQC v/!d; s/.*v//' ) // [!code --] END_VERSIONS // [!code --] ``` We updated the modules template, so if you run `nf-core modules create{:bash}` you will get the new syntax. For now nf-core linting accepts both ways to collect versions, but gives a warning if you use the old syntax. In your pipeline you can mix and match modules with the new and old syntax. :::note Note that the channel called `versions` is now renamed to `versions_`, so you will have to adapt the channel names if you want to continue using the old logic for collection versions. ::: We will slowly migrate all nf-core modules to the new syntax. For more information on how to migrate and handle these new modules, please refer to the [migration guide](/docs/tutorials/migrate_to_topics/update_modules). For more information about topic channels in general, we have a dedicated [blogpost about it](/blog/2025/version_topics). We outlined future adoptions of new Nextflow syntax elements in the nf-core infrastructure talk during the Nextflow summit 20205 and in an upcoming blog post. ## AWS bug fix With nf-core/tools v3.4.0, we introduced a bug in the `nextflow.config` for pipelines using iGenomes, leading to permission errors when querying AWS S3 buckets. This has been fixed in this release. ## Changes on sync PRs When a new release of nf-core/tools is out, all nf-core pipelines receive a sync PR with the updates in the template. Before, these sync PRs were closed if they were not merged when a new one was opened. Now, we will keep all PRs open. This will make it easier to do incremental template updates, which will help bringing older pipelines up to date. As a new detail, the sync PRs will also include a link to the tools release blogpost, to make it easier for pipeline maintainers to access documentation about the updates. To learn more about these decisions, you can read the above mentioned [blogpost about the Nextflow syntax roadmap for nf-core](https://nf-co.re/blog/2025/nextflow_syntax_nf-core_roadmap). ## Patch release 3.5.1 Before we actually started the automated template sync, we found a small bug in the updated version of `nf-core pipelines sync`, which was fixed in 3.5.1. These are the only changes in this patch release. ## Changelog You can find the complete changelog and technical details [on GitHub](https://github.com/nf-core/tools/releases/tag/3.5.0). As always, if you have any problems or run into any bugs, reach out on the [#tools slack channel](https://nfcore.slack.com/archives/CE5LG7WMB). ## Resolving conflicts on pipeline sync PRs With this release, there are few merge conflicts, but they are still worth mentioning: ### Changing `Channel` to `channel` As part of the new Nextflow strict syntax, we have changed all the mentions of `Channel` to `channel` (note the lower case `c`). This might have introduced some merge conflicts in your `*.nf` files if you have modified them from the template, for example in `subworkflows/local/utils_nfcore_$PIPELINE_pipeline/main.nf`. - Resolution \* Keep your changes, and afterwards change all mentions of `Channel` to `channel`.

Topic: Topics

Wed, 19 Nov 2025 10:00:00 GMT

import topicspooh from "../../../assets/images/blog/hackathon-topic/topics_pooh.jpg"; import { Image } from "astro:assets"; :::tip{title="TLDR;"} - We are adopting topic channels to replace the versions.yml. - For now both options are valid, but this will change. - No more mixing of version channels. ::: Winnie the Pooh meme on topics usage. Upper winnie is sad working on some topic on the hackathon, lower winnie is sophisticated as working on the topic channels

Winnie the Pooh meme on topics usage. Upper winnie is sad working on some topic on the hackathon, lower winnie is sophisticated as working on the topic channels

# What are topic channels Say goodbye to writing a version.yml for each module and mixing the channels together. To improve here, we will be adopting topic channels. Topic channels can collect values from multiple sources automatically, just by sharing the same topic name. This means no more tangled channel wiring or mix operator chains. With topic channels, you can now broadcast and collect data across your pipeline more naturally. ```groovy title="main.nf" process { output: val('hello'), topic: my_topic // ... } process bye { output: val('bye'), topic: my_topic // ... } ``` See also the [channels documention](https://www.nextflow.io/docs/latest/reference/channel.html#topic) and the [migration tutorial](https://nf-co.re/docs/tutorials/migrate_to_topics/update_modules) # How to adopt your module :::tip{title="Workflow"} 1. Adapt main.nf 2. Perform semi-automatic meta.yml update 3. Catch version in tests/main.nf.test 4. Update snapshot 5. Fix dependent nf-core/subworkflows ::: Let's get to work. Grab any nf-core module ## 1. Adapt main.nf No need to write to `version.yml` files any more, we emit a topic channel in the output section instead. ```diff title="main.nf" output: tuple val(meta), path("*.html"), emit: html tuple val(meta), path("*.zip") , emit: zip -path "versions.yml" , emit: versions +tuple val("${task.process}"), val('fastqc'), eval('fastqc --version | sed "/FastQC v/!d; s/.*v//"'), emit: versions_fastqc, topic: versions ︙ -cat <<-END_VERSIONS > versions.yml -"${task.process}": - fastqc: \$( fastqc --version | sed '/FastQC v/!d; s/.*v//' ) -END_VERSIONS ``` ## 2. Semi-automatic meta.yml update Update the meta.yml using nf-core/tools. Then fill in the version information. You can update the meta.yml conveniently using `nf-core modules lint --fix` ```yaml title="meta.yml" versions_fastqc: - - ${task.process}: type: string description: The process the versions were collected from - fastqc: type: string description: The tool name - fastqc --version | sed "/FastQC v/!d; s/.*v//": type: eval description: The command used to generate the version of the tool topics: versions: - - ${task.process}: type: string description: The process the versions were collected from - fastqc: type: string description: The tool name - fastqc --version | sed "/FastQC v/!d; s/.*v//: type: eval description: The command used to generate the version of the tool ``` ## 3. Catch version in tests/main.nf.test This might be a little more tricky for some of your modules, but a good start to catch the version in the snapshot is: ```groovy title="main.nf.test" process.out.findAll { key, val -> key.startsWith("versions")} ``` ## 4. Update snapshot As no version.yml exists anymore, you will have to update the snapshot. ## 5. Fix dependent nf-core/subworkflows All nf-core/subworkflows which are calling your module will fail now because no version.yml is created to use the `mix` operator on. So, you should remove all occurences of: ```groovy title="main.nf" ch_versions = ch_versions.mix(FASTQC.out.versions.first()) ``` Make sure to update the snapshots of the subworkflows as well.

The nf-core roadmap for adopting the new Nextflow syntax

Mon, 17 Nov 2025 10:00:00 GMT

## Upcoming changes to Nextflow syntax During the Nextflow Summit 2025, Ben Sherman introduced all the changes that are comming to Nextflow with the next versions. There is also a [Nextflow blogpost](https://seqera.io/blog/nextflow-updates-strict-syntax-data-lineage/) talking about the changes that we will start seeing within the next Nextflow releases. As always, in nf-core, we try to keep up-to-date with the latest Nextflow features and standards. For this reason, we have thought of the best way to deliver all these goodies to all our nf-core pipelines through template updates, but making it as easy as possible for pipeline maintainers to adopt them. ## nf-core syntax adoption The syntax changes will be introduced into Nextflow gradually, and we will do so in nf-core: For every one of the changes, we will wait until the following release to adopt the changes in nf-core. 1. First, we will allow the new functionalities in the pipeline template, and nf-core linting will warn you if you are using the old syntax (without failing!). 2. Then, we will switch linting to a failure for old Nextflow syntax, making it mandatory to update your pipeline to the new one. Here is a detailed roadmap of the incoming changes: ![Line diagram outlining the different planned adoptions](../../../assets/images/blog/nextflow_syntax_nf-core_roadmap/timeline-light-bg.png) These are the changes that will be (or have been) included in each Nextflow version: - Nextflow v25.04: - topic channels - workflow outputs (preview) - Nextflow v25.10: - workflow params - workflow outputs - workflow `onComplete:` and `onError:` section - type annotations - typed process inputs/outputs - Nextflow v26.04: - record types - dataflow - typed processes - strict type checking In nf-core, we will adopt these changes in the following timeline: - 4th quarter 2025: - Topic channels are allowed (nf-core tools version 3.5.0 released in November 2025) - 2nd quarter 2026: - Topic channels are mandatory - Strict syntax is mandatory - Static types and records are allowed - New process syntax is allowed - 4th quarter 2026: - Static types and records are added to the pipeline template - New process syntax is added to the pipeline template - 2nd quarter 2027: - Static types and records are mandatory - New process syntax is mandatory ## How to adopt all these changes in a pipeline In order to make it easier for maintainers to adopt Nextflow syntax changes, we have implemented a couple of new changes to the pipeline template sync process: 1. The template sync PRs that are opened to pipeline repositories with every new nf-core/tools will remain open, even if a new tools release is made. - This will allow you to choose if you want to add these changes one at a time, or wait until you receive multiple tools releases and do a single template update in bulk. - We recommend that you keep up to date with template updates, to have smaller changes and make the update and review process smother. 2. All sync PRs will include a direct link to the nf-core/tools release blog post. - This blog posts include a description of the changes made in this release, and a list of tips on how to implemente these changes to your pipeline, as well as guidance on how to resolve merge conflicts. ## Where to ask for help See the following links to learn more or get help with these updates: - The [nf-core blog post](https://nf-co.re/blog) for each tools release - The [Nextflow documentation](https://www.nextflow.io/docs/latest/strict-syntax.html) - The [nf-core help desk hours](https://nf-co.re/blog/2024/helpdesk) - The [nf-core bytesize talks](https://nf-co.re/events/bytesize) - The [nf-core slack](https://nf-co.re/join) or [Seqera community](https://community.seqera.io/)

nf-core core team retreat - September 2025

Mon, 20 Oct 2025 19:00:00 GMT

import { Image } from "astro:assets"; import aifur from "@assets/images/blog/retreat-2025/aifur.jpeg"; Between September 22nd and 26th, the [nf-core core team](https://nf-co.re/governance#core-team) gathered in Stockholm to tackle the big questions facing our growing community. Over four days, we worked through challenges around scaling, developer experience, and sustainability, and came away with a clear set of priorities for the year ahead. :::tip{title="TLDR;"} - Get help writing grants for nf-core work! Join the new [`#grants`](https://nfcore.slack.com/archives/C09JX8L2L5V) Slack channel. - Pipeline-release PR review requirements dropping to 1 core/maintainer. - More emphasis on adherence to PR best practices. - New nf-core governance team: 📚 Documentation! 🎉 Join [`#team-docs`](https://nfcore.slack.com/archives/C09H9L3R614) to get involved. ::: # What did we talk about? We covered a wide range of topics during the retreat, from governance structure and hackathon strategy to documentation overhaul and technical infrastructure. Here are the key topics we discussed and the initiatives we're launching as a result. ## Governance and teams As nf-core continues to grow, we need to distribute responsibilities more effectively. We discussed team functions and identified new members to join existing teams, with a particular focus on bringing in contributors from regions underrepresented in the community. We're also establishing an on-call process for core team members to ensure better response times and clearer ownership of ongoing tasks. Spreading the workload more evenly creates more opportunities for upskilling and community leadership across a truly global team, particularly important as we continue to scale. :::success{.fa-hand-wave title="New team members"} We would like to welcome [Kurayi](https://github.com/KurayiChawatama) to the outreach team, and [Kübra](https://github.com/kubranarci), [Arthur](https://github.com/awgymer), and [Jim](https://github.com/prototaxites) to the maintainers team! ::: One major outcome from these discussions was the creation of a new documentation team. Community feedback made it clear that our documentation is a significant pain point, and we need a dedicated initiative to address this properly. You can read more about the team's mission and the plans for [Documentation: v2](#documentation-v2) below. ```mermaid flowchart BT core --> steering infrastructure --> core outreach --> core maintainers --> core safety --> steering documentation[documentation ⭐ NEW] --> core style documentation fill:#22ae6340,stroke:#22ae63,stroke-width:2px ``` ## Documentation: v2 Survey results and community feedback made it clear, our documentation needs work. People told us it's hard to find, sometimes unclear, and occasionally outdated. We're establishing [`#team-docs`](https://nfcore.slack.com/archives/C09H9L3R614) with a mandate to rebuild our docs from the ground up. This v2 docs project will implement a style guide, create a more effective structure, write new content while selectively migrating existing pages, and explore versioning (e.g., for component specifications) and automation options. The result will be documentation that's easier to find, more consistent and readable, creates a lower barrier to entry for new users, and provides more stability for experts. This will provide clear answers to common questions and free up our Slack community for more nuanced and/or technical questions and discussions (and memes). See [Documentation](https://nf-co.re/governance#documentation) for more information about the team. :::success{.fa-hand-wave title="Join the team!"} We are actively looking for new documentation team members. Join [`#team-docs`](https://nfcore.slack.com/archives/C09H9L3R614) and reach out to Chris and James to find out more. ::: ## Improving review and development practices Reviews are critical for maintaining quality, but we've heard your frustration. Overwhelmingly large PRs and multiple rounds of review can slow things down, and reviews often fall on the same people. We're making several key changes, including: - Reducing the requirements for pipeline release reviews from two reviewers to one (core team member or maintainer). - Promoting [conventional](https://www.conventionalcommits.org/en/v1.0.0/) PR titles to enable automated changelogs. - Adding more release automations. - Adding CI warnings for excessively large PRs. These changes will hopefully enable faster development cycles and quicker releases, allowing improvements to reach users more quickly. Review work will be more appealing, distributed more equitably across the community, helping to prevent burnout and ensure a fair workload distribution. New contributors will find it easier to get started and see their contributions merged, while we maintain the quality standards that make nf-core pipelines trustworthy. See the full [RFC](https://github.com/nf-core/proposals/issues/85) for more information. :::success{.fa-face-smile title="Thanks"} A special thank you to Johan Dahlberg from Pixelgen for joining us for these discussions. ::: ## Very Important Pipelines (VIPs) Some nf-core pipelines address urgent scientific needs and, through sheer popularity, reach a scale where a single maintainer or small team struggles to keep up. We developed the concept of 'VIPs' _(very important pipelines)_, pipelines that adopt team-based development practices, including co-developers, regular development meetings, and documented collaborative maintenance processes. We're developing guidance for these practices and will reach out to maintainers of widely used pipelines to suggest adopting them. This approach reduces the burden on individual maintainers, creates more sustainable development of critical pipelines, and lowers the bus factor for important infrastructure. See the full [RFC](https://github.com/nf-core/proposals/issues/89) for more information. ## Setting clearer pipeline development expectations Several pipeline proposals are getting stuck in review, and some pipelines have gone silent with unclear progress. Whilst people's intentions are unfailingly good, this can have a detrimental effect, as it blocks others from starting a similar pipeline under their own steam. Pipeline proposals will soon require two new components: a description of the MVP (Minimal Viable Pipeline) and a development plan with defined milestones. The idea of the MVP is to describe the smallest and simplest version that can be released initially, and the development plan puts that in the bigger picture. These requirements help developers release their first version quickly and prevent pipelines from stalling during extended development periods. Starting with smaller initial releases and providing regular updates also makes the review process faster and easier for both developers and reviewers. ## Infrastructure roadmap The infrastructure team has an ambitious [roadmap](https://github.com/orgs/nf-core/projects/73) for the coming year. Key projects include: - A new locally-running [schema builder](https://github.com/nextflow-io/schema-builder) with improved privacy. - A [config builder](https://github.com/nf-core/tools/issues/2731). - Progressive Nextflow syntax adoption (see below). - Migration to Seqera Containers across modules (see Migration from Biocontainers to Seqera Containers: [Part 1](https://nf-co.re/blog/2024/seqera-containers-part-1) and [Part 2](https://nf-co.re/blog/2024/seqera-containers-part-2) for details). - Expanding Pulumi automation for infrastructure and [nf-core operations](https://github.com/nf-core/ops). - Adding [`nf-prov`](https://registry.nextflow.io/plugins/nf-prov) and [`nf-co2footprint`](https://registry.nextflow.io/plugins/nf-co2footprint) tracking to the pipeline template. - Battle-testing template updates with volunteer pipelines before wide deployment ⚔️ These improvements will deliver better tooling, clearer migration paths for new features, more reliable infrastructure, and reduced friction when updating templates. ## Adopting new Nextflow syntax Nextflow is evolving, and we need to strike a balance between timely adoption and not changing requirements too frequently. To manage this, we've developed a phased rollout plan: - In October 2025, we'll start to introduce topics for modules and pipelines and strict syntax for modules. - In March 2026, we'll start to introduce strict syntax for pipelines with workflow outputs and inputs. - In March 2027, we'll start to introduce static types with the new process syntax. ```mermaid %%{init: {'themeVariables': {'fontSize':'10px', 'sectionBkgColor':'rgba(128, 128, 128, 0.5)', 'sectionBkgColor2':'rgba(128, 128, 128, 0.5)', 'altSectionBkgColor':'rgba(128, 128, 128, 0.0)'}, 'gantt': {'useWidth':600, 'topAxis': false}}}%% gantt title Phased rollout dateFormat YYYY-MM tickInterval 12month axisFormat %Y section Phase 1 Topics for pipelines :2025-10, 12M Strict syntax for modules :2025-10, 12M section Phase 2 Strict syntax for pipelines :2026-03, 12M Workflow outputs and inputs :2026-03, 12M section Phase 3 Static types process syntax :2027-03, 12M ``` We'll communicate these changes clearly through blog posts and documentation well in advance. This approach gives visibility into what's changing and when, allowing time to adapt without feeling rushed, while also offering the option to jump ahead with an all-in-one migration. ## Making it easier to ask for help We've heard from some people that they're able to contribute time, but it comes in short bursts. They want to contribute, but can't sign up for long-term maintenance. To help these individuals contribute to nf-core, we're developing an issue template that pipeline maintainers can use specifically for requesting assistance from others, such as for pipeline “stretch goals” or relatively standalone maintenance tasks. We'll also be looking to better document the various ways people can contribute their time and expertise to nf-core, and exploring AI to facilitate better, more descriptive issue generation. Taken together, we hope that these efforts will create better visibility of where help is needed, making it easier to match contributors with appropriate tasks, and ensuring that we maintain quality standards while welcoming contributions from anyone with the skills and time to help. ## Funding and grants There are funding opportunities available for open-source projects, but it's not always clear how to access them to support nf-core activities. We want to empower community members to apply for grants to dedicate focused work to and with nf-core, and support them in doing so. To this end, we're adding community guidance for applying for grants related to nf-core, including relevant boilerplate stats and text for applications. We are also creating a new [`#grants`](https://nfcore.slack.com/archives/C09JX8L2L5V) Slack channel for funding support, discussions, coordination, and identifying cross-community collaboration opportunities. This creates easier access to funding opportunities and a stronger, more visible community. # Looking Ahead These four days in Stockholm were incredibly productive, and we're excited about the changes coming to nf-core. Every initiative we discussed was driven by one goal: making nf-core better for everyone — whether you're a new user trying to run your first pipeline, a developer contributing modules, or a maintainer shepherding a large project. Stay tuned to the [`#announcements`](https://nfcore.slack.com/archives/CE6P95170) channel in Slack and the blog for updates as these initiatives roll out. And as always, feel free to reach out and thank you to everyone who contributes to making nf-core what it is. Much fun was had at the viking-themed restaurant Aifur in Stockholm.

Much fun was had at the viking-themed restaurant Aifur in Stockholm.

And yes, we maintained the Stockholm tradition of visiting the viking-themed restaurant Aifur. Much planning and some silliness ensued.

nf-core/tools - 3.4.0

Thu, 16 Oct 2025 10:00:00 GMT

This release brings significant improvements to the download command, a new devcontainer setup, and better ARM64 architecture handling. As always, if you have any problems or run into any bugs, reach out on the [#tools slack channel](https://nfcore.slack.com/archives/CE5LG7WMB). # Highlights - [Refactored download command](#refactored-download-command) - [New devcontainer setup](#new-devcontainer-setup) - [Improvements in ARM64 architecture handling](#improvements-in-arm64-architecture-handling) - [CLI convenience improvements](#cli-convenience-improvements) ## Refactored download command The `nf-core download` command has received a major overhaul (see the [blog post](/blog/2025/refurbushing-the-pipeline-download) for more details about the motivation and new approach), bringing several powerful new features and improvements: #### Nextflow inspect for container discovery The download command now uses `nextflow inspect` to discover containers used in pipelines, replacing the legacy regex-based approach. This provides more accurate and reliable container detection. :::note This requires Nextflow version 25.04.04 or later. ::: The old regex-based container discovery could sometimes miss containers or pick up false positives. With `nextflow inspect`, Nextflow itself tells us exactly which containers are used, making the process more robust and future-proof. #### Docker tar archive support You can now download Docker images directly into tar archives, making it easier to transfer and deploy pipelines in air-gapped environments: ```bash /--compress tar/ nf-core download --container-system docker --compress tar ``` ## New devcontainer setup With the [sunsetting of Gitpod](https://ona.com/stories/gitpod-classic-payg-sunset) (😢) we recommend using GitHub Codespaces instead. We updated therefore the devcontainer configuration to make the experience as seamless as possible. :::note If you want to run a Nextflow pipeline, we recommend to use `-profile singularity` when running in a codespace. Docker is not fully supported yet, and it can lead to unexpected behaviours. ::: ## Improvements in ARM64 architecture handling The template now includes better support for ARM64 architectures: - The generic `arm` profile has been replaced with a more specific `arm64` profile - A new `emulate_amd64` profile has been added for running AMD64 containers on ARM64 systems using emulation, which is better suited for Apple Silicon. This provides clearer options for pipeline users working with Apple Silicon Macs and ARM64 servers ## CLI convenience improvements ### `modules lint` and `modules bump-versions` all sub-tools If you want to bump the version of a series of modules which are based on the same tool, e.g., samtools, you can now just specify the tool name and the same command will be run on all sub-tools: ```bash nf-core modules lint samtools nf-core bump-versions samtools ``` Thank you [@nh13](https://github.com/nh13) for adding this feature! ### Command short-hands All main commands have now shorter aliases: - `nf-core pipelines{:bash}` -> `nf-core p{:bash}` - `nf-core modules{:bash}` -> `nf-core m{:bash}` - `nf-core subworkflows{:bash}` -> `nf-core s{:bash}` - `test-datasets{:bash}` -> `nf-core tds{:bash}` Thanks to the new version of [rich-click](https://ewels.github.io/rich-click/1.9/blog/2025/09/16/version-1.9/) for adding this feature (and nicer themeing)! # Pipeline template changes This release includes also some changes to the pipeline template. ### CI - We switched the `download_pipeline.yml` action to use the latest version of nf-core/tools instead of `dev`. - We fixed the failing `aws{full}test.yml` action by switching to organization-wide variables as inputs. ### Release announcements Mastodon announcements now include the pipeline description, providing more context when pipelines are released. # Miscellaneous ## Modules linting Modules can now use Nextflow's `exec:` blocks and we lint for possible conflicts with `shell:` blocks. Thanks to [@muffato](https://github.com/muffato) for implementing this feature! ## Linting The linter now uses the organization specified in `.nf-core.yml` when checking manifest names, homepages, and MultiQC config comments. This makes it easier to fork and customize nf-core pipelines for your own organization while maintaining proper attribution. Thank you [@rrahn](https://github.com/rrahn) for implementing this feature! ## Updated dependencies - **python**: 3.9 reached end of life, so we set 3.10 as the minimum version and also expanded to support Python 3.14 - **nf-schema**: bumped to 2.5.0 with improved help message creation for future Nextflow versions - **Minimum Nextflow version bumped to 25.04.0** to support the new download logic # Changelog You can find the complete changelog and technical details [on GitHub](https://github.com/nf-core/tools/releases/tag/3.4.0). # Patch release 3.4.1 We found two small bugs (one whitespace error in `nextflow.config` and the devcontainer setup for pipelines was faulty) and fixed them for 3.4.1. # Getting help to update your pipeline template All the template changes are listed in [the Template section of the changelog](https://github.com/nf-core/tools/releases/tag/3.4.0). Below, we have collected the most common merge conflicts that you may find and how to resolve them. ### \*.nf.test.snap - nf-test snapshots ##### Changes If you use MultiQC in your pipeline, your nf-test snapshots will be outdated, because we updated the MultiQC version to 1.31. ##### Resolution Regenerate the snapshots. ### nextflow.config ##### Changes - The generic `arm` profile has been replaced with a more specific `arm64` and `emulate_amd64` profile for better ARM64 architecture handling. - We also bumped the Nextflow version. - the gitpod profile has been removed ##### Resolution - If you had custom changes in your `arm` profile, move them over to the `arm64` profile, otherwise accept the changes from the template. - Accept the new Nextflow version (if yours is not higher) and keep - Accept the removed gitpod profile ### .github/workflows/awstest.yml and .github/workflows/awsfulltest.yml ##### Changes We replaced many of the `secrets.` variables with organization-wide `vars.` variables. ##### Resolution Accept the new variable references (`vars.` instead of `secrets.`). If you previously changed one of these values or added custom changes to these workflows, keep both the new variable references and your modifications. ### .nftignore ##### Changes Changes related to the MultiQC update have been made to this file. ##### Resolution Accept all changes from the template and keep any additional patterns you added. Be sure not to add an empty line otherwise pre-commit will complain ### main.nf ##### Changes New inputs have been added and the `PIPELINE_INITIALIZATION` subworkflow has been updated. ##### Resolution Accept new inputs and ensure the `PIPELINE_INITIALIZATION` subworkflow changes are integrated. Keep any custom logic you added while merging the template updates. ### .gitpod.yml ##### Changes Gitpod configuration has been removed in favor of GitHub Codespaces with devcontainers. ##### Resolution You can safely delete the `.gitpod.yml` file. ### subworkflows/local/utils*nfcore*\*\_pipeline/main.nf ##### Changes Utility subworkflows have been updated. ##### Resolution Accept changes from the template for the local utility subworkflows to ensure compatibility with the latest nf-core standards.

Running nf-core pipelines on Google Colab

Tue, 02 Sep 2025 11:00:00 GMT

import { Image } from "astro:assets"; import Profile from "@components/GitHubProfilePictureExtended.astro"; import colabmeme from "@assets/images/blog/nf-core-colab/colab-meme.jpg"; import disconnectmeme from "@assets/images/blog/nf-core-colab/colab-disconnected-meme.png"; import sleepmeme from "@assets/images/blog/nf-core-colab/dont-fall-asleep-meme.png"; ## Running nf-core pipelines in Google Colab Running and developing nf-core pipelines can be computationally intensive, requiring resources not easily available to students, newcomers, or participants in hands-on training workshops. Google Colab provides an affordable and sometimes even free, accessible way to leverage powerful cloud hardware for computational tasks, making it an attractive option for students, researchers, and anyone with limited local resources. To make it easier for people to access such resources, we have just published a new detailed tutorial for running and developing nf-core/Nextflow pipelines in Google Colab is available at [this link](https://nf-co.re/docs/tutorials/google_colab/nf-core-colab-guide)! In this blog post, we provide background into our own experiences of running and developing Nextflow and nf-core pipelines, describing their pros and cons, but that also motivated the creation of the tutorial. ### Why run pipelines in Google Colab Butch eats GPUs from the garbage while Tom is handfed in luxury

Butch eats GPUs from the garbage while Tom is handfed in luxury

Google Colab provides free credits for basic computational infrastructure, making it an interesting option for people with limited resources to run computational workflows. Furthermore, for a small subscription fee, you can even access the latest hardware each year at a fraction of the cost of buying a new PC. As bioinformatics and data science tasks become more resource-intensive, Colab offers a potentially cost-effective solution for learning and developing large-scale pipelines like those built with Nextflow. If you’re a student or developer working on a laptop, Colab can dramatically speed up your workflow. For example, you can write and test your code locally, then run it in Colab to take advantage of faster execution and larger memory—saving time and reducing frustration from crashes on limited hardware. Colab is also ideal for training workshops where participants may not have access to high-performance computing (HPC) clusters. Instructors can use real-world, large datasets instead of toy examples, thus giving everyone hands-on experience with industry-scale workflows. Ultimately, using Colab can help democratize access to advanced pipeline development and best practices, enabling more people to contribute to open-source projects like nf-core. ### Limitations of running pipelines in Colab

While Google Colab is a powerful and accessible platform, it does have some constraints that you should keep in mind. In the free tier, the user is subject to session timeouts of unpredictable frequency and limited runtime duration. The paid Pro tier, while coming with benefits, will also be subject to session timeouts if the tab your notebook is open in closes for a few minutes. This can lead to the stereotypical case of waking up to find your notebook timed out a few minutes after you went to sleep because you temporarily lost internet connection, or didn't plug your laptop all the way in! The biggest issue that will likely affect someone developing Nextflow pipelines in Colab is the lack of root access. Due to this lack of root access, it is not possible to run nf-core or any Nextflow pipelines using the typical `-profile docker` or `-profile singularity` container-based configuration profiles. This is because `sudo` access is needed to install these engines. Thankfully, we can still run pipelines with conda under `-profile conda`. However, as Google Colab does not support native conda functionality, you need to install the [condacolab](https://pypi.org/project/condacolab/) Python package to serve as a proxy. In our experience, this doesn't seem to perform any differently from a shell-based conda installation. For a step-by-step guide on setting up Conda in Colab, see the [Setting up Conda for Google Colab section of the official nf-core guide](https://nf-co.re/docs/tutorials/google_colab/nf-core-colab-guide#setting-up-conda-for-google-colab). ## Developing with VS Code and Colab ### Why use VS Code with Colab? Once you've installed your Nextflow, conda, and nf-core pipeline of choice, you're pretty much good to go to run any pipeline you desire. However, because of the slightly more common instability of the conda profile when running most pipelines, you're bound to have the pipeline crash at some point and will need to make a script edit somewhere to solve the issue. While you could get away with developing pipelines inside Colab's built-in terminal using editors like vim or nano, VS Code offers a more robust environment. Thankfully, the [vscode-colab](https://github.com/EssenceSentry/vscode-colab) Python library provides just the toolkit you need to take advantage of Colab's powerful hardware in the comfort of VS Code's rich software suite. This means you will have access to all your favorite extensions and syntax highlighting in a familiar, seamless GUI! The library makes use of the official [VS Code Remote Tunnels](https://code.visualstudio.com/docs/remote/tunnels) to securely and reliably connect Google Colab as well as Kaggle notebooks to a local or browser-based instance of VS Code. You can read more about the library and even help contribute to new features on its [GitHub repository](https://github.com/EssenceSentry/vscode-colab). ### Limitations of the VS Code Colab approach **Main limitations:** - No root access (no Docker/Singularity) - Session timeouts - `MPLBACKEND` issues - Conda is not native - Limited GUI for complex workflows While the `vscode-colab` approach is great, it does have it's downsides. The biggest issue you will face is frequent disconnections or crashing of the connection tunnel. We have seen that if you make sure to set up the other aspects of your Colab environment before starting up the tunnel, disconnections rarely happen (or at least the number is drastically reduced). This may be because Colab isn’t designed to reliably support connections from multiple clients or interfaces at the same time. At the time of writing this blog post, the most annoying issue is that you have to set up the whole VS Code environment with all the extensions from scratch with each run. The developer of the `vscode-colab` package did indicate that the ability to save profiles as config files is under development and will be added soon, so make sure to keep an eye on the repo for any such developments. Although not a huge issue, we find the tunnel construction time of 3-5 minutes to be a bit too long to wait. Other than these, the package works great and just about seamlessly gets the job done. For instructions on how to set up and use VS Code with Colab, see the [Running and Editing Pipelines in VS Code via Colab section of the official nf-core guide](https://nf-co.re/docs/tutorials/google_colab/nf-core-colab-guide#running-and-editing-pipelines-in-vs-code-via-colab). ## Final tips for a Smooth Experience ### Preventing Matplotlib Backend Errors in Colab When we explore the use of Google Colab for our own work, we encounter specific issues with some pipelines that use tools that have Matplotlib as a dependency. If you try to run some nf-core pipelines that use such tools with the conda profile in Colab, but without changing the Matplotlib backend, you may see an error like this: ```text ValueError: Key backend: 'module://matplotlib_inline.backend_inline' is not a valid value for backend; supported values are ['gtk3agg', 'gtk3cairo', 'gtk4agg', 'gtk4cairo', 'macosx', 'nbagg', 'notebook', 'qtagg', 'qtcairo', 'qt5agg', 'qt5cairo', 'tkagg', 'tkcairo', 'webagg', 'wx', 'wxagg', 'wxcairo', 'agg', 'cairo', 'pdf', 'pgf', 'ps', 'svg', 'template'] ``` This happens because these pipelines (such as `nf-core/scdownstream`) and their dependencies (like Scanpy) import Matplotlib or its submodules. In Colab, the `MPLBACKEND` environment variable is often set to `module://matplotlib_inline.backend_inline` to enable inline plotting in notebooks. However, this backend is not available in headless or non-interactive environments, such as when Nextflow runs a process in a separate shell. When a pipeline process tries to import Matplotlib, it checks the `MPLBACKEND` value. If it is set to an invalid backend, the process will fail with the error above. This is why you may not see the error with simple demo pipelines (which do not use Matplotlib), but you will encounter it with pipelines that use Scanpy or other tools that rely on Matplotlib for plotting or image processing. To solve this, always set the `MPLBACKEND` environment variable to a valid backend (such as `Agg`) before running your pipeline. This ensures Matplotlib can render plots in a headless environment and prevents backend errors. You can do this either by running the following in a code cell: ```python title="Set MPLBACKEND to Agg in a code cell" %env MPLBACKEND=Agg ``` Or alternatively, by running the following command in the terminal: ```bash title="Set MPLBACKEND to Agg in the terminal" export MPLBACKEND=Agg ``` ### Overcoming Colab's Storage Limitations Google Colab's storage is temporary and limited to around 100GB in most cases. It's important to regularly back up your results to avoid data loss. Mounting your personal Google Drive is convenient for small to moderate outputs, but may not be suitable for large workflow results, which can reach hundreds of gigabytes. For larger datasets, consider syncing to external cloud storage or transferring results to institutional or project-specific storage solutions. Additionally, if you plan on writing and developing your pipelines exclusively in Google Colab, make sure to use `git` and regularly commit and push your code, or alternatively test in Colab but save changes from your local PC and commit to prevent loss of ### Choosing the Right Google Colab VM Instance(Runtime) For Your Workflow Finally, make sure to pick the VM instance that works best for your task and set up your Nextflow run configuration file accordingly to make the most use of the hardware at your disposal. Google Colab offers several types of VM instances, each with different hardware profiles Choosing the right instance can significantly impact the performance and efficiency of your data analysis: - **Standard (CPU-only) instances:** - Typically provide about 2 vCPUs and 13 GB RAM. - Best for lightweight workflows, small datasets, or tasks without GPU needs. - **GPU-enabled instances:** - **Colab Pro** offers access to modern NVIDIA GPUs such as **T4** (16 GB VRAM), **L4** (24 GB VRAM), and sometimes **A100** (40 GB VRAM). - These instances usually pair with 2 to 8 vCPUs and 13 GB RAM, or more if High-RAM is enabled. - Ideal for deep learning, image analysis, or workflows that explicitly support GPU acceleration. - **High-RAM instances:** - Toggle available in **Pro** plans. - Expands RAM from 13 GB to 25–30 GB (sometimes up to 52 GB in Pro+). - May also increase the number of vCPUs, commonly 4 to 8. - Crucial for memory-heavy workflows such as large-scale genomics, transcriptomics, or single-cell data. When selecting an instance, consider the following: - **CPU count:** More vCPUs = better for multi-threaded steps. Colab Pro usually provides 2, but High-RAM and GPU runtimes may give 4 to 8 vCPUs. - **GPU / VRAM:** Essential only if your tools leverage CUDA; GPUs available include T4 (16 GB), L4 (24 GB), and A100 (40 GB). - **RAM:** Ensure data and intermediates fit in memory—RAM ranges from 13 GB (standard) to 30 GB (High-RAM in Pro, and up to 52 GB in Pro+). - **Session limits & compute units:** Even with Pro tiers, sessions time out and hardware is not guaranteed, so plan checkpoints and outputs accordingly. Match your instance type to your workflow’s requirements. Use GPU instances for compute-heavy tasks that utilize machine learning based approaches, high-RAM for large datasets or memory-intensive pipelines, and standard CPU instances for lighter or highly parallelizable nf-core workflows. --- ## Conclusion Overall, Google Colab could be an interesting option for people looking to run or develop nf-core pipelines but do not have easy access for sufficiently powerful hardware. We hope this blog post and the tutorial will help kick-start the community members interested in trying out Google Colab for their own nf-core work! If you have feedback, questions, or tips, please share them via the nf-core Slack. Your input helps improve the community! Happy pipeline development!

Maintainers Minutes: August 2025

Sun, 31 Aug 2025 09:00:00 GMT

The 'Maintainers Minutes' aims to give further insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## Overview Summer was actually not over as we only were a small group on that Friday. We talked briefly about the aftermath of the previous meeting regarding the test datasets. Edmund (@edmundmiller) reported that Cloudflare should be cheap, but we would need a wider room of people to discuss the pros and cons of the two leading options of AWS S3 vs. Cloudflare. And then we talked about plans for migrating the code base of modules. Upcoming in the nearish™️ future, we are expecting migrations including: 1. Linting and automatic formatting of all module code via the Nextflow language server 2. Adoption of 'topic' channels, in particular for `versions` reporting 3. Move towards Seqera containers In all cases the various new functionality is not 💯 ready for roll out yet, for example with Harshil Alignment not completely working in the Nextflow formatter, or our experimental implementations not yet being fully tested, such as with topics being implemented in the nf-core-utils plugin. Therefore there is no new news, but watch this space! At the end, Maxime as his usual self, showcased what he has been playing around with lately, and demoed nf-core-utils and the particular nf-test setup in sarek. ## The end We will continue to work on these topics in the coming months. As always, we are looking for community involvement, so we encourage anyone to join the discussion in relevant PRs and Slack channels and threads! \- :heart: from your #maintainers team!

Why are my pipeline download tests failing?

Thu, 28 Aug 2025 11:00:00 GMT

import { Image } from "astro:assets"; import old_nf_version from "@assets/images/blog/pipeline-download-refactor/nf-old-demo.gif"; import docker_download_in_action from "@assets/images/blog/pipeline-download-refactor/bamtofastq-in-action.gif"; The `nf-core pipelines download` command, used when you want to run an `nf-core` pipeline in an offline compute environment, has recently undergone a [substantial refactor](https://github.com/nf-core/tools/pull/3634) that has recently been merged into the development version of `nf-core/tools`. The following blog post outlines the updates to the command, and explains why your pipeline downloads tests might be failing ### Troubles with container detection and how `nextflow inspect` solves them The major problem of the downloads command is to find all containers a pipeline depends on; it must do this to be able to bundle the software as a standalone package that can be transferred to the offline machine. Nextflow allows a user to define container both in modules directly or in config files – as long as the container string is resolved at runtime, Nextflow does not care where it came from. While this dynamicism makes it possible for Nextflow users to write flexible code, it makes it difficult to determine from the source code which containers the pipeline is using. Until now, the download command has solved this problem by building up a codebase of complex regex patterns used to search through pipeline files for strings resembling container directives. While this strategy has worked for many years in the absence of a better solution, it has been prone to breaking when new edge cases were discovered or when changes were made to the pipeline template. For examples of the struggles of writing a container string catch-all regex, see for example the [Bowtie quote issue](https://github.com/nf-core/tools/issues/2392) or the [issue of race conditions when processing Seqera containers](https://github.com/nf-core/tools/issues/3285). However, as of [Nextflow 25.04](https://www.nextflow.io/docs/latest/migrations/25-04.html) the `nextflow inspect` command has been substantially extended to capture all containers used in a pipeline. With the downloads refactor, the `nextflow inspect` command has now also been integrated into the `nf-core/tools` codebase to replace the complex regex logic used previously. This makes the command both simpler and more reliable at the same time. :::warning{title="Why are the pipeline download tests failing?!"} Using an old Nextflow version

Each `nf-core` pipeline repository has a GitHub workflow that runs the `nf-core pipelines download` command on the pipeline (see for example the [`nf-core/rnaseq` workflow](https://github.com/nf-core/rnaseq/actions/workflows/download_pipeline.yml)). This workflow checks that the pipeline does not produce any errors when downloaded, to ensure that the pipeline can be used in an offline enviroment. The workflow is typically only triggered when a PR is made to the pipeline's main branch, i.e. only on release of the pipeline. The GitHub workflow currently uses the **`dev`** branch of `nf-core/tools`; originally to allow maintainers of `nf-core/tools` to quickly push patches when the regex logic broke. However, this means that _any_ changes made to the development version of `nf-core/tools` will directly take effect in the GitHub workflow. Since the refactor of the downloads code requires that the pipeline uses the **25.04** version of Nextflow, pipelines that do not comply with this will fail the download test. In time, there will be a pipeline template update that will require pipelines to use Nextflow `>=` 25.04, and to thus be compatible with the new command. The test itself will also be updated to use the `main` branch of `nf-core/tools` to avoid similar issues in the future. Until then, you can either voluntarily update the Nextflow version of your pipelines or ignore the failing test. ::: ### Added support for downloading Docker containers Docker container download for `bamtofastq` 2.2.0

Docker container download for `bamtofastq` 2.2.0

The download command has also been extended to support Docker containers, in addition to Singularity containers which was the only container system previously supported. The container system to use can be selected via the `--container-system` flag of the command, which now accepts the options `singularity`, `docker` and `none`. Alternatively, the container system can be selected via the interactive CLI prompts. The change means that `nf-core` pipelines now can be run on offline HPCs which support only Docker or Podman containers, making it easy to run `nf-core` pipelines in even more compute environments! :::tip{title="How are Docker images saved?"} Compared to Singularity containers which are files kept on your file system, Docker images are generally handled by the Docker daemon. However, via the [`docker image save`](https://docs.docker.com/reference/cli/docker/image/save/) command, Docker allows packaging images as `tar` archives. The `tar` archives can subsequently be loaded into another Docker daemon with [`docker image load`](https://docs.docker.com/reference/cli/docker/image/load/), or if you are running Podman with [`podman load`](https://docs.podman.io/en/v5.0.2/markdown/podman-load.1.html). The `nf-core pipelines downloads` command creates `tar` archives for each Docker image within the downloaded pipeline. The saved archives are placed within the `docker-images` directory of the download folder packaged along with scripts for loading them into Docker or Podman on the offline machine. ::: ### Further details More details on the changes can be found in the corresponding PRs on `nf-core/tools`( [#3634](https://github.com/nf-core/tools/pull/3634), [#3706](https://github.com/nf-core/tools/pull/3706), [#3696](https://github.com/nf-core/tools/pull/3696) ) If you find any bugs to the download command after the recent major changes, please tell us on the [#tools](https://nfcore.slack.com/archives/CE5LG7WMB) Slack channel or create an issue on [nf-core/tools](https://github.com/nf-core/tools/) detailing the problem.

Introducing nf-core Advisories

Thu, 21 Aug 2025 09:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import { Image } from "astro:assets"; import AdvisorySidebarHeader from "@components/advisory/AdvisorySidebarHeader.astro";

With Advisories, important information about technical issues in nf-core pipelines — like regressions, incompatibilities, or security alerts — now travels faster and further than ever.

:::tip{.fa-hourglass-clock title="Too Long; Didn't Read"} - **What is an advisory?** A structured, long-lived notice about significant technical issues in nf-core. Advisories help users avoid or resolve known problems by providing clear, searchable, and actionable information. - **Browse all advisories:** See the full list of advisories and filter by category on the [Advisory Listing Page](https://nf-co.re/advisories). - **How to add an advisory:** Anyone can contribute! Learn how to create and publish an advisory in the [nf-core documentation](https://nf-co.re/docs/tutorials/nf-core_components/publish_advisories). - **Stay up to date:** Subscribe to the [advisories RSS feed](https://nf-co.re/advisories/rss.xml) for instant updates about regressions, incompatibilities, or security issues. ::: ## Hello advisories! As much as we aim for stability — with thorough testing and continuous integration — bugs can still slip through. Once a pipeline version is released, it remains as is, meaning any issues discovered later will continue to affect that version indefinitely. While fixes are included in newer releases, older versions are still commonly used in practice. Unfortunately, information about known issues and possible workarounds for these versions can easily escape your attention — unless you happen to catch a Slack thread, scroll through a GitHub issue, or comb through a changelog. That’s far from ideal. **To fix that, we’ve introduced a new feature on the nf-core website: [Advisories](https://nf-co.re/advisories).** :::note If you haven’t come across the term advisory before: it’s commonly used in contexts like severe weather alerts, travel warnings, or those bold black-and-white parental advisory labels. In essence, an advisory is a heads-up: something important that might affect you, even if you’re not immediately aware of it. We’re now bringing that idea to nf-core! ::: Think of these as structured, long-lived notices for significant technical issues in pipelines, modules, subworkflows or configs. They’re searchable, easy to reference, and designed to help users avoid or resolve known problems. If it's your jam, we’ve also added an [RSS feed](https://nf-co.re/advisories/rss.xml) so you can stay in the loop without needing to monitor chats or repositories. This isn’t just about fixing things — it’s about being transparent, reducing repeated confusion, and making life easier for both users and maintainers. If you work in a regulated environment, the added traceability and documentation may also come in handy for compliance and audits. ## Background and motivation May 2025. At least in our corner of the world, Sweden, spring had just arrived. Daylight lingered a little longer, and the first flowers carefully stretched toward the sun. But just as nature hit its stride, nf-core pipelines didn’t. As the world shook off winter’s sleep, a configuration issue in our pipeline template emerged from hibernation, too. Soon, nf-core's Slack channels and GitHub issues were buzzing like a disturbed hive — not with bees, but with confused and concerned users. With the release of Nextflow version 25.04, our software had decided to join the seasonal chaos. Fortunately, identifying the culprit didn’t take long. A subtle configuration bug — previously tolerated up through Nextflow 24.10 — now had wide-reaching effects when pipelines were executed with the newly released version 25.04. Fixing the configuration for the future was one thing; reaching everyone affected was another. Like many open-source projects, we don’t have a clear picture of who’s using our code, which meant we had no direct line to those caught off guard. To make matters trickier, resolving the issue fully required releasing new versions of the affected pipelines — a process that can easily take weeks or even months to complete across all of them. In the meantime, users needed to override the default, buggy configuration. Simple enough — if only they knew. We had the fix in hand, but no easy way to get it into all the hands that needed it. It took weeks before support requests for that issue finally started to taper off. The experience left us with a clear lesson: identifying and fixing bugs is only part of the equation — communicating effectively with a diverse, globally distributed user base is just as critical. Had advisories existed at the time,[we would have created one for this issue](https://nf-co.re/advisories/process-shell-configuration) and quickly shared the link on all our communication channels. ## How do I find advisories? You can browse all existing advisories through our dedicated [overview page](https://nf-co.re/advisories), which presents each entry with a concise summary of the most important details. Looking for something specific? Use the category buttons at the top to filter out advisory types that aren’t relevant to your work — whether you're only interested in security issues, regressions, or anything else.

Example Advisory Title

Category: Pipelines

This is an example advisory demonstrating how advisories appear in the listing view with structured metadata and severity indicators.

Affects: Nextflow: 25.04.0

Published 2 months ago

Incompatibility

Severity: High

An example advisory card as it appears in the advisory listing, showing title, description, metadata, and severity indicators.

Clicking on a specific advisory brings you to a structured view with two complementary sections. In the main content area, you’ll find a clear, plain-text explanation of the issue: what went wrong, how it was triggered, and what users should expect. It’s written to be human-readable, even if the underlying bug wasn’t. To the right, a compact info box offers the technical essentials at a glance — including which nf-core components are affected, which versions of dependencies are involved, and when the advisory was published by whom. And to make sure important issues don’t go unnoticed, any advisory that impacts a specific pipeline version will also trigger a banner on that pipeline’s page. So even if you miss the [RSS feed](https://nf-co.re/advisories/rss.xml) or the advisories page, you’ll still see the heads-up in context. ## From Bug to Broadcast So how do advisories actually come to life? In short: anyone can create one. While in practice they’ll most often be written by maintainers of the affected pipelines or modules, the process is open to contributors who spot an issue worth sharing broadly. Some examples of issues that should be communicated as advisories include: - A newly discovered issue that leads to incorrect results in a module or pipeline. This includes bugs in tools bundled within a specific pipeline version. - Known regressions that occur with certain parameter combinations. - Incompatibilities between a particular pipeline version and a specific Nextflow version. - Security vulnerabilities or other issues affecting a dependency or container image. - Problems with executors (e.g., SLURM, AWS Batch) that require special considerations. These are the kinds of problems where a clear, structured advisory can help users avoid pitfalls and stay informed, but there’s no rigid rule on when to publish one — if you’ve identified a problem, understand its cause, possibly even have a solid workaround or fix, that’s a great time. **Advisories should focus on clarity, relevance, and practical value: if the issue is significant enough that users could waste hours chasing it down, it’s probably worth an advisory. Think of it as a way to save others from stepping into the same pothole.** [We have extra documentation available](https://nf-co.re/docs/tutorials/nf-core_components/publish_advisories.mdx) if you'd like to author an advisory. ## Looking Ahead Advisories are our way of making important technical issues more visible, more searchable, and more manageable — for users and maintainers alike. They’re not here to replace existing channels like Slack, GitHub issues, or blog posts, but to supplement them with a structured, lasting format that can stand the test of time (and version numbers). Think of them as bookmarks for lessons learned. Importantly, they’re not about finger-pointing or perfect code. Software evolves — and so does understanding. By sharing what went wrong, what to watch out for, and how to move forward, we hope to strengthen trust in the nf-core ecosystem and make life just a little easier for everyone working with our pipelines. So whether you’re contributing your own advisory or simply subscribing to stay informed, we hope this new system will help the community stay better connected, more resilient, and a bit less surprised by the occasional issues.

Empowering bioinformatics communities with Nextflow and nf-core

Wed, 06 Aug 2025 11:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import { Image } from "astro:assets"; import paperv2 from "@assets/images/blog/paper-v2/paper-v2.png"; import paperv2fig1 from "@assets/images/blog/paper-v2/paper-v2-fig1.png"; import paperv2fig2A from "@assets/images/blog/paper-v2/paper-v2-fig2A.png"; import paperv2fig2B from "@assets/images/blog/paper-v2/paper-v2-fig2B.png"; ## Empowering bioinformatics communities with Nextflow and nf-core We're thrilled to announce that our latest nf-core community paper has now been published in [**Genome Biology**](https://doi.org/10.1186/s13059-025-03673-9). This follows last year's [preprint announcement](https://bsky.app/profile/nf-co.re/post/3ksi4mr6uyl25). Genome Biology review title: Empowering bioinformatics communities with Nextflow and nf-core

Genome Biology review title: Empowering bioinformatics communities with Nextflow and nf-core

In this publication, we summarised the evolution and impact of Nextflow and nf-core from 2018 to mid-2025, highlighting why Nextflow and nf-core have become central to the bioinformatics ecosystem and beyond. They empower research communities to adopt FAIR (Findable, Accessible, Interoperable, Reusable) practices for high-quality workflows, infrastructure, and collaboration models. While we encourage you to read the paper, here we describe the manuscript highlights. ### Technical Advances The release of Nextflow DSL2 enabled the creation of [an extensive library of modules and subworkflows](https://nf-co.re/modules/), for which just like for our pipelines, we have developed a strict set of standards for consistency and compatibility when using nf-core/modules. These components are reusable across any Nextflow pipeline, meaning that they can be used by anyone, not just those developing full nf-core pipelines. The utility of these components is reflected by the growth of this collection, which at the time of publishing, included over 1,400 modules and around 80 subworkflows. ### Community growth nf-core has reached 2,600 GitHub contributors (including ~1200 members of the nf-core organization) and over 10,000 Slack users, illustrating the strength, engagement, and constant growth of the community. We also report that, at the time of the publication, 124 pipelines were available (and we have more since!) and that the community is expanding beyond biological applications, with new pipelines in fields as diverse as astrophysics, earth science and economics. Thanks in part to the nf-core community, we also report that Nextflow has become the most widely adopted Workflow Management System (WfMS). The bar plot below illustrates this, using citation counts as a measure for WfMS adoption. Google Scholar citation counts for bioinformatics workflow management systems. Sum of citations of the major publications of Galaxy, Nextflow, and Snakemake between 2018 and 2024.

Google Scholar citation counts for bioinformatics workflow management systems. Sum of citations of the major publications of Galaxy, Nextflow, and Snakemake between 2018 and 2024.

### Pipeline reproducibility and maintenance Reproducibility is at the core of nf-core, and in the paper, we summarized how the project’s infrastructure accomplishes this through extensive continuous integration (CI) and automated testing. These CI workflows check code quality, ensure compatibility with dependencies, test execution on multiple platforms, and automatically propagate guidelines updates. This also allows multiple institutions to collaboratively maintain the same pipeline, promoting a distributed maintenance model and ensuring long-term sustainability. Pipeline maintenance and usage. Major contributions to the nf-core/smrnaseq pipeline over time by different academic institutions or private companies. Data for individual contributors is collapsed to their institution (SciLifeLab: 3; QBiC: 2; Boehringer Ingelheim: 3; Seqera: 4; all the others: 1)

Pipeline maintenance and usage. Major contributions to the nf-core/smrnaseq pipeline over time by different academic institutions or private companies. Data for individual contributors is collapsed to their institution (SciLifeLab: 3; QBiC: 2; Boehringer Ingelheim: 3; Seqera: 4; all the others: 1)

### FAIRness and cross-community impact The collaboration with the [EuroFAANG consortium](https://eurofaang.eu/)—who work on decoding genotype-to-phenotype relationships of farm animals—has been fruitful in promoting the adoption of FAIR principles beyond data, extending them to workflows. This partnership inspired the creation of the [Special Interest Groups](https://nf-co.re/blog/2024/special_interest_groups) at nf-core. The new publication describes how the adoption of Nextflow and nf-core by EuroFAANG led to more efficient and reproducible cross-institutional working and analysis, as shown by the contribution to and/or use of nf-core pipelines in the figure below. We also report how other initiatives, such as the Darwin Tree of Life (DToL) project and Genomics England, are adopting nf-core components in their workflows. These adoptions demonstrate both the embrace of FAIR principles and nf-core's broader utility across diverse research communities. Pipeline maintenance and usage. Nextflow analysis pipelines used in the EuroFAANG consortia for the functional annotation of various species’ genomes

Pipeline maintenance and usage. Nextflow analysis pipelines used in the EuroFAANG consortia for the functional annotation of various species’ genomes

### Acknowledgements We would like to thank Björn E. Langer and Cedric Notredame for leading this effort, and all EuroFAANG and nf-core community members who made this work possible. ### Looking to the future The nf-core community, as always, is in constant evolution and we are already cooking up the next wave of advancements. Features such as pipeline chaining, the adoption of the Nextflow strict syntax, or the exploration of AI driven solutions to help our pipelines become even more FAIR are on the works. We are looking forward to seeing what developments in our next 5-year publication will be showcasing! **Full Citation** Langer, B.E., Amaral, A., Baudement, M.-O., Bonath, F., Charles, M., Chitneedi, P.K., Clark, E.L., Di Tommaso, P., Djebali, S., Ewels, P.A., Eynard, S., Fellows Yates, J.A., Fischer, D., Floden, E.W., Foissac, S., Gabernet, G., Garcia, M.U., Gillard, G., Gundappa, M.K., Guyomar, C., Hakkaart, C., Hanssen, F., Harrison, P.W., Hörtenhuber, M., Kurylo, C., Kühn, C., Lagarrigue, S., Lallias, D., Macqueen, D.J., Miller, E., Mir-Pedrol, J., Moreira, G.C.M., Nahnsen, S., Patel, H., Peltzer, A., Pitel, F., Ramayo-Caldas, Y., Ribeiro-Dantas, M. da C., Rocha, D., Salavati, M., Sokolov, A., Espinosa-Carrasco, J., Notredame, C., community, the nf-core, 2025. Empowering bioinformatics communities with Nextflow and nf-core. Genome Biology 26, 228. [10.1186/s13059-025-03673-9](https://doi.org/10.1186/s13059-025-03673-9)

Maintainers Minutes: July 2025

Thu, 31 Jul 2025 09:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import datahostingmeme from "@assets/images/blog/maintainers-minutes-2025-07-31/datahosting-meme.jpg"; import uptodatedocs from "@assets/images/blog/maintainers-minutes-2025-07-31/uptodate-documentation-meme.jpg"; import dataorganisationmessmeme from "@assets/images/blog/maintainers-minutes-2025-07-31/dataorganisationmessmeme.png"; import { Image } from "astro:assets"; The 'Maintainers Minutes' aims to give further insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## Overview After a short summer break, we returned with a special maintainers meeting dedicated to nf-core/test-datasets. nf-core/testdatasets is a GitHub repository that holds the majority of the files we use for the CI testing of our modules, subworkflows, and pipelines. Interacting with this part of our infrastructure is currently one of the less optimal developer experiences. This has been identified based on qualitative impressions from the community by the maintainers and core teams, confirmed with [results from the nf-core/survey](https://nf-co.re/blog/2025/survey-results-2025) from earlier this year, and our own experiences. As a first step in overhauling this experience we saw in the last major nf-core/tools release the addition of a new sub-command by Julian Flesch ([@JulianFlesch](https://github.com/JulianFlesch)) to help explore the available data files in the nf-core/test-datasets repository. However this only acts to alleviate the symptoms of the problems in identifying suitable files, knowing where to put new files, and what is within each file. Instead we want to restructure and develop clearer specifications, documentation, and procedures for the test datasets repository. Therefore this month's meeting was 'taken over' by the [#wg-test-dataset-task-force ](https://nfcore.slack.com/archives/C07B5FK9GKA) leads Simon Simon ([@SPPearce](https://github.com/SPPearce)) and James ([@jfy133](https://github.com/jfy133)) to start the process of redesigning the structure and documentation. ## Scope of discussions Some things we agreed on throughout the meeting to limit the scope of the discussions was 1. We are primarily trying to address _modules_ test-datasets (not pipelines) 2. We agreed we want to 'start from scratch' rather than try and adjust the existing repository ## Location Character from lord of the rings saying 'one does not simply change hosting providers'

Character from lord of the rings saying 'one does not simply change hosting providers'

One of the larger discussions we had was where should the test-dataset files go: should we move them to a new service? We defined a set of criteria that we wanted to meet with the new location: - Much faster to download (or clone) - Support directories - Free (or cheap) hosting - Doesn't charge for ingress/egress The pros of continuing using GitHub were: - ✅ Familiarity of our users with the interface (e.g. for reviewing) - ✅ It stays within our existing infrastructure - ✅ The 10 MB file limit is a _good thing_ (forcing developers to ensure their tests are fast) The cons of GitHub were: - ❌ (Currently) makes a very large repository for cloning - ❌ It only supports HTTPs/SSH interaction, so you cannot pass directories from the repository to Nextflow (where only S3 filesystems are supported for directory input) - ❌ The 10 MB file limit is a _bad thing_ (some developers cannot physically get their data files that small, e.g. imaging) - ❌ It is hard to view the contents of any non-raw textfile Alternative solutions were proposed: - HuggingFace - ✅ Suggested by Edmund ([@edmundmiller](https://github.com/edmundmiller)) as a similar interface to GitHub (thus would be familiar) - ✅ Much less restrictive file sizes (up to 5GB per file, and no max number of files) - ❌ But is outside our infrastructure - ❌ Is actually just a `git-lfs`, so actually doesn't provide much difference to GitHub (which also supports git-lfs) - ❌ It would require separate team organisation (not everyone could join and have access) - AWS S3: - ✅ Our test-datasets are actually already 'backed up' here - ✅ This is already relatively well supported by our infrastructure and Nextflow (e.g. directory inputs) - ✅ Anabella ([@atrigila](https://github.com/atrigila)) showed services such as [42basepairs](https://42basepairs.com/) that provides ways to see inside common bioinformatics file formats of files on S3 - ❌ We were very worried about ingress/egress costs (particularly in our very parallelised CI tests) - ❌ We did not have an immediate solution how community members could 'submit' to a controlled bucket (for cost reasons) - ❌ We weren't sure on the longevity of services like 42basepairs - Cloudflare R2 - ✅ No ingress/egress fees, 'flat rate' for hosting based on amount - ✅ S3 filesystem - ❌ Would maybe need to ask for open-source credits... but no idea if available Our main conclusions from these discussions were: 1. We turn on `git-lfs` already for the existing GitHub repository, to make it easier to at least clone it 2. Edmund would investigate the Cloudflare option to get more information on the pros and cons of this option ## Documentation Character from star wars saying 'up to date documentation, I've not seen this in a long time'

Character from star wars saying 'up to date documentation, I've not seen this in a long time'

Next we moved onto documentation. Trying to know what was in a module test-data file, how it was generated, and how it linked with other data files within nf-core/test-datasets is a common pain point for the maintainers and community members. Currently this relies on both the directory structure of the repository, and also a haphazardly and inconsistent README file in the root of the modules branch. We had a brainstorming session of what sort of information we would like to recorded about each test data file: - Keywords - Is it real or simulated data? - Is it a tool specific file vs a generic file? - Command(s) was used to generate - Version of the tool(s) was used for generation - Source location of any upstream files - Who created (author) - Bioinformatics specific metadata - Organism derived from - Whole genome - Chromosomes embedded - Individuals - Genome version - Panel - Support 'grouped' files (e.g. in bioinformatics paired-end reads, ped/bim/fam, bam/bai) We then thought about different ideas how to store such metadata: - Using stricter and descriptive naming file scheme, to record metadata about the file, and a table with aggregating all the files - A prose-based `README` markdown file next to each data file - A `meta.yaml` file next to each data file akin to nf-core/modules YAML files Our primary conclusion here was that we needed to consult the community as to what other attributes they feel they need for test-data files. In particular we will try to contact different disciplines e.g. via the Special Interest Groups - particularly outside of bioinformatics - to ensure a consensus. ## Structure Character from futurama saying 'Not sure I'm cleaning a mess or organising a mess'

Character from futurama saying 'Not sure I'm cleaning a mess or organising a mess'

Finally, we briefly touched on the structure for the repository. During the session there was a general feeling that we wanted per-tool documentation rather than one mega README file. Assuming we stick with a GitHub interface we wanted to remove the 'empty' `master` branch and have the module test-data files as the primary landing page. Simon ([@SPPearce](https://github.com/SPPearce)) also proposed having modules and pipeline test-data in separate locations to make it easier to find the right files and reduce the size. However the structure somewhat would depend on the location we choose, we will wait for the outcome of the location discussions before we continue this. For example, if we were to follow an object storage concept, it could be that we go with 'chaos' with no directory structure' and everything is organised and guided via the metadata with an user interface layer (as previously proposed by Maxime ([@maxulysse](https://github.com/maxulysse))). ## Additional considerations Other points that were brought up such as: - We should try to somehow 'version' test-datafiles - e.g. using GitHub URLs pointing to a specific hash, to reduce risks of test breaking if someone changes the contents of a test file (although this shouldn't happen) (Jon ([@pinin4fjords](https://github.com/pinin4fjords))) - We could maybe consider a 'spill over' location, in case we stick with GitHub and the 10 MB limits are too restrictive for some tools's test-data (which would reduce costs) (Louis ([@LouisLeNezet](https://github.com/LouisLeNezet))) - Is there a way to automatically identify data files that have never been used, so we can clean them to save costs (Famke ([@famosab](https://github.com/famosab))) - None of the maintainers present were aware of anything like this, but if a community member has an idea please let us know!! - Should we allow 'copying' of a tool's own test-data files or always make our own derived from our existing files (where possible) - Could we use an MCP agent to auto-annotate files with metadata as a first pass, some nf-core members have experience with these (Igor ([@itrujnara](https://github.com/itrujnara))) And of course, we agreed all the decisions above should be converted into [nf-core/proposal RFCs](https://github.com/nf-core/proposals/issues?q=is%3Aissue%20state%3Aopen%20label%3Anew-rfc) to facilitate wider community discussions (these will be announced on GitHub when posted!). ## The end All of the above are just starting points for discussions, and we will continue to work on these topics in the coming months. But we will need a large amount of input from the wider community to ensure the community gets the best experience they want, so we encourage anyone with thoughts and feedback on the above to join the [#wg-test-data-task-force](https://nfcore.slack.com/archives/C07B5FK9GKA) channel and post their ideas there! As always, if you want to get involved and give your input, join the discussion on relevant PRs and Slack threads! \- :heart: from your #maintainers team!

Meet the new meta.yml

Mon, 07 Jul 2025 11:30:00 GMT

## Introduction When nf-core modules were introduced, we decided to add a `meta.yml` which contains the metadata of the module. This file describes: - The tool(s) used in the module - The structure of the channels - The authors of the module This file has been changing recently. We identified improvements that could be made when describing the inputs and outputs of the module. And we updated how the channels are described. You can see the [first blog-post](https://nf-co.re/blog/2025/modules-ontology) we wrote about this changes for more context. The first change consisted on grouping the input and output elements by channel. Before, they were all listed at the top level. This change makes it easier to understand the channel structure of the module. But one last detail was still missing. We were not distinguishing between tuple channels and single element channels. Now, tuple channels correspond to lists in the `meta.yml` file. While single element channels are not included inside a list. You can see the changes in [our PR on the modules repository](https://github.com/nf-core/modules/pull/8747). This is an example of the `bwa/mem` module. See the difference beween tuple channels and single element channels such as `val sorted_bam` in the input and `versions` in the output.

```groovy title="main.nf" {9} process BWA_MEM { ... input: tuple val(meta) , path(reads) tuple val(meta2), path(index) tuple val(meta3), path(fasta) val sort_bam output: tuple val(meta), path("*.bam"), emit: bam, optional: true tuple val(meta), path("*.cram"), emit: cram, optional: true tuple val(meta), path("*.csi"), emit: csi, optional: true tuple val(meta), path("*.crai"), emit: crai, optional: true path "versions.yml", emit: versions ... } ```

```yml title="meta.yml" caption="New file structure" {26-29} name: bwa_mem ... input: - - meta: type: map description: Groovy Map containing sample information - reads: type: file description: | List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. - - meta2: type: map description: Groovy Map containing reference information. - index: type: file description: BWA genome index files pattern: "*.{amb,ann,bwt,pac,sa}" - - meta3: type: map description: Groovy Map containing sample information - fasta: type: file description: Reference genome in FASTA format pattern: "*.{fasta,fa}" - sort_bam: type: boolean description: use samtools sort (true) or samtools view (false) pattern: "true or false" output: - bam: - - meta: type: file description: Groovy Map containing sample information - "*.bam": type: file description: Output BAM file containing read alignments pattern: "*.{bam}" - cram: - - meta: type: file description: Groovy Map containing sample information - "*.cram": type: file description: Output CRAM file containing read alignments pattern: "*.{cram}" - csi: - - meta: type: file description: Groovy Map containing sample information - "*.csi": type: file description: Optional index file for BAM file pattern: "*.{csi}" - crai: - - meta: type: file description: Groovy Map containing sample information - "*.crai": type: file description: Optional index file for CRAM file pattern: "*.{crai}" - versions: - versions.yml: type: file description: File containing software versions pattern: "versions.yml" ... ```

To make the switch to the new structure easier for everyone using nf-core modules, you can add `--fix` to you modules lint command. ```bash nf-core modules lint --fix bwa/mem ``` This flag will try to fix all the possible lint failures related to the meta.yml file. ### For nf-core/tools contributors To help users create nf-core modules, we use a [Jinja2](https://jinja.palletsprojects.com) template. Jinja2 is used to create templates for modules, subworkflows, and pipelines. With all the recent changes and improvements, this template was starting to get overengineered, with lots of conditionals which make it difficult to follow the code of the file. For this reason, we decided to simplify the template, and handle the `meta.yml` file with a YAML library. Now, the module template `meta.yml` is the most [basic template](https://github.com/nf-core/tools/blob/dev/nf_core/module-template/meta.yml) for an nf-core module. And it doesn't contain any conditionals. This file is read as a YAML during module creation. All the possible conditions are checked in the [`generate_meta_yml_file()`](https://github.com/nf-core/tools/blob/dev/nf_core/components/create.py#L524) funciton. And the YAML object is updated accordingly. This will take into account: - If the `--empty` flag was provided or not, to add `TODO` comments. - If the module should contain a `meta`. - It will try to find input and output information on [bio.tools](https://bio.tools/). - It will try to complete the input and output files with [EDAM ontology](https://edamontology.org/page) terms.

nf-core/tools - 3.3

Tue, 03 Jun 2025 10:00:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; This release brings exciting new features focused on nf-test integration, test dataset management and modules from different repositories in subworkflows. As always, if you have any problems or run into any bugs, reach out on the [#tools slack channel](https://nfcore.slack.com/archives/CE5LG7WMB). # Highlights ## New nf-core infrastructure members This release marks the first release with contributions from two new nf-core infrastructure members. [@JulianFlesch](https://github.com/JulianFlesch) and [@ningyuxin1999](https://github.com/ningyuxin1999) are doing their PhDs at [QBiC](https://www.qbic.uni-tuebingen.de/) and will be working 50% for the nf-core infrastructure team. ## nf-test for pipelines Approximately 2 years ago, nf-core started using [nf-test](https://www.nf-test.com/) to test modules and subworkflows. Since then, there has been a huge work form the community to switch all nf-core components from testing with pytest, to testing with nf-test. During the 2024 Barcelona Hackathon in October, a group of contributors worked on a proof of concept to add pipeline-level nf-test tests. After this first attempt, other pipeline maintainers started to join in, and started implementing nf-test to their pipelines. With some of the most used pipelines in nf-core, and others, having implemented these tests, it is time for nf-core/tools to catch-up, and add this to the pipeline template. To improve the robustness of nf-core pipeline testing workflows and help developers catch issues early in the development process, we've added pipeline-level nf-tests to the pipeline template. This can be seen in four new template files: ```bash nf-test.config # The pipeline-level nf-test configuration file tests/ ├── .nftignore # ignored files for nf-test ├── default.nf.test # The default test for the pipeline, mirroring the setup in config/test.config └── nextflow.config # The nextflow configuration for the pipeline tests ``` Additionally, we changed the CI setup to use nf-tests with [sharding](https://www.nf-test.com/docs/cli/test/#sharding) to speed up the testing process. This means good-bye to good old `ci.yml` and hello to `nf-test.yml`. :::note The initial nf-test CI run will fail, because the pipeline repository doesn't have a snapshot for `default.nf.test` yet. To fix this, generate a snapshot with: ```bash nf-test test tests/ --profile=+ ``` The `=+` notation is to extend the Nextflow `-profile test` option, and not overwrite it. Then commit `tests/default.nf.test.snap`. ::: :::tip If you do not want to use nf-test for your pipeline, add the following to the `.nf-core.yml` file: ```yaml skip_features: - "nf-test" ``` Additionally, to ignore the linting checks for nf-test, add add the following to the `.nf-core.yml` file: ```yaml lint: files_exist: - ".github/workflows/nf-test.yml" - ".github/actions/get-shards/action.yml" - ".github/actions/nf-test/action.yml" - "nf-test.config" - "tests/default.nf.test" nf_test_content: False ``` ::: To lower the load on our GitHub runners (especially during hackathons), we will use self-hosted runners for nf-test GitHub Actions. These are automatically set up for all nf-core repository. See our recent blog post on [self-hosted GitHub Actions runners](https://nf-co.re/blog/2025/state-of-nf-core-ci#moving-to-runson) to learn more about our setup. Big thanks to everyone who contributed to this big addition to the nf-core template, especially [@maxulysse](https://github.com/maxulysse), [@adamrtalbot](https://github.com/adamrtalbot), [@SateeshPeri](https://github.com/SateeshPeri), [@GallVp](https://github.com/GallVp) and [@edmundmiller](https://github.com/edmundmiller) for testing and tinkering, especially with the GitHub Actions. :raised_hands: ## New `nf-core test-datasets` command Our newest infrastructure-team member [@JulianFlesch](https://github.com/JulianFlesch) has added a new CLI command to make it easier to find and integrate data sets from the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. ```bash nf-core test-datasets list nf-core test-datasets search ``` This new command provides streamlined functionality for: - **Dataset discovery**: Easily search and explore available test datasets List all datasets for a specific branch: ![`nf-core test-datasets list --branch mag`](/images/tools/nf-core-test-datasets-list-mag.svg) Search for a specific term and get more detailed information about the dataset: ![`nf-core test-datasets search --branch mag minigut_reads`](/images/tools/nf-core-test-datasets-search.svg) - **Dataset integration**: Easily use test datasets in your pipeline by providing the download URL ![nf-core test-datasets list --branch mag --generate-dl-url](/images/tools/nf-core-test-datasets-list-url-out.svg) or the path used in the pipeline test files: ![nf-core test-datasets list --branch mag --generate-nf-path](/images/tools/nf-core-test-datasets-list-path-out.svg) See the full documentation here: [`test-datasets list`](https://nf-co.re/docs/nf-core-tools/test-datasets/list), and [`test-datasets search`](https://nf-co.re/docs/nf-core-tools/test-datasets/search). ## Installing subworkflows with components from different remotes Previously, when installing a subworkflow from a custom repository, all of its components had to be from the same repository. This release adds support for subworkflows that use components from multiple repositories. Thanks to [João Cavalcante](https://github.com/jvfe) for implementing this feature and to everyone who provided feedback, especially [Arthur Gymer](https://github.com/awgymer) and [Matthieu Muffato](https://github.com/muffato). ### Changelog You can find the complete changelog and technical details [on GitHub](https://github.com/nf-core/tools/releases/tag/3.3.0). # Getting help to update your pipeline template All the template changes are listed in [the Template section of the changelog](https://github.com/nf-core/tools/releases/tag/3.3.0). Below, we have collected the most common merge conflicts that you may find and how to resolve them. ## Video walkthrough ## `.editorconfig` The `.editorconfig` file was deleted - The same functionality was replaced by adding the `trailing-whitespace` and `end-of-file-fixer` hooks, and with additional settings in `.prettierrc.yml`. #### Resolution You can delete the `.editorconfig` file and make sure that you accept the changes in `.pre-commit-config.yaml`. ## nf-test files If your pipeline added nf-test before this template update, you might see some conflicts in the following files: - `.github/actions/nf-test/action.yml` - `.github/workflows/nf-test.yml` - `.nftignore` - `tests/nextflow.config` - `nf-test.config` #### Resolution It is a good idea to accept the changes made by the template, since they might contain updates and bug fixes that we have found thanks to the pipelines that added nf-test before. Make sure to double-check the changes and understand them, and don't delete any extra content that you added to your tests. If you need help, do not hesitate to reach out on slack ## `ci.yml` The file `ci.yml` was removed and replaced by `nf-test.yml`. #### Resolution Once you have implemented nf-test, by updating the `default.nf.test` file, you can safely remove this file, this is replaced by `nf-test.yml`. ## `CITATIONS.md` #### Resolution If you have added additional citations, keep them! ## `modules.json` This file often has merge conflicts during template updates because it typically contains pipeline-specific information. #### Resolution Don't accept the changes. Instead, run the command `nf-core modules update --all{:bash}` to update all nf-core module after resolving the template update conflicts. This will make sure that all modules are updated and the correct version is specified in this file. ## `nextflow.config` #### Resolution Keep all your added parameters. Accept the additions that come with this template update: - We have added a new profile `gpu` - We have added a new value to the `errorStrategy` - The expression to include config files has changed - nf-schema can be bumped back to 2.3.0 - If you have modified the contributors, added more tests, etc. don't remove them. ## `README.md` We have replaced the Twitter badge for Bluesky and we have added an nf-core template version badge. This is one of the harder files to sync with the template, because it usually doesn't contain any custom content. #### Resolution Accept the changes to the Twitter badge and the nf-core template version badge. Keep your Zenodo URL and your custom text. ## `ro-crate-metadata.json` The file `ro-crate-metadata.json` is updated on `nf-core pipelines sync`. #### Resolution Accept the changes. ## GitHub Actions In the Github Action `download_pipeline.yml` we have removed the `pull_request_target` and added a step to upload the `nextflow.log` file. In all Github Actions (files inside `.github/workflows`) you will see several version updates. #### Resolution You can accept all version updates. ## Pipeline logos If you see changes in pipeline logos: accept them, except if you customized them. ## `modules.config` This file often has merge conflicts during template updates because it typically contains pipeline-specific content. #### Resolution Make sure to keep all your custom changes. ## `conf/base.config` We added the `withLabel: process_gpu` config to the base config. #### Resolution Accept the incoming gpu profile but keep your customizations. ## `.nf-core.yml` There might be a conflict between the different `nf_core_version` values. #### Resolution Accept the newer version.

Maintainers Minutes: May 2025

Tue, 03 Jun 2025 09:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import sad_maintainers_duck from "@assets/images/blog/maintainers-minutes-2024-06-07/sad-maintainers-duck.png"; import notifications from "@assets/images/blog/maintainers-minutes-2025-04-24/too-many-notifications.jpg"; import botoverlords from "@assets/images/blog/maintainers-minutes-2025-04-24/bot-overlords.jpg"; import saveapolarbear from "@assets/images/blog/maintainers-minutes-2025-04-24/save-a-polar-bear.jpg"; import { Image } from "astro:assets"; The 'Maintainers Minutes' aims to give further insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## Overview What was initially planned as a meeting focused on test data was unfortunately delayed due to key members being absent. Maxime performed a full takeover and redirected the meeting into a presentation of the recently introduced RFC initiative from the core team. But before all that, Louis gave us a quick run down on the pytest to nf-test tests migration effort that he has been recently spear-heading with the help of many community members. A tools release is on its way, coming with lots of goodness. ## Pytest to nf-test tests migration for nf-core/modules As of the day the meeting was held, only one module and one subworkflow were left to migrate 😱. So we are hoping to finally close this issue that was started two years ago. A HUGE thank you :people_hugging: for everyone who contributed to this huge effort of adding nf-test to [>1500 modules](https://nf-co.re/modules/)! ## RFCs aka Request For Comments This new initiative coming from the core team is meant to help formalize and present substantial ideas to improve the nf-core. A discussion can be started on Slack, which can lead to an issue on nf-core/proposals, which in the end can lead into a Pull Request. For more information, please read [the documentation](https://nf-co.re/docs/contributing/project_proposals). The important part is that we maintainers will discuss these proposals and coordinate with the core team to help port all of these to fruition. ## nf-core tools The next update is HAPPENING and we're quite happy about it, as a lot of goodness is coming with it. Please get ready to read the blogpost about it that should be going out soon. ## Maxime takeover That is all for this Maxime takeover, this meeting was shorter than usual, as there were less issues to discuss than usual. ## The end As always, if you want to get involved and give your input, join the discussion on relevant PRs and Slack threads! \- :heart: from your #maintainers team!

nf-core supports now `main` branch names

Mon, 19 May 2025 11:00:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; # nf-core stays woke We are happy to announce that nf-core is rolling out full support for the use of `main` branches in nf-core pipeline GitHub repositories! This brings us in-line with the rest of the GitHub programming community since the switch to the GitHub default back in [2020](https://github.blog/changelog/2020-10-01-the-default-branch-for-newly-created-repositories-is-now-main/). We recognise that changing the primary branch that is critical for the pulling of any Nextflow pipelines can feel intimidating. So for Gen Z we have made a short video and for Millenials and older, a documentation page describing the steps to switch the default branch of existing pipelines: - Watch the video: - Read the [Documentation](https://nf-co.re/docs/tutorials/pipelines/switching_master_to_main) For new pipelines, you can already select the primary branch on repository creation using `nf-core pipelines create`. If you have any problems, please shout on [#tools channel](https://nfcore.slack.com/archives/CE5LG7WMB) on slack.

Maintainers Minutes: April 2025

Fri, 02 May 2025 09:00:00 GMT

During the yearly spring cleaning, the team leads also take the opportunity to review the maintainers team cohort. We would like to thank the following people for their service and contributions during period on the maintainers team, and will be becoming maintainers team alumni (with hopes of this not being permanent!): - Anders Jemt - Anders Sune Pedersen - Carson Miller - Christopher Mohr - Lili Andersson-Li - Sofia Stamouli - Rob Syme - Gisela Gabernet - Harshil Patel But we are excited welcome the following new members, who have in light of their active community contributions, kindly agreed to join the team! Anabella Trigila Louis Le Nézet Pontus Freyhult ## Use of CODEOWNER files Captain Jack Sparrow from Pirates of the Caribbean running away from a large crowd of people with the texts 'being added to the nf-core/modules codeowners file' and 'notifications' overlaid on Captain Jack and the crowd respectively.

Captain Jack Sparrow from Pirates of the Caribbean running away from a large crowd of people with the texts 'being added to the nf-core/modules codeowners file' and 'notifications' overlaid on Captain Jack and the crowd respectively.

Maxime brought up the use of GitHub's [`CODEOWNER`](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners) files. These files are used to automate 🛎️ notifications to certain people listed in the file to ensure 'the right people' are informed to look at pull requests. This had previously been used within nf-core/modules, but was [eventually removed](https://github.com/nf-core/modules/pull/6738) due to complaints from prolific module creators getting inundated by notifications on update PRs of modules they were no longer interested in maintaining themselves. However Maxime and Edmund proposed resurrecting the use of these files in pipelines (possibly with opt-in). The rationale was that even though nf-core is generally 'permissive' with who is allowed to add code to repositories, when it comes to active and popular pipelines, it is important that the core development team keeps abreast of all developments. They described a few cases on a large pipeline where code was integrated by a community member and reviewed and merged by another community member without the core development team having an overview or keeping track 😱. We had a discussion of the pros and cons of such a file, and at the pipeline level there was a general agreement that this should at least be an option for pipeline developers. We discussed whether this should just be an opt-in during pipeline creation, or to rather opt-out but otherwise auto-propagate the list of users from the `manifest` section of the `nextflow.config` (using the `maintainers` contributor tag). There were still a few concerns that possible some developers will complain of getting too many notifications on their pipelines. We therefore decided that Maxime would ✒️ write a more formal proposal document, in the meantime volunteers will manually add the file to their pipelines (to see if their co-developers complain about notifications), and check the results in 2 months. ## Use of nf-core/modules issues as wishlists Screenshot of a news presenter from the cartoon The Simpsons with the text 'I for one one welcome our robot overlords'

Screenshot of a news presenter from the cartoon The Simpsons with the text 'I for one one welcome our robot overlords'

Famke raised an issue she and the nf-core/modules sub-team encountered during the spring cleaning. They found that there was a 📈 huge number of open issues on the repository requesting new modules, some very old, that were never assigned to anyone nor worked on. The large number made it very hard to go through, check the status of the various issues, and generally reasonably maintain a sense of order. The sub-team impression was that many people were using the issues page as 'wish list', where someone wants a module integrated into a pipeline, but didn't want to do it themselves (either due to experience or time). To reduce the maintenance burden for the maintainers team, we discussed a couple of options, including having a singular mega-issue where all module 'requests' were recorded, or only allowing a new-module issue to be created if the person was willing to work on it themselves. James' concern was that the list of 'unassigned' modules can sometimes be very useful for training on writing modules or for hackathons where novice participants don't have something specific to work on (or want to practice before working on a specific pipeline). Removing or 'hiding' these issues would mean these would potentially hinder teachers or hackathon newcomers effectively begin their contributions. Eventually we found a compromise via a 🤖 'stale bot' concept that was assigned to Famke to implement. New-module issues that are not immediately assigned to work on by someone will be automatically given a 'wishlist' label. If it remains unassigned after a period time (initially around a year to ensure we always have a list of 'free' modules for hackathons), the issue will be closed by an automated bot. The time period would reset if assigned to someone and then subsequently unassigned (if they could not complete it). Through this automated mechanism we hope it will make the nf-core/modules (our most popular repository!) more manageable for the community and the maintainers team, freeing up time to work on cooler things (or argue about naming things..., the maintainer team's favourite pasttime 😉). ## Saving more polar bears via optimised AWS megatest configs Photo of a polar bear face with the text 'Save a polarbear, optimise your AWS megatests' overlaid

Photo of a polar bear face with the text 'Save a polarbear, optimise your AWS megatests' overlaid

Finally, we briefly looked at a neat [little repository](https://github.com/FloWuenne/megatest-resource-optimization/tree/main/optimized_configs) that Florian (who unfortunately sent his apologies) has made that includes optimised resource-usage configuration files for nf-core pipelines AWS megatests. This was after observations that many pipeline megatest configurations could be rather wasteful in over- or under-requesting resources on AWS (and in general), resulting in wasted costs and energy by requesting more cpus and memory than their processes actually use (sad 🐻‍❄️). We discussed whether there should be an automation to generate an optimised config and updating a pipeline's `test_full` config with the optimised config, or some other mechanism to incorporate these into our pipelines. We decided to ask volunteers to implement Florian's existing optimised configs to see if they cause any problems on a subset of nf-core pipelines before rolling out, and I discuss further on Slack (in fact, the internal maintainers team Slack discussions were already fruitful and @Maxime and @Florian have already come up with a potential non-invasive system without requiring any extra work by pipeline developers - watch this space!) ## Misc Finally a couple of minor other points were briefly announced in the last couple of minutes: - James asked for 'live reviews' of his PR to the [test-data specifications](https://nf-co.re/docs/guidelines/components/test_data#replication-of-test-data) to improve community member's use of nf-core/testdatasets - TL;DR: module tests don't need to make sense, the tool just needs to 'work'! Re-use existing test-data as much as possible. - Júlia briefly summarised an upcoming small patch release for nf-core/tools to fix a couple of CI bugs that have cropped up in the last couple of weeks related to AWS megatests and Nextflow itself. ## The end As always, if you want to get involved and give your input, join the discussion on relevant PRs and Slack threads! \- :heart: from your #maintainers team!

State of the nf-core CI

Wed, 09 Apr 2025 11:00:00 GMT

## Introduction In order to scale the nf-core organisation, we make heavy use of automation. Every repository has a `.github/workflows/` directory with a set of YAML files that configure automated workflows using [GitHub Actions](https://github.com/features/actions). These do all kinds of things, the most visible being to run continuous integration (CI) tests every time you open a pull-request or push to `master`. Continuous integration tests are critical for validation of new code and automated quality checks that help to prevent code regressions. These help nf-core pipelines and modules to stay standardised and high-quality, whilst avoiding a lot of manual code checks and providing fast feedback to developers. ## The waiting game Of course, when something works well, we tend to use it a lot. This is certainly true with GitHub actions and nf-core. An average pull-request in nf-core/modules can kick off up to 20 automated jobs. PRs for nf-core pipelines can start up even more jobs. The nf-core GitHub organisation has an allocation of 60 action runners (this already includes a double allocation for open source projects). This mismatch between number of jobs launched and total concurrent capacity can very quickly create a long queue of jobs waiting to be picked up by a runner. This is especially noticeable during peaks of community activity, such as hackathons. One of the most popular memes during nf-core hackathons: ![Pablo Escobar from the series Narcos waiting in different positions with the caption "Waiting](../../../assets/images/blog/state-of-nf-core-CI/ci-waiting-meme.png) Over the years we have been working on improving the situation applying different strategies: ## 1. Self-hosted runners Thanks to a generous donation of AWS credits, we are in theory able to run these tests on self-hosted runners on AWS. ### Manual AWS instances We first [launched instances manually, based on demand](https://github.com/nf-core/tools/issues/1940). This was a good first step, removing a bit of the pressure during hackathons. But this didn't scale well, and we soon reached the runner limit also outside of hackathons. ### AWS instances managed by terraform This lead us to search for automatically scaling solutions. After a bit of trial we had a working solution using [philips-labs/terraform-aws-github-runner/](https://github.com/philips-labs/terraform-aws-github-runner). By using the "infrastructure as code" tool called [terraform](https://www.terraform.io/) we were able to set up AWS Lambda functions, that automatically create and destroy AWS EC2 instances with predefined configurations. These functions were triggered by a webhook to GitHub to trigger these functions. It did the automatic scaling for us, and we didn't need to add any extra setting files in the different nf-core repositories, like other approaches would have required. We just needed to install the specific GitHub App in the repositories and specify `runs-on: "self-hosted"{:yml}` instead of `runs-on: "ubuntu-latest"{:yml}` in the GitHub Actions workflows. During the Barcelona hackathon in September 2024 we even managed to [add GPU-enabled runners](https://github.com/nf-core/actions-runners/pull/10), which allowed us to test and add GPU-based tools like [parabricks](https://docs.nvidia.com/clara/parabricks/latest/index.html) to nf-core/modules and nf-core pipelines (e.g. [nf-core/sarek](https://github.com/nf-core/sarek/issues/1853)). However, the self-hosted runners solution hasn't been perfect. We kept running into user-permission errors, because the terraform setup came only with a rootless docker config. We also found that runners sometimes didn't pick up a job from GitHub, and debugging this was very tricky. The final straw was when the modules tests kept failing due to remnants of previous test runs and resulting permissions errors. ### Moving to runsOn As our manual runners become more problematic, we started to search for yet another solution. This time both [Edmund](https://github.com/edmundmiller) and me stumbled upon [runsOn](https://runs-on.com/). This service provides an AWS CloudFormation template to create a self-hosted runner in your own AWS account. So it is basically doing what we did before, but without the terraform setup and comes with really good documentation. We switched switched the CI tests for [nf-core/modules](https://github.com/nf-core/modules/pull/7840) to this new solution a week before the March 2025 hackathon and it worked like a charm. It was so smooth I had to check several times that the tests are actually running on the self-hosted runners. Switching to the new solution was as simple as changing: ```diff title="nf-core/modules/.github/workflows/nf-test.yml" - runs-on: "self-hosted" + runs-on: + - runs-on=${{ github.run_id }} + - runner=4cpu-linux-x64 ``` And if we wanted to add GPU-enabled runners, we just specify a different runner in the workflow. RunsOn furthermore allowed us to quickly support the initiative to add `linux/arm64` containers to nf-core/modules, which was a great effort by other nf-core members, by adding a ARM-based runners to the tests, e.g. to the [nf-core/rnaseq pipeline tests](https://github.com/nf-core/rnaseq/pull/1530). An additional benefit compared to the previous terraform setup is that we now use spot instances, which are both cheaper and faster to start up. There were still some issues where the runners behaved differently compared to the GitHub-hosted runners: - We had problems accessing some public AWS resource. We fixed this by setting: ```groovy title="nextflow.config" aws.client.anonymous = true ``` - Dependencies were missing for `setup-apptainer` (which was fixed by the [runsOn developer](https://github.com/runs-on/runs-on/releases/tag/v2.7.0)). But overall this new runner setup works really well. ## 2. nf-test sharding A further approach on speeding up the CI tests was introduced in [nf-test 0.9.0](https://github.com/askimed/nf-test/releases/tag/v0.9.0). This release added the option to split up the tests into multiple jobs using the `--shard` flag. Instead of running all tests one after the other, nf-test will now distribute the tests into multiple jobs and run them in parallel based on given maximum number of shards. This is especially useful for modules/subworkflows/pipelines that trigger a large number of tests. Let's look at the tests for the [FASTQC module](https://github.com/nf-core/modules/blob/master/modules/nf-core/fastqc/tests/main.nf.test) as an example. This module contains 12 tests, e.g. `sarscov2 single-end [fastq]`,`sarscov2 paired-end [fastq]` and their stub versions (and it is also included in 2 subworkflows, which would get tested during an update). Without sharding, we would run all 12 tests sequentially, which can take quite a while. With sharding, we can now run, for example, 4 shards/jobs in parallel, each with 3 tests and the tests from the subworkflows distributed over them. This will not lower the number of needed GitHub Actions runners, but we tackled that problem with self-hosted runners. Sharding will instead reduce the time it takes to run the tests for a single PR. To handle this scaling, we added [a sharding step](https://github.com/nf-core/modules/blob/master/.github/actions/get-shards/action.yml) to the CI workflows, that first gets the number of triggered tests by running nf-test in dry-run-mode. We then use this number to set the number of shards needed based on the number of tests and the `max_shard_size` parameter, which gives us a bit more control over the number of runners used and to avoid idle runners. The disadvantage of this approach is that it is not immediately clear which tests failed during a run, one needs to instead go through the logs of the different jobs. We tried to mitigate this by adding a summary of the tests in the GitHub Actions, but this needs some more polishing. Thanks to the work by [@GallVp](https://github.com/GallVp), [@sateeshperi](https://github.com/sateeshperi), [@edmundmiller](https://github.com/edmundmiller), [@adamrtalbot](https://github.com/adamrtalbot), [@mirpedrol](https://github.com/mirpedrol), [@maxulysse](https://github.com/maxulysse), and [@mashehu](https://github.com/mashehu), we now have this dynamic sharding step in the CI workflows for [nf-core/modules](https://github.com/nf-core/modules/blob/ab281000d296e2a6ab4efceb14c1151bd3a326da/.github/actions/get-shards/action.yml) and in the upcoming [pipeline template](https://github.com/nf-core/tools/blob/ae7760bcff980809f6dabdcaa96209b60a3d2d5a/nf_core/pipeline-template/.github/actions/get-shards/action.yml) for nf-core/tools 3.3.0. ### A different waiting game Both of these strategies already removed quite a bit of waiting time and hopefully in the future we can return to another hackathon evergreen meme: ![Bernie Sanders meme with the caption: "I am once again asking for PR reviews"](../../../assets/images/blog/state-of-nf-core-CI/pr-review-bernie-meme.png)

nf-core survey 2025: the results

Wed, 26 Mar 2025 11:00:00 GMT

import worldmap from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-worldmap.png"; import locationdistribution from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-locationdistribution.png"; import respondertype from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-respondertype.png"; import confidence from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-confidence.png"; import happiness from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-happiness.png"; import cloudglobalimprove from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-cloudglobalimprove.png"; import cloudglobalpositive from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-cloudglobalpositive.png"; import cloudglobalrequests from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-cloudglobalrequests.png"; import barglobaldetails from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-barglobaldetails.png"; import fineuserpositive from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-fineuserpositive.png"; import fineuserimprove from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-fineuserimprove.png"; import fineuserrequest from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-fineuserrequest.png"; import finedevpositive from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-finedevpositive.png"; import finedevimprove from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-finedevimprove.png"; import finedevrequest from "@assets/images/blog/survey-results-2025/nfcoresurvey2025-finedevrequest.png"; import { Image } from "astro:assets"; ## Introduction nf-core has a huge community of 11,000 members (based on Slack users), and this number does not seem appear to be diminishing. As the community gets bigger, it is getting harder to keep track of all the discussions happening across all of our Slack channels, and increasingly in face-to-face talks on Slack huddles and gather.town. Therefore the core team has decided to hold a yearly community survey to help provide us with an overview of the mood within the community and in which ways the 'wind is blowing' in terms of needs and requirements. With this information, we plan to develop a roadmap for the continued development and progress of the nf-core ecosystem and community over the next few years. ## Key points - We had 209 responders, representing 1.8% of our Slack community. - nf-core has a responder NPS score of 54, representing a very positive opinion of nf-core, and are likely to recommend the community to colleagues and peers. - The community itself was the biggest positive aspect of nf-core, with responders finding it a welcoming, friendly, and helpful group of people. - While the existing amount of documentation was appreciated, it remains the biggest target for improvement primarily in terms of discoverability and consistency. ## Methodology In mid-february 2025, we sent a simple survey out on Slack for all community members, regardless of whether they are users, developers, newcomers, or veterans of nf-core. This short 7 question survey asked the following main questions 1. Community member type (user, developer, or both) 2. Experience within nf-core (1 newcomer - 5 advanced) 3. What country based in (country list) 4. How likely to recommend nf-core (0-10 scale) 5. What is liked the most about nf-core (free text) 6. What difficulties have been encountered (free text) 7. Any other feedback (free text) The survey was open for approximately a month, with one reminder halfway through. The answers of the first four questions were aggregated and summarised as distributions. The free text answers from the latter three questions were assigned with one or more of various types of tags to group the responses into similar feedback topics. This tagging was independently performed by two core members and compared. Tagging was performed within Google Drive, and then all data was cleaned up, processed, and visualised R with the Tidyverse collection of packages. The code in the form of a Quarto markdown notebook can be seen [here](https://github.com/jfy133/nf-core-simplesurvey-2025/). ## Response statistics In total, we received 209 responses. Assuming these were all unique responders (the survey was anonymous), and with a maximum community size of 11,640 people based on Slack numbers in mid-March 2025, this corresponds to an approximate response rate of 1.8% of the community providing feedback. ## Geographic Distribution We had responders based in 36 different countries, spanning the Americas, Europe, Africa, Asia and Oceania. World map with countries that the survey received at least response from filled in green.

World map with countries that the survey received at least response from filled in green.

The top 5 countries with the greatest number of responders came from the USA, UK, Germany, Sweden, and Spain. Other than the USA, all other countries in the top 10 countries by of responders are in Europe, emphasising the origins of the community. Barchart of countries on Y axis and responder counts on the X axis

Barchart of countries on Y axis and responder counts on the X axis

## Responder type and experience To get a better idea of the type of experience or interaction the responders may have had of nf-core, we can look at how they classified themselves. Barchart of type of responder (user, developer, both, snakemake developer, no feedback) on Y axis and responder counts on the X axis

Barchart of type of responder (user, developer, both, snakemake developer, no feedback) on Y axis and responder counts on the X axis

We received a relatively equal spread of users, developers, those who both use and develop nf-core pipelines, and one apparently disgruntled Snakemake developer. We also wanted to understand the level of experience each of these groups felt they had by asking them to score their confidence in using or developing pipelines. Three barcharts for each type of responder (user, developer, both), with count on the Y axis and the self-reported confidence as a user or developer on the X axis

Three barcharts for each type of responder (user, developer, both), with count on the Y axis and the self-reported confidence as a user or developer on the X axis

After excluding the responses with no responder type and the snakemake developer, we can see that the developer and user/developer responders mostly felt relatively confident with developing within nf-core, users were more evenly spread but with a skew towards being feeling less confident in using nf-core pipelines. ## Overall happiness with nf-core The first feedback question we asked was a general 'how satisfied' or 'happy' you are with the initiative and community as a whole on a scale of 0-10. Based on the values, we can calculate a ['net promoter score'](https://en.wikipedia.org/wiki/Net_promoter_score) or NPS, which is a market research metric to evaluate the general satisfaction and approximate loyalty responders have. This is calculated by subtracting the percentage of 'low score' responders (0-6) from the percentage of 'high score' responders (9-10). Negative values are normally evaluated as general dissatisfaction amongst the responders, and scores of 60-100 represent strong loyalty and happiness of the responders. The NPS score from the survey is 54. Three barcharts for each type of responder (user, developer, both), with count on the Y axis and the likely to recommend to a peer score on the X axis

Three barcharts for each type of responder (user, developer, both), with count on the Y axis and the likely to recommend to a peer score on the X axis

When looking at the likelihood to recommend nf-core to a colleague score distribution across each of our responder types, we see that developers and user developers - both likely to be more comfortable bioinformatically have a strong skew towards being very likely to recommend nf-core. The general trend of recommendations by users is similar, but there are slightly higher numbers of less happy users giving only middling scores (between a score of 4-7). Overall, the responders are mostly positive and happy with the nf-core initiative and community. ## General Feedback So let's start with what the community likes most about nf-core across both users, developers, and user/developers? Word cloud with different words corresponding to positive feedback category types as evaluated by the reviewers at different sizes

Word cloud with different words corresponding to positive feedback category types as evaluated by the reviewers at different sizes

By far the most common feedback was the community 'feel' itself. Responders often mentioned that the community was just a nice place to be, with people being friendly and helpful (as reflected by other categories such as references to inclusivity, expertise, and speed of responses). Furthermore people also appreciated that the diverse number of pipelines are generally of high quality, are reproducible, and the infrastructure around them. Other appreciated factors were the ease of use of the pipelines and the documentation, as well as the consistency and familiarity of the pipelines (derived from the common template), and the ability to share and re-use components such as nf-core modules in other contexts. Next let's look what people felt was their biggest difficulties in using or developing within nf-core. Word cloud with different words corresponding to improvement feedback category types as evaluated by the reviewers at different sizes

Word cloud with different words corresponding to improvement feedback category types as evaluated by the reviewers at different sizes

Interestingly, by far the largest feedback was documentation - despite other responders saying they really appreciated the documentation! There were so many comments on this the reviewers end up splitting this into multiple different categories, and these are still some of the most commented on issues encountered when working with nf-core. These categories spanned from finding the whole ecosystem very overwhelming to get started with or it being unclear how to get involved (for both users or developers), and setting up nf-core pipelines on infrastructure. Another curious discordance with the positive feedback is that while people appreciate the consistency within pipelines due to the common template, people often felt the template was too complex. A related feedback was some users finding the pipelines were too large and complicated to get started with. Other common categories were related to release speed of pipelines and tools/template (typically that they are too fast and there are too many, although sometimes _not fast enough_) Communication in terms of how and who decisions within nf-core get made was also sometimes brought up, as well as problems with setting up and debugging the recently implemented nf-test tooling. Finally, confusingly a common issue was also Nextflow itself, however the reviewers felt this appears to be either due to a misunderstanding that `Nextflow != nf-core` or the strength of the feelings of Python developers that how can you _not_ develop something in Python. So what were the most common requests? Word cloud with different words corresponding to requests feedback category types as evaluated by the reviewers at different sizes

Word cloud with different words corresponding to requests feedback category types as evaluated by the reviewers at different sizes

The answers from this question had a much greater diversity than the other two questions, except for the most common one - unsurprisingly - better documentation. Another request was better communication - either through more transparent procedures, clearer standards, and more external advertising to pull in a wider diversity of research software engineers. A few people also requested pipeline chaining functionality, new pipelines, more merchandise on the [nf-core/shop](https://nf-co.re/shop) and memes (feed the obsession by posting [#nf-core-memes](https://nfcore.slack.com/archives/C03EZ806PFT)!) Interested in the actual numbers? See the image below. :::info{collapse title="Actual numbers"} Bar chart of counts of feedback category types as evaluated by the two reviewers

Bar chart of counts of feedback category types as evaluated by the two reviewers

::: ## Closer Feedback But what about the feedback from just users or just developers of nf-core? ## Users Users mostly appreciate the ease of use of pipelines, the community feel and expertise, and documentation and training. Bar chart of counts of feedback category types as evaluated by the two reviewers for users and positive tags

Bar chart of counts of feedback category types as evaluated by the two reviewers for users and positive tags

Even though many are happy with the documentation, much of the difficulties encountered were to related documentation - particularly for new users on how to run pipelines and how to install and configure for their infrastructure. Furthermore some user-only responders were somewhat concerned about the variable pipeline quality encountered between pipelines. Bar chart of counts of feedback category types as evaluated by the two reviewers for users and improve tags

Bar chart of counts of feedback category types as evaluated by the two reviewers for users and improve tags

Most users-only responders did not feel they needed anything more and only had positive sentiments about the community. But as with the the above - documentation and training improvements was the most important. Bar chart of counts of feedback category types as evaluated by the two reviewers for users and request tags

Bar chart of counts of feedback category types as evaluated by the two reviewers for users and request tags

## Developer Feedback For developers-only, once again the community feel and inclusivity alongside expertise was a big draw. The high quality of the open-source pipelines, modules, and infrastructure was also a big plus point Bar chart of counts of feedback category types as evaluated by the two reviewers for developers and positive tags

Bar chart of counts of feedback category types as evaluated by the two reviewers for developers and positive tags

The biggest issues encountered by developer-only responders were related to onboarding of new developers as well as documentation in regards to handling the complexity of the pipeline template. Another issue was tool release speed - some people complained there were too many changes too quickly, and others wanted new features faster. Bar chart of counts of feedback category types as evaluated by the two reviewers for developers and improve tags

Bar chart of counts of feedback category types as evaluated by the two reviewers for developers and improve tags

Finally, as with users - in terms of extra feedback the strongest category of feedback comments were just positive vibes. More merchandise ideas were requested, as well as few people requesting clearer and more consistent standards and specifications when developing. Bar chart of counts of feedback category types as evaluated by the two reviewers for developers and requests tags

Bar chart of counts of feedback category types as evaluated by the two reviewers for developers and requests tags

## So what's next? The core team will take some time process the results, including holding discussions with steering and maintainers teams. We aim to develop a roadmap with priority areas where coordination efforts will be driven towards to address the feedback. Once this roadmap is ready, we will present it to the community via another blog post for feedback and implementation. Thank you to all the responders to the survey, and we hope the results of this survey and future ones will make the community even more proud of our collective efforts to making user-friendly, powerful, and reproducible workflows.

Spring cleaning 2025

Thu, 20 Mar 2025 11:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import { Image } from "astro:assets"; import merged from "@assets/images/blog/springcleaning2025/maintainers-team-merged-stats.png"; Last week, the maintainers team volunteered their time on behalf of the community to spring clean the whole common nf-core ecosystem in preparation for the upcoming hackathon! **We would first like to thank everyone involved**, who helped to clean many of the nf-core github repositories by closing issues, updating, merging, or closing PRs. It was a tremendous effort. Thanks to all of you, the state of nf-core is in a much better and cleaner state than before - and just in time for our largest hackathon ever! In this blog post we want to briefly summarise the outcome of the spring cleaning, and describe some of the proposals put forward by the maintainers team to improve our processes and procedures. ## Organisation Prior the spring cleaning, we split up into different groups that were focused on particular tasks: - nf-core/pipelines - nf-core/modules - nf-core/documentation - nf-core/configs - Hackathon preparation - Core Team The teams were extremely productive throughout the week, as can be seen just with merged PRs alone from this [pullpo.io](https://pullpo.io/) graph: Bar chart of number of PRs merged by the maintainers team between March 10th-16th, with a total of 77 merged PRs throughout the week

Bar chart of number of PRs merged by the maintainers team between March 10th-16th, with a total of 77 merged PRs throughout the week

## Progress ### Pipelines 📑 The pipelines cleaning team were tasked with checking the state of each pipeline's repository. This involved assessing the number of stale issues, branches, PRs etc, as well as checking minor things such as that tags and descriptions were all present and correct. The pipelines team were also evaluating the development status of the pipeline. If a pipeline seemed to be very 'quiet', they began to contact the lead developers regarding whether the pipeline is still under active development or maintenance. ✅ By the end of the spring cleaning, almost half of the nf-core pipelines were assessed. Most of them were evaluated to be in good shape. Fun fact: of the 50 assessed, 5 were still written in Nextflow DSL1! Around 10 of them appear to be no longer maintained, and this list was passed to the core team for further evaluation about their status and whether they should be possibly archived. 🛠️ A yearly check seemed to be enough even if not all pipelines were checked this time round. For next year's spring cleaning, Jonas and Florian requested we ensure the team has more maintainers to work on this task. ### Modules 📑 The modules team were tasked with tackling the 🦣 modules repository of more than 1,400 modules, and hundreds of open issues and PRs. ✅ An **astounding** number of issues and PRs were reviewed and closed. Throughout the week this team reviewed 162 PRs, closing at least 62 of them, they also reviewed 213 issues, closing 72. Special shout-out to the MVPs of Famke (52 PRs and 108 issues!), Luisa, and Simon who started _early_ and powered through the huge number of issues and PRs. 🛠️ During our wrap up session we discussed a number of options to better handle the modules repository using automation. For example, we discussed potentially making better use of GitHub tags and [types](https://docs.github.com/en/issues/tracking-your-work-with-issues/configuring-issues/managing-issue-types-in-an-organization), to help both prepare spring-cleaning task boards, but also to potentially 'auto close' stale PRs and issues after a certain amount of time. We also discussed maybe requiring all new modules proposals/issues to be made on pipeline repositories and linked to the modules repository, to ensure the module has a direct application, and thus the repository issues does not get filled up with just 'ideas'. The modules maintainers maestros also pointed out there were inconsistencies in some places where modules were not following nf-core guidelines, so potentially we will ask the infrastructure team to add more linting rules. They also suggested a good hackathon task to bring older nf-core subworkflows up to speed to the latest [specifications](https://nf-co.re/docs/guidelines/components/subworkflows). ### Configs 📑 In comparison, the configs team of Maxime, Joon, and James had an easier time of cleaning up issues and pull requests in nf-core/configs due to the smaller nature of the repository. ✅ During the week, 4 issues were closed, in 9 issues the original authors were nudged for further information, 17 stale branches deleted, and 7 PRs merged or closed, with 10 outstanding. 🛠️ So while regular clean up is done more or less regularly by James, Maxime, and community member Pontus Freyhult, and this repo is smaller, it was still good to give it a good look through. Thanks to Joon for going beyond his normal responsibilities for this one! Despite the low-traffic nature of the repository, we did discuss the setting up of 'stale-bot' automation to remove abandoned PRs and issues here as well. There is a tendency in this repo of long-standing abandoned PRs/issues. However given that the community volunteers don't have access to all HPCs, it is not something we can test, finalise and merge in. Thus auto-closing issues and pull requests makes sense here. ### Hackathon preparation 📑 Nicolas led a team to try and add more tasks and issues to the [March 2025 hackathon project board](https://github.com/orgs/nf-core/projects/99). We are currently somewhere between 300-400 issues! ✅ This is still an ongoing project, and we will continue asking people to add tasks to project board leading up to the hackathon! 🛠️ So, please keep adding what you plan to work on during the three days! Furthermore, if you are looking for help on a particular task or project, add an issue to help community volunteers find you to help you out! ### Core team 📑 The core team did their routine checks and overhauls of our ['internal' documentation](https://nf-co.re/docs/checklists/community_governance/core_team), and other places that only people 'with the keys' can clean up. ✅ The GitHub and Slack were checked and updated where necessary, as well as the core-team and maintainer-checklists by Nicolas. Issues with the AWS megatests on the nf-core Seqera Platform workspace were investigated, and all older and broken computing environments were been cleared by Rike. All recent [proposals](https://nfcore.slack.com/archives/C06L02SNLVA) for [special interest groups](https://nf-co.re/special-interest-groups) were reviewed and website PRs opened by Jose. If you're interested in 'Core facilities', 'Cancer Genomics', or 'Immunology' then you might be interested in these newly accepted groups! Maxime went through all slack channels and closed unused channels. We also reviewed over 100 new pipeline proposals from the last 3 years! Of these, 37 proposals were moved to 'timed out', and 16 accepted proposals were flagged for archiving due to a lack of development. 🛠️ We discussed the current new pipelines proposal procedure, as we had many abandoned proposals and also the current GitHub board used for tracking them is not really coping with the history of >200 proposals. The core team will be investigating another solution to allow for better search, retrieval and tracking of all the proposals, possibly via a dedicated github repository (which can also use more automation). ### Tools 📑 Finally Júlia and Matthias led some standard nf-core/tools repo issues, branches, and pull requests clean-ups. ✅ During the week, 29 issues and 8 PRs were closed. Furthermore 34 branches were deleted. 🛠️ During the reporting session we discussed how to automate clean-up of merged branches (those pesky template merge PRs anyone?!), and everyone was pro-enabling auto-deletion merged branches everywhere in all repositories across nf-core. ## Summary This was by far the best and most active spring cleaning to date - and the maintainers team leads would like to say thank you on behalf of the whole community for the efforts of the maintainers members for taking the time and effort to do this. We hope this [TLC](https://en.wikipedia.org/wiki/Tender_loving_care) will set the community up for it's strongest year yet! \- :heart: from your @maintainers-team

nf-core weekly help desk is coming to Asia-Pacific!

Mon, 10 Mar 2025 02:00:00 GMT

## nf-core weekly help desk is coming to Asia-Pacific (APAC)! :::tip{.fa-hourglass-clock title="Too Long; Didn't Read"} **When**: - Wednesdays, 3.00 PM NZDT - Wednesdays, 1.00 PM AEDT - Wednesdays, 12.30 PM ACDT - Wednesdays, 12.00 PM AEST - Wednesdays, 11.30 AM ACST - Wednesdays, 11.00 AM AWST **Where:** - [Gather Town](https://app.gather.town/app/b2RCFGS2cIGusuNi/nf-core-office) **Slack:** - [`#weekly-helpdesk`](https://nfcore.slack.com/archives/C06KSS9L8M7) - [`#region-apac`](https://nfcore.slack.com/archives/C08DF6UQZDJ) ::: In these sessions, seasoned APAC community members will be available to discuss your code, running pipelines, writing configs, or anything else related to nf-core and Nextflow. They will hang out online during designated hours every week, ready to provide guidance and insights or just chat about your favorite pipeline. APAC office hours will follow the same informal structure as the European and American office hours. See [Office hours blog post](https://nf-co.re/blog/2024/office_hours) for more information. Things that might be discussed include: - You can't figure out how to run a pipeline - You'd like to write an institution config and like some insights - A specific concept doesn't make sense and you'd like to talk it over - You're working on a component or pipeline, you're stuck, and you'd like to get unstuck - You want to hang out with other Nextflow enthusiasts and get to know the faces behind the slack handles - You'd like to swap code reviews with someone to get your PR merged - Recent sports results The help desk hours take place in [Gather](https://gather.town/), an online collaborative platform, kind of like Pokemon-meets-Zoom. Follow our [Gather Town link](https://app.gather.town/app/b2RCFGS2cIGusuNi/nf-core-office) to join. See the [Gather Town bytesize talk](/events/2022/bytesize-37-gathertown) talk for more information about the platform. Join the [`#weekly-helpdesk`](https://nfcore.slack.com/archives/C06KSS9L8M7) for regular reminders about help desk. We are also pleased to launch a new [`#region-apac`](https://nfcore.slack.com/archives/C08DF6UQZDJ) Slack channel, a regional channel for the APAC nf-core community. Join the channel, say hi, and meet others in the region! The first week of the APAC help desk is **Wednesday, Mar 12, 2025**. We're looking forward to seeing you at APAC weekly help desk!

Maintainers Minutes: February 2025

Fri, 28 Feb 2025 17:00:00 GMT

Integrating ontologies into nf-core modules

Mon, 24 Feb 2025 15:30:00 GMT

## Introduction Each nf-core module includes a `meta.yml` file that describes its structure. However, the initial design of this file didn't reflect the actual organization of the module's input and output channels. Instead of grouping elements by channel, they were all listed at the top level. This made it difficult to understand the channel structure and led to problems such as missing descriptions for multiple meta maps.

```groovy title="main.nf" process BWA_MEM { ... input: tuple val(meta) , path(reads) tuple val(meta2), path(index) tuple val(meta3), path(fasta) val sort_bam output: tuple val(meta), path("*.bam"), emit: bam, optional: true tuple val(meta), path("*.cram"), emit: cram, optional: true tuple val(meta), path("*.csi"), emit: csi, optional: true tuple val(meta), path("*.crai"), emit: crai, optional: true path "versions.yml", emit: versions ... } ```

```yml title="meta.yml" caption="Old file structure" name: bwa_mem ... input: - meta: type: map description: Groovy Map containing sample information - reads: type: file description: List of input FastQ files. - index: type: file description: BWA genome index files pattern: "*.{amb,ann,bwt,pac,sa}" - fasta: type: file description: Reference genome in FASTA format pattern: "*.{fasta,fa}" - sort_bam: type: boolean description: use samtools/sort (true) or samtools/view (false) output: - meta: type: file description: Output BAM file containing read alignments pattern: "*.{bam}" - bam: type: file description: Output BAM file containing read alignments pattern: "*.{bam}" - cram: type: file description: Output CRAM file containing read alignments pattern: "*.{cram}" - csi: type: file description: Optional index file for BAM file pattern: "*.{csi}" - crai: type: file description: Optional index file for CRAM file pattern: "*.{crai}" - versions: type: file description: File containing software versions pattern: "versions.yml" ... ```

_(All files shown in this post are a *simplified version* of `main.nf` and `meta.yml` files, to show the structure of input and output channels.)_ In October 2024 we updated all nf-core modules ([Github Pull Request](https://github.com/nf-core/modules/pull/6674)) to ensure they properly define their input and output channel structures.

```yml title="meta.yml" caption="New file structure" name: bwa_mem ... input: - - meta: type: map description: Groovy Map containing sample information - reads: type: file description: | List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. - - meta2: type: map description: Groovy Map containing reference information. - index: type: file description: BWA genome index files pattern: "*.{amb,ann,bwt,pac,sa}" - - meta3: type: map description: Groovy Map containing sample information - fasta: type: file description: Reference genome in FASTA format pattern: "*.{fasta,fa}" - - sort_bam: type: boolean description: use samtools sort (true) or samtools view (false) pattern: "true or false" output: - bam: - meta: type: file description: Groovy Map containing sample information - "*.bam": type: file description: Output BAM file containing read alignments pattern: "*.{bam}" - cram: - meta: type: file description: Groovy Map containing sample information - "*.cram": type: file description: Output CRAM file containing read alignments pattern: "*.{cram}" - csi: - meta: type: file description: Groovy Map containing sample information - "*.csi": type: file description: Optional index file for BAM file pattern: "*.{csi}" - crai: - meta: type: file description: Groovy Map containing sample information - "*.crai": type: file description: Optional index file for CRAM file pattern: "*.{crai}" - versions: - versions.yml: type: file description: File containing software versions pattern: "versions.yml" ... ```

We also introduced linting checks to nf-core/tools to ensure the proper structure of `meta.yml` files. Together with these linting checks, we also introduced a new flag to the `nf-core modules lint` command: `--fix`. This flag will try to fix all the possible lint failures related to the `meta.yml` file. ### Introducing ontologies Together with these changes, we also added the [`bio.tools`](https://bio.tools/) identifier of the tool. `bio.tools` is a community-driven registry of bioinformatics software and data resources. It provides information about software tools, databases, analysis workflows, and services that are used in bioinformatics and the life sciences. ```yml title="meta.yml" {13} name: bwa_mem ... tools: - bwa: description: | BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. homepage: http://bio-bwa.sourceforge.net/ documentation: https://bio-bwa.sourceforge.net/bwa.shtml arxiv: arXiv:1303.3997 licence: ["GPL-3.0-or-later"] identifier: "biotools:bwa" ``` This bio.tools ID opened new possibilities, such as being able to know the inputs and outputs of the tool and the ontology terms for input and output files. We now use this information to generate a suggestion of input and output channels when creating a module with `nf-core modules create` (available as of February 2025 on the `dev` version of nf-core/tools and it will be released as part of nf-core/tools 3.3.0) So modifying the freshly created template of a module will be easier. Here we show an example of the template generated when creating a module for the BWA tool, using the same example as before. ```yml title="main.nf" process BWA_MEM { tag "$meta.id" label 'process_single' // TODO nf-core: See section in main README for further information regarding finding and adding container addresses to the section below. conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/bwa:0.7.18--h577a1d6_2': 'biocontainers/bwa:0.7.18--h577a1d6_2' }" input: // TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct tuple val(meta), path(sequence) tuple val(meta2), path(genome_index) output: // TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct tuple val(meta), path("*.{}"), emit: genome_index tuple val(meta), path("*.{}"), emit: alignment tuple val(meta), path("*.{}"), emit: sequence_coordinates tuple val(meta), path("*.{}"), emit: sequence_alignment path "versions.yml" , emit: versions ... } ``` And we populate a section `ontologies` of files described in the `meta.yml`. ```yml title="meta.yaml" {15-17,30-31} name: "bwa_mem" ... input: - - meta: type: map description: | Groovy Map containing sample information e.g. `[ id:'sample1' ]` - sequence: # TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct type: file description: sequence file pattern: "*.{fastq}" ontologies: - edam: "http://edamontology.org/data_2044" # Sequence - edam: "http://edamontology.org/format_1930" # FASTQ - - meta: type: map description: | Groovy Map containing sample information e.g. `[ id:'sample1' ]` - genome_index: # TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct type: file description: genome_index file pattern: "*.{}" ontologies: - edam: "http://edamontology.org/data_3210" # Genome index ``` Ontologies are specified under the ontologies key, which contains a list of dictionaries. Each dictionary represents a single ontology URL, with the key indicating the ontology type (currently [EDAM ontology](https://edamontology.github.io/edam-browser/#topic_0091)) and the value being the URL itself. This structure allows for easy adoption of other ontologies in the future. ## Using nf-core helper tools As mentioned before, we updated the nf-core/tools package to make creating and updating modules easier. - `nf-core modules create`: Will automatically fetch a `bio.tools` ID when possible, and use the information provided by this to populate intput and output channels in the `main.nf` and `meta.yml`. - `nf-core modules lint`: Will make sure that the channels defined in the `main.nf` are properly described in the `meta.yml`. - `nf-core modules lint --fix`: Will try to correct the inputs and outputs on `meta.yml` to match the `main.nf` file. It will add ontology URLs if missing, guessing them from the `pattern` value. ## Future potential These additions open the door to new possibilities. For example knowing the exact inputs and outputs of modules, subworkflows and workflows, makes automated chaining of these components easier.

Maintainers Minutes: January 2025

Fri, 31 Jan 2025 11:00:00 GMT

nf-core/tools - 3.1.0

Tue, 10 Dec 2024 13:00:00 GMT

This release comes with some new features and smaller pipeline template, which shouldn't cause too much of a headache. As always, if you have any problems or run into any bugs, reach out on the [#tools slack channel](https://nfcore.slack.com/archives/CE5LG7WMB). # Highlights ## Research Object (RO) Crates in nf-core pipelines RO Crate is an open-source standard which we use to describe our pipelines and their components in a structured way and helps with automated provenance tracking. In our case, it describes all files that are in the pipeline template, as well as the pipeline authors, besides some general metadata. More information about RO Crates can be found [here](https://www.researchobject.org/ro-crate/). Thanks to [Phil Ewels](https://github.com/ewels), [Stian Soiland-Reyes](https://github.com/stain) and [Simone Leo ](https://github.com/simleo) for kicking off this effort and providing useful feedback! :::note This RO Crate describes the pipeline as a whole, not pipeline runs. For that kind of provenance, use the new [nf-prov plugin](https://github.com/nextflow-io/nf-prov), which is currently in development. ::: If you want to update the RO Crate for your pipeline, run `nf-core pipelines rocrate` (automatically run with `nf-core pipelines sync`). For more information about the command and how it generates the RO Crate, see [the docs](https://nf-co.re/docs/nf-core-tools/pipelines/rocrate). ## ORAS container URI support in `nf-core pipelines download` Previously, you needed to use a `https://` prefix, but thanks to the work by [@MatthiasZepper](https://github.com/MatthiasZepper) we can now use the `oras://` prefix. This requires singularity to be installed on your system. ## `main` as optional default branch With this release we extended the template and tooling to support either `main` or `master` as the default release branch for pipelines. If you want to use the `main` branch in your pipeline, make sure to set `defaultBranch = main` on the `manifest` section from the `nextflow.config` file. Afterwards, you can change the GitHub repository default branch to `main`. Then, you can run `nf-core pipelines sync` again. This will automatically change all the required links from `master` to `main`. ## `nf-core subworkflows patch` This new commandallows you to patch subworkflows in the same way as you would patch modules. # Miscellaneous - When running `nf-core pipelines create` you can now toggle all pipelines features on and off with one switch. - The template now includes an expanded contributors section in the manifest, as introduced in [Nextflow 24.10.0](https://www.nextflow.io/docs/latest/reference/config.html#manifest) for details. We added a new `TODO nf-core` comment, so please add the missing information to these fields. Completing these fields will allow us for example to improve the data in the RO crate, especially the ORCID. The newly added fields are: ```groovy title="nextflow.config" {2-11} manifest { contributors = [ [ name: '', affiliation: '', email: '', github: '', contribution: [], // List of contribution types ('author', 'maintainer' or 'contributor') orcid: '' ], ] } ``` - We moved `includeConfig 'conf/modules.config'` next to `includeConfig 'conf/base.config'` to not overwrite tests profiles configurations. Thanks to [Louis Le Nézet](https://github.com/LouisLeNezet) for adding this change. # Possible merge conflicts and how to resolve them ## `.nf-core.yml` We started to clean null values from the `.nf-core.yml` file, so if you get something like the following, you can accept this change. ```diff <<<<<<< nf-core-template-merge-3.1.0 + nf_core_version: 3.1.0 ======= - nf_core_version: 3.0.1 - org_path: null >>>>>>> dev ``` ### Resolution You can just accept the changes that don't include null values, i.e. the ones coming from `nf-core-template-merge-3.1.0`.

Maintainers Minutes: November 2024

Fri, 29 Nov 2024 11:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import { Image } from "astro:assets"; The 'Maintainers Minutes' aims to give further insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## Overview In the November meeting we discussed (amongst others) the following topics: - [Stricter SemVer versioning](#stricter-semver-versioning) - [Push for BAM to CRAM?](#push-for-bam-to-cram) - [Removal of `workflowCitation()`](#removal-of-workflowcitation) - [Recent CI approach changes](#recent-ci-approach-changes) - [Modules piling up](#modules-piling-up) ## Stricter SemVer versioning We immediately dove into what was expected to be a prickly topic: Jon Manning raised concerns about the 'loose' use of [Semantic Versioning (SemVer)](https://semver.org/), within nf-core pipeline release versioning. Jon pointed out that different pipelines take different approaches, with few adhering strictly to ‘true’ SemVer rules. The primary issue revolved around ‘major releases’. According to SemVer, transitioning from version 1.0.0 to 2.0.0 signifies a breaking change—a change that alters the software’s functionality such that existing usage patterns no longer work. However, many nf-core pipelines have used major version bumps to indicate substantial new functionalities or extensive back-end codebase refactoring, even when these changes do not disrupt user interactions. This sparked a discussion about the pros and cons of alternative versioning systems. James Fellows Yates and Jose Espinosa suggested that the ‘misuse’ of SemVer within nf-core stems partly from the technical nature of the SemVer specification, which can be challenging to apply to pipelines. This complexity often leads to misinterpretation, particularly when comparing the formal SemVer definition of ‘major’ releases to how they are currently used in pipelines. Florian Wuennemann, showcasing his enthusiasm for AI ([Nextflow advent calendar](https://flowuenne.github.io/nextflow_advent_calender/) anyone?), proposed using a large language model (LLM) to scan the codebase on dev branches and determine the appropriate versioning automatically. While this idea was met with skepticism from some AI critics in the group, it sparked a productive conversation about whether tooling and automation could assist developers in evaluating the appropriate version bump for a release. The group agreed that Jon would prepare a [PR](https://github.com/nf-core/website/pull/2842) where the community could collectively define ‘rules’ for different release types, adhering to SemVer principles as much as possible. (Note: The PR has since been merged, but it remains open for updates from the community.) Following this, Júlia Mir Pedrol and the infrastructure team committed to exploring tooling that could suggest the most appropriate version bump during the linting process before a new release. ## Push for BAM to CRAM? Maxime Garcia spoke on behalf of Friederike Hanssen (Rike) who raised the question of whether nf-core should initiate a widespread push to replace BAM files with CRAM files across all modules. [CRAM]() files are a highly compressed version of SAM and BAM files, offering significant reductions in hard drive usage for reference genome-aligned DNA/RNA sequencing data. Although CRAM files are natively supported by modern versions of the HTSlib SAMTools suite, many maintainers hesitated to enforce or advocate for their adoption despite the potential benefits. Concerns centered around the substantial effort that such a shift would entail. Adam Talbot and James exchanged knowing, bitter laughs over their experiences with ‘C’ programming and ancient DNA tools. The primary challenge lies in tool compatibility. Many bioinformatics tools used in nf-core pipelines either lack support for recent versions of HTSlib or have hardcoded BAM as the required input format. Transitioning to CRAM would not only involve updating nf-core modules but could also necessitate patching the codebases of numerous tools—often written in diverse programming languages, and many of which are no longer actively maintained. The group agreed to avoid a broad mandate for now. Instead, Rike was encouraged to propose ways to raise awareness of CRAM’s benefits during the review process. Additionally, there was consensus on emphasizing CRAM adoption in the module specifications to encourage its use where feasible. ## Removal of `workflowCitation()` Júlia presented her [proposal](https://github.com/nf-core/modules/pull/7094) to remove the `workflowCitation()` function from the nf-core pipeline template. This function was originally used to display citation information at the start of a pipeline run, encouraging users to cite the pipeline’s publication, the nf-core project, and relevant dependencies. However, it had not been actively used elsewhere in the pipeline template for some time. Although Júlia had previously inquired about its usage on Slack, she wanted to confirm with the maintainers whether anyone was still relying on it before removing it entirely. The response was a unanimous '[PURGE](https://github.com/nf-core/modules/pull/7094#pullrequestreview-2470115528) IT'. If you’ve been using this function for your own purposes, take note: it will no longer be part of the nf-core template starting with the next release of nf-core/tools! ## Recent CI approach changes Sateesh Peri provided an overview of significant changes to the nf-core/modules CI testing approach, implemented over a few weeks in November by a small team that included himself, Edmund Miller, Matthias Hörtenhuber, and a recent addition to maintainers team, Usman Rashid among others. ### Key updates to CI testing 1. Introduction of nf-test ‘Shards’ - The use of nf-test shards enables tests defined in the nf-test script to run in parallel instead of sequentially. - This significantly speeds up module development cycles by providing faster feedback, especially for failures, enabling quicker iteration and resolution. 2. `--changed-since` Feature for Smarter Testing - The `--changed-since` flag allows tests to focus on code changes made since a specified commit hash or branch name. - By default, this parameter uses HEAD, but developers can specify other targets for more targeted validation. - This feature improves CI efficiency by avoiding redundant tests and reduces the environmental impact of running tests. 3. Support for GPU-Dependent Module Testing - Modules requiring GPU-enabled tools can now be tested effectively. - A new gpu tag triggers these tests to run on ‘self-hosted’ runners outside GitHub Actions, utilizing nf-core credits generously donated by AWS. > The nf-core/modules CI can be seen in the modules repo `.github` folder [here](https://github.com/nf-core/modules/tree/master/.github/workflows) ### Pipeline CI workflow updates Pipelines using GPU-dependent modules must update their CI workflows to support GPU testing. Currently, nf-core/methylseq serves as a test case for these CI updates. The updated CI workflow involves two main github actions: - nf-test-shard: - Performs a dry run of nf-test with the `--changed-since` flag, filtering by tags. - It outputs the test shard and total number of shards. - nf-test: - Accepts inputs such as the profile, shard, and total shard count. - It filters by tags and runs the tests accordingly. There are now two YAML workflows in the repository: - `nf-test.yml`: - For non-GPU pipelines, running the nf-test-shard and nf-test actions. - `nf-test-gpu.yml`: - For GPU-dependent pipelines, combining the same shard and test actions but tailored for GPU testing. > The nf-core/methylseq CI updates are under review and will be included in the upcoming `v2.8.0` release. > In the meantime, the implementation PR can be viewed [here](https://github.com/nf-core/methylseq/pull/478) ### Current Challenges While these improvements mark significant progress, they aren’t without challenges. Some maintainers noted that the test names have become less informative, making it harder to quickly identify failing tests without navigating through several layers of the GitHub Actions interface. However, the team is already working on further enhancements to address these usability issues. ### Work in Progress The team emphasized that these changes are just the beginning, with additional refinements and optimizations still in the pipeline especially with more nf-test CI features like `--excludeTags` feature for easier handling of tests. ## Modules piling up To close out the discussion, our friendly neighbourhood module maestro Simon Pearce raised a concern about the growing number of open pull requests (PRs) in the module repository. Simon encouraged community members to spare some time to help review and merge as many PRs as possible before the new year. Even 10 minutes of effort could make a significant difference in reducing the backlog! Sateesh highlighted another issue: many PRs lack proper labels—such as the `Ready for Review` label—which could help developers filter and prioritize PRs that are ready for review. Sateesh suggested making an announcement to encourage contributors to add appropriate labels based on the status of their PRs, streamlining the review process. ## The end As always, if you want to get involved and give your input, join the discussion on relevant PRs and slack threads! \- :heart: from your #maintainers team!

Announcing: Weekly Helpdesk

Mon, 28 Oct 2024 15:14:00 GMT

We are restarting nf-core office hours under a new name and broader scope: _nf-core weekly Helpdesk_ on November 5th. These are weekly virtual drop-in sessions designed for nf-core users and developers seeking an opportunity to connect, chat, and work together. :::tip{.fa-hourglass-clock title="Too Long; Didn't Read"} **When**: - Tuesdays, 2 PM Eastern Time - Thursdays, 10 AM Central European Summer Time **Where**: - [Gather Town](https://nf-co.re/join#gather-town) - [Slack: `#weekly-helpdesk`](https://nfcore.slack.com/channels/weekly-helpdesk) ::: During the summer we trialed the [nf-core office hours](https://nf-co.re/blog/2024/office_hours). After gathering feedback, we decided to continue with them under a new name and broader scope: **nf-core weekly Helpdesk** In these sessions, some of the nf-core maintainer team and other seasoned community members will be available to engage in discussions about your code, running pipelines, writing configs, or anything else related to nf-core and Nextflow. They will hang out in Gather during designated hours every week, ready to provide guidance and insights or just chat about your favourite pipeline. Helpdesk sessions won't be structured. Things that might be discussed include: - A specific concept doesn't make sense and you'd like to talk it over - You're working on a component or pipeline, you're stuck, and you'd like to get unstuck - You'd like to swap code reviews with someone to get your PR merged - You'd like to write an institution config and like some insights - You can't figure out how to run a pipeline - You want to hang out with other Nextflow enthusiasts and get to know the faces behind the slack handles The Helpdesk hours will take place in [Gather](https://gather.town/), an online collaborate platform, kind of like Pokemon-meets-Zoom. You can join by following the Gather Town link [here](https://nf-co.re/join#gather-town). For more information, see the [Gather Town bytesize talk](/events/2022/bytesize-37-gathertown). Please join the [#weekly-helpdesk](https://nfcore.slack.com/channels/weekly-helpdesk) channel on the nf-core Slack to get reminders when sessions are about to start, and discuss the format / provide feedback.

Maintainers Minutes: September/October 2024

Mon, 28 Oct 2024 11:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import { Image } from "astro:assets"; The 'Maintainers Minutes' aims to give further insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## Overview In this double post, we discussed the following topics from the September and October maintainers meetings: - [Massive modules update](#massive-modules-update) - [Pipeline level test with nf-test](#Pipeline-level-test-with-nf-test) - [helpdesk is the new coffee-n-code](#helpdesk-is-the-new-coffee-n-code) - [New way of passing parameters to modules within nf-test scripts](#new-way-of-passing-parameters-to-modules-within-nf-test-scripts) - [Stalled PRs due to waiting for review](#stalled-prs-due-to-waiting-for-review) - [Replacing monolithic `conf/modules.conf` into per module-configs](#replacing-monolithic-confmodulesconf-into-per-module-configs) - [Modification of modules specifications regarding channels](#modifications-of-modules-specifications-regarding-channels) ### Massive modules update The nf-core/modules update taskforce have been hard at work, and has :drum: surprise :drum: massively updated the modules: - https://github.com/nf-core/modules/issues/5828 The list is still pretty long, but we will get there. ### Pipeline level test with nf-test A new plugin for nf-test was created to facilitate pipeline level test, it's [nft-utils](https://github.com/nf-core/nft-utils/), and it's already implemented in methylseq, rnaseq, sarek and some other pipelines. Maxime has made [a bytesize](https://nf-co.re/events/2024/bytesize_pipeline_level_tests_with_nftest) about it. ### helpdesk is the new coffee-n-code Stay tuned, it's coming pretty soon. ## New way of passing parameters to modules within nf-test scripts Our nf-core/modules have greatly benefited from the new nf-test implementation. However one pet peeve for many is the greatly increased number of config files that now comes with each module when installing on a pipeline (>100 file pipeline PRs anyone 😱). Fortunately Mahesh Binzer-Panchal has made a [proof of concept](https://github.com/nf-core/modules/pull/6788/) aimed at reducing the number of config files needed for testing different parameters. This approach consolidates configs that previously required separate files to test different parameters into a single `nextflow.config`, helping streamline testing processes. An implementation can be seen in the `bismark/align` module used in methylseq pipeline: - [param declaration in `when` block](https://github.com/nf-core/methylseq/blob/master/modules/nf-core/bismark/align/tests/main.nf.test#L28-L33) - [test `nextflow.config`](https://github.com/nf-core/methylseq/blob/master/modules/nf-core/bismark/align/tests/nextflow.config) By moving parameters into `when` blocks within the main nf-test script file, we can have a single `nextflow.config` file with the `ext.args` for the process set to the parameter defined in the `when` block. The proposal received unanimous approval. Mahesh will now begin to formalise this by updating the [nf-core/modules specifications](https://nf-co.re/docs/guidelines/components/modules), and we can start to begin adding additional linting tests to help transfer to this system. :::info This system only applies to modules nf-test, and should not be used at pipeline level! ::: ## Stalled PRs Due to Pending Reviews It has been noted that we occasionally receive almost-complete module PRs that stall because a reviewer leaves comments but does not return for a re-review. Even if the module author receives a second approving review, they may feel uncomfortable merging without the original contributor’s approval. After a brief discussion, we agreed to introduce a new reviewing guideline: if all reasonable comments have been addressed and another reviewer has approved, the PR should be merged after 1 month if the original reviewer has not returned for a re-review. If the original reviewer was temporarily unavailable during this period (which can happen!), the module can still be updated in a follow-up PR. ## Replacing monolithic `conf/modules.conf` with per-module configs A contentious point was brought up by the developers of [nf-core/rnaseq](https://nf-co.re/rnaseq) and [nf-core/methylseq](https://nf-co.re/methylseq). Both pipelines recently have experimented with modifying the ways in which modules and processes can get customised by config files. Currently, the nf-core template has a single file - `conf/modules.conf` - where all module-level configurations occur (specifying additional arguments, file publication specifications etc.). The two pipelines, however, have opted to split this single monolithic file into multiple files, with one config file per module e.g. [in this nf-core/methylseq PR](https://github.com/nf-core/methylseq/commit/7b276c97589b65153ae23b34fcf4f6da86bb8d70). The main goal of this approach is to make CI testing more efficient by allowing changes to a specific config to trigger only the related tests rather than the entire test suite (since all modules are currently dependent on a single config file). Another reason is to increase the modularity and portability of modules and their configurations. This proposal initially brought up a lot of consternation with most of the other maintainers present. The primary issue was the potential impact on usability, specifically regarding the ease with which users can locate customized parameters within each module. Currently, we can direct users to a single file within a single directory, simplifying their search. Splitting configurations into potentially hundreds of files could make it significantly harder for users to find what they need. It's also unclear if another file has an overriding setting for a module, for example that's defined in the subworkflow folder rather than the module folder. Additionally, this approach isn’t currently supported by Nextflow’s native capabilities, which would allow modules to automatically detect and load nextflow.config files alongside main.nf. As a result, a substantial amount of manual work would be needed to manually include each module’s config. A lengthy discussion ensued, weighing user readability against developer efficiency and which should take priority. For instance, efficiency improvements could be made at the level of nf-core/tools or GitHub CI configurations, where optimizations might achieve similar goals. Alternatively, waiting for native Nextflow support could eventually reduce the workload on developers. The sarek developers proposed a middle ground that has worked well in sarek: triggering tests on smaller chunks while maintaining findability. Their approach involves naming config files after each process and storing all configs within a `conf/modules/` folder. This setup makes configurations easy to locate and include while still achieving modularity. see sarek's module configs organization [here](https://github.com/nf-core/sarek/tree/master/conf/modules) A general agreement was that this new system is not yet palatable to enough for maintainers to propose pushing split configs to the template. However, `nf-core/rnaseq` and `nf-core/methylseq` can continue to refine this approach based on what `nf-core/sarek` developers have proposed. Developers agreed to revisit and review this approach again in a couple of months. ## Modification of modules specifications regarding channels Finally, we debated pros- and cons- of different ways of structuring input channels in modules. In the initial design of DSL2 nf-core modules, it was decided to require one input channel per file type, and not support 'mega-tuples', due to readability and portability. For example, this was found to be less readable: ```nextflow input: tuple value(meta), path(bam), path(bai), path(fasta), path(fai), path(dict) ``` Compared to: ```nextflow input: tuple value(meta), path(bam) tuple value(meta), path(bai) tuple value(meta), path(fasta) tuple value(meta), path(fai) tuple value(meta), path(dict) ``` Where ensuring each file was associated with it's companion could be facilitated with (relatively) standardised `multiMap()` calls. ```nextflow combined_channel.multiMap { meta, bam, bai, fasta, fai, dict -> bam: tuple( meta, bam ) bai: tuple( meta, bai ) fasta: tuple ( meta, fasta ) fai: tuple ( meta, fai ) dict: tuple ( meta, dict ) } .set { process_in } PROCESS ( process_in.bam, process_in.bai, process_in.fasta, process_in.fai, process_in.dict, ) ``` :::info `multiMap` should only be used in this way when directly providing process inputs. It's not a general method of synchronising channel inputs. ::: By forcing a particular structure of a singular input tuple (former example), it restricted pipeline developers in how they can do their own file shuttling between processes, potentially requiring many custom `map` functions to restructure channels. On the flipside, there are some instances (think `samtools`) where a particular file will _always_ be accompanied by a secondary file such as index files. By combining these into a single tuple would make it easier to chain such modules with extremely uniform input file requirements. Recently the module specifications had been relaxed to allow such cases of the former, but also in particular that different file format variants serving the same function (`.bai` vs `.csi` files for example), could be represented in the same element of a 'collapsed' input channel. For example, an output channel could be like so: ```nextflow output: tuple value(meta), path("*bam"), path(".{bai,csi}"), path(fasta) path(fai) path(dict) ``` Some maintainers expressed a preference for a stricter approach, advocating for each input file to be provided in its own input channel. Their concern was that shared channels could introduce risks, such as misconfiguration, where the wrong type of index file could inadvertently be supplied, causing the pipeline to fail if a downstream process receives an incompatible file type. We discussed the potential benefits of ‘typed’ inputs—possibly a future feature in Nextflow—versus the importance of code clarity, which is challenged by the need to repeatedly use .join and .combine after each module call. Differing philosophies emerged: some felt it essential to design pipelines that prevent all possible user errors, while others suggested pipelines should support only specific configurations and parameters, with any deviations (e.g., via ext.args) being at the user’s own risk. This discussion remains unresolved, and will likely rear it's head again in the future. ## The end As always, if you want to get involved and give your input, join the discussion on relevant PRs and slack threads! \- :heart: from your #maintainers team!

nf-core/tools - 3.0.0

Mon, 07 Oct 2024 13:00:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; # What and Why? The nf-core/tools CLI is the backbone of our community. It encapsulates the pipeline template that we use and packages and simplifies the tasks involved in keeping nf-core pipelines up to date. The same holds true for modules and subworkflows. In many ways, it is the linchpin of our collaborative coding. This week, we are thrilled to announce nf-core/tools version 3.0.0. With this release we bring much-requested template granularity, better findability for the many commands, and a suite of major updates to the nf-core pipeline template. This blog post explains what's new and guides you through the update process. We promise, it's worth it! :pray: ## ✨ New features - Enhanced pipeline template customisation - The template has been divided into features that can be selectively included or excluded. For example, you can now create a new pipeline without any traces of FastQC. - You can strip down the pipeline to the bare minimum and add only the tools you need. - For nf-core pipelines, certain core features (e.g., documentation, CI tests) remain mandatory, but you still have significant customisation flexibility. - New Text User Interface (TUI) for pipeline creation - A guided interface helps you through the process when running `nf-core pipelines create{:bash}` (don't worry - you can still use the CLI by providing all values as parameters). {" "} - Switch to [nf-schema](https://nextflow-io.github.io/nf-schema/latest/) for input validation - Replaces its predecessor nf-validation as a standard plugin in the template - CI tests now use the nf-core tools version matching the pipeline's template version - This will reduce errors in opened PRs with each new tools release ## ⛓️‍💥 Breaking changes - All pipeline commands now require the `pipelines` prefix. This change makes the commands more consistent with `nf-core modules{:bash}` and `nf-core subworkflows{:bash}` commands. The commands which changed are: | old command | new command | | ----------------------------------- | --------------------------------------------- | | `nf-core lint{:bash}` | `nf-core pipelines lint{:bash}` | | `nf-core launch{:bash}` | `nf-core pipelines launch{:bash}` | | `nf-core download{:bash}` | `nf-core pipelines download{:bash}` | | `nf-core create-params-file{:bash}` | `nf-core pipelines create-params-file{:bash}` | | `nf-core create{:bash}` | `nf-core pipelines create{:bash}` | | `nf-core lint{:bash}` | `nf-core pipelines lint{:bash}` | | `nf-core bump-version{:bash}` | `nf-core pipelines bump-version{:bash}` | | `nf-core sync{:bash}` | `nf-core pipelines sync{:bash}` | | `nf-core schema build{:bash}` | `nf-core pipelines schema build{:bash}` | | `nf-core schema docs{:bash}` | `nf-core pipelines schema docs{:bash}` | | `nf-core schema lint{:bash}` | `nf-core pipelines schema lint{:bash}` | | `nf-core schema validate{:bash}` | `nf-core pipelines schema validate{:bash}` | | `nf-core create-logo{:bash}` | `nf-core pipelines create-logo{:bash}` | - Some options have been changed for the `nf-core pipelines download{:bash}` command: - The `-t` / `--tower` flag has been renamed to `-p` / `--platform`. - We renamed the short flags for consistency, to always use the first letter of the second word in the long flag: - The `-d` / `--download-configuration` flag has been renamed to `-c` / `--download-configuration`. - The `-p` / `--parallel-downloads` flag has been renamed to `-d` / `--parallel-downloads`. ## 🫡 Deprecations - The `nf-core licences{:bash}` command is deprecated. - Python 3.8 reaches end of life in October 2024, so this will probably be the last release supporting Python 3.8. # Tip: Avoiding Merge Conflicts The v3 release of nf-core/tools includes the ability to have fine-grained control of which template features to include. You can use this new functionality to switch off chunks of the template. Doing so means less code will update in the template, and fewer merge conflicts in files that you don't care about. So - if you don't use any of these template features: - fastqc - multiqc - igenomes - nf_schema you can minimize merge conflicts with a quick update and intermediate sync: 1. Start by checking out the `dev` branch: ```bash git switch dev ``` 2. Update the template to the latest version: ```bash nf-core pipelines sync ``` 3. Pull the updated `.nf-core.yml` file from the TEMPLATE branch: ```bash git checkout TEMPLATE -- .nf-core.yml ``` 4. Add the features you want to skip to `skip_features`: ```yaml title=".nf-core.yml" template: skip_features: - fastqc - igenomes - nf_schema ``` 5. Commit the changes: ```bash git add .nf-core.yml git commit -m "Skip fastqc, igenomes and nf_schema" ``` 6. Add the changes to the `dev` branch: ```bash git push origin dev ``` 7. Merge the changes into the `dev` branch of the nf-core repository: 8. Retrigger the pipeline sync via the [GitHub Actions workflow](https://github.com/nf-core/tools/actions/workflows/sync.yml) using the name of your pipeline as the input. 9. Your template update merge should now have fewer conflicts! 🎉 # Important Template Updates ## The `check_max()` Function Has Been Removed The `check_max()` function has been replaced by core Nextflow functionality called [`resourceLimits`](https://www.nextflow.io/docs/latest/reference/process.html#resourcelimits). The `resourceLimits` are specified in the `nextflow.config` file. You can remove all references to `check_max()` and its associated parameters (`max_cpus`, `max_memory`, and `max_time`). For more information, see the [Nextflow documentation](https://www.nextflow.io/docs/latest/reference/process.html#resourcelimits). ## The nf-validation Plugin Has Been Replaced by nf-schema The `nf-validation` plugin is deprecated in favour of the new `nf-schema` plugin. This plugin uses a new JSON schema draft (2020-12), requiring changes to the `nextflow_schema.json` and `assets/schema_input.json` files. Follow the [migration guide](https://nextflow-io.github.io/nf-schema/2.0/migration_guide/) for required changes. The following validation parameters have been removed from `nextflow.config` and `nextflow_schema.json`: - `validationFailUnrecognisedParams` - `validationLenientMode` - `validationSchemaIgnoreParams` - `validationShowHiddenParams` - `validate_params` Instead, use the `validation` scope for `nf-schema` options. :::note The plugins definition and `validation` scope have been moved further down `nextflow.config` and are now found after the `manifest` scope. This allows access to `manifest` variables to help improve message customisation. ::: The `UTILS_NFVALIDATION_PLUGIN` subworkflow has been replaced by `UTILS_NFSCHEMA_PLUGIN`, changing how the input samplesheet is read. See [the documentation](https://nextflow-io.github.io/nf-schema/2.0/migration_guide/#__tabbed_2_2) for details on using the new `samplesheetToList()` function. # Removing scripting from `nextflow.config` To prepare for upcoming Nextflow syntax depreciations, for loops and try/catch blocks have been removed from the template config files: - nf-core configs are now included without try/catch blocks - Include statements have been moved after profile definitions to ensure correct profile overriding - For more information, see these Nextflow issues: [#1792](https://github.com/nextflow-io/nextflow/issues/1792) and [#5306](https://github.com/nextflow-io/nextflow/issues/5306) # Common Merge Conflicts and Solutions ## Video walk-through ## `README.md` In the `## Credits{:md}` section, you might see conflicts like this: ```diff title="README.md" <<<<<<< HEAD - nf-core/mag was written by [Hadrien Gourlé](https://hadriengourle.com) at [SLU](https://slu.se), [Daniel Straub](https://github.com/d4straub) and - [Sabrina Krakau](https://github.com/skrakau) at the [Quantitative Biology Center (QBiC)](http://qbic.life). [James A. Fellows Yates](https://github.- com/jfy133) and [Maxime Borry](https://github.com/maxibor) at the [Max Planck Institute for Evolutionary Anthropology](https://www.eva.mpg.de) joined in version 2.2.0. - - Other code contributors include: - - - [Antonia Schuster](https://github.com/AntoniaSchuster) - - [Alexander Ramos](https://github.com/alxndrdiaz) - - [Carson Miller](https://github.com/CarsonJM) - - [Daniel Lundin](https://github.com/erikrikarddaniel) - - [Danielle Callan](https://github.com/d-callan) - - [Gregory Sprenger](https://github.com/gregorysprenger) - - [Jim Downie](https://github.com/prototaxites) - - [Phil Palmer](https://github.com/PhilPalmer) - - [@willros](https://github.com/willros) - - Long read processing was inspired by [caspargross/HybridAssembly](https://github.com/caspargross/HybridAssembly) written by Caspar Gross [@caspargross](https://github.com/caspargross) - nf-core/mag was written by [Hadrien Gourlé](https://hadriengourle.com) at [SLU](https://slu.se), [Daniel Straub](https://github.com/d4straub) and [Sabrina Krakau](https://github.com/skrakau) at the [Quantitative Biology Center (QBiC)](http://qbic.life). ======= + nf-core/mag was originally written by Hadrien Gourlé, Daniel Straub, Sabrina Krakau, James A. Fellows Yates, Maxime Borry. >>>>>>> TEMPLATE ``` #### Resolution Verify that all authors are included. If confirmed, keep your existing credits section and ignore incoming changes. ## `CITATIONS.md` Citations you added might have been removed in the template update. #### Resolution Reject the incoming changes to keep your citations. ## `conf/base.config` The above mentioned removal of `check_max()` might cause conflicts in the config files ```groovy title="conf/base.config" <<<<<<< HEAD cpus = { check_max( 1 * task.attempt, 'cpus' ) } // [!code --] memory = { check_max( 6.GB * task.attempt, 'memory' ) } // [!code --] time = { check_max( 4.h * task.attempt, 'time' ) } // [!code --] ======= // TODO nf-core: Check the defaults for all processes // [!code ++] cpus = { 1 * task.attempt } // [!code ++] memory = { 6.GB * task.attempt } // [!code ++] time = { 4.h * task.attempt } // [!code ++] >>>>>>> TEMPLATE ``` #### Resolution Double-check the values and accept the incoming changes if correct. ## `.nf-core.yml` There might be conflicts due to changed order, renamed and new fields, especially in the `template` section. #### Resolution Double-check the changes for duplicates and accept the incoming changes. The new file should follow this structure: ```yaml title=".nf-core.yml" bump_version: null lint: null nf_core_version: 3.0.0 org_path: null template: author: Author Name description: The description of the pipeline force: false is_nfcore: true name: pipelinename org: nf-core outdir: . skip_features: [] version: 3.0.0 update: null ``` See the [API docs](/docs/nf-core-tools/api_reference/dev/api/utils#pydantic-modelpythonnf_coreutilsnfcoretemplateconfigpython) for a more detailed description of each field and the allowed input values. ## `nextflow.config` Several merge conflicts due to changes described in [Important Template Updates](#important-template-updates). #### Resolution Double-check in the parameters section, that no pipeline-specific parameter would be removed by the incoming changes. In general, you can accept the incoming changes for this file. ## Pipeline utils subworkflow In `subworkflows/local/utils_nfcore_$PIPELINE_NAME_pipeline/main.nf` the switch to `nf-schema` might cause conflicts in the logic of reading in a samplesheet. #### Resolution Be careful in accepting incoming changes. The main changes you should do are: ```groovy title="subworkflows/local/utils_nfcore_$PIPELINE_NAME_pipeline/main.nf" include { fromSamplesheet } from 'plugin/nf-validation' // [!code --] include { samplesheetToList } from 'plugin/nf-schema' // [!code ++] ``` ```groovy title="subworkflows/local/utils_nfcore_$PIPELINE_NAME_pipeline/main.nf" Channel.fromSamplesheet("input") // [!code --] Channel.fromList(samplesheetToList(params.input, "path/to/samplesheet/schema")) // [!code ++] ``` ```groovy title="subworkflows/local/utils_nfcore_$PIPELINE_NAME_pipeline/main.nf" UTILS_NFVALIDATION_PLUGIN ( // [!code --] UTILS_NFSCHEMA_PLUGIN ( // [!code ++] ``` See the [nf-schema migration guide](https://nextflow-io.github.io/nf-schema/2.0/migration_guide/) for more details in replacing the `fromSamplesheet` function.

Migration from Biocontainers to Seqera Containers: Part 2

Wed, 02 Oct 2024 08:16:00 GMT

import creation_flow from "@assets/images/blog/seqera-containers-part-2/creation_flow.excalidraw.svg?raw"; import renovate_flow from "@assets/images/blog/seqera-containers-part-2/renovate_flow.excalidraw.svg?raw"; import { Image } from "astro:assets"; import { YouTube } from "@astro-community/astro-embed-youtube"; # Introduction In nf-core, we've been excited to adopt [Wave](https://seqera.io/wave/) to automate software container builds, and have been looking for the right way to do it. With the announcement of [Seqera Containers](https://seqera.io/containers/) we felt it was the right time to put in the effort to migrate our containers to be built using Wave, using Seqera Containers to host the container images for our modules. You can read more about our motivation for this change in [Part 1 of this blog post](https://nf-co.re/blog/2024/seqera-containers-part-1). Here, in Part 2, we will dig into the technical details: how it all works behind the curtain. You don't need to know or understand any of this as an end-user of nf-core pipelines, or even as a contributor to nf-core modules, but we thought it would be interesting to share the details. It's mostly to serve as an architectural plan for the nf-core maintainers and infrastructure teams. :::tip{.fa-hourglass-clock title="Too Long; Didn't Read"} - Module contributors edit `environment.yml` files to update software dependencies - Containers are automatically build for Docker + Singularity, `linux/amd64` + `linux/arm64` - Conda lock-files are saved for more reproducible and faster conda environments - Details are stored in the module's `meta.yml` - Pipelines auto-generate Nextflow config files when modules are updated - Pipeline usage remains basically unchanged ::: # The end goal Before we dig into how the details of how the automation will work, let's summarise the end goal of this migration. ## Glossary - [`linux/amd64`](https://en.wikipedia.org/wiki/X86-64): Regular intel CPUs (aka `x86_64`) - [`linux/arm64`](https://en.wikipedia.org/wiki/AArch64): ARM CPUs (eg. AWS Graviton, aka `AArch64`). Not Apple Silicon. - [Apptainer](https://apptainer.org/): Alternative to Singularity, uses same image format - [Mamba](https://mamba.readthedocs.io): Alternative to Conda, uses same conda environment files - [Conda lock files](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#identical-conda-envs): Explicit lists of packages, used to recreate an environment exactly. ## Usage summary Pipeline users will see almost no change in current behaviour, but have several new configuration profiles available. | nf-core profile | Status | Use case | | ---------------------- | ------------------------------------------------------ | ----------------------------------------------------------------- | | `docker` | Unchanged | Docker images for `linux/amd64` | | `podman` | Unchanged | Docker images for `linux/amd64` | | `shifter` | Unchanged | Docker images for `linux/amd64` | | `charliecloud` | Unchanged | Docker images for `linux/amd64` | | `docker_arm` | New | Docker images for `linux/arm64` | | `podman_arm` | New | Docker images for `linux/arm64` | | `shifter_arm` | New | Docker images for `linux/arm64` | | `charliecloud_arm` | New | Docker images for `linux/arm64` | | `singularity` | Unchanged | Singularity images for `linux/amd64` | | `apptainer` | Updated | Singularity images for `linux/amd64` (not Docker, as previously) | | `singularity_arm` | New | Singularity images for `linux/arm64` | | `apptainer_arm` | New | Singularity images for `linux/arm64` | | `singularity_oras` | New | Singularity images for `linux/amd64` using the `oras://` protocol | | `apptainer_oras` | New | Singularity images for `linux/amd64` using the `oras://` protocol | | `singularity_oras_arm` | New | Singularity images for `linux/arm64` using the `oras://` protocol | | `apptainer_oras_arm` | New | Singularity images for `linux/arm64` using the `oras://` protocol | | `conda` | Updated | Conda lock files for `linux/amd64` | | `mamba` | Updated | Conda lock files for `linux/amd64`, using Mamba | | `conda_arm` | New | Conda lock files for `linux/arm64` | | `mamba_arm` | New | Conda lock files for `linux/arm64`, using Mamba | | `conda_env` | New | Conda with local `environment.yml` resolution | | `mamba_env` | New | Conda with local `environment.yml` resolution, using Mamba | ## Conda lock files Conda lock files were mentioned in [Part I of this blog post](/blog/2024/seqera-containers-part-1#exceptionally-reproducible). > These pin the exact dependency stack used by the build, not just the top-level primary tool being requested. > This effectively removes the need for conda to solve the build and also ships md5 hashes for every package. > This will greatly improve the reproducibility of the software environments for conda users and the reliability of Conda CI tests. They look something like this: ```yaml title="FastQC Conda lock file for linux/amd64" # micromamba env export --explicit # This file may be used to create an environment using: # $ conda create --name --file # platform: linux-64 @EXPLICIT https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81 https://conda.anaconda.org/conda-forge/linux-64/libgomp-13.2.0-h77fa898_7.conda#abf3fec87c2563697defa759dec3d639 https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.2.0-h77fa898_7.conda#72ec1b1b04c4d15d4204ece1ecea5978 # .. and so on ``` ## Singularity: oras or https? Unfamiliar with `oras://`? Don't worry, it's relatively new in the field. It's a new protocol to reference container images, similar to `docker://` or `shub://`. It allows Singularity to interact with any OCI ([Open Container Initiative](https://opencontainers.org/)) compliant registry to pull images. Using `oras` has some advantages: - Singularity handles pulls in the process task, rather than in the Nextflow head job - This means less resource usage on the head node, and more parallelisation - Singularity can use authentication to pull from private registries (see [Singularity docs](https://docs.sylabs.io/guides/main/user-guide/cli/singularity_registry.html) for more information). However, there are some downsides: - Shared cache Nextflow options such as `$NXF_SINGULARITY_CACHEDIR` and `$NXF_SINGULARITY_LIBRARYDIR` are not used - Singularity must be installed when downloading images for offline use - `oras://` is only supported by recent versions of Singularity / Apptainer As such, we will continue to use `https` downloads for Singularity `SIF` images for now. However, we will start to provide new `-profile singularity_oras` profiles for anyone who would prefer to fetch images using the newer `oras` protocol. If you'd like to know more, check out the amazing [bytesize talk](https://nf-co.re/events/2024/bytesize_singularity_containers_hpc) by Marco Claudio De La Pierre ([@marcodelapierre](https://github.com/marcodelapierre/)) from June 2024: ## Modules All nf-core pipelines use a single container per process, and the majority of processes are encapsulated within shared modules in the [nf-core/modules](https://github.com/nf-core/modules) repository. As such, we must start with containers at the module level. :::tip{.fa-brands.fa-github title="GitHub issue"} For the latest discussion and progress on _bulk-updating_ existing nf-core modules, see GitHub issue [nf-core/modules#6698](https://github.com/nf-core/modules/issues/6698). ::: ### Changes to `main.nf` With this switch, we simplify the `container` declaration, listing only the default container image: Docker, for `linux/amd64`. There will no longer be any string interpolation or logic within the container string. The `container` string is **never edited by hand** and is fully handled by the modules automation. With [the FastQC module](https://github.com/nf-core/modules/blob/f768b283dbd8fc79d0d92b0f68665d7bed94cabc/modules/nf-core/fastqc/main.nf#L6-L8) as an example: ```diff title="main.nf" process FASTQC { label 'process_medium' conda "${moduleDir}/environment.yml" + container "fastqc:0.12.1--5cfd0f3cb6760c42" // automatically generated - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/fastqc:0.12.1--hdfd78af_0' : - 'biocontainers/fastqc:0.12.1--hdfd78af_0' }" input: tuple val(meta), path(reads) ``` We considered removing both `conda` and `container` declarations from the module `main.nf` file entirely. However, we see benefit in keeping these in this form because: - It's clearer to those exploring the code about what the module requires. - The container string is needed to tie the module to the pipeline config files (see [Building config files](#building-config-files) below) Removing the container logic from the string should be a big win for readability. ### Changes to `meta.yml` Through the magic of automation, we will append and then validate the following fields within the module's `meta.yml` file. Following the [FastQC example](https://github.com/nf-core/modules/blob/f768b283dbd8fc79d0d92b0f68665d7bed94cabc/modules/nf-core/fastqc/meta.yml) from above: ```yaml title="meta.yml" # ..existing meta.yml content above containers: docker: linux_amd64: name: community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 build_id: 5cfd0f3cb6760c42_1 scan_id: 6fc310277b74 linux_arm64: name: community.wave.seqera.io/library/fastqc:0.12.1--d3caca66b4f3d3b0 build_id: d3caca66b4f3d3b0_1 scan_id: d9a1db848b9b singularity: linux_amd64: name: oras://community.wave.seqera.io/library/fastqc:0.12.1--0827550dd72a3745 https: https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/b2/b280a35770a70ed67008c1d6b6db118409bc3adbb3a98edcd55991189e5116f6/data build_id: 0827550dd72a3745_1 linux_arm64: name: oras://community.wave.seqera.io/library/fastqc:0.12.1--b2ccdee5305e5859 https: https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/76/76e744b425a6b4c7eb8f12e03fa15daf7054de36557d2f0c4eb53ad952f9b0e3/data build_id: b2ccdee5305e5859_1 conda: linux_amd64: lock file: https://wave.seqera.io/v1alpha1/builds/5cfd0f3cb6760c42_1/condalock linux_arm64: lock file: https://wave.seqera.io/v1alpha1/builds/d3caca66b4f3d3b0_1/condalock ``` All images are all built at the same time, avoiding Conda dependency drift. The build and scan IDs allow us to trace back to the build logs and security scans for these images. The Conda lock files are a new addition to the nf-core ecosystem and will help reproducibility for Conda users. These are generated during the Docker image build and are specific to architecture. The lock files can be accessed remotely via the Wave API, so we can treat them much in the same way that we treat remote container images. ## Pipelines Container information at module-level is great, but it's not enough. Nextflow doesn't know about module `meta.yml` files (they're an nf-core invention), so we need to tie these into the pipeline code where they will run. The heart of the solution is to auto-generate a config file for each software packaging type (Docker, Singularity, Conda) and platform (`linux/arch64` and `linux/arm64`). These will be created by the nf-core/tools CLI and never be edited by hand, so no manual merging will be required. They'll simply be regenerated and overwritten every time a version of a module is updated. Each config file will specify the `container` or `conda` directive for every process in the pipeline: ```groovy title="config/containers_docker_amd64.config" // AUTOGENERATED CONFIG BELOW THIS POINT - DO NOT EDIT process { withName: 'NF_PIPELINE:FASTQC' { container = 'fastqc:0.12.1--5cfd0f3cb6760c42' } } process { withName: 'NF_PIPELINE:MULTIQC' { container = 'multiqc:1.25--9968ff4994a2e2d7' } } process { withName: 'NF_PIPELINE:ANALYSIS_PLOTS' { container = 'express_click_pandas_plotly_typing:58d94b8a8e79e144' } } //.. and so on, for each process in the pipeline ``` Likewise, the conda config files will point to the lock files for each process: ```groovy title="config/conda_lock files_amd64.config" // AUTOGENERATED CONFIG BELOW THIS POINT - DO NOT EDIT process { withName: 'NF_PIPELINE:FASTQC' { conda = 'https://wave.seqera.io/v1alpha1/builds/5cfd0f3cb6760c42_1/condalock' } } process { withName: 'NF_PIPELINE:MULTIQC' { conda = 'https://wave.seqera.io/v1alpha1/builds/9968ff4994a2e2d7_1/condalock' } } process { withName: 'NF_PIPELINE:ANALYSIS_PLOTS' { conda = 'https://wave.seqera.io/v1alpha1/builds/58d94b8a8e79e144_1/condalock' } } //.. and so on, for each process in the pipeline ``` The main `nextflow.config` file will import these config files, depending on the [profile selected](#usage-summary) by the person running the pipeline. Singularity will have separate config files and associated `-profile`s for both `oras` and `https` containers, so that users can choose which to use. Local modules and any edge-case shared modules that cannot use the Seqera Containers automation will need the pipeline developer to hardcode container names and conda lock files manually. These can be added to the above config files as long as they remain above the comment line: ```groovy // AUTOGENERATED CONFIG BELOW THIS POINT - DO NOT EDIT ``` We're also taking this opportunity to update the `apptainer` and `mamba` profiles too, they will import the exact same config files as the `singularity` and `conda` profiles. Here's roughly how the `nextflow.config` file with the `-profile` config includes will look: ::::info{.fa-code title="nextflow.config" collapse} :::note Boilerplate code (eg. disabling other container engines) has been removed from this blog post code snippet for clarity. It will still be included in the pipelines. We may move this whole code block into it's own separate config file with `includeConfig` so that the main `nextflow.config` file is easier to read. ::: ```groovy title="nextflow.config" // Set container for docker amd64 by default includeConfig 'config/containers_docker_amd64.config' profiles { docker { docker.enabled = true // Use the default config/containers_docker_amd64.config } docker_arm { includeConfig 'config/containers_docker_linux_arm64.config' docker.enabled = true } // podman, shifter, charliecloud the same as docker - also with _arm versions singularity { includeConfig 'config/containers_singularity_linux_amd64.config' singularity.enabled = true } singularity_arm { includeConfig 'config/containers_singularity_linux_arm64.config' singularity.enabled = true } singularity_oras { includeConfig 'config/containers_singularity_oras_linux_amd64.config' singularity.enabled = true } singularity_oras_arm { includeConfig 'config/containers_singularity_oras_linux_arm64.config' singularity.enabled = true } apptainer { includeConfig 'config/containers_singularity_linux_amd64.config' apptainer.enabled = true } apptainer_arm { includeConfig 'config/containers_singularity_linux_arm64.config' apptainer.enabled = true } apptainer_oras { includeConfig 'config/containers_singularity_oras_linux_amd64.config' apptainer.enabled = true } apptainer_oras_arm { includeConfig 'config/containers_singularity_oras_linux_arm64.config' apptainer.enabled = true } conda { includeConfig 'config/conda_lock files_amd64.config' conda.enabled = true } conda_arm { includeConfig 'config/conda_lock files_arm64.config' conda.enabled = true } conda_env { conda.enabled = true // Use the environment.yml file in the module main.nf } mamba { includeConfig 'config/conda_lock files_amd64.config' conda.enabled = true conda.useMamba = true } mamba_arm { includeConfig 'config/conda_lock files_arm64.config' conda.enabled = true conda.useMamba = true } mamba_env { conda.enabled = true // Use the environment.yml file in the module main.nf conda.useMamba = true } } docker.registry = 'community.wave.seqera.io/library' podman.registry = 'community.wave.seqera.io/library' apptainer.registry = 'oras://community.wave.seqera.io/library' singularity.registry = 'oras://community.wave.seqera.io/library' ``` :::: Note that there are a few changes here: - New profiles with `_arm` suffixes for `linux/arm64` architectures - New profiles for `_oras` suffixes for using the `oras://` protocol - The `apptainer` profiles now uses the `singularity` config files - The `conda` profiles now use Conda lock files instead of `environment.yml` files - New `conda_env` profiles for those wanting to keep the old behaviour - New `mamba` profiles, using the `conda` config files - Base registries set to Seqera Containers Because we're only defining the image name and making use of the base container registry config option, it should still be simple to mirror containers to custom Docker registries and overwrite only `docker.registry` as before. # Automation - Modules The nf-core community loves automation. It's baked into the core of our community from our shared interest in automating workflows. We have linting bots, template updates, slack workflows, pipeline announcements. You name it, we've automated it. In these sections, we'll cover _how_ we're going to build all of these shiny new things without manual intervention. :::tip{.fa-brands.fa-github title="GitHub issue"} For the latest updates on modules container automation, see [nf-core/modules#6694](https://github.com/nf-core/modules/issues/6694). ::: ## Updating conda packages The automation begins when a contributor wants to add a piece of software to a container. For instance, they decided that they need samtools installed. The contributor updates the `environment.yml` and adds a line with samtools: ```yml title="environment.yml" {6} channels: - conda-forge - bioconda dependencies: - bioconda::fastqc=0.12.1 - bioconda::samtools=1.16.1 ``` That will kick off the container image generation factory (it could equally be a change to remove a package, or change a pinned version). A commit will be pushed automatically with an updated `meta.yml` file pointing to the new containers, plus new nf-test snapshots for the software version checks. Only the interesting part needs to be edited by the developer (which tools to use) and all other steps are fully automated. ## Container image creation As with most automation in nf-core, container creation will happen in GitHub Actions. Edits to a module's `environment.yml` file will trigger a workflow that uses the [`wave-cli`](https://github.com/seqeralabs/wave-cli) to build the container images. 1. GitHub Actions identifies changes in the `environment.yml` file. 2. `wave-cli` is executed on the updated environment file. 3. Seqera Containers builds new containers for various platforms and architectures. 4. GitHub Actions runs stub tests commits the updated the [version snapshot](https://github.com/nf-core/modules/blob/1fe2e6de89778971df83632f16f388cf845836a9/modules/nf-core/bowtie/align/tests/main.nf.test.snap#L32-L46).

Once this GitHub Actions run completes it will push the new commits back to the PR and the regular nf-test CI will run. ## nf-test versions snapshot One of the primary reasons that we were so excited to adopt nf-test was the snapshot functionality. Every test has a snapshot file with the expected outputs, and the outputs are deterministic (not a binary file, and there's no dates). In that snapshot, we also capture the versions of the dependencies for the module ([example shown for bowtie2](https://github.com/nf-core/modules/blob/1fe2e6de89778971df83632f16f388cf845836a9/modules/nf-core/bowtie/align/tests/main.nf.test.snap#L32-L46)): ```json title="main.nf.test.snap" {4-7} "versions": { "content": [ { "BOWTIE_ALIGN": { "bowtie": "1.3.0", "samtools": "1.16.1" } } ], "meta": { "nf-test": "0.9.0", "nextflow": "24.04.4" }, "timestamp": "2024-09-27T10:42:58.892298" }, ``` This gives a second level of confirmation that the containers were correctly generated. When updating containers, the nf-test snapshot is parsed and compared to the snapshot from before the new containers were built. The snapshot changes are discarded if anything other than the `versions` key changed, so it should only vary in the software versions reported. If any other changes are detected, the snapshot change will be rejected and _not_ committed back to the PR. Then PR reviewers will see failing tests and need to manually update the snapshot file. This means that we can automatically commit the updated snapshot file in the PR if the tool output is unchanged, saving the developer from taking this extra step. :::tip Note that in the above example, the versions are in the snapshot in plain text, not using an md5 hash. This is a change that we will roll out for all modules, as it makes verification in the PR much easier. ::: ## Automatic version bumps with Renovate We've recently adopted [Renovate](https://renovatebot.com/), a tool for automated dependency updates. It's multi-platform and multi-language and has become pretty popular in the devops space. It's similar to [GitHub's dependabot](https://docs.github.com/en/code-security/getting-started/dependabot-quickstart-guide#about-dependabot), but supports more languages and frameworks, and more importantly for nf-core, enables us to write our own custom dependencies. Renovate runs on a schedule and automatically updates software versions for us based on the specifications we've laid out in a [common config](https://github.com/nf-core/ops/blob/main/.github/renovate/default.json5). The magic starts with some nf-core automation to add renovate comments to the `environment.yml` file: ```yml {5,7,9} channels: - conda-forge - bioconda dependencies: # renovate: datasource=conda depName=bioconda/bwa - bioconda::bwa=0.7.18 # renovate: datasource=conda depName=bioconda/samtools - bioconda::samtools=1.20 # renovate: datasource=conda depName=bioconda/htslib - bioconda::htslib=1.20.0 ``` [These comments will be added](https://github.com/nf-core/modules/issues/6504) through [the batch module updates](https://github.com/nf-core/modules/issues/5828) happening this year. Future modules will have these comments [added automatically and linted](https://github.com/nf-core/tools/issues/3184) by the nf-core/tools CLI. The comments allow some scary regexes to find the conda dependencies and their versions in nf-core/modules, and check if there's a new version available. If there is a new version available, the Renovate bot will create a PR bumping the version, which in turn will kick off the container creation GitHub Action. The process will be very similar to the diagram laid out above, however we can go a step further: if the new software versions have no effect on the results of the tests, the PR will be automatically merged:

So: if all tests pass, the pull request is automatically merged without human intervention. In case of test failures, the Renovate bot automatically requests a review from the appropriate module maintainer using the `CODEOWNERS` file. The maintainer then steps in to fix failing tests and request a final review before merging. This efficient process ensures that software dependencies stay current with minimal manual oversight, reducing noise and streamlining development workflows. This will hopefully be the end of the _"can I get a review on this version bump"_ requests in `#review-requests`! # Automation - Pipelines We now have nice, up to date software packaging for the shared nf-core/modules with minimal manual intervention. However, we need to propagate these changes to the pipelines that use these modules. There are two main areas that we need to address: ## Building config files As [described above](#pipelines), pipelines will have a set of config files automatically generated that specify the container or conda environment for each process in the pipeline. Creation of these files will be triggered whenever installing, updating or removing a module, via the `nf-core` CLI. The config files will be completely regenerated each time, so there will never be any manual merging required. The trickiest part of this process is linking the module containers to the pipeline processes. Modules can be imported into pipelines with any alias or scope. We need to match this against the values that we find in the module `meta.yml` files: - Run `nextflow inspect` to generate default Docker `linux/arch64` config - Copy this file and replace the container names with the relevant container for each platform, using the `meta.yml` files that match the Docker container name This is why we duplicate the default docker container in both `main.nf` and `meta.yml` files - it allows us to link the module to the pipeline. :::warning Currently, `nextflow inspect` works by executing a dry-run of the pipeline. We can run using `-profile test`, but any processes that are skipped due to pipeline logic will not be included in the config file. This is a known limitation and we are currently working hard with the Nextflow team at Seqera on a new version of `nextflow inspect` which will return all processes, regardless of whether they are skipped or not. This will be a requirement for progression with the Seqera Containers migration. ::: :::note This process of copying files and using string substituion is a bit of a hack. If you have any ideas on how to improve this, please let us know! ::: ## Edge cases and local modules There will always be edge cases that won't fit the automation described above. Not all software tools can be packaged on Bioconda (though we encourage it where possible!). For example, some tools have licensing restrictions that prevent them from being distributed in this way. Other edge-cases include optional usage of container variants for GPU support, or other hardware. Local modules won't be able to benefit from the Wave CLI automation to fetch containers from Seqera Containers and will have to be manually updated by the pipeline developer. For these reasons, we will still support custom `container` declarations in modules without use of Seqera Containers. It will be up to the module contributors to ensure that these are correctly specified and kept up to date manually. These can be specified in the `main.nf` file and added to the autogenerated platform-specific config files, as long as they remain above the comment line: ```groovy // AUTOGENERATED CONFIG BELOW THIS POINT - DO NOT EDIT ``` If at all possible then software should be packaged with Bioconda and Seqera Containers. Failing that, custom containers should be stored under the [nf-core account on quay.io](https://quay.io/organization/nf-core). The only time other docker registries / accounts should be used are if there are licensing issues restricting software redistribution. Custom containers can be also be built using Wave in continuous integration, it's just that they can't be pushed to the Seqera Containers registry. However, they _can_ be pushed to [quay.io](https://quay.io/organization/nf-core) automatically. We can do this using a similar mechanism to the automation used for changing `environment.yml` files, simply replacing it with `Dockerfile`s (see [nf-core/modules#4940](https://github.com/nf-core/modules/pull/4940)). ## Downloads The nf-core CLI has a `download` command that downloads pipeline code and software for offline use. This has a lot of hardcoded logic around the previous syntax of container strings and will need a significant rewrite. By the time we get to running this tool, the pipeline has all containers defined in configuration files. As such, we should be able to run the new and improved `nextflow inspect` command to return the container names for every process for a given platform. Once collected, we can download the images as before. The advantage of using this approach is that the download logic can be far simpler. Complex syntax for container strings is not a problem, as we can rely on Nextflow to resolve these to simple strings. # Roadmap This blog post lays out a future vision for this project. It serves as both a rubber-duck for the authors, a place to request feedback from the community, and as a roadmap for developers. There are many pieces of work that must come together for its completion, including but not limited to: {/* TODO: Am I missing any? */} - Nextflow - Improve `nextflow inspect` to return all processes - nf-core/tools - [Add and lint module Renovate comments](https://github.com/nf-core/tools/issues/3184) - [Lint for nf-test snapshots](https://github.com/nf-core/tools/issues/2504) - Write automation for creating pipeline container config files - [Rewrite `nf-core download`](https://github.com/nf-core/tools/issues/3179) - nf-core/modules - [Add renovate comments to environment.yml](https://github.com/nf-core/modules/issues/6504) - [Bulk update modules to use Seqera Containers](https://github.com/nf-core/modules/issues/6698) - [Build automation for fetching Seqera Containers](https://github.com/nf-core/modules/issues/6694) If all goes well, we hope to have the majority of this work completed by the end of 2024.

Migration from Biocontainers to Seqera Containers: Part 1

Tue, 17 Sep 2024 08:20:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; import butwhy from "@assets/images/blog/seqera-containers-part-1/simone-secci-49uySSA678U-unsplash.jpg"; import mulled from "@assets/images/blog/seqera-containers-part-1/hannah-pemberton-bXi4eg4jyuU-unsplash.jpg"; import disks from "@assets/images/blog/seqera-containers-part-1/behnam-norouzi-8FsybY-URs0-unsplash.jpg"; import wave_logs from "@assets/images/blog/seqera-containers-part-1/logs_screenshot.png"; import leaves from "@assets/images/blog/seqera-containers-part-1/providence-doucet-mE5MBZX5sko-unsplash.jpg"; import { Image } from "astro:assets"; ## Introduction Dear nf-core community, the core team would like to inform you about some upcoming changes in how we would like to handle software containers... :package: Software containers are a fundamental part of modern bioinformatics workflows. Nextflow supports multiple container platforms, but the two most commonly used are Docker and Singularity. When using containers, all software requirements for a specific Nextflow process are wrapped up into an image and referenced within the pipeline code. The process tasks run in isolation in the host environment, and the end user doesn't need to worry about installing software dependencies for the pipeline. The software builds are locked in time, making results highly reproducible over many years. To ensure maximum compatibility across different environments, nf-core pipelines ship with the full range of options: conda environments as well as Docker and Singularity images. These are typically built using [Bioconda](https://bioconda.github.io/) and [BioContainers](https://biocontainers.pro/). The Bioconda and BioContainer projects have been invaluable to nf-core's success. We're hugely grateful to all contributors for their work, as well as to Anaconda, Quay.io and the Galaxy project for hosting these resources. ## Change is on the breeze Photo by Simone Secci on Unsplash.

Bioconda and BioContainers have worked very well for nf-core. However, we are now looking to migrate to a new system: Seqera Containers. The motivation comes down to a few key reasons: - Difficulties with BioContainers - [Mulled (multi-package) images](#mulled-multi-package-images) - [Slow Singularity image availability](#time-to-singularity-image) - [BioContainers API](#biocontainers-api) - [Reliability of hosting](#reliability-of-hosting) - New features we'd like - [Conda lock files](#exceptionally-reproducible) - Simplified developer workflow - [Better transparency](#trust-and-transparency) With the new Seqera Containers setup we should be able to address these problems, as well as provide the additional features. :::tip{title="What's changing"} A detailed description of the proposed mechanism will come in part 2 of this blog post. Here are the key points: - Developers will _only_ need to edit the conda `environment.yml` (no process `container`) - Container images will be built by Wave and stored in Seqera Containers - Build logs and source files will be shown on the nf-core website module pages - Conda lock-files will be used for Conda users and CI tests - Software releases will be automatically bumped in modules, using [Renovate](https://renovatebot.com/) - There should be almost no change in how end-users run nf-core pipelines ::: :::info{.fa-calendar-lines-pen title="Timeline" collapse} This move has been in the works for over a year now. It started with discussions between nf-core maintainers and Seqera developers about what the community needed. Seqera's open source [Wave](https://seqera.io/wave/) container tool brings convenience, but long term storage of container images and stable container URIs were identified as hard requirements. In response, Seqera developed _Seqera Containers_, a free community resource with hosting infrastructure funded by AWS. This service was launched in spring 2024. The adoption of Seqera Containers in nf-core was initially raised in the nf-core steering group, followed by discussion in the core team and then maintainers team (see the [nf-core governance structure](/governance)). At each step, the discussion informed the planned infrastructure and setup for this change, as well as the development of Seqera Containers and Wave. Several steps are still needed to complete this migration: - nf-core/modules automation for fetching images and pinning conda-lock files and `meta.yml` references - nf-core/tools tooling for pipeline config file generation, with pipeline template update - Bulk update of nf-core/modules to use Seqera Containers - Update of nf-core pipelines to use the new modules If all goes to plan, we hope to have all modules updated by the end of 2024. This work will be tracked in GitHub issues as usual, starting with [nf-core/modules#5832](https://github.com/nf-core/modules/issues/5832). This issue has already had a substantial amount of discussion, much of which has informed this blog post. It's not too late to add your feedback! So if you have any questions or ideas, let us know on Github or Slack. Note that images from Seqera Containers can already be used in nf-core/modules, with the old syntax of using a `container` declaration (see [example](https://github.com/nf-core/modules/blob/b6b54f3929b0ba4a7c02a49191308ce1d8351f0d/modules/nf-core/bedtools/genomecov/main.nf#L6-L8)). ::: ## BioContainers and Seqera Containers Nearly all nf-core modules are bundled with a conda `environment.yml`, listing software package dependencies from [Bioconda](https://bioconda.github.io/) (eg. [FastQC](https://github.com/nf-core/modules/blob/f768b283dbd8fc79d0d92b0f68665d7bed94cabc/modules/nf-core/fastqc/environment.yml)). The [BioContainers project](https://biocontainers.pro/) conveniently builds Docker and Singularity images for all tools on Bioconda automatically, hosting the images publicly on [quay.io](quay.io) and the Galaxy FTP servers, respectively. This means that the nf-core module developer can find the matching [BioContainer](https://biocontainers.pro/) images for their Bioconda package, and then add the image URLs into the module's `main.nf` script (see [FastQC example](https://github.com/nf-core/modules/blob/f768b283dbd8fc79d0d92b0f68665d7bed94cabc/modules/nf-core/fastqc/main.nf#L6-L8)). The nf-core pipeline user then pulls the container images from quay.io or the Galaxy FTP server when they run the pipeline. [Seqera Containers](https://seqera.io/containers/) was launched in spring 2024 (see [Nextflow Summit talk](https://summit.nextflow.io/2024/boston/agenda/05-23--whats-new-in-the-nextflow/), [Seqera blog post](https://seqera.io/blog/introducing-seqera-pipelines-containers/), [AWS blog post + podcast](https://aws.amazon.com/blogs/hpc/announcing-seqera-containers-for-the-bioinformatics-community/), and the [Nextflow Channels podcast](https://nextflow.io/podcast/2024/ep38_seqera_pipelines_containers.html) for more information). Seqera Containers allows anyone to request a container image based on Conda or PyPI packages. The image is built on demand and then saved so that subsequent requests return the exact same container image files. The Seqera Containers service has a lot in common with BioContainers. Both generate Docker and Singularity images from Bioconda. The main difference is _when_ those builds happen. Seqera Containers is built on top of [Wave](https://seqera.io/wave/) - an open-source tool developed by Seqera for **on-demand** generation of containers. When a new tool or version is requested, Wave builds a container using Conda and returns it. In contrast, BioContainers runs a build when a new package is created on Bioconda. Building on-demand gives greater flexibility and scalability, especially for multi-tool containers. nf-core pipeline users won't need to interact with Wave directly. We simply plan to replace the mechanism for supplying the default container images specified in pipelines. :::info{.fa-list-check title="Feature comparison"} It's important to distinguish the differences between Wave and Seqera Containers. Here's a brief comparison of the features:

	BioContainers	Wave	Seqera Containers
Support Bioconda packages	✅	✅	✅
Support all conda channels	❌	✅	✅
Support PyPI (pip) packages	❌	✅	✅
Docker + Singularity support	✅	✅	✅
Linux aarch64 and arm64	⏳ In progress	✅	✅
Multi-package containers	✅ Mulled	✅	✅
Conda lock files generated	❌	✅	✅
Container build logs	❌ CI logs short lived	✅	✅
Docker container security scans	✅ quay.io	✅ Trivy	✅ Trivy
SBOM manifests (software bill of materials)	❌	✅	✅
Long storage duration	✅ *	❌ 72 hours cache	✅ *
Pull delay for conda packages	✅ instant	❌ ~2-3 minutes for build on first request	✅ instant
Stable image URIs	✅	❌ Single-run	✅
Software required for end user	Docker / Singularity	Wave CLI / Nextflow, Docker / Singularity	Docker / Singularity
Offline support with downloaded images	✅	❌	✅
Guaranteed identical conda builds in future image pulls	✅	❌	✅

\* Forever is a long time. Seqera is committing to keeping images for a minimum of 5 years from the time of their creation. BioContainers has no public policy. ::: ### Mulled (multi-package) images

BioContainers is primarily set up to have a 1:1 relationship with Bioconda. This is great for bioinformatics tools as an image is created every time a package is published. This works well until you need to use more than one tool in a single process, particularly when the tools are from elsewhere in the conda ecosystem. For example, many bioinformatics tools leverage [samtools](https://www.htslib.org/) to convert file formats or sort reads on the fly. Others may pipe output to compression tools like [pigz](https://zlib.net/pigz/). To resolve this, BioContainers has the concept of “mulled” images (as in [mulled wine](https://en.wikipedia.org/wiki/Mulled_wine)). Unfortunately, generating mulled containers is not trivial.

:::tip{title="How to make a mulled BioContainer"} Luke Pembleton, Nextflow Ambassador, has a great blog post summarising the dark art of [Finding the right mulled biocontainer](https://lpembleton.rbind.io/posts/mulled-biocontainers/). Galaxy also has [documentation on the topic](https://docs.galaxyproject.org/en/master/admin/container_resolvers.html). In short, to request an image, you need to scroll to the bottom of a large [CSV file full of hashes](https://github.com/BioContainers/multi-package-containers/blob/master/combinations/hash.tsv) and add a line with your requested conda packages. Once edited, the images are built on CI. The container then becomes available after a review and merge. It used to be that you had to hunt through the GitHub action log to find the URI for your new container. However, [Moriz E. Beber](https://github.com/Midnighter) from the nf-core community has created a webpage to [generate the name of the mulled container](https://midnighter.github.io/mulled) which gently guides you through the process of finding the name of your container and how to update a container if you want to bump the software versions. Once the image is built and you know its address, the final step is to go to nf-core/modules, create a pull-request with the updated containers, and bump the versions in the conda `environment.yml`. ::: This system works and, despite its complexity, is now a familiar process to many nf-core developers. However, it does present some problems. It's a highly manual process to update software versions and the container declarations - just fetching the software for a module often ends up taking longer than writing the entire module. There are also significant delays in waiting for the images to become available. All in all, it can be a frustrating experience and we see this in the volume of Slack messages asking for help. It seems likely that this puts some people off from contributing to nf-core. Migrating nf-core to use Seqera Containers will make provisioning multi-package containers easier. The module developer will only need to edit the module `environment.yml` file and everything else will be fully automated and made available almost immediately. Packages can be used minutes after their release on Bioconda and we will have automation in place to bump the Conda + Seqera Containers packages when a tool is updated. ### Time to Singularity image BioContainers builds both Docker and Singularity images. Docker images are pushed to quay.io and are available almost immediately, whereas Singularity images are pushed to the Galaxy FTP server in a nightly build, taking up to 24 hours to become available. This delay is frustrating for developers, as it breaks the flow of working on an update. Once a new version of a tool is released, they have to wait for the Singularity image to become available before they can add it and test the module. In practice, this often means that module updates get stuck in limbo for days until the developer has time to come back and check if the image is available. Singularity images are generated on the fly with Seqera Containers. Developers will be able update the nf-core module minutes after the Bioconda package is released. ### BioContainers API To simplify adding single-package BioContainer images to nf-core modules, the nf-core/tools package queries the BioContainer API for a given tool version and fetches the image URIs. This process is repeated during linting of nf-core modules and pipelines, to ensure that there is not an accidental mismatch between Conda and Docker / Singularity software versions. Unfortunately, the BioContainer API has had a history of being slow and unreliable, with a lot of down time. This has the knock-on effect of causing many CI tests to fail, which is frustrating and delays the process of pull-request merges. By migrating to Seqera Containers we will have similar automation when linting modules, but it will use the Seqera Wave API instead. This has been built to scale to very high volumes and has an extensive and robust back-end with a high degree of monitoring. Note that the Wave API will _not_ be used when people run nf-core pipelines, this is only developers running linting and GitHub Actions automation during development. ### Reliability of hosting BioContainer Docker images are hosted publicly on [quay.io](http://quay.io). This service is provided free of charge, however in recent years we have had some reliability issues. This becomes more noticeable as the community grows and the number of users trying to pull images increases. Because quay.io is a huge service for whom we are only a tiny player, we have no recourse when this happens and have to just wait until it becomes available again. Seqera serves nf-core as its primary community: they will respond immediately to any problems or needs. All Seqera Container images (Docker and Singularity) are hosted on custom infrastructure built by Seqera with hosting provided by AWS, so the whole stack is within reach. ## Key features Photo by Behnam Norouzi on Unsplash.

### Exceptionally reproducible Seqera container images are hosted on long-term infrastructure and because their URLs will be hardcoded into pipeline configuration, they should be highly reproducible. Even if new builds of the same software are created in the future using Wave (for example due to an update in the base infrastructure used by Wave) then the old images are still pinned by the pipeline and will continue to be used, much like the BioContainer URIs used currently. As container URIs will be stored in the module-level `meta.yml` file, any pipelines using the same shared nf-core/module will also be using the exact same container images. We will further improve reproducibility at the conda level by adopting the use of conda lock-files. These pin the exact dependency stack used by the build, not just the top-level primary tool being requested. This effectively removes the need for conda to solve the build and also ships md5 hashes for every package. This will greatly improve the reproducibility of the software environments for conda users and the reliability of Conda CI tests. ```yaml # This file may be used to create an environment using: # $ conda create --name --file # platform: linux-64 @EXPLICIT https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81 https://conda.anaconda.org/conda-forge/linux-64/libgomp-14.1.0-h77fa898_1.conda#23c255b008c4f2ae008f81edcabaca89 https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d https://conda.anaconda.org/conda-forge/linux-64/libgcc-14.1.0-h77fa898_1.conda#002ef4463dd1e2b44a94a4ace468f5d2 # .. truncated .. ``` ### Trust and transparency Reproducibility is one thing, but it's also important that the images we use can be trusted and are secure. Users and developers must be able to verify that the contents of the container match the `environment.yml` for the module. The first point of trust is the `community.wave.seqera.io` base URI. This is only used for Seqera Containers and only `wave.seqera.io` is able to push images. It is also more locked-down than regular Wave, only allowing Conda builds and not custom Dockerfiles or augmented images. Second, the image URI includes a tag that has a hash of the input files used to request the image. We can use this to retrieve build details, which we will add to the nf-core website module page: - Build time, architecture and platform (Docker / Singularity) - `Dockerfile` / Singularity recipe and Conda `environment.yml` - Full build logs - Conda lock files with full package URLs and md5sums for entire dependency resolution - For Docker images: Trivy security scan and software bill of materials (SBOM) :::info{title="Example Wave build details page" collapse} We haven't built this yet, but here's what the Wave build details page looks like, which has the same information: Screenshot of a Wave container build details page.

Screenshot of a Wave container build details page.

Wave build details page for the MultiQC v1.24.1 image: `community.wave.seqera.io/library/multiqc:1.24.1--789bc3917c8666da` The final part is the hash - from this we can infer the Wave Build ID and retrieve the build details: [https://wave.seqera.io/view/builds/789bc3917c8666da_1](https://wave.seqera.io/view/builds/789bc3917c8666da_1) ::: The new conda lock files also allow enhanced reproducibility for Conda users and more stable CI tests - we'll come back to these in a future blog post / bytesize. Finally, the image creation will happen automatically on GitHub Actions when an `environment.yml` file is edited. This means that the entire flow from conda file through to final containers will be entirely transparent, as the request to Wave itself and its response will be available in the GitHub Actions logs. ### No lock-in Using Seqera Containers does not mean that nf-core will be locked into using Seqera tooling. The suggested implementation has two components: 1. Automated on-demand generation of container images using Wave 2. Long term hosting images on Seqera Containers The tooling for image generation is [open-source](https://github.com/seqeralabs/wave). Seqera is hosting this API and build service for the community for free, but there's nothing to stop us from hosting the same API ourselves in the future. As such, there is no lock-in on this API or any tooling we will write around it. Long-term hosting is more fixed, but again Wave is designed to work with _any_ container registry. So if we want to change in the future we simply flip a configuration value and the generated images will be stored (“frozen”) at an alternative registry of our choosing. Old pipelines would have their configuration overwritten to use a different base registry. This is the same situation that we have currently with BioContainers and quay.io / Galaxy hosting. The nf-core community will remain free to pick and choose hosting solutions as needed. ### Special cases Not all tools can use Biocontainers, and we're aware of some special-case packages that have only bespoke Docker images. These will continue to work as they do today. For people who need to mirror container images to their own custom registry, this will still be possible. Changing the registry base will continue to work exactly the same way it does today. ## What this means for you ### Users If you're using nf-core pipelines but not developing them, then nothing should really change! All nf-core modules and pipelines should start supporting linux/arm64 CPU architectures, such as AWS Graviton / Raspberry Pis(!). Conda users should benefit from faster environment resolution, with more reproducible and stable software thanks to use of the lock files. Finally, the failure rate of docker pulls will improve as we phase out our usage of quay.io. ### Developers If you're a maintainer of a pipeline that uses nf-core/modules, this means that your life is about to get easier! Never again will you need to try to figure out how to make a mulled image, or wonder when your Singularity image build will be available. New software releases can be used with modules within minutes of release and most updates should happen automatically - freeing up maintainers from having to deal with routine package updates and reviews. In part-two of this blog post we will dive into the details of how we intend to build this tooling infrastructure in nf-core, so please check that out if you're interested. ## In conclusion We hope that the nf-core community is excited about these proposals, if you have any questions or concerns then please let us know in the nf-core Slack ❤️ Photo by Providence Doucet on Unsplash.

Maintainers Minutes: September 2024

Fri, 30 Aug 2024 10:00:00 GMT

The 'Maintainers Minutes' aims to further give insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## Batch modules update ### Seqera Containers [Phil Ewels](https://github.com/ewels) joined us to chat about migrating to [Seqera Containers](https://seqera.io/containers/) which ties into the [batch modules](https://github.com/nf-core/modules/issues/5828) update we have been discussing in the last few months. We agreed on a general structure and refactoring plan to incorporate the new container repository as well as leveraging conda lock files for better reproducibilty. Phil and [Edmund Miller](https://github.com/edmundmiller) will put together blog posts describing the plan in detail. ### Renovate We also voted on the [structure of the environment.yml](https://github.com/nf-core/modules/issues/6504)'s generated by renovate and decided to keep the explicit channel declaration to safe-guard against any accidental package name duplication across multiple channels: ```yaml channels: - conda-forge - bioconda dependencies: # renovate: datasource=conda depName=bioconda/bwa - bioconda::bwa=0.7.18 # renovate: datasource=conda depName=bioconda/samtools - bioconda::samtools=1.20 # renovate: datasource=conda depName=bioconda/htslib - bioconda::htslib=1.20.0 ``` ## Testing In the previous meeting we discussed a more comprehensive testing strategy. As a result we decided that in addition to a `docker` based test, also `singularity` and `conda` should be run on a PR made to the `master` branch. [Júlia Mir](https://github.com/mirpedrol) added this now to `tools` and it will be rolled out with the next template update. ## Office Hours Earlier this year, we tried out ["office hours"](https://nf-co.re/blog/2024/office_hours). We reviewed its reception and decided to continue it under a new name. More details following soon. ## Expand modules guidelines [Simon Pearce](https://github.com/SPPearce) suggests to change the guidelines [of the default prefix for modules](https://github.com/nf-core/website/pull/2608). The advantage would be that many developers do it by default when submitting a module already and the likelihood of a module working without any additional configuration is higher. Counter arguments are that we are just kicking the can down the road: it is still easy to generate output files with the same suffix (i.e. `sorted.bam`). One idea is, to add the task name as default which is long and ugly. The newly proposed output definition will alleviate some of these pain points as well. Please review Simon's PR above, if you want to weigh in on the discussion. ## Update documentation We noticed that the [nf-test assertions page](https://nf-co.re/docs/contributing/nf-test/assertions) is in need of updating. Maxime Garcia will take the lead in that pinging people as needed. \- :heart: from your #maintainers team!

Maintainers Minutes: August 2024

Wed, 14 Aug 2024 10:00:00 GMT

The 'Maintainers Minutes' aims to further give insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## More (or less) testing [Júlia](https://github.com/mirpedrol) encountered several failing tests on nf-core/crisprseq with conda and singularity (currently, our CI tests only run with docker). Moving forward we agreed that at least some tests should also be run with `singularity` and `conda` at some point during the development process but before release. We discussed several ideas. On the one hand we'd like to make sure everything works at all times, on the other hand running tons of essentially duplicate tests makes development slow, costs money, and is not great for the planet 🌳. As a compromise, we agreed on the following: - Singularity & conda tests are run on PR to master - Make singularity required - Conda optional opt-out - Add `workflow dispatch` button for easier manual testing during earlier stages of development ## Nf-test best practices [Nicolas](https://github.com/nvnieuwk/) brought up that nf-test snapshots (i.e., files that record e.g. md5sums or presence of strings) should be readable to quickly determine that they include everything they should. He had noticed that some people in module tests had started using a snapshot for every single file (rather than all in one), seemingly to be able to 'label' each snapshot. However, since the order of each 'snapshot' recorded in the `.snap` file does not necessarily correspond to the order they are defined in the test file, having one snapshot _per file_ in the same test scrambles the entries all over the place, e.g. out of order from the output channels as defined in the module. He will make updates to the guidelines that all assertions should go into a single snapshot per test for modules. ## Bulk updates of modules [Júlia](https://github.com/mirpedrol) is coordinating an upcoming bulk update of all modules in https://github.com/nf-core/modules/issues/5828 (join [#wg-modules-update](https://nfcore.slack.com/archives/C07BDPBPLG3) to stay tuned) This will include a lot of new exciting things: - Improved `meta.yml` layout accounting for the channel structure (https://github.com/nf-core/modules/pull/5867) - Subworkflows - bio.tools identifiers (for better linked metadata across registries) - edam ontologies for all input/output files (for better standardisation of file types) - Removing defaults from conda channel (to reduce chances of ToS violations) - Adding lock files (for better conda dependency version stability) - Adding version topic channels (for simplified version reporting in pipelines) - Adding lint for stub tests for all modules (to help us move to having complete 'dry run' functionality in pipelines) ### Topic channels for `version` In addition, we discussed how to incorporate the version aggregation system using Nextflow's new '[topic](https://www.nextflow.io/docs/latest/channel.html#topic)' channel functionality. [Edmund](https://github.com/edmundmiller) created a [proof of concept](https://github.com/nf-core/nascent/pull/150) showing that it is possible to handle topic and non-topic version channels together. This would require a pipeline update by all developers, pinning a recent nextflow version, and adding the `preview` flag. It was concluded that this should be a hackathon task. ## Obligatory argument After our recent settling (somewhat, anyway) on how we want to name subworkflows in the previous meeting, we now discussed the `meta.yml` restructuring for subworkflows. The `meta.yml` currently only includes the channel name. However we are moving now for it being expanded to match the _entire_ channel construct, like: "meta, bam, bai" (not just files). However, of course, this let us to start arguing over the overall channel names. > There are only two hard things in Computer Science: cache invalidation and naming things. > > -- Phil Karlton _from [https://martinfowler.com/bliki/TwoHardThings.html](https://martinfowler.com/bliki/TwoHardThings.html)_ ## Versioning of modules guidelines Various members of the nf-core community had both a long time ago and more recently suggested that the module guidelines should be versioned. Everyone think it's a good idea and we discussed how to best approach this since the guidelines are listed on the website, _but_ linting is linked to nf-core/tools versions, _but_ not everything can/is linted for at the moment. Some ideas included linking the docs with the tools version and date stamp the docs, have dev docs version to account for incremental updates. However, it was brought up that that linting against the tools dev version may cause problems with changes that may occur during the development on the dev branch. One of the major technical challenges that was flagged was how to best account for various docs versions on the website, with the additional complication that the guidelines page is planned to be split up into one file per guideline point (which [James](https://github.com/jfy133) finds 🤢). [Matthias](https://github.com/mashehu) will think about a few ways on how this could be tackled to minimise the amount of file copies that may occur when archiving each version. ## Next time [Mahesh](https://github.com/mahesh-panchal) has figured out an interesting way to handle args in nf-tests without having to make a separate `nextflow.config` file each time for each test ([as brought up here](https://nfcore.slack.com/archives/C049MBCEW06/p1721042240473589)). We'll discuss this in detail next time. \- :heart: from your #maintainers team!

A gentle introduction to nf-core/reportho

Thu, 18 Jul 2024 23:19:00 GMT

import reportho_tube_map from "@assets/images/blog/reportho_intro/reportho_tube_map.png"; import train_tracks from "@assets/images/blog/reportho_intro/train_tracks.png"; import the_world_if from "@assets/images/blog/reportho_intro/the_world_if.png"; import { Image } from "astro:assets"; # Orthologs matter Orthologs matter. We need them for a variety of reasons. They are key to understanding taxonomy, protein function, evolutionary constraints, and probably more. We would like to have a universal predictor of orthologs. However, the reality could be more bright. Even though we have a wide variety of good ortholog predictors, their results are quite different in real scenarios. # The Nightmare A very complicated layout of railway tracks with multiple signs and signals

A very complicated layout of railway tracks with multiple signs and signals

Imagine you need to find the orthologs of a protein for your study. What do you do? You might decide to get them from OMA because you heard it is good. Or maybe EggNOG because your boss said so. If you want to be accurate, you might try to get the predictions from multiple sources and compare them. However, you will probably run into problems. The interfaces are different, the lists are hard to obtain, and the output formats are tricky to compare. Just looking at this might make you go home and cry. # The Dream A futuristic city with flying vehicles, captioned in boldface with 'The world if reportho'

A futuristic city with flying vehicles, captioned in boldface with 'The world if reportho'

Now imagine you could get all the results you need at the click of a button. Just type in an identifier, press “Go” and get a neat comparison of all the public predictions. That’s how we imagine nf-core/reportho. We want to isolate all the complexity of queries, access modes, and data formats, all while giving you maximum flexibility. # How does it work? A tube map presenting the steps of the pipeline

A tube map presenting the steps of the pipeline

The workflow is quite straightforward. The pipeline takes a UniProt identifier as input, queries all available databases for ortholog predictions, puts them in a homogeneous format, and performs comparisons. Working with a new sequence not annotated in UniProt? No problem! We will run BLAST for you and find a close match in the databases. Once you get a list of orthologs, you can perform MSA and phylogeny reconstruction with some of the most popular tools on the market. And, to top it all off, you get a bunch of useful and tidy reports: [a detailed one](https://nf-co.re/reportho/1.0.1/results/reportho/results-42b305199b903365b71e7a8554cfcc6a822da8a8/report/BicD2_dist/index.html) for each query protein (NB: works better if you download the folder and use `run.sh`), plus [a summary report](https://nf-core-awsmegatests.s3-eu-west-1.amazonaws.com/reportho/results-42b305199b903365b71e7a8554cfcc6a822da8a8/multiqc/multiqc_report.html) of all the queries you used. # Supported databases Even though we can combine many formats, there are cases where the task becomes too difficult. As of writing, we support 4 databases: [EggNOG](https://eggnog5.embl.de), [OMA](https://omabrowser.org), [OrthoInspector](https://lbgi.fr/orthoinspector/), and [PANTHER](https://pantherdb.org). We offer two ways to access databases. If you’re working with a small number of queries, you can use API calls, limiting the amount of data saved on your machine. This is currently possible for OMA, OrthoInspector, and PANTHER. If you need to analyze large-scale data or you have doubts about runtime internet access, it is possible to provide the pipeline with local snapshots of the databases instead. These are available for EggNOG, OMA, and PANTHER. # The present and the future The pipeline has already had its first release and is available on [the nf-core website](nf-co.re/reportho). We have many ideas to expand the pipeline, but we need your feedback to make it exactly what you need. Try the pipeline today, enjoy hassle-free ortholog research, and make sure to send us your thoughts!

nf-core/callingcards - first release!

Wed, 17 Jul 2024 23:19:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; import alf_gif from "@assets/images/blog/callingcards/alf_what_is_this.webp"; import callingcards_vs_chip from "@assets/images/blog/callingcards/callingcards_vs_chip_white.png"; import callingcards_metro from "@assets/images/blog/callingcards/callingcards_metromap.png"; import callingcards_mammals from "@assets/images/blog/callingcards/callingcards_mammals.png"; import { Image } from "astro:assets"; The [`nf-core/callingcards`][nf-core-callingcards-link] workflow is here! Alf from the TV show Alf, sitting at a table in front of a plate with a burning birthday candle and asking 'What is this?'.

Alf from the TV show Alf, sitting at a table in front of a plate with a burning birthday candle and asking 'What is this?'.

# What is Calling Cards? Calling cards is the name of a sequencing based assay developed in Rob Mitra’s lab (https://mitralab.wustl.edu/) at Washington University in St Louis to interrogate the DNA binding locations of proteins called Transcription Factors. Transcription Factors (TF) are a class of proteins capable of binding to DNA to affect the transcription of genes. In order to record where a given TF binds, calling cards harnesses the power of a retrotransposon, or a transcribed genomic sequence capable of copying itself back into the genome, to deposit a molecular barcode into the genome when a TF binds. # Calling Cards in Yeast

A graphic comparing Calling Cards to ChIP-seq — Figure 1: A comparison of Calling Cards to the more familiar method of interrogating transcription factors, ChIP-Seq. The calling cards figure is from{" "} Calling cards for DNA-binding-proteins by Wang et al., which originally described calling cards. The ChIP-seq diagram is from{" "} Role of ChIP-seq in the discovery of transcription factor binding sites {" "} by Mundade et al.

In the yeast version of calling cards, a retrotransposon complex is linked to the TF of interest. When the TF binds to the genome, the Ty5 sequence is inserted, leaving behind a “calling card” which records the presence of that transcription factor at that site. In contrast to calling cards, which provides a longitudinal record of TF/DNA interaction, in ChIP experiments, the protein are cross-linked to the DNA, in effect providing a snapshot of the state of the DNA and its associated protein at a single point in time. # Calling Cards in Mammals Much of the current work and development on the Calling Cards assay has focused on the mammalian system. This differs in technical detail from the yeast procedure, though the outcome is similar. Figure 2 below provides a graphic overview of the biochemical procedure.

A graphic describing the calling cards protocol in mammals — Figure 2: Barcoding the self-reporting transposon. (A) Schematic overview of the SRT construct, Calling Card method, and sequencing library preparation. Candidate sites for barcode insertions are indicated with gold stars. The TR-Genome junction, used to map transposon insertions, is circled in dotted magenta line. (B) Barcode site 3 is within the piggyBac TR sequence, immediately adjacent to the TR-Genome junction. Underlined nucleotides in the 13-bp terminal inverted repeat region (‘CTA’, gold) were targeted for mutagenesis by mutagenic PCR. (C) Overview of calling card rapid mutagenesis scheme. Mutant amplicons were transfected into cells with piggyBac transposase and integrated calling cards were collected. Nucleotide frequency for each mutagenized position of integrated SRTs were calculated. Nucleotide frequency at (D) position 1, (E) position 2 and (F) position 3 of integrated mutated SRTs. Wild-type sequences are outlined in red. All four possible nucleotides were well-represented at all three mutated positions. IR: internal repeat. TR: terminal repeat. EF1a: eukaryotic translation elongation factor 1 α promoter. SRT: self-reporting transposon. nt: nucleotide. kb: kilobase. PuroR: puromycin resistance cassette. WT: wild-type. Mut: mutant.

Figure and caption from: Matthew Lalli, Allen Yen, Urvashi Thopte, Fengping Dong, Arnav Moudgil, Xuhua Chen, Jeffrey Milbrandt, Joseph D Dougherty, Robi D Mitra, Measuring transcription factor binding and gene expression using barcoded self-reporting transposon calling cards and transcriptomes

# The Calling Cards Processing Pipeline A graphic comparing Calling Cards to ChIP-seq

The [`nf-core/callingcards`][nf-core-callingcards-link] processing pipeline for yeast (_s. cerevisiae_) and mammalian experiments is described in [Calling Cards: A Customizable Platform to Longitudinally Record Protein-DNA Interactions Over Time in Cells and Tissues][calling-cards-workflow-paper-link] by Yen et al. Briefly, in both the yeast and mammalian systems, the first step is to process reads by trimming off the calling cards barcode sequence. This is appended to the read ID of the fastq file and is utilized in downstream processing. In yeast, if the experiment is multiplexed, the reads are split into separate fastq files according to the barcode which identifies the TF. In both yeast and mammals, the fastq files may be split for parallel processing at this point. In the next step, the genome is processed according to the selected aligner (bwa, bwa-mem, bowtie and bowtie2 are the current options). A BED file describing regions of the genome to hard mask may be included here – this is important for yeast, in particular, in order to avoid aligning Sir4 reads to the genome erroneously. However, this BED file is already provided in the pipeline assets – simply set the `-profile yeast` in order to correctly set the genome fasta, gtf and masking regions. Aftering aligning the reads, the alignments are scanned to validate the calling cards insertions. Preliminary QC metrics are computed, and the data is output in a [qBED format][qbed-paper-link], a modified BED6 format, for downstream processing. # Current Development and Future Directions Current development in the yeast system focuses on the [callingCardsTools][callingcardstools-github-link] python package, which handles the calling cards specific read parsing and quantification. More robust QC metrics are in the works, as are recommended methods for handing replicates. The Mitra lab is actively working on the mammalian system, including in [single cell based assays][callingcards-single-cell-paper-link]. [nf-core-callingcards-link]: https://nf-co.re/callingcards [calling-cards-workflow-paper-link]: https://doi.org/10.1002/cpz1.883 [qbed-paper-link]: https://pubmed.ncbi.nlm.nih.gov/32941613/ [callingcardstools-github-link]: https://github.com/cmatKhan/callingCardsTools [callingcards-single-cell-paper-link]: https://pubmed.ncbi.nlm.nih.gov/32710817/

Maintainers Minutes: July 2024

Wed, 03 Jul 2024 10:00:00 GMT

The 'Maintainers Minutes' aims to further give insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly team meetings. ## End of the subworkflow naming saga? 📚 The first point of order was reviewing our homework from last time. We reviewed various proposals put forward for how nf-core should name our subworkflows, and after comparison, it was decided to go with a less strict or structured name, and go with more flexible and descriptive names. This allows subworkflows to be understandable in what they do, without making them too long. In general short subworkflows with only one tool option in each step can simply list the main modules used, whereas longer subworkflows with more option can be descriptive without explicitly listing all modules. To compensate the loss of detail, Matthias (@mashehu) will investigate adding more metadata and keywords to the [nf-core subworkflow](https://nf-co.re/subworkflows) page for findability purposes. ✍ Jon (@pinin4fjords) has been tasked formalising the discussions in proper guidelines, and will be posted on the nf-core website [guidelines page](https://nf-co.re/docs/guidelines/components/subworkflows) soon. ## Daisy chains 🌻 We discussed a recent pipeline proposal from Maxime (@maxulysse) which acts as one of the first nf-core forays into 'daisy-chaining' pipelines (i.e., importing pipelines into other pipelines). In this case the maintainers team collectively decided against making it an official pipeline at the moment. This was primarily due to concerns about a current lack of tooling and infrastructure in this context, particularly in how would documentation be 'merged' together, but also due to unresolved 'philosophical' discussions on what is the distinction between a subworkflow versus a workflow. 🤔 However given the burning need by the community to have such functionality, we decided to set up the first (of two!) 'working groups', lead by Rike (@FriederikeHanssen), to address this question. Feel free to join [#wg-meta-pipelines](https://nfcore.slack.com/archives/C07B5FK9GKA) join in. ## Cleaning up our closet 🧹 The second working group, named 'test-data task-force' was set up to clean up the nf-core test-data repository. In various areas of nf-core, it has been brought up that submitting and using the test-data repository is not optimal. Community members have said they find the turnaround time for reviews on the test-data repository take too long. Furthermore, members have said that knowing where to put, and then where to find existing test-data is hard to find due to insufficient documentation. In particular, the `delete_me/` folder has ballooned turning into a wild-west, contributing to the difficulties finding data. ⛏ This working group will be led by Simon (@SPPearce), and if you want to give your input or get involved, you can join the [#wg-test-data-task-force](https://nfcore.slack.com/archives/C07B5FK9GKA) slack channel. ## Ensuring quality ☁️ Finally, there was a discussion about AWS megatests runs (runs of full-sized 'realistic' data executed on each pipeline release) regularly failing. It was brought up this often happens due to conflicts with Seqera's [Fusion](https://seqera.io/fusion/) file system, but more critically - these were only being discovered _after_ release. It was decided that documentation will be improved and that pipeline developers will be made aware they are allowed to trigger (a limited number) of manual full-test run. 🤖 Matthias (@mashehu) would also investigate to see if we can semi-automate a test being triggered prior release (without them executing too many times). ## Upcoming discussions Next meeting we will be reviewing the progress of the various new task-forces established during the meeting. \- :heart: from your #maintainers team!

Faster nf-core website builds

Wed, 03 Jul 2024 06:00:00 GMT

import subsites from "@assets/images/blog/new-website-structure/sub-site-meme.png"; import horsesizedduck from "@assets/images/blog/new-website-structure/horse-sized-duck.jpg"; import harderstronger from "@assets/images/blog/new-website-structure/harder-stronger.gif"; import { Image } from "astro:assets"; # Build big Nearly two years ago we rebuilt the nf-core website from the ground up with [Astro](https://astro.build/). The website is now comprised of over 6500 pages, featuring content for every pipeline release as well as blog posts, documentation, events and more. All these pages are built and deployed to the live server twice a day, in addition to building previews for every pull request against the main branch. The build process has become longer and longer as the site grows, eventually took over 10 minutes. We could reduce the build time a bit, but more than 5 minutes is an annoyingly long time to wait for a deployment preview, and also a waste a lot of resources when the majority of pages are unchanged. # Divide and conquer To address this problem, we came up with a new website structure that will allow us to build the website in a more modular way. We landed on a monorepo/microsite structure. a drawing of a large duck standing next to several tiny horses and the word 'vs' in-between.
The duck has the title 'old website structure' and the horses have the title 'new structure'.

a drawing of a large duck standing next to several tiny horses and the word 'vs' in-between.
The duck has the title 'old website structure' and the horses have the title 'new structure'.

The term "monorepo" typically refers to a setup where multiple code bases coexist within a single repository. We chose to divide our website into several segments, focusing on those that frequently change and operate independently from the rest. The website is now really five separate websites: - [main-site](https://github.com/nf-core/website/tree/main/sites/main-site) - [pipelines](https://github.com/nf-core/website/tree/main/sites/pipelines) - [modules and subworkflows](https://github.com/nf-core/website/tree/main/sites/modules-subworkflows) - [documentation](https://github.com/nf-core/website/tree/main/sites/docs) - [configs](https://github.com/nf-core/website/tree/main/sites/configs) These are hosted in separate Netlify deployments which are stitched together with redirect rules. To the end user the experience is the same, but the build steps can now be handled independently from one another. Finally, we separated the [pipeline results](https://github.com/nf-core/website/tree/main/sites/pipeline-results) pages, which are server-side rendered per request instead of being generated during the build step like the rest of the pages. Three headed dragon meme cartoon, with the title 'nf-co.re sub-sites'. The left head looks angry and has the title 'pipelines', the second head has the title 'docs' and looks stern at the third head, which looks very goofy and has the title 'pipeline-results'.

Three headed dragon meme cartoon, with the title 'nf-co.re sub-sites'. The left head looks angry and has the title 'pipelines', the second head has the title 'docs' and looks stern at the third head, which looks very goofy and has the title 'pipeline-results'.

# Results: build fast The results speak for themselves. This plot shows the build times for each sub-site, compared to the old site structure: ```mermaid --- config: xyChart: width: 1000 height: 600 themeVariables: xyChart: plotColorPalette: '#495057,#1a9655' backgroundColor: 'transparent' --- xychart-beta x-axis "sites" ["old structure", "sites/configs", "sites/docs", "sites/main-sites", "sites/pipelines", "sites/pipelin-releases"] y-axis "average build times [in minutes]" 0.5 --> 10 bar [8, 0.85, 1.5, 1.85, 4.3, 1.5] bar [0, 0.85, 1.5, 1.85, 4.3, 1.5] ``` Whilst the total build time is about the same, each sub-site now only builds when needed, and can run in parallel. The end result is that developers only need to wait for a fraction of the time to see their deployment previews. # Technical setup We are using [npm workspaces](https://docs.npmjs.com/cli/v7/using-npm/workspaces) to handle the dependencies in our monorepo structure. We tried [pnpm](https://pnpm.io/) which should be even faster and better optimised for this task, but never got it to run completely, whilst _npm workspaces_ worked out of the box. Every sub-site is a complete Astro project with its own `astro.config.mjs` and `package.json`, but the components and layouts are shared. We accomplished this by keeping them in `sites/main-site/src` and setting aliases for the relative links to `sites/main-site` in the respective sub-sites. This kept the required code changes to a minimum. Even though each of these sub-sites are their own entity on our host Netlify, they are all reachable under the same `nf-co.re` domain. This is achieved by setting up a redirect rules with a `200` redirect status, which is a silent redirect, meaning the URL in the browser does not change. Data flow for a request to the homepage: ```mermaid flowchart LR %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% A[fas:fa-link https://nf-co.re] --> B(fas:fa-server sites/main-site) ``` Data flow between sub-sites for a request to a pipeline page: ```mermaid flowchart LR %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% E(fas:fa-server sites/main-site) --fetching components--> D(fas:fa-server sites/pipelines) C[fas:fa-link https://nf-co.re/rnaseq/latest/docs/usage] --> B(fas:fa-server sites/main-site) --redirects silently to --> D ``` # Changes to the development workflow All npm commands now should specify the workspace they are run with. Instead of running the development server with `npm run dev{:bash}` you need to now run it for a specific sub-site. For example, to see the changes to a blog entry, you should now run: ```bash npm run dev -w sites/main-sites ``` :::warn Be aware that pages not part of these sub-site will throw a 404 error. This only happens when previewing a single sub-site and should not happen on the final built website. ::: # Additional Changes We couldn't update to Astro 4 with the old website structure, because it increased the build time to > 20 minutes leading to a timeout on netlify. With the new structure, we are now able to use newer Astro versions and benefit from the new features and optimizations, including [content collection caching](https://astro.build/blog/astro-350/#content-collections-build-cache-experimental), decreasing the build time for repeated builds even more. Flashing words from the Lyrics of Daft Punk's 'Harder, Better, Faster, Stronger' on a black background. The words are 'Work it harder, make it better, do it faster, makes us stronger, more than ever, hour after hour, work is never over.'

Flashing words from the Lyrics of Daft Punk's 'Harder, Better, Faster, Stronger' on a black background. The words are 'Work it harder, make it better, do it faster, makes us stronger, more than ever, hour after hour, work is never over.'

nf-core/fastquorum - first release!

Sat, 22 Jun 2024 23:19:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; import but_why from "@assets/images/blog/fastquorum/but_why.gif"; import i_want_the_truth from "@assets/images/blog/fastquorum/i_want_the_truth.gif"; import darwin from "@assets/images/blog/fastquorum/darwin.png"; import duplex_sequencing_schema from "@assets/images/blog/fastquorum/duplex_sequencing_schema.png"; import explaining_umis from "@assets/images/blog/fastquorum/explaining_umis.png"; import bioconda_download from "@assets/images/blog/fastquorum/bioconda_download.png"; import big_deal from "@assets/images/blog/fastquorum/big_deal.jpg"; import fastquorum_diagram from "@assets/images/blog/fastquorum/fastquorum_diagram.png"; import fulcrumgenomics_logo from "@assets/images/blog/fastquorum/fulcrumgenomics.svg"; import { Image } from "astro:assets"; We are thrilled to announce the first release of the [`nf-core/fastquorum`][nf-core-fastquorum-link] pipeline, which implements the [fgbio Best Practices FASTQ to Consensus Pipeline][fgbio-best-practices-link] to produce consensus reads using unique molecular indexes/barcodes (UMIs). Developed by [Nils Homer][nilshomer-linkedin-link] at [Fulcrum Genomics][fulcrum-genomics-link], the pipeline utilizes the [`fgbio` Bioinformatics toolkit][fgbio-link] to enable producing ultra-accurate reads for low frequency variant detection in Genomics/DNA Sequencing. # But Why? Ryan Reynolds asking 'But why?'

As described in [Salk et al 2018][salk-2018-link], finding the one in a thousand frequency variant, or even the one in the million frequency variant, is extremely important across a diversity of applications, including but not limited to: 1. Cancer (ctDNA) 2. Prenatal diagnosis (cffDNA) 3. Mutagenesis 4. Aging 5. Antimicrobial Resistance 6. Forensics # Sequencing Errors Obscure the Truth Tom Cruise in 'A Few Good Men' saying 'I want the truth'

Tom Cruise in 'A Few Good Men' saying 'I want the truth'

Sequencing error, and errors from the library preparation process itself, makes it extremely difficult to obtain this accuracy at such incredible resolution. # Molecular Barcoding to the Rescue

Charles Darwin in three pictures: blurry on the left showing vanilla NGS, in the middle slightly more clear with single-strand error-corrected NGS, and on the right very clear with duplex sequencing. From : Novel DNA Standards for Assessing Technical Sensitivity and Reproducibility Duplex Sequencing Mutagenesis Assays. TwinStrand Biosciences. Retrieved May 20 2024.

Obtaining such high accuracy has been achieved through molecular barcoding that enables squashing random error through multiple observations of a single DNA source molecule. These molecular barcodes are commonly referred to as Unique Molecular Indexes, or UMIs. They are attached to a DNA source molecule to uniquely identify it. After amplification, the multiple observations can be compared to vote on a consensus. Or form a quorum, if you will.

Duplex Sequencing Schema from Kennedy et al 2014

Molecular barcodes can be added at various points in the library preparation process, including prior to or during amplification, for a single strand or to both strands of a double-stranded duplex molecule. In particular, [Duplex Sequencing][duplex-sequencing-link] can be used to identify reads from both strands of a double-stranded source molecule, squashing strand-specific errors that can occur during library preparation (e.g. PCR errors, oxidative damage due to hybrid capture). This means UMIs can be found in the index/sample-barcoding reads (i7/i5 reads in Illumina parlance), or inline in the genomic/template reads themselves (R1/R2). Furthermore, there sometimes exists extra “spacer” sequence which should be removed prior to alignment and downstream analysis. Always Sunny in Philadelphia conspiracy meme with caption 'Explaining where UMI bases are found'

Always Sunny in Philadelphia conspiracy meme with caption 'Explaining where UMI bases are found'

In [fgbio][fgbio-link], as well as in [fqtk][fqtk-link], [sgdemux][sgdemux-link], and also in [Picard][picard-link], [Read Structures][read-structure-link] are used to describe how the bases in a sequencing run should be allocated into logical reads. It serves a similar purpose to the `--use-bases-mask` in Illumina's `bcltofastq` software, but provides some additional capabilities. The following [handful of examples from the fgbio wiki][read-structure-examples-link] demonstrate the recommended way to describe a sequencing run in two different ways. Firstly as a single Read Structure for the entire run as you might use with [`picard IlluminaBasecallsToSam`][picard-illuminabasecallstosam-link], and secondly as a set of Read Structures that would map one-to-one with the physical reads after fastq-conversion and optionally adapter trimming (which will create variable length reads). A few examples: 1. A simple 2x150bp paired end run with no sample or molecular indices: - `150T150T` - `[+T, +T]` 2. A 2x75bp paired end run with an 8bp I1 index read: - `75T8B75T` - `[+T, 8B, +T]` 3. A 2x150bp paired end run with an 8bp I1 index read and an inline 6bp UMI in read 1: - `8M142T8B150T` - `[8M+T, 8B, +T]` 4. A 2x150bp duplex sequencing run with dual sample-barcoding (I1 and I2) and both a 10bp UMI and 5bp monotemplate at the start of both R1 and R2: - `10M5S135T8B8B10M5S135T` - `[10M5S+T, 8B, 8B, 10M5S+T]` By utilizing the [fgbio toolkit][fgbio-homepage-link] and the [Read Structure][read-structure-link] description, [nf-core/fastquorum][nf-core-fastquorum-link] is able to support the diversity of molecular barcoding schemes. A few that are commercially available are described below: | Assay | Company | Strand | Randomness | URL | | --------------------------------------------------------- | --------------------------- | ------ | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | SureSelect XT HS | Agilent Technologies | Single | Random | [link](https://www.agilent.com/en/product/next-generation-sequencing/ngs-library-prep-target-enrichment-reagents/dna-seq-reagents/sureselectxt-hs-reagent-kits-4252208) | | SureSelect XT HS2 (MBC) | Agilent Technologies | Dual | Random | [link](https://www.agilent.com/en/product/next-generation-sequencing/ngs-library-prep-target-enrichment-reagents/dna-seq-reagents/sureselect-xt-hs2-dna-reagent-kit-4252207) | | TruSight Oncology (TSO) | Illumina | Dual | Nonrandom | [link](https://www.illumina.com/products/by-type/clinical-research-products/trusight-oncology-umi.html) | | xGen dual index UMI Adapters | Integrated DNA Technologies | Single | Random | [link](https://www.idtdna.com/pages/products/next-generation-sequencing/workflow/xgen-ngs-library-preparation/ngs-adapters-indexing-primers/adapters-indexing-primers-for-illumina) | | xGen Prism (xGen cfDNA & FFPE DNA Library Prep MC v2 Kit) | Integrated DNA Technologies | Dual | Nonrandom | [link](https://www.idtdna.com/pages/products/next-generation-sequencing/workflow/xgen-ngs-library-preparation/dna-library-preparation/cfdna-ffpe-prep-kit) | | NEBNext | New England Biosciences | Single | Random | [link](https://www.neb.com/en-us/products/e7874nebnext-multiplex-oligos-for-illumina-unique-dual-index-umi-adaptors-dna-set-2) | | AML MRD | TwinStrand Biosciences | Dual | Random | [link](https://twinstrandbio.com/aml-assay/) | | Mutagenesis | TwinStrand Biosciences | Dual | Random | [link](https://twinstrandbio.com/mutagenesis-assay/) | | UMI Adapter System | Twist Biosciences | Dual | Random | [link](https://www.twistbioscience.com/products/ngs/library-preparation/twist-umi-adapter-system) | We are working with some of the vendors above to verify that their assays are supported by this pipeline by obtaining test/example data, along with appropriate parameters with which to run this pipeline. # fgbio: the Fulcrum Genomics Bioinformatics toolkit That brings us to [fgbio][fgbio-github-link]. Similar to popular Bioinformatic toolkits samtools, the GATK, and bedtools, fgbio is a collection of command line tools developed by Fulcrum Genomics][fulcrum-genomics-link] to analyze primary Genomics data. Since its conception in 2015, `fgbio` has been downloaded from [Bioconda][fgbio-bioconda-link] over three-hundred thousand times, and we use it extensively with our clients at [Fulcrum Genomics][fulcrum-genomics-link]. Chart of Bioconda downloads placing fgbio with three-hundred-thousand downloads

Chart of Bioconda downloads placing fgbio with three-hundred-thousand downloads

Will Ferrell in Anchorman meme with caption '>300K downloads on BioConda, we are kind of a big deal'

The [fgbio toolkit has a wide variety of available tools][fgbio-list-of-tools-link], with many tools producing tabular [quality control metrics][fgbio-list-of-metrics-link]. Particularly relevant for the [`nf-core/fastquorum`][nf-core-fastquorum-link] pipeline are the tools for working with read-level data containing these unique molecular indexes. Click the arrows below to expand specific topic areas for the tools: {/* prettier-ignore */}

Tools for working with Unique Molecular Indexes (UMIs, aka Molecular IDs/Barcodes):

- Annotating/Extract UMIs from read-level data: [`FastqToBam`][fgbio-fastqtobam-link], [`AnnotateBamWithUmis`][fgbio-annotatebamwithumis-link], [`ExtractUmisFromBam`][fgbio-extractumisfrombam-link], and [`CopyUmiFromReadName`][fgbio-copyumifromreadname-link]. - Manipulate read-level data containing UMIs: [`CorrectUmis`][fgbio-correctumis-link], [`GroupReadsByUmi`][fgbio-groupreadsbyumi-link], [`CallMolecularConsensusReads`][fgbio-callmolecularconsensusreads-link], [`CallDuplexConsensusReads`][fgbio-callduplexconsensusreads-link], and [`FilterConsensusReads`][fgbio-filterconsensusreads-link]. - Collect metrics and review consensus reads: [`CollectDuplexSeqMetrics`][fgbio-collectduplexseqmetrics-link] and [`ReviewConsensusVariants`][fgbio-reviewconsensusvariants-link].

{/* prettier-ignore */}

Tools to manipulate read-level data:

- FASTQ Manipulation: [`FastqToBam`][fgbio-fastqtobam-link], [`ZipperBams`][fgbio-zipperbams-link], and [`DemuxFastqs`][fgbio-demuxfastqs-link] (see [`fqtk`][fqtk-link], our rust re-implementation for sample demultiplexing). - Filter, clip, randomize, sort, and update metadata for read-level data: [`FilterBam`][fgbio-filterbam-link], [`ClipBam`][fgbio-clipbam-link], [`RandomizeBam`][fgbio-randomizebam-link], [`SortBam`][fgbio-sortbam-link], [`SetMateInformation`][fgbio-setmateinformation-link] and [`UpdateReadGroups`][fgbio-updatereadgroups-link].

{/* prettier-ignore */}

Tools for quality control assessment:

- Detailed substitution error rate evaluation: [`ErrorRateByReadPosition`][fgbio-errorratebyreadposition-link]. - Sample pooling QC: [`EstimatePoolingFractions`][fgbio-estimatepoolingfractions-link]. - Splice-aware insert size QC for RNA-seq libraries: [`EstimateRnaSeqInsertSize`][fgbio-estimaternaseqinsertsize-link].

{/* prettier-ignore */}

Tools for adding or manipulating alternate contig names:

- Extract from a NCBI Assembly Report: [`CollectAlternateContigNames`][fgbio-collectalternatecontignames-link]. - Update contig names in common file formats: [`UpdateFastaContigNames`][fgbio-updatefastacontignames-link], [`UpdateVcfContigNames`][fgbio-updatevcfcontignames-link], [`UpdateGffContigNames`][fgbio-updategffcontignames-link], [`UpdateIntervalListContigNames`][fgbio-updateintervallistcontignames-link], [`UpdateDelimitedFileContigNames`][fgbio-updatedelimitedfilecontignames-link].

{/* prettier-ignore */}

Miscellaneous tools:

- Pick molecular indices (ex. sample barcodes, or molecular indexes): [`PickIlluminaIndices`][fgbio-pickilluminaindices-link] and [`PickLongIndices`][fgbio-picklongindices-link]. - Find technical/synthetic, or switch-back sequences in read-level data: [`FindTechnicalReads`][fgbio-findtechnicalreads-link] and [`FindSwitchbackReads`][fgbio-findswitchbackreads-link]. - Make synthetic mixture VCFs: [`MakeMixtureVcf`][fgbio-makemixturevcf-link] and [`MakeTwoSampleMixtureVcf`][fgbio-maketwosamplemixturevcf-link].

# Pipeline Overview To support the various molecular barcoding schemes, [`nf-core/fastquorum`][nf-core-fastquorum-link] is organized into two main phases, according to the fgbio best practices: grouping and consensus calling. The pipeline steps for each phase can be best explained with a metro map 🚇: Subway diagram of the fastquorum pipeline

Subway diagram of the fastquorum pipeline

Thank you to [James Fellows Yates][james-fellows-yates-link] for these diagrams! # Phase 1: Pre-processing and Grouping The first phase takes FASTQs as input, and performs the following steps: 1. Performs basic Quality Control (with [`FASTQC`][fastqc-link]). 2. Extracts the UMI bases based on the molecular barcoding scheme (with [`fgbio FastqToBam`][fgbio-fastqtobam-link]). 3. Aligns the raw reads to the genome (with [`bwa`][bwa-link], [`samtools`][samtools-link], and [`fgbio ZipperBams`][fgbio-zipperbams-link]). 4. Then groups them by genomic coordinate and UMI (with [`fgbio GroupReadsByUmi`][fgbio-groupreadsbyumi-link]). This produces the grouped BAM file, where raw reads originating from the same original source molecule are grouped together and tagged. # Phase 2: Consensus Calling The second phase takes the grouped BAM from phase one, and performs the following steps: 1. For each group of raw reads, calls a consensus sequence, thereby eliminating random errors, significantly improving the accuracy of resulting data. - [`fgbio CallMolecularConsensusReads`][fgbio-callmolecularconsensusreads-link] is used for single-strand UMI schemes, while [`fgbio CallDuplexConsensusReads`][fgbio-callduplexconsensusreads-link] is used for duplex sequencing. 2. The consensus reads are aligned back to the genome (with [`bwa`][bwa-link], [`samtools`][samtools-link], and [`fgbio ZipperBams`][fgbio-zipperbams-link]). 3. The consensus reads are filtered based on various properties, such as minimum per-molecule or per-base coverage (with [`fgbio FilterConsensusReads`][fgbio-filterconsensusreads-link]). The filtered consensus BAM is ready for downstream analysis, such as variant calling. Importantly, this _R&D version_ of the second phase allows users to test various tool-level parameters, to optimize them for their data. The second phase also has a _High-Throughput_ version, for when performance and throughput take precedence over flexibility. This version consensus calls and filters in one step, thereby reducing the number of consensus reads that need to be aligned as well as reducing the number of files that are written to disk. # Release Early, Release Often ... and listen to your nf-core customers. With the help of some very responsive nf-core maintainers and members, [`nf-core/fastquorum`][nf-core-fastquorum-link] had its [first official release][nf-core-fastquorum-first-release-link] at the [Nextflow Summit in Boston (2024)][nextflow-summit-boston-2024-link]. It supports single-strand and duplex sequencing data, both the R&D and high-throughput fgbio best practices, combining FASTQs across runs and lanes, as well as flexible support for molecular barcoding or UMI schemes. For more documentation, head over to the [`nf-core/fastquorum`][nf-core-fastquorum-link] page, visit the [`fgbio` toolkit homepage][fgbio-homepage-link] and [`fgbio` wiki][fgbio-wiki-link], and join us on our [Slack Channel][nf-core-fastquorum-slack-link]. # It Takes a Nextflow Village A number of organizations have sponsored and contributed to the development of the [`fgbio` toolkit][fgbio-github-link] including [Fulcrum Genomics][fulcrum-genomics-link], [TwinStrand Biosciences][twinstrand-biosciences-link], and [Integrated DNA Technologies][idt-link]. Both [Fulcrum Genomics][fulcrum-genomics-link] and its [current and past team members][fulcrum-genomics-about-link], along with the nf-core community of maintainers, core team, and contributors have enabled [`nf-core/fastquorum`][nf-core-fastquorum-link]'s first release. | Fulcrum Genomics | TwinStrand Biosciences | `nf-core` Community | | ---------------- | ---------------------- | ------------------- | | Nils Homer | Michael Hipp | Simon Pearce | | Tim Fennell | John McGuigan | Adam Talbot | | Clint Valentine | Thomas Smith | Chad Young | | Yossi Farjoun | Robert N. Azad | Peter Hickey | | Jay Carey | | Brad Langhorst | | Kari Stromhaug | | Jordi Camps | | Nathan Roach | | Brent Pedersen |

If you want to hear this all over again, please check out the recent talk announcing the first release of this pipeline at the Nextflow Summit in Boston (2024): [nf-core-fastquorum-link]: https://nf-co.re/fastquorum [fgbio-best-practices-link]: https://github.com/fulcrumgenomics/fgbio/blob/main/docs/best-practice-consensus-pipeline.md [fulcrum-genomics-link]: https://www.fulcrumgenomics.com/ [fgbio-link]: https://github.com/fulcrumgenomics/fgbio [salk-2018-link]: https://doi.org/10.1038/nrg.2017.117 [twinstrand-poster-link]: https://twinstrandbio.com/wp-content/uploads/EMGS-2023-Novel-DNA-Standards-for-Assessing-Technical-Sensitivity-and-Reproducibility-Duplex-Sequencing-Mutagenesis-Assays-1.pdf [duplex-sequencing-link]: https://en.wikipedia.org/wiki/Duplex_sequencing [kennedy-2014-link]: https://doi.org/10.1038/nprot.2014.170 [read-structure-link]: https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures [read-structure-examples-link]: https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures#examples [fgbio-homepage-link]: https://fulcrumgenomics.github.io/fgbio/ [fgbio-github-link]: https://github.com/fulcrumgenomics/fgbio [fgbio-bioconda-link]: https://bioconda.github.io/recipes/fgbio/README.html [fgbio-list-of-tools-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/ [fgbio-list-of-metrics-link]: https://fulcrumgenomics.github.io/fgbio/metrics/latest/ [fqtk-link]: https://github.com/fulcrumgenomics/fqtk [james-fellows-yates-link]: https://www.jafy.eu/ [fastqc-link]: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ [fgbio-fastqtobam-link]: http://fulcrumgenomics.github.io/fgbio/tools/latest/FastqToBam.html [bwa-link]: https://github.com/lh3/bwa [samtools-link]: https://www.htslib.org/ [fgbio-zipperbams-link]: http://fulcrumgenomics.github.io/fgbio/tools/latest/ZipperBams.html [fgbio-groupreadsbyumi-link]: http://fulcrumgenomics.github.io/fgbio/tools/latest/GroupReadsByUmi.html [fgbio-callmolecularconsensusreads-link]: http://fulcrumgenomics.github.io/fgbio/tools/latest/CallMolecularConsensusReads.html [fgbio-callduplexconsensusreads-link]: http://fulcrumgenomics.github.io/fgbio/tools/latest/CallDuplexConsensusReads.html [fgbio-filterconsensusreads-link]: http://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html [nf-core-fastquorum-first-release-link]: https://github.com/nf-core/fastquorum/releases/tag/1.0.0 [nextflow-summit-boston-2024-link]: https://summit.nextflow.io/2024/boston/ [fgbio-homepage-link]: http://fulcrumgenomics.github.io/fgbio/ [fgbio-wiki-link]: https://github.com/fulcrumgenomics/fgbio/wiki [nf-core-fastquorum-slack-link]: https://nfcore.slack.com/archives/C0453Q2SFCM [twinstrand-biosciences-link]: https://twinstrandbio.com/ [idt-link]: https://www.idtdna.com/ [fulcrum-genomics-about-link]: https://fulcrumgenomics.com/about [fgbio-fastqtobam-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/FastqToBam.html [fgbio-annotatebamwithumis-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html [fgbio-extractumisfrombam-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/ExtractUmisFromBam.html [fgbio-copyumifromreadname-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/CopyUmiFromReadName.html [fgbio-correctumis-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/CorrectUmis.html [fgbio-groupreadsbyumi-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/GroupReadsByUmi.html [fgbio-callmolecularconsensusreads-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/CallMolecularConsensusReads.html [fgbio-callduplexconsensusreads-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/CallDuplexConsensusReads.html [fgbio-filterconsensusreads-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html [fgbio-collectduplexseqmetrics-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/CollectDuplexSeqMetrics.html [fgbio-reviewconsensusvariants-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/ReviewConsensusVariants.html [fgbio-fastqtobam-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/FastqToBam.html [fgbio-zipperbams-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/ZipperBams.html [fgbio-demuxfastqs-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/DemuxFastqs.html [fgbio-filterbam-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/FilterBam.html [fgbio-clipbam-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/ClipBam.html [fgbio-randomizebam-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/RandomizeBam.html [fgbio-setmateinformation-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/SetMateInformation.html [fgbio-updatereadgroups-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/UpdateReadGroups.html [fgbio-collectalternatecontignames-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/CollectAlternateContigNames.html [fgbio-updatefastacontignames-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/UpdateFastaContigNames.html [fgbio-updatevcfcontignames-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/UpdateVcfContigNames.html [fgbio-updategffcontignames-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/UpdateGffContigNames.html [fgbio-updateintervallistcontignames-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/UpdateIntervalListContigNames.html [fgbio-updatedelimitedfilecontignames-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/UpdateDelimitedFileContigNames.html [fgbio-errorratebyreadposition-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/ErrorRateByReadPosition.html [fgbio-estimatepoolingfractions-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/EstimatePoolingFractions.html [fgbio-estimaternaseqinsertsize-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/EstimateRnaSeqInsertSize.html [fgbio-pickilluminaindices-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/PickIlluminaIndices.html [fgbio-picklongindices-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/PickLongIndices.html [fgbio-findtechnicalreads-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/FastqToBam.html [fgbio-sortbam-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/SortBam.html [fgbio-makemixturevcf-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/MakeMixtureVcf.html [fgbio-maketwosamplemixturevcf-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/MakeTwoSampleMixtureVcf.html [fgbio-findswitchbackreads-link]: https://fulcrumgenomics.github.io/fgbio/tools/latest/FindSwitchbackReads.html [picard-link]: https://broadinstitute.github.io/picard/ [picard-illuminabasecallstosam-link]: https://broadinstitute.github.io/picard/command-line-overview.html#IlluminaBasecallsToSam [sgdemux-link]: https://github.com/Singular-Genomics/singular-demux [nilshomer-linkedin-link]: https://www.linkedin.com/in/nilshomer/

Introducing nf-core office hours

Thu, 13 Jun 2024 15:14:00 GMT

# nf-core office hours Slack is great, but sometimes you can't beat sitting down and chatting to someone about your code over a cup of coffee. We are thrilled to introduce _nf-core office hours_, weekly drop-in sessions designed for nf-core developers seeking an opportunity to connect, chat, and work together. # What you can expect nf-core office hours will be held online at regular times, open to anyone to drop in and out as they'd like. It can be nice to just sit with some other people working on similar stuff (even virtually). And of course it's a great opportunity to chat with others about Nextflow and nf-core development. We expect that nf-core office hours will be most useful for “advanced beginners” and up - people who have grasped the fundamentals of writing nextflow pipelines, and are now trying to make the leap to designing, coding, and contributing nf-core components and pipelines. Please note that it's not designed to be training or mentorships (we have programs for both of those) and there is no guarantee of who will be there and how much help they'll be able to give you. However, regardless of what stage you are at, if you think that spending some time talking about developing with nf-core sparks your interest, then the office hours might be just what you're looking for! In these sessions, some of the nf-core maintainer team and other seasoned community members will be available to engage in discussions about your code. They will hang out in Gather during designated hours every week, ready to provide guidance and insights into your nf-core code development. [Gather](https://gather.town/) is an online collaborate platform, kind of like Pokemon-meets-Zoom. You can get an invite to the nf-core Gather space by going to the `#gather-town` channel on the nf-core Slack, writing a message and dropping the Gather emoji - you'll get an automated message with details. For more information, see the [Gather Town bytesize talk](/events/2022/bytesize-37-gathertown). Office hour sessions won't be structured. Some example of things that might be discussed include: - A specific concept doesn't make sense and you'd like to talk it over - You're working on a component or pipeline, you're stuck, and you'd like to get unstuck - You'd like to swap code reviews with someone to get your PR merged - You're excited about code and your colleages are tired of hearing about it # Schedule We are starting a 8 week trial of _nf-core office hours_ while we evaluate the program. We will focus on the European and American time zones initially. - Tuesdays, 2 PM Eastern Time - Thursdays, 11 AM Central European Summer Time After the trial we will collect feedback and evaluate if and how we might improve it. Please join the `#office-hours` channel on the nf-core Slack to get reminders when sessions are about to start, and discuss the format / provide feedback.

Maintainers Minutes: June 2024

Fri, 07 Jun 2024 15:00:00 GMT

import Profile from "@components/GitHubProfilePictureExtended.astro"; import sad_maintainers_duck from "@assets/images/blog/maintainers-minutes-2024-06-07/sad-maintainers-duck.png"; import { Image } from "astro:assets"; In this new initiative, the 'Maintainers Minutes' aims to further give insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly Maintainer's team meetings. ## Goodbyes and welcomes! Sad looking cartoon yellow rubber duck with nf-core logo badge on its body, with a large blue tear coming out of its eye.

The maintainers team has been now running for over a year and half, and it is time to expand the team to keep up with the community! We first would like to say thank you to the following people for their service and contributions during their terms, and will be becoming maintainers team alumni: - Louisa Perelo - Daniel Lundin - Matthieu Muffato - Alex Peltzer And we are happy to announce that the following people were nominated by current members of the team as being highly active community members, and have accepted to join in keeping the community healthy: Carson Miller Florian Wuennemann Igor Trujnara Joon Klaps Lili Andersson-Li Luisa Santus Mahesh Binzer-Panchal Nico Trummer ## Still naming things... We once again again spent some time talking about the naming of subworkflows, talking about the tricky balance of keeping findability but conciseness in understanding what a particular subworkflow does. There was general agreement that actually the _length_ of the subworkflow name isn't as important as ensuring knowing exactly what the subworkflow does. In some cases it was argued that describing every operation and tool of the subworkflow was unnecessary and made it harder to understand. One factor brought up is in complex subworkflows the current '\_$tool` suffix can make subworkflow names very long, and doesn't provide much information compared to active 'verbs'. However it was countered that you may have multiple subworkflows that do similar things (e.g. deduplication), so having the tool name performing the activity can be useful. We agreed as 'homework' for each team member to independently rename the '[preprocess_rnaseq](https://github.com/nf-core/modules/blob/master/subworkflows/nf-core/preprocess_rnaseq/main.nf)', in their preferred manner, and see if we can find a majority consensus of a naming specification.. If you also want to participate, please let us know of your own ideas on the [#subworkflow](https://nfcore.slack.com/archives/C03D38JNFNJ) channel on the nf-core slack! ## Improving new pipeline proposals experience A point recently raised by a few of core team members has been that the experiences of people submitting new pipeline proposals can be quite variable. Currently new proposals require two core team members to approve the proposal (and you guessed it, the name!). However, as the community grows so does the number of proposals. As the number of proposals increase, so does the diversity of the types of analyses the proposals are addressing. This increased workload, coupled with the backgrounds of the core team not being able to cover all areas of biology, has lead to a slow down and somtimes 'stalling' of proposals while waiting for core team members to make a decision. To improve the experience of proposers by decreasing turnaround time, and reducing the load on the core team on evaluating proposals outside their 'comfort zone', the maintainers team has agreed to help monitor the #new-pipelines-proposal channel and evaluate each new submission. From now one each proposal can have one core-team member and one maintainer approval for a pipeline to be accepted! ## Infrastructure Team Updates Júlia updated us on a couple of developments happening from the infrastructure team. Since the recent `2.14.0` release, both the version of nf-core tools used for setting up a pipeline and which version of the nf-core template is currently being used will be recorded in each pipeline. A discussion on how to reformat the nf-core modules `meta.yml` file was had, mainly in regards to better displaying metadata associated to input channels with tuples. Maxime brought up a long running request to have additional fields for better recording publication information (e.g. bibtex information), to potentially allow auto-generation of methods texts with proper attribution of tools. ## Upcoming discussions Points for the next agenda will to confirm the experience of the first office hours and do our 'preprocess_rna' subworkflow renaming homework. \- :heart: from your #maintainers team!

Maintainers Minutes: May 2024

Mon, 13 May 2024 15:00:00 GMT

import drake_meme from "@assets/images/blog/maintainers-minutes-2024-05-03/maintainers-meme-2024-05-03.jpg"; import { Image } from "astro:assets"; In this new initiative, the 'Maintainers Minutes' aims to further give insight into the workings of the [nf-core maintainers team](/governance#maintainers) by providing brief summaries of the monthly Maintainer's team meetings. If you're not familiar, the maintainers team takes an active role in managing nf-core repositories in collaboration with the wider nf-core community. The team is made up of regular contributors to the initiative, their expertise is additionally used to help guide design decisions on various areas of nf-core, from code to community development. ## Keeping up with the community nf-core is a growing community. This is particularly highlighted with the 170 open pull requests on the [nf-core/modules](https://github.com/nf-core/modules) repository. Keeping on top of such activity can be difficult for volunteers. We had discussions on how to better distribute reviewing tasks among the maintainers teams, such as including the most popular notification systems, the upcoming concept of nf-core 'office hours', and how to get the contributors to make more atomic pull requests (and thus make reviewing more palatable for all) ## The hardest part of writing code is naming things Drake meme, with top image showing Drake with his hand in his face in a 'don't bother me gesture' next to the text 'Actual programming', then the bottom image of him pointing like 'now that's more like it' with 'Debating for 30 minutes on how to name a subworkflow' text next to it.

Drake meme, with top image showing Drake with his hand in his face in a 'don't bother me gesture' next to the text 'Actual programming', then the bottom image of him pointing like 'now that's more like it' with 'Debating for 30 minutes on how to name a subworkflow' text next to it.

Keeping in line with coding stereotypes, we had a discussion on the current naming scheme of subworkflows. In many PR reviews, comments often arise regarding the naming of the subworkflow. Some in the community find that the current naming scheme is overly restrictive, resulting in overly long names and also sometimes not being possible for complex subworkflows with many inputs or steps. The pros and cons of the current scheme was discussed, with pros of being clearly defined, enforceable, prevents developers being 'lazy' with naming, and discoverability of the purpose is still relatively high through reading the name. Cons included points such that it does not _actually_ fit well with complex subworkflows, as in some cases 'free text' can be much more understandable in a more concise manner. The team also discussed about issues with renaming of existing subworkflows, deprecations, and sizes of subworkflows (and how this influences naming). ## Upcoming discussions Points for the next agenda will be updating of maintainers team roster, confirming the nf-core office hours trial, and the next nf-core 'spring cleaning' week. \- :heart: from your #maintainers team!

nf-core/tools - 2.14.0

Wed, 08 May 2024 11:00:00 GMT

import GitHubProfilePictureExtended from "@components/GitHubProfilePictureExtended.astro"; This release contains some template changes and nf-core/tools updates. For a more detailed list of changes, you can read the [changelog](https://github.com/nf-core/tools/releases/tag/2.14.0). # Highlights ## New contributors The hackathon in March brought us some new contributors to the tools repository. Welcome to nf-core/infrastructure!

## nf-core/tools functionalities - We included a new linting test to assure that nf-test snapshots contain the `versions.yml` file. - Components (modules and subworkflows) can now handle more possible git URLs (`ssh://` and `ftp://`). - The `nf-core download` command has a new argument `--tag`, which allows adding additional tags to select particular revisions in the Seqera Platform interface. For example, `--tag "3.10=validated"{:bash}` would allow you to quickly select the validated version of the pipeline. ## Pipeline template - We updated the GitHub Action which tests that the pipeline can be downloaded correctly (`download_pipeline.yml`): If you had errors with this GitHub test, they are fixed! First, the test will try to run a stub run of the downloaded pipeline. If this fails, because some modules don't have a stub test yet, it run the pipeline without the `-stub` option. - We removed `pyproject.toml` from the template. This was used to lint Python code. The nf-core/pipeline template doesn't contain Python code anymore, thanks to the new utils subworkflows and the nf-validation (now nf-schema) plugin, which replace the Python script which was validating the input sample sheet. If you have other Python scripts in your pipeline and you would like to keep this linting, feel free to add this file back to your pipeline. - Pipeline-specific institutional configs support is now activated for all pipelines by default. - The `.nf-core.yml` file contains now the version of the pipeline template, corresponding to the version of nf-core/tools used for the last template update. # How to merge the pipeline template updates - `editorconfig`: We removed redundant configurations from the `.editorconfig` file. Accept the changes made to this file. - Files inside the `.github/` folder: These files are responsible for the Continuous Integration tests. In general, accept all changes made on these files. - `.github/workflows/ci.yml`: If you added your own tests to this file, for example, you added nf-test tests to your pipeline, keep your changes, but accept the template updates related to action versions, e.g. ```diff "v1" "v2" - uses: nf-core/setup-nextflow@v1 + uses: nf-core/setup-nextflow@v2 ``` - `.github/workflows/download_pipeline.yml`: This file is responsible for testing if the pipeline can be downloaded correctly. Accept the changes made to this file. - `.github/PULL_REQUEST_TEMPLATE.md`: We were a bit too fast with adding nf-test commands to the PR template. Accept the changes made to this file, if your pipeline doesn't use nf-test yet. - `pre-commit-config.yaml`: We set the version of prettier to 3.2.5. Accept this change. - `conf/base.config` and `conf/modules.config` directory: We removed some remnants of the old custom_dumpsoftwareversions module. Accept the changes made to the files in this directory. - `.nf-core-yml`: The version of nf-core used for the template update is added to the `.nf-core.yml` file. Accept the change of version. Do NOT accept changes removing any other configurations that you added to this file. - `README.md` We updated the link to Seqera Platform badge, accept this change. - `assets/multiqc_config.yml`: Always accept changes made to this file before the line: `disable_version_detection: true`. Custom changes should be made after this line. - `docs/usage.md`: We added a new profile `wave`. Accept this change. - `nextflow.config`: We fixed a bug with `conda.channels`. Accept the changes made to this file. Don't accept changes removing any of your pipeline custom parameters. - `nextflow_schema.json`: We added a new parameter `pipelines_testdata_base_path`, accept this change. Do not accept changes which remove any of your pipeline paramters. - `pyproject.toml`: Python linting is now optional. If you have Python code on your pipeline and want to keep linting it, DON'T accept this change. Otherwise, it is safe to remove this file. - `conf/test_full.config` and `conf/test.config`: We are using the parameter `params.pipelines_testdata_base_path` to specify the base path of the repo containing test data. You will have to resolve this manually. Accept the change using this parameter on the `input`path, the new parameter will replace `https://raw.githubusercontent.com/nf-core/test-datasets/`. But don't accept the change changing the last part of this path, which is specific of your pipeline. Don't accept changes removing other custom configurations you added to your tests. - Changes on `docs/`: Do NOT accept any change that removes custom docs that you added to your pipeline. - Changes on `CHANGELOG.md`: Do NOT accept any change which modified custom points of your `CHANGELOG.md`. - `modules.json` and template modules and subworkflows: Do NOT accept any changes deleting your pipeline modules from `modules.json`. Template modules and subworkflows are updated on every template release. You can accept those changes and the changes on `modules.json` related to these. A safe way to add these changes is to NOT accept them. Then run `nf-core modules update{:bash}` and `nf-core subworkflows update{:bash}`. These commands will update all your modules and subworkflows and the `modules.json` file accordingly. - `subworkflows/local/utils_nfcore_$PIPEPLINE_pipeline/main.nf`: We added a way to handle multiple DOIs in the manifest. Accept this change. - `workflows/$PIPELINE.nf`: We shortened very long lines in the main pipeline script to allow easier comparison during code reviews. Accept these changes.

nf-core/sarek paper

Wed, 24 Apr 2024 23:19:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; import work_pic1 from "@assets/images/blog/core_retreat_Feb_2024/work_pic1.png"; import sarek_subway from "@assets/images/blog/sarek/sarek_subway.png"; import cram_meme from "@assets/images/blog/sarek/cram_meme.jpg"; import cram_graph from "@assets/images/blog/sarek/storage_preprocessing_reduced_cumulative.png"; import dataflow from "@assets/images/blog/sarek/paper_dataflow.png"; import illumina from "@assets/images/blog/sarek/benchmarking_snvs_combination.png"; import not_illumina from "@assets/images/blog/sarek/benchmarking_snps_combination_nonillumina.png"; import contributors from "@assets/images/blog/sarek/contributors.png"; import { Image } from "astro:assets"; We are extremely happy to see this [paper](https://academic.oup.com/nargab/article/6/2/lqae031/7658070) out describing the changes and updates to [nf-core/sarek](https://github.com/nf-core/sarek/), the DNA variant calling pipeline, in the last several years. In 2020, we embarked on the journey of rewriting the whole pipeline in DSL2. One of the major motivations was to bring down cloud computing costs and generally reduce storage space and computational resources. # Overview No nf-core pipeline without a metro map 🚇: Metromap of nf-core/sarek

# New tools We added new tools: BwaMem2 and DragMap for alignment, more variant callers (DeepVariant, GATK HaplotypeCaller Joint Calling & Single Sample variant recalibration, CNVKit, Tiddit), and more annotation possibilities. Some tools were replaced: Trimming is now done with FastP, CRAM quality control with Mosdepth. For convenience we added more quality control options: When starting from variant-calling directly, all input files can now run through the alignment QC steps. ![Thor throwing a cup and screaming 'Another'](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExMDMycW55dDFmNHNzODllanVxbndzZzA4eHZlNmM1NWQ0anFhZnFkMyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/ziLadIVnOGCKk/giphy.gif) # Resource optimization ## Use CRAM files We ditched BAM format where possible and switched to CRAM saving us 4x work storage space. Toy story meme

Bar chart showing reduction in storage space usage

## Split files (but not too much) The `splitFastq()` operator was replaced by a FastP process to split the Fastq files before alignment (default 12) plus replacing trimgalore! for read trimming. We also changed the default grouping of the intervals for BQSR to 21 (instead of 124) reducing storage space another 4x and speeding up processing. Evaluation of FastP usage and different interval group sizes for BQSR

Evaluation of FastP usage and different interval group sizes for BQSR

# Cost savings Overall, we reduced computational costs on AWS (last summer, using spot instances) by 70% to about $20 from FASTQs to annotated VCFs using Strelka, Manta, and VEP. ![Little geco saying: 'you could save'](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExcnUwZnV4eGkycjR5b3R3M21wbjNlN2tkZGdtbXJqam1tbmYxNGZxZCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/AlRadeCzctWmA8fnNG/giphy.gif) # Benchmarking: a.k.a is it any good? We benchmarked the germline track with Illumina, MGI, and BGI GiaB samples and the somatic track with Seq2C samples. We recently joined the NCBench effort to continuously validate the pipeline on release.

# Team work makes dream work This was a gigantic team effort with [Lasse Folkersen](https://github.com/lassefolkersen), [Anders Sune Pedersen](https://github.com/asp8200), [Francesco Lescai](https://github.com/lescai), [ Susanne Jodoin](https://github.com/SusiJo), [Edmund Miller](https://github.com/Edmundmiller), Matthias Seybold, [Oskar Wacker](https://github.com/WackerO), [Nick Smith](https://github.com/nickhsmith), [Gisela Gabernet](https://github.com/ggabernet), [Sven Nahnsen](https://uni-tuebingen.de/forschung/forschungsinfrastruktur/zentrum-fuer-quantitative-biologie-qbic/team/prof-dr-sven-nahnsen/), and many many others from the nf-core community:

Also shout out to all the amazing people starting sarek way back in 2016: Szilveszter Juhos, Malin Larsson, Pall I. Olason, Marcel Martin, Jesper Eisfeldt, Sebastian DiLorenzo, Johanna Sandgren, Teresita Díaz De Ståhl, Phil Ewels, Valtteri Wirta, Monica Nistér, Max Käller, and Björn Nystedt. # Join the fun If you want to join us, visit: https://nf-co.re/join/ we’re on the #sarek channel on slack, and you’re welcome to join #sarek_dev if you really want to get involved. # There is more If you want to know more, here are some recent talks detailing the changes and development journey:

March 2024 Hackathon

Sun, 07 Apr 2024 23:13:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; import { Image } from "astro:assets"; import attendee_locations from "@assets/images/blog/hackathon-march-2024/attendee_locations.png"; import attendees_per_site from "@assets/images/blog/hackathon-march-2024/attendees_per_site.png"; import attendees_per_year from "@assets/images/blog/hackathon-march-2024/attendees_per_year.png"; import local_site_locations from "@assets/images/blog/hackathon-march-2024/local_site_locations.png"; import number_attendees from "@assets/images/blog/hackathon-march-2024/number_attendees.png"; # March 2024 nf-core Hackathon From March 18-20, 2024, we held the [March 2024 online / distributed nf-core hackathon](/events/2024/hackathon-march-2024/). People from all over the world joined us both online and in person, at local nodes. The response from the community was overwhelming, with a phenomenal number of people registering to join: Number attendees

The year-on-year growth of hackathon attendance is fantastic to see, once again in 2024 we smashed previous records with attendance: Attendees per year

In the registration form, we asked people what country they live in. As you can see, we had people joining us from all over the globe! Attendee locations

The numbers in Germany were a real outlier in 2024, likely due to the large number of local sites organised there by community members. ## Local sites We first tried the idea of hosting local in-person hackathon nodes [in March 2023](/events/2023/hackathon-march-2023/). Volunteers from the community organised venues at their local workplace and invited anyone to join. The response was extremely positive, so in 2024 we decided to take the same approach. In the end, we had a phenomenal 23 local sites run by community volunteers. The size and scale varied enormously, with the [GHGA](https://www.ghga.de/) site in Heidelberg, Germany, attracting over 50 people!

Attendees per site — Hopefully Sherbrooke had more than one person in reality!

These sites were also spread all over the world. It's great to see more sites popping up in the USA especially: Local site locations

## Socials The hackathon was a lot of fun, with pizza provided by [Seqera](https://seqera.io/), quizzes and hidden sock hunts 🧦 Our talented nf-core musicians [@TCLamnidis](https://github.com/TCLamnidis) and [@jfy133](https://github.com/jfy133) enjoyed the hackathon so much that they even put a song and music video together about it! Many thanks to everyone to joined the hackathon and especially to the many organisers who helped make it happen. We hope to see you all again next year!

Special Interest Groups

Tue, 02 Apr 2024 08:00:00 GMT

import crossFunctional from "@assets/images/blog/special_interest_groups/cross-functional-organisation.png"; import { Image } from "astro:assets"; Today, we're very happy to announce a new nf-core initiative: _Special Interest Groups_. These will be a new place to meet and discuss topics with others working in your specific field(s) of research, bringing people together to collaborate in a new way. # Background nf-core was created to bring people together to collaborate on building Nextflow pipelines. However, as the community grows, the way that we structure ourselves around pipelines and tools is beginning to show a few gaps. Researchers from different disciplines can use the _same_ set of pipelines in very different ways. Most nf-core pipelines are very flexible in their usage by design, and so for truly reproducible research groups need to standardise not just the pipelines that they use, but also the manner in which they use them. To meet this need, we want to provide a space for community members working on similar research interests to gather and collaborate. We want to foster collaboration not just in the _development_ of scientific research tooling, but also in the _usage_ of that tooling across different fields. We hope that Special Interest Groups will provide a place for this to happen. # How will this work? The nf-core community is structured around pipelines. The existing methods that we use to manage pipeline groups appears to work well, so rather than reinventing the wheel we intend to adopt a similar pattern for Special Interest Groups: 1. Anyone can suggest a new group in the [nf-core/proposals](https://github.com/nf-core/proposals/issues) GitHub repository - Just like with [nf-core/proposals](https://github.com/nf-core/proposals/issues) for pipelines, there is a GitHub issue template with a form to fill in, with a handful of questions to collect all the required information. 2. The core team (and steering group, as necessary) will discuss the proposal, suggest any changes and either accept or reject the proposal. - This process will be tracked in a GitHub project, called [New interest groups](https://github.com/orgs/nf-core/projects/63). 3. A new Slack channel will be created for the new interest group on the nf-core Slack. 4. A [new web page](/special-interest-groups) will be created for the group on the nf-core website. 5. The lead for the new group will organise a nf-core/bytesize talk to introduce the group to the community. A blog post will also be recommended. 6. The new special interest group will be active and continue under their own steam. ## Group web pages The [new web page](/special-interest-groups) will just be a stub at first, but we hope that the interest group will develop this over time. These pages can evolve into mini sections with multiple sub-pages, including things like best-practice documentation, meeting notes and more. The web pages list the nf-core pipelines typically used by members of the group, so that the pages can be cross-linked. That way, anyone viewing a web page for an nf-core pipeline will be able to see at a glance what it can be used for and hopefully link to best practices for that pipeline with that type of data. ## Closing after inactivity Just like pipelines, Special Interest Groups will never be _deleted_. As part of the annual "spring cleaning" that the nf-core maintainers team does, we will check on each group's recent activity and make sure that everything is going ok. If a group is obviously inactive and does not respond to requests for a status update, or if desired by the group organisers, the interest group will be archived. Anyone will be able to revive an archived special interest group at a future date, in a process much like proposing a new group. ## Who can join? Anyone can join any interest group. In the same way that every nf-core pipeline is owned by the community, Special Interest Groups will be made as inclusive as possible. In order to achieve this, we will try to avoid naming groups after existing groups or consortia, even though some may begin from such origins. We will also try to not to tie any interest groups to very specific topics or research groups where possible. We want anyone working in a relevant field or interest to be able to join an interest group without feeling like an outsider. ## What will they do? We fully expect every interest group to be different: both in their scope and membership, but also in the time that their organisers can commit and the activities that would be most useful to their members. As such, we want to be as flexible as possible with our expectactions. If the group is active and people are finding it useful, we will consider it a success. That said, the types of things that we hope that interest groups could do could include: - Writing best practices for nf-core pipeline usage, specific to their field. - Creating pipeline configs specific to the group, to encode these recommendations. - Creating and extending nf-core pipelines, as necessary. - Working together to harmonise usage of data formats. - Listing relevant externala and internal resources, videos and documentation. - Holding regular group meetings. - Presenting updates in nf-core/bytesize talks and blog posts. - Potentially even running their own series of talks, if there is interest. We're sure that our new groups can and will do many more different and diverse things. We can't wait to see where the program goes! ## What about pipeline development? Discussion within Special Interest Groups can revolve around usage of pipelines, but when it comes to any missing features or requirements we expect members to move discussions into the regular pipeline channels. There should be a distinction between scientific and technical topics. In this way, we hope that interest groups and pipeline groups work in an orthogonal manner, each contributing to the other. Cross-functional pipelines and interest group structure

Cross-functional pipelines and interest group structure

# Introducing: Our first special interest group Today we are delighted to announce our first interest group: [Animal genomics](/special-interest-groups/animal-genomics/). We'll leave the details to be elaborated on in a dedicated blog post by them at a later date, but we'd like to take this opportunity to thank them for taking the lead. This interest group has formed out of the [BovReg](https://bovreg.eu/) and [EuroFAANG](https://eurofaang.eu/) research consortia. BovReg has been working closely with nf-core since its inception and has helped bring other EuroFAANG consortia members together to use nf-core pipelines, including [AQUA-FAANG](https://www.aqua-faang.eu/), [EuroFAANG Research Infrastructure](https://eurofaang.eu/), [GENE-SWitCH](https://www.gene-switch.eu/), [GEroNIMO](https://www.geronimo-h2020.eu/), [HoloRuminant](https://holoruminant.eu/) and [Rumigen](https://rumigen.eu/). The BovReg project is now reaching its conclusion, and members of the group who had been working closely with nf-core expressed their interest in continuing the collaboration and discussion going in some form. It was this request that led to the formation of nf-core Special Interest Groups. Many thanks to those involved for their enthusiasm and feedback in the process! Stay tuned for more information about this group, and others. If you have an idea for a group then let us know in the [`#new-interest-group`](https://nfcore.slack.com/channels/new-interest-group) channel!

nf-core/tools - 2.13.0

Mon, 19 Feb 2024 23:00:00 GMT

import { YouTube } from "@astro-community/astro-embed-youtube"; This release contains some template changes and bug fixes. # Highlights We refactored the pipelines template a bit: - The `lib` directory is removed: :wave: Groovy code - Instead we now use nf-core subworkflows for pipeline initialisation - Some nf-core pipelines adopted this already: [nf-core/fetchngs](https://github.com/nf-core/fetchngs/tree/dev) and [nf-core/rnaseq](https://github.com/nf-core/rnaseq/tree/dev) - The [`nf-validation` plugin](https://nextflow-io.github.io/nf-validation/1.1/) is now used to create an input channel from a sample sheet file. :::tip If you haven't started merging in the template updates yet, it may almost be easier to ignore the template updates for now and try and remove the `lib/` directory on the pipeline `dev` branch by: 1. Individually installing the `utils_*` subworkflows with the following commands: ```bash nf-core subworkflows install utils_nextflow_pipeline nf-core subworkflows install utils_nfcore_pipeline nf-core subworkflows install utils_nfvalidation_plugin ``` 2. Creating a local `utils_*` subworkflow for the pipeline. You can copy the one in [rnaseq (`dev`)](https://github.com/nf-core/rnaseq/blob/dev/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf) or the [pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/subworkflows/local/utils_nfcore_pipeline_pipeline/main.nf) and customise this to your requirements. Make sure you move any custom functions in `lib/` directory to this file. 3. Include the `utils_*` subworkflows in the main.nf as done in [rnaseq (`dev`)](https://github.com/nf-core/rnaseq/blob/48663bffadb900e1ae4e11fb3391134cbf12ffc7/main.nf#L22-L25). 4. Including the `utils_*` subworkflows in the workflow main.nf as done in [rnaseq (`dev`)](https://github.com/nf-core/rnaseq/blob/48663bffadb900e1ae4e11fb3391134cbf12ffc7/workflows/rnaseq/main.nf#L25-L30). 5. Delete the `lib/` directory after you have confirmed everything has been ported across. 6. Once you have merged this to `dev` the template sync PR will be updated and tell you whether you have missed anything. 7. The `nf-core lint` command might complain about having to recompute checksum of subworkflow(s). - Be sure to check in the `modules.json` file that non of the previously installed subworkflows have dissapeared from it. - A new `subworkflows` section with the new `utils_*` subworkflows might have been added during the merged. Reinstalling the subworkflows using `nf-core subworkflows install` should fix this, otherwise one could manually edit the modules.json file (only recommended for advanced users). :bulb: It helped to disable running the main workflow whilst wiring all of this in to speed up development. :movie_camera: Last but not least, we have a short bytesize talk showing the necessary steps: ::: ## Additional bug fixes and improvements - Have you seen a `no space left on device` error or your CI tests lately? This is now fixed by adding a clean up step in the github actions. - [New api-docs](https://nf-co.re/tools/docs/latest) on the nf-core website to get clarifications on linting errors etc. - Updating a module won’t remove custom nextflow.config files anymore. - The commands `nf-core modules test` and `nf-core subworkflows test` now have a new parameter `--profile` to specify the container to run nf-test with. Thanks to all the contributors who made this release possible, especially to [Harshil Patel](https://github.com/drpatelh/) and his team for kicking off the `lib` restructuring and the `utils_*` subworkflows. You can find the whole changelog [on GitHub](https://github.com/nf-core/tools/releases/tag/2.13).

core team retreat

Sun, 11 Feb 2024 23:13:00 GMT

import work_pic1 from "@assets/images/blog/core_retreat_Feb_2024/work_pic1.png"; import viking1 from "@assets/images/blog/core_retreat_Feb_2024/viking1.jpg"; import viking2 from "@assets/images/blog/core_retreat_Feb_2024/viking2.jpg"; import viking3 from "@assets/images/blog/core_retreat_Feb_2024/viking3.jpg"; import viking4 from "@assets/images/blog/core_retreat_Feb_2024/viking4.jpg"; import viking5 from "@assets/images/blog/core_retreat_Feb_2024/viking5.jpg"; import viking6 from "@assets/images/blog/core_retreat_Feb_2024/viking6.jpg"; import viking7 from "@assets/images/blog/core_retreat_Feb_2024/viking7.jpg"; import viking8 from "@assets/images/blog/core_retreat_Feb_2024/viking8.jpg"; import phin from "@assets/images/blog/core_retreat_Feb_2024/phin.jpg"; import { Image } from "astro:assets"; core team working at retreat

Between the 5th and 8th of February, the core team and members of the steering group met in icy Stockholm to discuss upcoming challenges and improvements to the community. We spent four days discussing key questions and projects for the community and getting stuck into kickstarting key initiatives. The topics and decisions are outlined below. # Structure and Governance ### Improve transparency To keep everyone in the loop about what is happening in the community and what decisions are taken, the core team will introduce a quarterly blog post with updates about infrastructure, pipeline maintenance and the leadership. We will also try to better document how new members are added to the different [governance teams](/governance/), with a slightly more formalised process. ### Special interest groups An exiting development is that nf-core wants to establish _"Special interest groups"_ to encourage people within the nf-core community with similar interest to meet and work together. We spent a disproportionate amount of time arguing about the name for these: SIGs / COSIs / communities / working groups etc etc - naming things is hard! More information on this new initiative will be presented in the [bytesize talk on February 20th](/events/2024/bytesize_workgroups_intro) and in a dedicated blog post / website section soon. ### Funding As many open-source communities, funding is an important topic. The nf-core community has applied for another grant from the Chan Zuckerberg Initiative, but we will not hear about a decision until later in the year. Until then, the community is run on volunteer work by all of you and supported by different universities and institutions. # Outreach and Community ### Training and Documentation overhaul One of the main goals of nf-core is to provide easy to use and well documented tools and pipelines. While the pipeline documentation pages are generally in a good state, the docs for tools and training are a bit scattered throughout the website and at times not very accessible. You can look forward to a complete overhaul of the docs page towards a more modular setup, which will also help putting together more targeted training for different levels of knowledge. Be aware that this is a major project and will likely require a good amount of time. ### Publication of a nf-core v2.0 paper It has been 4 years since the first nf-core paper was published and the community has changed a lot since then. Not only has it grown tremendously, but we also experienced a lot of technical changes including the update to Nextflow DSL2, the introduction of modules and subworkflows and upgrades to nf-core/tools. To reflect these changes, the core team would like to publish an updated state of nf-core paper. A first draft is in the works and will also emphasize the work done in the community built around Euro-FAANG that sparked the idea about creating nf-core special interest groups. ### nf-core mentorship program The three previous nf-core mentorship rounds have been a great success. However, with the uncertainty of further funding, it was a major discussion point how nf-core would be able to provide continues opportunities for both new, but also more experienced members of the communities. There are plans to change the mentorship program to be volunteer based, whereas prospective mentors can set more targeted aims and community members can apply to be a mentee of (a) specific mentor(s). More information will be available soon, so keep your eyes on the `#announcements` channel in Slack. ### Open office hours for nf-core developers As a trial, nf-core wants to implement weekly open office hours to bring people together to collaborate on pull-request reviews and discuss issues developers experience during their work on nf-core related items. This _won't_ be for training or support, but rather a collaborative space for collaborative work within nf-core. Stay tuned for a blog post with more information coming soon. # Technical Improvements ### Roadmap for nf-core/tools There are lots of smaller and bigger improvements to nf-core/tools in the works and the core team discussed and prioritised the major tasks for the start of 2024. Special mention to the upcoming config builder ([#2731](https://github.com/nf-core/tools/issues/2731)) that will help new users to create institutional or other config files to run nf-core and other nextflow pipelines. We will also be focussing on a more flexible / minimal pipeline template ([#2340](https://github.com/nf-core/tools/issues/2340)) and some major improvements to the Nextflow schema ([#2429](https://github.com/nf-core/tools/issues/2429)). ### Restructuring of nf-core/modules Newly added features in nf-test, will allow us to drop the `tags.yml` files for nf-core modules and subworkflow. Thinking further ahead, we continued our discussions on the structure of nf-core modules directories for better portability and workflow-level tests. ### Reference genomes Many nf-core analysis pipelines use AWS-iGenomes for reference indexes. However it suffers from a number of well documented problems. The core team continued discussions about a potential successor to AWS-iGenomes and made tentative first steps towards building something new. Stay tuned for updates. ### Resource optimization The goal is to improve resource usage on the module level and as a first step it is planned to make use of optimized configs taken from Seqera platform. ### Benchmarking Wouldn't it be great to know how a nf-core pipeline compares to other pipelines or to a previous version? The core team decided to do a proof of concept for `nf-core/sarek` and `nf-core/multipesequencealign`, to investigate the feasibility and time demands. ### Wave and nf-core [Seqera Wave](https://seqera.io/wave/) is a way to generate software containers and it was discussed if and how Wave could be used to help developers in nf-core. # nf-core/vikings As is becoming tradition for nf-core events in Stockholm, the group visited at the viking-themed restaurant [Aifur](https://aifur.se/). Much silliness ensued. first viking

# fin All the excitement was too much for some of the team members. Thank you to everyone who attended and worked hard to make this week happen! sleepy viking

nf-core/tools - 2.12.0

Sun, 28 Jan 2024 23:00:00 GMT

import tui from "@assets/images/blog/tools-2_12/nfcoretui.gif"; import fix_linting from "@assets/images/blog/tools-2_12/fix-linting.png"; import { Image } from "astro:assets"; This release comes with a lot of neat little improvements and bug fixes. # Highlights - More responsive nf-core-bot: If you add a `@nf-core-bot fix linting` comment in a PR to fix linting errors, there will be reactions added to the comment to indicate the status of the fix: - 👀 fixing action has started - 👍🏻 Everything looks good, nothing to fix - 🎉 Fixed errors and commited the changes - 😕 Something went wrong, please check the logs (also a comment with a link to the logs is added) {" "} Screenshot of the nf-core-bot fix linting comment

Screenshot of the nf-core-bot fix linting comment

- The `nf-core tui` subcommand launches a TUI (terminal user interface) to intuitively explore the command line flags, built using [Trogon](https://github.com/Textualize/trogon) (more TUIs to come!) {" "} - If you need an svg version or a bigger png version of an nf-core pipeline logo, you can now use the new `nf-core logo-create` subcommand to output one. - Speaking of logos, the pipeline READMEs now use the [new(-ish) github image syntax](https://github.blog/changelog/2022-08-15-specify-theme-context-for-images-in-markdown-ga/). - Every pipeline now has a GitHub Action Workflow that tests a successful download with `nf-core download`. - Goodbye `-profile docker, test` errors: We check now if the `-profile` parameter is well formatted to avoid this common pitfall. - Fun changes on the tooling side: - The longer CI tests for the tools repo are now run on self-hosted runners on AWS (thanks for the sponsorship AWS!). - We've got a new bot which helps us to keep the changelog up to date. Big thanks to [@vladsavelyev](https://github.com/vladsavelyev) for the code! - We now use [ruff](https://github.com/astral-sh/ruff) for linting and formatting, goodbye to Black, isort and pyupgrae and thank you for your service!🫡. :::tip - We included ruff to the pre-commit config. Use `pre-commit install` to install the git hook scripts. - To lint on save with VSCode, add the following settings: ```json title=".vscode/settings.json" "[python]": { "editor.formatOnSave": true, "editor.codeActionsOnSave": { "source.fixAll": true, "source.organizeImports": true }, "editor.defaultFormatter": "charliermarsh.ruff" } ``` - To run ruff manually you can use `ruff check .` and `ruff format .`. ::: You can find the whole changelog [on GitHub](https://github.com/nf-core/tools/releases/tag/2.12.0).

Hello nf-core world

Fri, 12 Jan 2024 15:14:00 GMT

# Greetings, nf-core Community Today marks the launch of the nf-core blog, a new avenue for keeping our community informed and connected. This blog will serve as a central hub for in-depth updates and discussions. # Our Vision for the Blog This platform is created with the intent to: - Share substantive updates from our community’s activities. For example, descriptions of features included in a pipeline release. - Provide clarity on decisions made by the core and maintainers' group. - Offer recaps of nf-core events and hackathons. - Announce and detail updates to nf-core tools. # Content Snapshot Expect concise, informative posts covering: - **Community News**: Insights into our collective advancements and achievements. - **Technical Insights**: Explanations of decisions and updates from our technical leadership. - **Event Highlights**: Overviews of key takeaways from various nf-core gatherings. - **Tool Updates**: Announcements and insights into the latest nf-core tool releases. - **Other Topics**: Other topics relevant to the nf-core community. All contributions need to align with the nf-core [code of conduct](https://nf-co.re/code_of_conduct). # Contribution Made Easy Contributing to this blog is quite straight forward. You just need to add a markdown (or MDX) file to the `src/content/blog/` folder. The file name title of the post separated by dashes. For example, `hello-nf-core-world.mdx`. Fill in the frontmatter (the stuff between the three dashes on the top of the markdown files) and start writing your post. # Follow Along Similar to the events page, also our blog posts come with an RSS feed. Subscribe to it at [https://nf-co.re/blog/rss.xml](https://nf-co.re/blog/rss.xml) to stay up to date with the latest posts. Alternatively, you can also receive the latest posts by joining the [`#nf-core-blog` Slack channel](https://nfcore.slack.com/archives/C06DP7N82V9).

Data management

Tue, 14 Apr 2020 11:00:00 GMT

## Data management Funding agencies are recognizing the importance of research data management and some now request detailed Data Management Plans (DMP) as part of the grant application. Research data management concerns the organization, storage, preservation, and sharing of data that is collected or analyzed during a research project. Proper planning and data management facilitates sharing and allows others to validate and reuse the data. Guidance is provided below to aid the creation of DMPs, estimate resources needed by nf-core workflows, and how to share the resulting data. ### Data Management Plan A Data Management Plan (DMP) is a revisable document explaining how you intend to handle new and existing data, during and following the conclusion of your research project. It is wise to write a DMP as early as possible, using either a tool provided by your host institution or for example [DS Wizard](https://ds-wizard.org/) or [DMP Online](https://dmponline.dcc.ac.uk/). Ethical and legal considerations regarding the data will depend on where the research is conducted, this is especially true for projects including sensitive human data. For more information about the Swedish context, please review this page on [Sensitive personal data](https://scilifelab-data-guidelines.readthedocs.io/en/latest/docs/general/sensitive_data.html). ### Data storage and computational resources To estimate computational resources needed for a specific pipeline please see the pipeline documentation. This lists the different output filetypes you can expect. In the future we hope to automate a full run of each pipeline after every release. The pipeline docs will then show a full set of results from a real run, along with all file sizes. This can then be used as a guide as to what to expect for your data. Backing up and archiving your data is essential. The 3-2-1 rule of thumb means that you should have 3 copies of the data, on 2 different types of media, and 1 of the copies at different physical location. Consider uploading the raw data to a repository already when receiving them, under an embargo (if that is important to you). This way you always have an off-site backup with the added benefit of making the data sharing phase more efficient. Identifying a suitable repository early on will allow you to conform to their standards and metadata requirements already from the start. Archiving is often the responsibility of your host institution, contact them for more details. ### Data compression Some tools require compressed input files, which have many advantages: they take less space for storage and sharing. The most frequent format is gzip; it is accepted by many tools which you can check in tool manuals. To compress a file, you can use [bzip2](https://sourceware.org/bzip2/) which creates a non-blocked compressed file. If a tool only accepts uncompressed file input, you can uncompress the file and parse it via a pipe to the tool without saving the compressed version of the input. Here is an example: ``` gzip input | TOOL > output ``` If a tool requires a blocked compressed file (BGZP), in which the information is more easily accessible than a non-blocked compression format, you can use the [htslib/bgzip](http://www.htslib.org/doc/bgzip.html) tool. This is typically needed by SAMTOOLS during sequence alignment analyses. Read a more detailed explanation on which format to choose [here](https://www.uppmax.uu.se/support/faq/resources-faq/which-compression-format-should-i-use-for-ngs-related-files/). ## Data sharing In the era of [FAIR](https://www.nature.com/articles/sdata201618) (Findable, Accessible, Interoperable and Reusable) and [Open science](https://ec.europa.eu/research/openscience/index.cfm), datasets should be made available to the public, for example by submitting your data to a public repository. ### Choosing a repository It’s recommended to choose a domain-specific repository when possible. It is also important to consider the sustainability of the repository to ensure that the data will remain public. Please see the [EBI archive wizard](https://www.ebi.ac.uk/submission/) or [SciLifeLab's data guidelines](https://scilifelab-data-guidelines.readthedocs.io/en/latest/docs/index.html) for suggestions depending on data type. You can also refer to the [ELIXIR Deposition Databases](https://elixir-europe.org/services/tag/elixir-deposition-databases), [Scientific Data’s Recommended Data Repositories](https://www.nature.com/sdata/policies/repositories), [FAIRsharing.org](https://fairsharing.org/databases/) and [re3data.org](https://www.re3data.org/) to find suitable repositories. Also note that funding agencies might have specific requirements regarding data deposition. For example, data generated in projects funded by US federal grants should be deposited into public databases such as [SRA](https://www.ncbi.nlm.nih.gov/sra) for raw sequencing data and [GEO](https://www.ncbi.nlm.nih.gov/geo/) for functional genomics data. For datasets that do not fit into domain-specific repositories, you can use an institutional repository when available or a general repository such as [Figshare](https://figshare.com/) and [Zenodo](https://zenodo.org/). ### Preparing for submission #### Describing and organizing your data Metadata should be provided to help others discover, identify and interpret the data. Researchers are strongly encouraged to use community metadata standards and ontologies where these are in place, consult e.g [FAIRsharing.org](https://fairsharing.org/standards/). Data repositories may also provide guidance about metadata standards and requirements. Capture any additional documentation needed to enable reuse of the data in Readme text files and [Data Dictionaries](https://help.osf.io/hc/en-us/articles/360019739054-How-to-Make-a-Data-Dictionary) that describe what all the variable names and values in your data really mean. Identifiers to refer to e.g. ontology terms can be designed for computers or for people; in a FAIR data context it is recommended to supply both a human-readable as well as a machine-resolvable Persistent Identifier (PID) for each concept used in the data. #### Data integrity Some repositories require md5 checksums to be uploaded along with the files. Also consider adding checks that your data files follow the intended file formats and can be opened by standard software for those formats. If you are using a Linux system, you can generate md5 checksums using the `md5sum` command. #### Choosing a license To ascertain re-usability data should be released with a clear and accessible data usage license. We suggest making your data available under licenses that permit free reuse of data, e.g. a Creative Commons license, such as CC0 or CC-BY. The [EUDAT license selector wizard](https://ufal.github.io/public-license-selector/) can help you select suitable licenses for your data. Note that sequence data submitted to [ENA](https://www.ebi.ac.uk/ena)/[GenBank](https://www.ncbi.nlm.nih.gov/genbank/)/[DDBJ](https://www.ddbj.nig.ac.jp/index-e.html) are implicitly free to reuse by others as specified in the [INCD Standards and policies](https://www.ebi.ac.uk/ena/standards-and-policies).