Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic workflow for basecalling ONT pod5 files. Does not yet include demultiplexing. #61

Open
wants to merge 10 commits into
base: simon-ont-dev
Choose a base branch
from

Conversation

simonleandergrimm
Copy link
Collaborator

@simonleandergrimm simonleandergrimm commented Oct 1, 2024

This PR creates the necessary files to basecall Oxford Nanopore pod5 files. Running the basecaller (dorado) requires a g5.xlarge machine, with an AMI that is both ECS-optimized and contains GPU-drivers (link). The Instance template can be found here

@willbradshaw I had to specify default values for undefined run.nf parameters as I always got warnings that they were not defined (even though I wasn't running run.nf). I will resolve this in future, though let me know if I can suppress these warnings in some other way.

Also, let me know if there is a more efficient way to get the base_dir leaf, which I'd need to define the name of the nanopore_run:

params {
    mode = "basecall"
    debug = true

    // Directories
    base_dir = "s3://nao-mgs-simon/NAO-ONT-20240905-DCS_RNA2_Test"

    // Run parameters
    // TODO: Automate getting run name from base_dir
    nanopore_run = "NAO-ONT-20240905-DCS_RNA2_Test"

@willbradshaw
Copy link
Contributor

@willbradshaw I had to specify default values for undefined run.nf parameters as I always got warnings that they were not defined (even though I wasn't running run.nf. I will resolve this in future, though let me know if I can suppress these warnings in some other way.

Yeah this is a known issue that someone (possibly me) should look into fixing. If we're going to have multiple workflows running from the same start point this is going to keep coming up.

Also, let me know if there is a more efficient way to get the base_dir leaf, which I'd need to define the name of the nanopore_run

Sorry, I don't understand what you mean by this.

@simonleandergrimm
Copy link
Collaborator Author

Also, let me know if there is a more efficient way to get the base_dir leaf, which I'd need to define the name of the nanopore_run

Sorry, I don't understand what you mean by this.

Also, let me know if there is a more efficient way to get the base_dir leaf, which I'd need to define the name of the nanopore_run

Sorry, I don't understand what you mean by this.

When the s3 address is s3://foo/bar, I want the nanopore run identifier to be bar. After trying for 5 min I didn't figure out how to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonleandergrimm can you put this in the appropriate module directory? We're using module binaries now.

// TODO: Add optional paramter for ONT kit

// Files
pod5_dir = "${base_dir}/pod5_small/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be in the config file? If they'll always be the same except for the base_dir you can hardcode them in the pipeline and save the user some thinking.

calls_bam = "${base_dir}/bam/calls.bam"
fastq_file = "${base_dir}/raw/${nanopore_run}.fastq.gz"

// Default values for undefined run.nf parameters. Adding them here because they always get triggered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fixed now that I've removed all the addParams statements. Try it again after merging in the changes from master?


includeConfig "${projectDir}/configs/containers.config"
includeConfig "${projectDir}/configs/resources.config"
includeConfig "${projectDir}/configs/profiles.config"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some missing standard configs to add once you've merged in master.


// Run parameters
// TODO: Automate getting run name from base_dir
nanopore_run = "NAO-ONT-20240905-DCS_RNA2_Test"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this value is always equivalent to the basename of base_dir, you can do this with Groovy: https://www.nextflow.io/docs/latest/working-with-files.html#getting-file-attributes

@@ -63,4 +63,12 @@ process {
withLabel: fastp {
container = "staphb/fastp:0.23.4"
}
withLabel: dorado {
container = "ontresearch/dorado:latest"
// NB: For now going with latest version, maybe the version switching with new updates will break things in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend locking in a version soon




publish:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be compatible with the output structure specified in main.nf (@simonleandergrimm look into this more after merging in master)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants