-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic workflow for basecalling ONT pod5 files. Does not yet include demultiplexing. #61
base: simon-ont-dev
Are you sure you want to change the base?
Basic workflow for basecalling ONT pod5 files. Does not yet include demultiplexing. #61
Conversation
…WIP things (e.g., want to automate nanopore_run extraction, rename pod5_dir, remove defining default run.nf parameters that I'm not using but that always get flagged when I execute the basecall workflow.
Yeah this is a known issue that someone (possibly me) should look into fixing. If we're going to have multiple workflows running from the same start point this is going to keep coming up.
Sorry, I don't understand what you mean by this. |
When the s3 address is s3://foo/bar, I want the nanopore run identifier to be bar. After trying for 5 min I didn't figure out how to do this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonleandergrimm can you put this in the appropriate module directory? We're using module binaries now.
// TODO: Add optional paramter for ONT kit | ||
|
||
// Files | ||
pod5_dir = "${base_dir}/pod5_small/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be in the config file? If they'll always be the same except for the base_dir you can hardcode them in the pipeline and save the user some thinking.
calls_bam = "${base_dir}/bam/calls.bam" | ||
fastq_file = "${base_dir}/raw/${nanopore_run}.fastq.gz" | ||
|
||
// Default values for undefined run.nf parameters. Adding them here because they always get triggered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fixed now that I've removed all the addParams statements. Try it again after merging in the changes from master?
|
||
includeConfig "${projectDir}/configs/containers.config" | ||
includeConfig "${projectDir}/configs/resources.config" | ||
includeConfig "${projectDir}/configs/profiles.config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some missing standard configs to add once you've merged in master.
|
||
// Run parameters | ||
// TODO: Automate getting run name from base_dir | ||
nanopore_run = "NAO-ONT-20240905-DCS_RNA2_Test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this value is always equivalent to the basename of base_dir, you can do this with Groovy: https://www.nextflow.io/docs/latest/working-with-files.html#getting-file-attributes
@@ -63,4 +63,12 @@ process { | |||
withLabel: fastp { | |||
container = "staphb/fastp:0.23.4" | |||
} | |||
withLabel: dorado { | |||
container = "ontresearch/dorado:latest" | |||
// NB: For now going with latest version, maybe the version switching with new updates will break things in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend locking in a version soon
|
||
|
||
|
||
publish: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to be compatible with the output structure specified in main.nf
(@simonleandergrimm look into this more after merging in master)
This PR creates the necessary files to basecall Oxford Nanopore pod5 files. Running the basecaller (dorado) requires a g5.xlarge machine, with an AMI that is both ECS-optimized and contains GPU-drivers (link). The Instance template can be found here
@willbradshaw I had to specify default values for undefined run.nf parameters as I always got warnings that they were not defined (even though I wasn't running run.nf). I will resolve this in future, though let me know if I can suppress these warnings in some other way.
Also, let me know if there is a more efficient way to get the base_dir leaf, which I'd need to define the name of the nanopore_run: