v2.5.2

Changes to default read filtering:
- Relaxed FASTP quality filtering (--cut_mean_quality and --average_qual reduced from 25 to 20).
- Relaxed BBDUK viral filtering (switched from 3 21-mers to 1 24-mer).
Overhauled BLAST validation functionality:
- BLAST now runs on forward and reverse reads independently
- BLAST output filtering no longer assumes specific filename suffixes
- Paired BLAST output includes more information
- RUN_VALIDATION can now directly take in FASTA files instead of a virus read DB
- Fixed issues with publishing BLAST output under new Nextflow version
Implemented nf-test for end-to-end testing of pipeline functionality
- Implemented test suite in tests/main.nf.test
- Reconfigured INDEX workflow to enable generation of miniature index directories for testing
- Added Github Actions workflow in .github/workflows/end-to-end.yml
- Pull requests will now fail if any of INDEX, RUN, or RUN_VALIDATION crashes when run on test data.
- Generated first version of new, curated test dataset for testing RUN workflow. Samplesheet and config file are available in test-data. The previous test dataset in test has been removed.
Implemented S3 auto-cleanup:
- Added tags to published files to facilitate S3 auto-cleanup
- Added S3 lifecycle configuration file to ref, along with a script in bin to add it to an S3 bucket
Minor changes
- Added logic to check if grouping variable in nextflow.config matches the input samplesheet, if it doesn't, the code throws an error.
- Externalized resource specifications to resources.config, removing hardcoded CPU/memory values
- Renamed index-params.json to params-index.json to avoid clash with Github Actions
- Removed redundant subsetting statement from TAXONOMY workflow.
- Added --group_across_illumina_lanes option to generate_samplesheet

v2.5.1

Enabled extraction of BBDuk-subset putatively-host-viral raw reads for downstream chimera detection.
Added back viral read fields accidentally being discarded by COLLAPSE_VIRUS_READS.

v2.5.0

Reintroduced user-specified sample grouping and concatenation (e.g. across sequencing lanes) for deduplication in PROFILE and EXTRACT_VIRAL_READS.
Generalised pipeline to detect viruses infecting arbitrary host taxa (not just human-infecting viruses) as specified by ref/host-taxa.tsv and config parameters.
Configured index workflow to enable hard-exclusion of specific virus taxa (primarily phages) from being marked as infecting ost taxa of interest.
Updated pipeline output code to match changes made in latest Nextflow update (24.10.0).
Created a new script bin/analyze-pipeline.py to analyze pipeline structure and identify unused workflows and modules.
Cleaned up unused workflows and modules made obsolete in this and previous updates.
Moved module scripts from bin to module directories.
Modified trace filepath to be predictable across runs.
Removed addParams calls when importing dependencies (deprecated in latest Nextflow update).
Switched from nt to core_nt for BLAST validation.
Reconfigured QC subworkflow to run FASTQC and MultiQC on each pair of input files separately (fixes bug arising from allowing arbitrary filenames for forward and reverse read files).

v2.4.0

Created a new output directory where we put log files called logging.
Added the trace file from Nextflow to the logging directory which can be used for understanding cpu, memory usage, and other infromation like runtime. After running the pipeline, plot-timeline-script.R can be used to generate a useful summary plot of the runtime for each process in the pipeline.
Removed CONCAT_GZIPPED.
Replaced the sample input format with something more similar to nf-core, called samplesheet.csv. This new input file can be generated using the script generate_samplesheet.sh.
Now run deduplication on paired-ends reads using clumpify in the taxonomic workflow.
Fragment length analysis and deduplication analysis.
- BBtools: Extract the fragment length as well as the number of duplicates from the taxonomic workflow and add them to the hv_hits_putative_collapsed.tsv.gz.
- Bowtie2: Conduct a duplication analysis on the aligned reads, then add the number of duplicates and fragment length to the hv_hits_putative_collapsed.tsv.gz.

v2.3.3

Added validation workflow for post-hoc BLAST validation of putative HV reads.

v2.3.2

Fixed subsetReads to run on all reads when the number of reads per sample is below the set threshold.

v2.3.1

Clarifications to documentation (in README and elsewhere)
Re-added "joined" status marker to reads output by join_fastq.py

v2.3.0

Restructured run workflow to improve computational efficiency, especially on large datasets
- Added preliminary BBDuk masking step to HV identification phase
- Added read subsampling to profiling phase
- Deleted ribodepletion and deduplication from preprocessing phase
- Added riboseparation to profiling phase
- Restructured profiling phase output
- Added addcounts and passes flags to deduplication in HV identification phase
Parallelized key bottlenecks in index workflow
Added custom suffix specification for raw read files
Assorted bug fixes

v2.2.1

Added specific container versions to containers.config
Added version & time tracking to workflows
Added index reference files (params, version) to run output
Minor changes to default config files

v2.2.0

Major refactor
Start of changelog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

v2.5.2

v2.5.1

v2.5.0

v2.4.0

v2.3.3

v2.3.2

v2.3.1

v2.3.0

v2.2.1

v2.2.0

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

v2.5.2

v2.5.1

v2.5.0

v2.4.0

v2.3.3

v2.3.2

v2.3.1

v2.3.0

v2.2.1

v2.2.0