Skip to content
/ vtools Public

Various tools operating over VCF files. Uses cyvcf2 and cython under the hood for speed

License

Notifications You must be signed in to change notification settings

LUMC/vtools

Repository files navigation

install with bioconda

vtools

Little toolset operating over VCF files. Uses cyvcf2 and cython under the hood for speed.

Installation

PyPI

vtools is now on pip! Since the 'vtools' name is already taken by another package, installing this vtools requires installing the following:

pip install v-tools

After installation, tools will still be called vtools-<tool>. Programmatic access also simply works with

import vtools

Conda

conda install -c bioconda vtools

Tools

vtools-filter

Filter VCF files based on a few criteria. Will output both a filtered VCF file, and a VCF file containing all the filtered-out variants.

Filter criteria

name meaning optional
NON_CANONICAL Non-canonical chromosome Yes
INDEX_UNCALLED Index uncalled or homozygous reference Yes
TOO_HIGH_GONL_AF Too high GonL allele frequency Yes
TOO_HIGH_GNOMAD_AF Too high GnomAD allele frequency Yes
LOW_GQ Too low GQ on index sample Yes
DELETED_ALLELE The only ALT allele is a deleted allele No

Configuration

Configuration of filters goes by a little JSON file. See here for an example.

Usage

Usage: vtools-filter [OPTIONS]

Options:
  -i, --input PATH                Path to input VCF file  [required]
  -o, --output PATH               Path to output (filtered) VCF file
                                  [required]
  -t, --trash PATH                Path to trash VCF file  [required]
  -p, --params-file PATH          Path to filter params json  [required]
  --index-sample TEXT             Name of index sample  [required]
  --immediate-return / --no-immediate-return
                                  Immediately write filters to file upon
                                  hitting one filter criterium. Default = True
  --help                          Show this message and exit.

vtools-stats

Collects some general statistics about a VCF file, and writes a json to stdout.

Usage

Usage: vtools-stats [OPTIONS]

Options:
  -i, --input FILE  Input VCF file  [required]
  --help            Show this message and exit.

vtools-gcoverage

Collect coverage metrics over a gVCF file for every exon or every transcript in a refFlat file. This assumes the input VCF file is at least similar to GATK's gVCF files. gVCF files are only expected to have one sample; if your input file contains multiple samples, we simply take the first only.

Output is a simple TSV file with the following columns

column meaning
exon exon number
gene gene name / symbol / id
mean_dp mean DP value over the exon
mean_gq mean GQ value over the exon*
median_dp median DP value over the exon
median_gq median GQ value over the exon
perc_at_least_{10, 20, 30, 50, 100}_dp Percentage of exon with DP value over value
perc_at_least_{10, 29, 30, 50, 90}_gq Percentage of exon with GQ value over exon
transcript transcript name / symbol / id

*: mean GQ value is computed by first calculating the P-value of all GQ values, then calculating the mean over these P-values, and lastly converting this number back to a phred score.

Usage

Usage: vtools-gcoverage [OPTIONS]

Options:
  -I, --input-gvcf PATH          Path to input VCF file  [required]
  -R, --refflat-file PATH        Path to refFlat file  [required]
  --per-exon / --per-transcript  Collect metrics per exon or per transcript
  --help                         Show this message and exit.

vtools-evaluate

Evaluate a VCF file to a baseline VCF file containing true positives. We only consider variants that are present in both VCF files. This makes it useful when the two VCF files have been produced by wildly different technologies. E.g, when comparing a WES VCF file vs a SNP array, this tool can be quite useful.

Output is a simple JSON file listing counts of concordant and discordant alleles and some other metrics. It is also possible to output the discordant VCF records.

Multisample VCF files are allowed; the samples to be evaluated have to be set through a CLI argument.

Variants from the --call-vcf are filtered to have a Genotype Quality (GQ) of at least 30 by default. This can be overruled by specifying --min-qual 0. The optional flag --min-depth can be used to set the minimum read coverage.

Usage

Usage: vtools-evaluate [OPTIONS]

Options:
  -c, --call-vcf PATH           Path to VCF with calls to be evaluated
                                [required]
  -p, --positive-vcf PATH       Path to VCF with known calls  [required]
  -cs, --call-samples TEXT      Sample(s) in call-vcf to consider. May be
                                called multiple times  [required]
  -ps, --positive-samples TEXT  Sample(s) in positive-vcf to consider. May be
                                called multiple times  [required]
  -s, --stats PATH              Path to output stats json file
  -dc, --discordant PATH        Path to output gzipped discordant vcf file
  -mq, --min-qual FLOAT         Minimum quality of variants to consider
  -md, --min-depth INTEGER      Minimum depth of variants to consider
  --help                        Show this message and exit.

License

MIT

About

Various tools operating over VCF files. Uses cyvcf2 and cython under the hood for speed

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •