MetadataTable

SRA Metadata is rich, sometimes too rich. There's no single standardized location for many types of useful information. This is a generalizable framework for pulling SRA metadata and summarizing usefully, with some common use-cases ready to go.

What's the problem?

Even the type of study is heterogeneous for different submitters. The location of for instance tissue and developmental stage specifications within SRA metadata also varies, as do the values used to describe particular tissues.

Why should we solve it?

Third-party efforts to load all metadata into a relational db suggest how widespread the problem is-- a customizable localized implementation helps many.

Workflow

Configuration

Requires python3 and python packages noted in requirements.txt

To get started, git clone to subdirectory MetadataTable, and create a working subdirectory named "testing"

 cd testing 
 source bin/activate 
 pip3 install -r ../MetadataTable/requirements.txt  
 python3 ../MetadataTable/metadatatable.py -h

Please note that this a minimal implementation, quickly written; very few informative warning messages if problems are encountered

Example Invocation

Retrieve all RNA-Seq datasets for taxon 7955, save output as a tab-delimited version :
python3 ../MetadataTable/metadatatable.py -e <your_email> -t "(biomol_transcript[properties] OR study_type_transcriptome_analysis[properties] OR strategy_rna_seq[properties] OR strategy_FL_cDNA[properties]) AND txid7955[Organism]" -ot 7955.tsv

Usage

>python metadatatable.py -h
usage: MetadataTable [-h] [-ox OUTPUT_XML] [-ot OUTPUT_TSV] [-i INPUT_XML]
                     [-e EMAIL] [-t TERM [TERM ...]] [-u] [-f] [-x XPATH]

optional arguments:
  -h, --help            show this help message and exit
  -ox OUTPUT_XML, --output_xml OUTPUT_XML
                        Path for saving xml file
  -ot OUTPUT_TSV, --output_tsv OUTPUT_TSV
                        Path for saving parsed file
  -i INPUT_XML, --input_xml INPUT_XML
                        Path to input file
  -e EMAIL, --email EMAIL
                        Let NCBI know who you are (required)
  -t TERM [TERM ...], --term TERM [TERM ...]
                        Query terms
  -u, --unlimited       Retrieve unlimited records
  -f, --full            Whether to output full table (Only for builtin template)
  -x XPATH, --xpath XPATH
                        Path to a csv file for xpath query

Commands

Input mode

Users can choose which input mode they want. These two modes cannot be used together.

-i input.xml
Use a xml file as input.

-e <your_email> -t searchterm
Query Entrez using the terms specified by -t.

Output mode

Users can save both xml and tsv file, if filenames are given. xml output can be large, it can be convenient to have a local copy while exploring which summary fields are informative.

(blank)
If no output mode specified, metadatatable will print out the parsed
results.

-ox output.xml
Save downloaded records in a xml file.

-ot output.tsv
Save parsed records in a tsv file.

Retrieval

-u
Retrieve unlimited records. If not used, metadatatable will abort if
the results from querying Entrez are more then 100,000

Parsing

-f
For built-in parsing template only. Output the full table. It will
be silently ignored if using user supplied xpath file.

-x xpath.csv
If supplied, metadatatable will use read the xpath query from the
specified csv file, and -c and -f will be silently ignored.

-c case
Use built-in parsing template. Choose from (rnaseq, source). Default
is rnaseq. Note that "source" template is not implemented yet. This
argument currently is ignored.

Contributors

Tara Tufano
Lukas Wagner
Yadi Zhou

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
assets/img		assets/img
LICENSE		LICENSE
README.md		README.md
metadatatable.py		metadatatable.py
metaparse.py		metaparse.py
metatestxml.py		metatestxml.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MetadataTable

What's the problem?

Why should we solve it?

Workflow

Configuration

Example Invocation

Usage

Commands

Contributors

About

Releases 1

Packages

Contributors 4

Languages

License

NCBI-Hackathons/MetadataTable

Folders and files

Latest commit

History

Repository files navigation

MetadataTable

What's the problem?

Why should we solve it?

Workflow

Configuration

Example Invocation

Usage

Commands

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages