This is a Qiime2 plugin to analyze metabolomics data that utilizes GNPS.
Mass spectrometry detects molecules via charged surrogates called ions (i.e. mass-to-charge, m/z). MS data contains MS1 spectra (i.e. spectrum of all ions; m/z and their respective abundance) and MS2 spectra (i.e. spectrum of structural fragments of an ion). MS2. spectra are generated by imparting excess internal energy into ions which causes them to dissociate into smaller mass fragments. Collision induced dissociation (CID) or higher-energy collision induced dissociation (HCD) are common methods applied to non-volatile ionized molecules to impart energy via collisions with gas molecules. One can collect both MS1 and MS2 spectra in a single experiment using data dependent acquisition (DDA), a common approach used in untargeted metabolomics. The data contains a series of MS1 spectra from which the n most abundant m/z values are selected and fragmented serially. This cycle continues throughout the analysis of a sample. Often, liquid chromatography (or chemical separation techniques) are combined with mass spectrometry to i) simplify sample complexity, ii) provide orthogonal information (e.g. retention time), and iii) increase coverage of the fragmentation approach to fragment more unique ions (e.g. isomers separated by chromatography can all be fragmented individually). Samples can be compared qualitatively and/or quantitatively using either MS1 or MS2 data or a combination of the two.
If one were to collect data using DDA or similar method, multiple MS2 spectra from the sample ion (aka molecule) can be present in the data. One of the initial data analysis processes performed in GNPS (via MScluster) is to collapse identical spectra into a single molecular feature. The number of fragmentation spectra measured for each molecular feature (unique ion) per sample, i.e. spectral counts, can be used as a semiquantitative estimate: i.e. the higher the spectral count, the more abundant the molecular feature will be as it triggered multiple fragmentation events.
Fundamentally, spectral abundance observed at a given time in a mass spectrum is related to concentration (imperfectly); however, integration of spectral abundance over time better represents the concentration of a particular compound in the sample. Extraction ion chromatograms (XIC) are generated from all observed m/z in the MS1 of a sample, and the area under the curve (i.e. peak area) are determined via integration. Advantage of comparing samples using the MS1 peak area is that the quantitative information can be more accurate as well as more sensitively, particularly for those compounds of low abundance, as not all ions will be selected in the top n most abundant peaks and thus not trigger a MS2 fragmentation event.
Install Qiime2 and activate environment by following the steps described here.
Test if the Qiime2 installation was successful by typing the following command:
qiime
If Qiime2 was successfully installed, options will appear.
To install the q2_metabolomics plugin, you have two options, though the conda installation is not current supported.
conda install -c mwang87 q2-metabolomics
or
git clone https://github.com/mwang87/q2_metabolomics
cd q2_metabolomics
pip install -e .
Test if the plugin was installed correctly by repeating the following command:
qiime
If successful, the metabolomics plugin is now listed in the options.
qiime metabolomics
This function will take as input a set of mass spectrometry files (mzXML or mzML) and a manifest file to produce a biom qza file by processing the data via MS2 spectral counts processing at GNPS.
qiime metabolomics import-gnpsnetworkingclustering
This function will take as input an existing GNPS Molecular Networking task and a manifest file to produce a biom qza file.
qiime metabolomics import-gnpsnetworkingclusteringtask
This function will take as input an existing Bucket Table from GNPS Molecular Networking Clustering to produce a biom qza file.
qiime metabolomics import-gnpsnetworkingclusteringbuckettable
This function will take as input a feature quantification file from MZmine2 and a manifest file and produce a biom qza file.
qiime metabolomics import-mzmine2
In this tutorial, we will be utilizing two sets of data to show different analyses we can do.
- Cross-Sectional Data - Discrete cohorts of different sample types
- Longitudinal Data - Time series data of yogurt fermentation
These data and download process are described below.
In this tutorial, we will download metabolomics data for use with the metabolomics plugin for qiime2. The dataset we will use for this tutorial contains cross sectional data from plant or animal sources.
Here we create a directory to hold the data, download the actual mass spectrometry data/metadata, and reorganize.
mkdir Example_CrossSectional
cd Example_CrossSectional
wget -m ftp://massive.ucsd.edu/MSV000082820/peak/
wget -m ftp://massive.ucsd.edu/MSV000082820/other/
mv massive.ucsd.edu/MSV000082820/ .
rm -rf massive.ucsd.edu/
mv MSV000082820/peak/data/ MSV000082820/other/
rm -rf MSV000082820/peak/
mv MSV000082820/other/data/ MSV000082820/
mv MSV000082820/other/ MSV000082820/data/
mv -v MSV000082820/data/other/* MSV000082820/data/
rm -rf MSV000082820/data/other/
mv -v MSV000082820/data/* MSV000082820/
rm -rf MSV000082820/data/
Note: The files contained within the folder “other” have been created for this example dataset. If you want to recreate the example analyses listed below with your own dataset, you will have to create your own [manifest.csv](link to manifest file description) and metadata.txt files.
In this tutorial, we will download metabolomics data for use with the metabolomics plugin for Qiime2. The dataset we will use for this tutorial contains longitudinal data on the fermentation process of milk to yogurt.
Here we create a directory to hold the data, download the actual mass spectrometry data/metadata, and reorganize.
mkdir Example_Longitudinal
cd Example_Longitudinal
wget -m ftp://massive.ucsd.edu/MSV000082821/peak/
wget -m ftp://massive.ucsd.edu/MSV000082821/other/
mv massive.ucsd.edu/MSV000082821/ .
rm -rf massive.ucsd.edu/
mv MSV000082821/peak/data/ MSV000082821/other/
rm -rf MSV000082821/peak/
mv MSV000082821/other/data/ MSV000082821/
mv MSV000082821/other/ MSV000082821/data/
mv -v MSV000082821/data/other/* MSV000082821/data/
rm -rf MSV000082821/data/other/
mv -v MSV000082821/data/* MSV000082821/
rm -rf MSV000082821/data/
Note: The files contained within the folder “other” have been created for this example dataset. If you want to recreate the example analyses listed below with your own dataset, you will have to create your own [manifest.csv](link to manifest file description) and metadata.txt files.
In order to utilize MS1 Peak Areas in Qiime2, you will need to use MZMine2. A detailed tutorial for feature finding with MZmine2 can be found here.
Upon finding all features according to the tutorial above, perform the following steps to export the features and their respective quantifications to be compatible with this Qiime2 plugin with the import_mzmine2
command.
Select Export->CSV File
- Specify .csv file name and location
- Check “Export row ID”, “Export row m/z” and “Export row retention time”
- Check “Peak area”
- Hit OK
- The generated .csv file can now be used directly for further processing in Qiime2 in the Feature Based Quantification Analysis
The manifest file specifies the location of the files that will be processed by the metabolomics plugin. It is a .CSV (comma separated value) formatted table that contains two columns. The first column indicates the ‘sample_name’ for each file, while the second column indicates its corresponding relative file path (relative to where qiime commands are called). The gnps-clustering and the mzmine2-clustering tools are using both the same manifest file.
View of the manifest file (.CSV format). The first column indicates the sample_name for each file, while the second column indicates its corresponding relative file path. The example file can be downloaded here.
sample_name | filepath |
---|---|
sample1 | data/121114_nanoDESI_polar_ISP2_control_DD_MS2.mzXML |
sample2 | data/121119_VM37_FT-IT.mzXML |
sample3 | data/121207_proximicin_B_DD_MS2.mzXML |
GNPS login credentials will be specified in json format, in the following example:
{
"username": "your username",
"password": "your password"
}
To create a GNPS account, checkout the GNPS Documentation.
In this tutorial, we will learn how to analyze metabolomics data using the metabolomics plugin for Qiime2. We will leverage Global Natural Products Social Molecular Networking (GNPS) to make metabolomics data accessible within the Qiime2 platform. We will then investigate the data by running some simple descriptive statistical analyses available through Qiime2.
This tutorial contains two different approaches of analysis with two example data sets, respectively.
- Spectrum Count Qualitative Analysis
- Food Cross Sectional Study
- Longitudinal Study
- Feature Based Quantification Analysis
- Food Cross Sectional Study
- Longitudinal Study
The dataset we will use for this tutorial contains a) cross sectional data from plant or animal sources and b) longitudinal data on the fermentation process of milk to yogurt.
In this tutorial, we will learn how to analyze metabolomics data using spectrum count qualitative analysis. The dataset we will use for this tutorial contains cross sectional data from plant or animal sources.
Before you submit your files to GNPS, navigate to the folder, where your raw data and manifest.csv file is located:
cd MSV000082820/
Now activate your qiime2 conda environment by typing:
source activate qiime2-2018.6
Now we are ready to start using qiime2 commands with our data. For the first step, we will use the gnps-clustering method to perform GNPS mass spectral network analysis:
qiime metabolomics import-gnpsnetworkingclustering \
--p-manifest manifest.csv \
--p-credentials credentials.json \
--o-feature-table categorical_ms2
Provide the name of your manifest.csv file, your GNPS credentials file. Once the GNPS network analysis is finished, you will find the GNPS feature table in .qza format within the directory you are currently in directory you specified. Your job will appear in your job list at GNPS once the files are loaded into GNPS where you can then track progress.
To generate visual and tabular summaries of your feature table, you can use the qiime feature-table summarize function whilst staying in the output folder:
qiime feature-table summarize \
--i-table categorical_ms2.qza \
--o-visualization table.qzv \
--m-sample-metadata-file metadata.txt
To generate a tabular view of your metadata file, you can use the qiime metadata tabulate function. The output visualization enables interactive filtering, sorting, and exporting to common file formats:
qiime metadata tabulate \
--m-input-file metadata.txt \
--o-visualization tabulated-metadata.qzv
To compute the Shannon diversity index for all samples contained within your mass spectral feature table, use the qiime diversity alpha function:
qiime diversity alpha \
--i-table categorical_ms2.qza \
--p-metric shannon \
--o-alpha-diversity shannon.qza
The output file ‘shannon.qza’ contains the per sample Shannon diversity index. You can inspect a .qza file by using a Text Editor (e.g. TextWrangler).
To compute all pairwise canberra distances, you can use the qiime diversity beta function:
qiime diversity beta \
--i-table categorical_ms2.qza \
--p-metric canberra \
--output-dir canberra_qiime2
The output consists of a distance matrix, comprising the canberra distances of all pairs of samples provided in the mass spectral feature table. You can specify a distance metric of your choice using the --p-metric option (e.g. braycurtis, jaccard, mahalanobis, euclidean, etc.)
The resulting distance matrix can be used for PCoA analysis. To create PCos from the above created canberra matrix of pairwise distances type:
qiime diversity pcoa \
--i-distance-matrix canberra_qiime2/distance_matrix.qza \
--output-dir pcoa_canberra_qiime2
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:
qiime emperor plot \
--i-pcoa pcoa_canberra_qiime2/pcoa.qza \
--m-metadata-file metadata.txt \
--output-dir emperor_qiime2
To visualize the PCoA type:
qiime tools view emperor_qiime2/visualization.qzv
Or drag and drop emperor_qiime2/visualization.qzv to https://view.qiime2.org/
In this tutorial, we will learn how to analyze metabolomics data using spectrum count qualitative analysis. The dataset we will use for this tutorial contains longitudinal data on the fermentation process of milk to yogurt.
Before you submit your files to GNPS, navigate to the folder, where your raw data and manifest.csv file is located:
cd MSV000082821/
Now activate your qiime2 conda environment by typing:
source activate qiime2-2018.6
Now we are ready to start using Qiime2 commands with our data. For the first step, we will use the import_gnpsnetworkingclustering method to perform GNPS mass spectral network analysis:
qiime metabolomics import-gnpsnetworkingclustering \
--p-manifest manifest_longitudinal.csv \
--p-credentials credentials.json \
--o-feature-table longitudinal_ms2
Provide the name of your manifest.csv file, your GNPS credentials, and an output of your choice. Once the GNPS network analysis is finished, you will find the GNPS feature table in longitudinal_ms2.qza format.
To generate visual and tabular summaries of your feature table, you can use the qiime feature-table summarize function:
qiime feature-table summarize \
--i-table longitudinal_ms2.qza \
--o-visualization tableSummary_spectralCounts_longitudinal.qzv
This will create a qiime tableSummary_spectralCounts_longitudinal.qzv
object, you can open it by typing:
qiime tools view tableSummary_spectralCounts_longitudinal.qzv
Or drag and drop to: https://view.qiime2.org/
Generate a tabular view of Metadata
To generate a tabular view of your metadata file, you can use the qiime metadata tabulate function. The output visualization enables interactive filtering, sorting, and exporting to common file formats:
qiime metadata tabulate \
--m-input-file metadata_longitudinal.txt \
--o-visualization tabulated-metadata.qzv
Compute the Shannon diversity index for all samples
To compute the Shannon diversity index for all samples contained within your mass spectral feature table, use the qiime diversity alpha function:
qiime diversity alpha \
--i-table longitudinal_ms2.qza \
--p-metric shannon \
--o-alpha-diversity shannon.qza
The output file ‘shannon.qza’ contains the per sample Shannon diversity index. You can inspect a .qza file by using a Text Editor (e.g. TextWrangler).
Compute pairwise canberra distances and visualization in interactive PCoA space
To compute all pairwise canberra distances, you can use the qiime diversity beta function:
qiime diversity beta \
--i-table longitudinal_ms2.qza \
--p-metric canberra \
--output-dir canberra_qiime2
The output consists of a distance matrix, comprising the canberra distances of all pairs of samples provided in the mass spectral feature table. You can specify a distance metric of your choice using the --p-metric option (e.g. braycurtis, jaccard, mahalanobis, euclidean, etc.)
The resulting distance matrix can be used for PCoA analysis. To create PCos from the above created canberra matrix of pairwise distances type:
qiime diversity pcoa \
--i-distance-matrix canberra_qiime2/distance_matrix.qza \
--output-dir pcoa_canberra_qiime2
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:
qiime emperor plot \
--i-pcoa pcoa_canberra_qiime2/pcoa.qza \
--m-metadata-file metadata_longitudinal.txt \
--output-dir emperor_qiime2
To visualize the PCoA type :
qiime tools view emperor_qiime2/visualization.qzv
Or drag and drop emperor_qiime2/visualization.qzv to https://view.qiime2.org/
Here is an example file for emperor_qiime2/visualization.qzv
You should be able to create the following visualization:
Here, we can for example depict chemical differences of milk samples during the fermentation process to yogurt (black: milk, red to blueblue to red: milk with yogurt culture at different stages of the fermentation process from 0 to 58 hours. to yogurt, black: yogurt).
Test whether groups of samples are significantly different from one another using a Permutational multivariate analysis of variance (PERMANOVA)
To test whether the chemistry of the milk samples differs significantly during the fermentation process to yogurt, we can apply a Permutational multivariate analysis of variance (PERMANOVA) to our categorical metadata category ‘age’ using qiime2:
To execute this function, we will provide the distance matrix found in the canberra_qiime2 directory, the longitudinal metadata file, a category of metadata to compute upon (in this case 'age'), an output artifact name, and the option pairwise.
qiime diversity beta-group-significance \
--i-distance-matrix canberra_qiime2/distance_matrix.qza \
--m-metadata-file metadata_longitudinal.txt \
--m-metadata-column age \
--o-visualization PERMANOVA_spectralCounts_longitudinal.qzv \
--p-pairwise
To visualize the results of PERMANOVA_spectralCounts_longitudinal.qzv
:
qiime tools view PERMANOVA_spectralCounts_longitudinal.qzv
Sometimes before performing any of the above analyses you will want to filter out samples from your original mass spectral feature table. For example, large datasets may be computationally intensive, so filtering them down to just the data we’re interested in before downstream analysis can be advantageous.
You can do this directly from your mass spectral feature table in the .qza format using the qiime feature-table filter-samples function. To create a feature table containing only milk samples during different stages of the fermentation process to yogurt, without including the yogurt samples (exclude ‘not applicable’ in the metadata category ‘age’), type:
qiime feature-table filter-samples \
--i-table longitudinal_ms2.qza \
--m-metadata-file metadata_longitudinal.txt \
--p-where "age='not applicable'" \
--p-exclude-ids \
--o-filtered-table age-table.qza
You can now repeat all of the above analyses by substituting the feature_table.qza with the new, filtered output feature table file created here: age-table.qza.
qiime metabolomics import_gnpsnetworkingclustering \
--p-manifest manifest_longitudinal_age.csv
--p-credentials credentials.json
--output-dir out_age
qiime diversity beta \
--i-table age-feature_table.qza \
--p-metric canberra \
--output-dir canberra_age_qiime2
qiime diversity pcoa \
--i-distance-matrix canberra_age_qiime2/distance_matrix.qza \
--output-dir pcoa_canberra_age_qiime2
You can also include a numeric sample metadata column as axis in the Emperor plot. To select the ‘age’ metadata category, which you now filtered for numeric metadata only do:
qiime emperor plot \
--i-pcoa pcoa_canberra_age_qiime2/pcoa.qza \
--m-metadata-file metadata_longitudinal_age.txt \
--p-custom-axes age \
--output-dir emperor_qiime2_custom_axe_age
qiime tools view emperor_qiime2_custom_axe_age/visualization.qzv
Here is an example file for emperor_qiime2_custom_axe_age/visualization.qzv
You should be able to create the following visualization:
To create “quantificationtable” please follow the steps outlined in the tutorial “Qiime2 - MZmine export – Documentation”
In this tutorial, we will learn how to analyze metabolomics data feature based quantification. The dataset we will use for this tutorial contains cross sectional data from plant or animal sources.
This step creates qza file for further analysis in Qiime2
qiime metabolomics import-mzmine2 \
--p-manifest manifest.csv \
--p-quantificationtable quantification_table_categorical.csv \
--o-feature-table feature_mzmine2_cat.qza
This step creates qzv file for further visualization in Qiime2 view
qiime feature-table summarize \
--i-table feature_mzmine2_cat.qza \
--o-visualization tableSummary_peakAreas_cross-sectional.qzv \
--m-sample-metadata-file metadata.txt
This will create a qiime tableSummary_peakAreas_cross-sectional.qzv
object, you can open it by typing:
qiime tools view tableSummary_peakAreas_cross-sectional.qzv
Or drag and drop to: https://view.qiime2.org/
To compute the Shannon diversity index for all samples contained within your MS1 feature table, use the Qiime2 diversity alpha function:
qiime diversity alpha \
--i-table feature_mzmine2_cat.qza \
--p-metric shannon \
--o-alpha-diversity shannon.qza
The output file ‘shannon.qza’ contains the Shannon diversity index for each sample. You can inspect the .qza file by using a Text Editor (e.g. TextWrangler).
To compute all pairwise canberra distances, you can use the Qiime2 diversity beta function:
qiime diversity beta \
--i-table feature_mzmine2_cat.qza \
--p-metric canberra \
--output-dir canberra_qiime2
The output consists of a distance matrix, comprising the canberra distances of all pairs of samples provided in the mass spectral feature table. You can specify a distance metric of your choice using the --p-metric option (e.g. braycurtis, jaccard, mahalanobis, euclidean, etc.)
The resulting distance matrix can be used for PCoA analysis. To create PCos from the above created canberra matrix of pairwise distances type:
qiime diversity pcoa \
--i-distance-matrix canberra_qiime2/distance_matrix.qza \
--output-dir pcoa_canberra_qiime2
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:
qiime emperor plot \
--i-pcoa pcoa_canberra_qiime2/pcoa.qza \
--m-metadata-file metadata.txt \
--output-dir emperor_qiime2
Drag and drop emperor_qiime2/visualization.qzv to https://view.qiime2.org/
Here is an example file for emperor_qiime2/visualization.qzv
You should be able to create the following visualization:
In this tutorial, we will learn how to analyze metabolomics data using feature based quantification analysis. The dataset we will use for this tutorial contains longitudinal data on the fermentation process of milk to yogurt.
This step creates qza file for further analysis in qiime2
qiime metabolomics import-mzmine2 \
--p-manifest manifest_longitudinal.csv \
--p-quantificationtable quantification_table_longitudinal.csv \
--o-feature-table feature_mzmine2_long
To generate visual and tabular summaries of your feature table, you can use the qiime feature-table summarize function:
qiime feature-table summarize \
--i-table feature_mzmine2_long.qza \
--o-visualization table_long.qzv \
--m-sample-metadata-file metadata_longitudinal.txt
This will create a qiime .qzv object, you can open it by typing:
qiime tools view table_long.qzv
Or drag and drop to: https://view.qiime2.org/
To generate a tabular view of your metadata file, you can use the [qiime metadata tabulate] (https://docs.qiime2.org/2017.10/plugins/available/metadata/tabulate/) function. The output visualization enables interactive filtering, sorting, and exporting to common file formats:
qiime metadata tabulate \
--m-input-file metadata_longitudinal.txt \
--o-visualization tabulated-metadata.qzv
This will create a qiime tableSummary_peakAreas_longitutional.qzv
object, you can open it by typing:
qiime tools view tableSummary_peakAreas_cross-sectional.qzv
Or drag and drop to: https://view.qiime2.org/
Generate a tabular view of Metadata
To compute the Shannon diversity index for all samples contained within your mzmine2(ms1) feature table, use the qiime diversity alpha function:
qiime diversity alpha \
--i-table feature_mzmine2_long.qza \
--p-metric shannon \
--o-alpha-diversity shannon.qza
The output file ‘shannon.qza’ contains the per sample Shannon diversity index. You can inspect a .qza file by using a Text Editor (e.g. TextWrangler).
To compute all pairwise canberra distances, you can use the qiime diversity beta function:
qiime diversity beta \
--i-table feature_mzmine2_long.qza \
--p-metric canberra \
--output-dir canberra_qiime2
The output consists of a distance matrix, comprising the canberra distances of all pairs of samples provided in the mass spectral feature table. You can specify a distance metric of your choice using the --p-metric option (e.g. braycurtis, jaccard, mahalanobis, euclidean, etc.)
The resulting distance matrix can be used for PCoA analysis. To create PCos from the above created canberra matrix of pairwise distances type:
qiime diversity pcoa \
--i-distance-matrix canberra_qiime2/distance_matrix.qza \
--output-dir pcoa_canberra_qiime2
To create an interactive ordination plot of the above created PCoA with integrated sample metadata use the qiime emperor plot function. Make sure that the ‘sample_name’s provided in the metadata file correspond to the sample_name in the canberra distance_matrix.qza file:
qiime emperor plot \
--i-pcoa pcoa_canberra_qiime2/pcoa.qza \
--m-metadata-file metadata_longitudinal.txt \
--output-dir emperor_qiime2
To visualize the PCoA type:
qiime tools view emperor_qiime2/visualization.qzv
Or drag and drop emperor_qiime2/visualization.qzv to https://view.qiime2.org/
Here is an example file for emperor_qiime2/visualization.qzv
You should be able to create the following visualization:
Test whether groups of samples are significantly different from one another using a Permutational multivariate analysis of variance (PERMANOVA)
To test whether the chemistry of the milk samples differs significantly during the fermentation process to yogurt, we can apply a Permutational multivariate analysis of variance (PERMANOVA) to our categorical metadata category ‘age’ using qiime2:
To execute this function, we will provide the distance matrix found in the canberra_qiime2 directory, the longitudinal metadata file, a category of metadata to compute upon (in this case 'age'), an output artifact name, and the option pairwise.
qiime diversity beta-group-significance \
--i-distance-matrix canberra_qiime2/distance_matrix.qza \
--m-metadata-file metadata_longitudinal.txt \
--m-metadata-column age \
--o-visualization PERMANOVA_peakAreas_longitudinal.qzv \
--p-pairwise
To visualize the results of PERMANOVA_peakAreas_longitudinal.qzv
:
qiime tools view PERMANOVA_peakAreas_longitudinal.qzv