-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent BAM output from identical inputs #72
Comments
I have observed a similar issue as well. Any advice greatly appreciated. |
Sorry not to reply earlier, my spam filter decided the snap mailing list was something I shouldn't see. This seems very bad, and not something I have seen before. I wonder if it is a Linux only problem, nearly all the dev work is done on widows. It's not even writing as many reads as are in the input file, so maybe somehow the process is exciting before all of the buffers are flushed; it is a very small input file. Anyway, I'm on vacation now, so I probably can't get to it until next week. Maybe Ravi has some time earlier. Sent from my Windows Phone From: Richard Smith-Unnamailto:[email protected] I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be. The input dataset can be found herehttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fbintray.com%2fartifact%2fdownload%2fblahah%2fgeneric%2fexample_data.tar.gz&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xGjmpZVyyYgmibjhyYGB5ZDBB5WAbv3H67p5TjtOHfg%3d, and the two sets of results are linked below:
The version of SNAP used was v1.0dev.96 commands used the commands were identical between runs ./snap-aligner index example_data/one/assembly.fa assembly -s 23 -t8 -bSpace -locationSize 4 It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed: run 1 (logs) Loading index from directory... 0s. 36562 bases, seed size 23 run 2 (logs) Loading index from directory... 0s. 36562 bases, seed size 23 However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments: run1 (bamtools stats) Total reads: 11800 run2 (bamtools stats) Total reads: 28714 Both commands had an exit status of 0, so SNAP didn't throw any error. I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me. — |
Even though I’m on vacation, I started looking at this, and the dataset you provide doesn’t have the same fasta as the example you give. It has transcripts.fa rather than one/assembly.fa. My guess is that this doesn’t matter in the least, since it’s probably unrelated to the actual alignment, but in case you think it might be, please let me know. --Bill From: Richard Smith-Unna [mailto:[email protected]] I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be. The input dataset can be found herehttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fbintray.com%2fartifact%2fdownload%2fblahah%2fgeneric%2fexample_data.tar.gz&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xGjmpZVyyYgmibjhyYGB5ZDBB5WAbv3H67p5TjtOHfg%3d, and the two sets of results are linked below:
The version of SNAP used was v1.0dev.96 commands used the commands were identical between runs ./snap-aligner index example_data/one/assembly.fa assembly -s 23 -t8 -bSpace -locationSize 4 ./snap-aligner paired assembly example_data/left.fq example_data/right.fq \ -o left.fq.right.fq.assembly.bam \ -s 0 1000 \ -H 300000 -h 2000 \ -d 30 -t 8 -b -M -D 5 \ -om 5 -omax 10 It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed: run 1 (logs) Loading index from directory... 0s. 36562 bases, seed size 23 Aligning. Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s) 20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 173,913 0 Welcome to SNAP version 1.0dev.96. run 2 (logs) Loading index from directory... 0s. 36562 bases, seed size 23 Aligning. Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s) 20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 160,000 0 Welcome to SNAP version 1.0dev.96. However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments: run1 (bamtools stats) Total reads: 11800 Mapped reads: 11800 (100%) Forward strand: 5900 (50%) Reverse strand: 5900 (50%) Failed QC: 0 (0%) Duplicates: 0 (0%) Paired-end reads: 11800 (100%) 'Proper-pairs': 11800 (100%) Both pairs mapped: 11800 (100%) Read 1: 5900 Read 2: 5900 Singletons: 0 (0%) run2 (bamtools stats) Total reads: 28714 Mapped reads: 28714 (100%) Forward strand: 14357 (50%) Reverse strand: 14357 (50%) Failed QC: 0 (0%) Duplicates: 0 (0%) Paired-end reads: 28714 (100%) 'Proper-pairs': 28714 (100%) Both pairs mapped: 28714 (100%) Read 1: 14357 Read 2: 14357 Singletons: 0 (0%) Both commands had an exit status of 0, so SNAP didn't throw any error. I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me. — |
I couldn’t get this to repro either in Windows or Linux. What OS version are you using, and how often do you see the truncated output? Maybe I just haven’t tried it enough times. --Bill From: Richard Smith-Unna [mailto:[email protected]] I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be. The input dataset can be found herehttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fbintray.com%2fartifact%2fdownload%2fblahah%2fgeneric%2fexample_data.tar.gz&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xGjmpZVyyYgmibjhyYGB5ZDBB5WAbv3H67p5TjtOHfg%3d, and the two sets of results are linked below:
The version of SNAP used was v1.0dev.96 commands used the commands were identical between runs ./snap-aligner index example_data/one/assembly.fa assembly -s 23 -t8 -bSpace -locationSize 4 ./snap-aligner paired assembly example_data/left.fq example_data/right.fq \ -o left.fq.right.fq.assembly.bam \ -s 0 1000 \ -H 300000 -h 2000 \ -d 30 -t 8 -b -M -D 5 \ -om 5 -omax 10 It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed: run 1 (logs) Loading index from directory... 0s. 36562 bases, seed size 23 Aligning. Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s) 20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 173,913 0 Welcome to SNAP version 1.0dev.96. run 2 (logs) Loading index from directory... 0s. 36562 bases, seed size 23 Aligning. Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s) 20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 160,000 0 Welcome to SNAP version 1.0dev.96. However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments: run1 (bamtools stats) Total reads: 11800 Mapped reads: 11800 (100%) Forward strand: 5900 (50%) Reverse strand: 5900 (50%) Failed QC: 0 (0%) Duplicates: 0 (0%) Paired-end reads: 11800 (100%) 'Proper-pairs': 11800 (100%) Both pairs mapped: 11800 (100%) Read 1: 5900 Read 2: 5900 Singletons: 0 (0%) run2 (bamtools stats) Total reads: 28714 Mapped reads: 28714 (100%) Forward strand: 14357 (50%) Reverse strand: 14357 (50%) Failed QC: 0 (0%) Duplicates: 0 (0%) Paired-end reads: 28714 (100%) 'Proper-pairs': 28714 (100%) Both pairs mapped: 28714 (100%) Read 1: 14357 Read 2: 14357 Singletons: 0 (0%) Both commands had an exit status of 0, so SNAP didn't throw any error. I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me. — |
hi @bolosky sorry that I missed the notifications on this, and thanks very much for taking time out during your holiday to look at it! I've run some tests and I think the problem is specific to OSX - I can't reproduce it on Linux. The following script helps see the issue. It requires #!/bin/bash
for i in `seq 1 20`;
do
rm -rf assembly *bam
snap-aligner index example_data/transcripts.fa assembly \
-s 23 -t8 -bSpace -locationSize 4
snap-aligner paired assembly example_data/left.fq example_data/right.fq \
-o left.fq.right.fq.assembly.bam \
-s 0 1000 -H 300000 -h 2000 -d 30 \
-t 8 -b -M -D 5 -om 5 -omax 10
bamtools stats -in left.fq.right.fq.assembly.bam
done saving it as
on OSX I get:
whereas on Linux I get:
Because most of our users are on Linux this is not as urgent as previously thought, but it would be really nice to figure it out. I'm guessing the none of the SNAP team are on OSX as there haven't been OSX builds for the recent release. Any ideas how I can debug this? |
I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be.
The input dataset can be found here, and the two sets of results are linked below:
The version of SNAP used was v1.0dev.96
commands used
the commands were identical between runs
It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed:
run 1 (logs)
run 2 (logs)
However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments:
run1 (
bamtools stats
)run2 (
bamtools stats
)Both commands had an exit status of
0
, so SNAP didn't throw any error.I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me.
The text was updated successfully, but these errors were encountered: