diff --git a/docs/index.html b/docs/index.html
index 5300f74..4a65300 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -165,7 +165,29 @@
 
 <div class="quarto-listing quarto-listing-container-default" id="listing-listing">
 <div class="list quarto-listing-default">
-<div class="quarto-post image-right" data-index="0" data-listing-date-sort="1718078400000" data-listing-file-modified-sort="1718983212528" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1903">
+<div class="quarto-post image-right" data-index="0" data-listing-date-sort="1719806400000" data-listing-file-modified-sort="1719839900283" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="3" data-listing-word-count-sort="553">
+<div class="thumbnail">
+<p><a href="./notebooks/2024-07-01_partial-hv.html" class="no-external"></a></p><a href="./notebooks/2024-07-01_partial-hv.html" class="no-external">
+<div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
+</a><p><a href="./notebooks/2024-07-01_partial-hv.html" class="no-external"></a></p>
+</div>
+<div class="body">
+<h3 class="no-anchor listing-title">
+<a href="./notebooks/2024-07-01_partial-hv.html" class="no-external">Investigating the v2 pipeline’s human-virus assignment behavior</a>
+</h3>
+<div class="listing-subtitle">
+<a href="./notebooks/2024-07-01_partial-hv.html" class="no-external">Checking treatment of partially-human-infecting virus taxa</a>
+</div>
+</div>
+<div class="metadata">
+<a href="./notebooks/2024-07-01_partial-hv.html" class="no-external">
+<div class="listing-date">
+Jul 1, 2024
+</div>
+</a>
+</div>
+</div>
+<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1718078400000" data-listing-file-modified-sort="1718983212528" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1903">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-06-11_batch.html" class="no-external"></a></p><a href="./notebooks/2024-06-11_batch.html" class="no-external">
 <div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
@@ -187,7 +209,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1715054400000" data-listing-file-modified-sort="1715090192338" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="40" data-listing-word-count-sort="7927">
+<div class="quarto-post image-right" data-index="2" data-listing-date-sort="1715054400000" data-listing-file-modified-sort="1715090192338" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="40" data-listing-word-count-sort="7927">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-05-07_munk.html" class="no-external"></a></p><a href="./notebooks/2024-05-07_munk.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-05-07_munk_files/figure-html/plot-countries-1.png"  class="thumbnail-image card-img"/></p>
@@ -209,7 +231,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="2" data-listing-date-sort="1714536000000" data-listing-file-modified-sort="1714577194354" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="35" data-listing-word-count-sort="6822">
+<div class="quarto-post image-right" data-index="3" data-listing-date-sort="1714536000000" data-listing-file-modified-sort="1714577194354" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="35" data-listing-word-count-sort="6822">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-05-01_ng.html" class="no-external"></a></p><a href="./notebooks/2024-05-01_ng.html" class="no-external">
 <p class="card-img-top"><img src="img/2024-05-01_ng-schematic.png"  class="thumbnail-image card-img"/></p>
@@ -231,7 +253,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="3" data-listing-date-sort="1714536000000" data-listing-file-modified-sort="1714568618257" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="49" data-listing-word-count-sort="9692">
+<div class="quarto-post image-right" data-index="4" data-listing-date-sort="1714536000000" data-listing-file-modified-sort="1714568618257" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="49" data-listing-word-count-sort="9692">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-05-01_bengtsson-palme.html" class="no-external"></a></p><a href="./notebooks/2024-05-01_bengtsson-palme.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-05-01_bengtsson-palme_files/figure-html/plot-basic-stats-1.png"  class="thumbnail-image card-img"/></p>
@@ -253,7 +275,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="4" data-listing-date-sort="1714536000000" data-listing-file-modified-sort="1714577172801" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="40" data-listing-word-count-sort="7835">
+<div class="quarto-post image-right" data-index="5" data-listing-date-sort="1714536000000" data-listing-file-modified-sort="1714577172801" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="40" data-listing-word-count-sort="7835">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-05-01_maritz.html" class="no-external"></a></p><a href="./notebooks/2024-05-01_maritz.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-05-01_maritz_files/figure-html/plot-basic-stats-1.png"  class="thumbnail-image card-img"/></p>
@@ -275,7 +297,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="5" data-listing-date-sort="1714449600000" data-listing-file-modified-sort="1714507774274" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="51" data-listing-word-count-sort="10163">
+<div class="quarto-post image-right" data-index="6" data-listing-date-sort="1714449600000" data-listing-file-modified-sort="1714507774274" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="51" data-listing-word-count-sort="10163">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-04-30_brinch.html" class="no-external"></a></p><a href="./notebooks/2024-04-30_brinch.html" class="no-external">
 <p class="card-img-top"><img src="img/2024-04-30_brinch.png"  class="thumbnail-image card-img"/></p>
@@ -297,7 +319,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="6" data-listing-date-sort="1713499200000" data-listing-file-modified-sort="1713538736315" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="57" data-listing-word-count-sort="11398">
+<div class="quarto-post image-right" data-index="7" data-listing-date-sort="1713499200000" data-listing-file-modified-sort="1713538736315" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="57" data-listing-word-count-sort="11398">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-04-19_leung.html" class="no-external"></a></p><a href="./notebooks/2024-04-19_leung.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-04-19_leung_files/figure-html/plot-basic-stats-1.png"  class="thumbnail-image card-img"/></p>
@@ -319,7 +341,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="7" data-listing-date-sort="1712894400000" data-listing-file-modified-sort="1712954346618" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="46" data-listing-word-count-sort="9074">
+<div class="quarto-post image-right" data-index="8" data-listing-date-sort="1712894400000" data-listing-file-modified-sort="1712954346618" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="46" data-listing-word-count-sort="9074">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-04-12_rosario.html" class="no-external"></a></p><a href="./notebooks/2024-04-12_rosario.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-04-12_rosario_files/figure-html/plot-basic-stats-1.png"  class="thumbnail-image card-img"/></p>
@@ -341,7 +363,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="8" data-listing-date-sort="1712894400000" data-listing-file-modified-sort="1712951171148" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="48" data-listing-word-count-sort="9559">
+<div class="quarto-post image-right" data-index="9" data-listing-date-sort="1712894400000" data-listing-file-modified-sort="1712951171148" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="48" data-listing-word-count-sort="9559">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-04-12_prussin.html" class="no-external"></a></p><a href="./notebooks/2024-04-12_prussin.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-04-12_prussin_files/figure-html/plot-basic-stats-1.png"  class="thumbnail-image card-img"/></p>
@@ -363,7 +385,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="9" data-listing-date-sort="1712548800000" data-listing-file-modified-sort="1712670437987" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="32" data-listing-word-count-sort="6385">
+<div class="quarto-post image-right" data-index="10" data-listing-date-sort="1712548800000" data-listing-file-modified-sort="1712670437987" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="32" data-listing-word-count-sort="6385">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-04-08_brumfield.html" class="no-external"></a></p><a href="./notebooks/2024-04-08_brumfield.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-04-08_brumfield_files/figure-html/plot-basic-stats-1.png"  class="thumbnail-image card-img"/></p>
@@ -385,7 +407,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="10" data-listing-date-sort="1711944000000" data-listing-file-modified-sort="1712010795272" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="48" data-listing-word-count-sort="9483">
+<div class="quarto-post image-right" data-index="11" data-listing-date-sort="1711944000000" data-listing-file-modified-sort="1712010795272" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="48" data-listing-word-count-sort="9483">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-04-01_spurbeck.html" class="no-external"></a></p><a href="./notebooks/2024-04-01_spurbeck.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-04-01_spurbeck_files/figure-html/plot-basic-stats-1.png"  class="thumbnail-image card-img"/></p>
@@ -407,7 +429,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="11" data-listing-date-sort="1710820800000" data-listing-file-modified-sort="1710855113922" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1427">
+<div class="quarto-post image-right" data-index="12" data-listing-date-sort="1710820800000" data-listing-file-modified-sort="1710855113922" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="8" data-listing-word-count-sort="1427">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-03-19_yang-2.html" class="no-external"></a></p><a href="./notebooks/2024-03-19_yang-2.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-03-19_yang-2_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -429,7 +451,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="12" data-listing-date-sort="1710561600000" data-listing-file-modified-sort="1710615491487" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="27" data-listing-word-count-sort="5320">
+<div class="quarto-post image-right" data-index="13" data-listing-date-sort="1710561600000" data-listing-file-modified-sort="1710615491487" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="27" data-listing-word-count-sort="5320">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-03-16_yang.html" class="no-external"></a></p><a href="./notebooks/2024-03-16_yang.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-03-16_yang_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -451,7 +473,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="13" data-listing-date-sort="1709269200000" data-listing-file-modified-sort="1709305045593" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1813">
+<div class="quarto-post image-right" data-index="14" data-listing-date-sort="1709269200000" data-listing-file-modified-sort="1709305045593" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1813">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-03-01_dedup.html" class="no-external"></a></p><a href="./notebooks/2024-03-01_dedup.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-03-01_dedup_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -473,7 +495,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="14" data-listing-date-sort="1709182800000" data-listing-file-modified-sort="1709227197090" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="24" data-listing-word-count-sort="4649">
+<div class="quarto-post image-right" data-index="15" data-listing-date-sort="1709182800000" data-listing-file-modified-sort="1709227197090" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="24" data-listing-word-count-sort="4649">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-02-29_rothman-2.html" class="no-external"></a></p><a href="./notebooks/2024-02-29_rothman-2.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-02-29_rothman-2_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -495,7 +517,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="15" data-listing-date-sort="1709010000000" data-listing-file-modified-sort="1709061727761" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="28" data-listing-word-count-sort="5587">
+<div class="quarto-post image-right" data-index="16" data-listing-date-sort="1709010000000" data-listing-file-modified-sort="1709061727761" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="28" data-listing-word-count-sort="5587">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-02-27_rothman-1.html" class="no-external"></a></p><a href="./notebooks/2024-02-27_rothman-1.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-02-27_rothman-1_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -517,7 +539,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="16" data-listing-date-sort="1707973200000" data-listing-file-modified-sort="1709059724859" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="26" data-listing-word-count-sort="5168">
+<div class="quarto-post image-right" data-index="17" data-listing-date-sort="1707973200000" data-listing-file-modified-sort="1709059724859" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="26" data-listing-word-count-sort="5168">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-02-15_crits-christoph-3.html" class="no-external"></a></p><a href="./notebooks/2024-02-15_crits-christoph-3.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-02-15_crits-christoph-3_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -539,7 +561,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="17" data-listing-date-sort="1707368400000" data-listing-file-modified-sort="1707418839515" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2347">
+<div class="quarto-post image-right" data-index="18" data-listing-date-sort="1707368400000" data-listing-file-modified-sort="1707418839515" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2347">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-02-08_crits-christoph-2.html" class="no-external"></a></p><a href="./notebooks/2024-02-08_crits-christoph-2.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2024-02-08_crits-christoph-2_files/figure-html/unnamed-chunk-3-1.png"  class="thumbnail-image card-img"/></p>
@@ -561,7 +583,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="18" data-listing-date-sort="1707022800000" data-listing-file-modified-sort="1707060932221" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11" data-listing-word-count-sort="2167">
+<div class="quarto-post image-right" data-index="19" data-listing-date-sort="1707022800000" data-listing-file-modified-sort="1707060932221" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11" data-listing-word-count-sort="2167">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-02-04_crits-christoph-1.html" class="no-external"></a></p><a href="./notebooks/2024-02-04_crits-christoph-1.html" class="no-external">
 <p class="card-img-top"><img src="img/2024-01-23_nextflow.png"  class="thumbnail-image card-img"/></p>
@@ -583,7 +605,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="19" data-listing-date-sort="1706590800000" data-listing-file-modified-sort="1706626948426" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2264">
+<div class="quarto-post image-right" data-index="20" data-listing-date-sort="1706590800000" data-listing-file-modified-sort="1706626948426" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="12" data-listing-word-count-sort="2264">
 <div class="thumbnail">
 <p><a href="./notebooks/2024-01-30_blast-validation.html" class="no-external"></a></p><a href="./notebooks/2024-01-30_blast-validation.html" class="no-external">
 <div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
@@ -605,7 +627,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="20" data-listing-date-sort="1703221200000" data-listing-file-modified-sort="1706626471698" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11" data-listing-word-count-sort="2058">
+<div class="quarto-post image-right" data-index="21" data-listing-date-sort="1703221200000" data-listing-file-modified-sort="1706626471698" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11" data-listing-word-count-sort="2058">
 <div class="thumbnail">
 <p><a href="./notebooks/2023-12-22_bmc-rna-sequel.html" class="no-external"></a></p><a href="./notebooks/2023-12-22_bmc-rna-sequel.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2023-12-22_bmc-rna-sequel_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -627,7 +649,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="21" data-listing-date-sort="1702962000000" data-listing-file-modified-sort="1703082760549" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="27" data-listing-word-count-sort="5304">
+<div class="quarto-post image-right" data-index="22" data-listing-date-sort="1702962000000" data-listing-file-modified-sort="1703082760549" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="27" data-listing-word-count-sort="5304">
 <div class="thumbnail">
 <p><a href="./notebooks/2023-12-19_project-runway-bmc-rna.html" class="no-external"></a></p><a href="./notebooks/2023-12-19_project-runway-bmc-rna.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2023-12-19_project-runway-bmc-rna_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -649,7 +671,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="22" data-listing-date-sort="1699419600000" data-listing-file-modified-sort="1699450659122" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="13" data-listing-word-count-sort="2479">
+<div class="quarto-post image-right" data-index="23" data-listing-date-sort="1699419600000" data-listing-file-modified-sort="1699450659122" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="13" data-listing-word-count-sort="2479">
 <div class="thumbnail">
 <p><a href="./notebooks/2023-11-02_project-runway-dna-deduplication.html" class="no-external"></a></p><a href="./notebooks/2023-11-02_project-runway-dna-deduplication.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2023-11-02_project-runway-dna-deduplication_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -671,7 +693,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="23" data-listing-date-sort="1698897600000" data-listing-file-modified-sort="1698943032000" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2641">
+<div class="quarto-post image-right" data-index="24" data-listing-date-sort="1698897600000" data-listing-file-modified-sort="1698943032000" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="14" data-listing-word-count-sort="2641">
 <div class="thumbnail">
 <p><a href="./notebooks/2023-11-02_project-runway-comparison.html" class="no-external"></a></p><a href="./notebooks/2023-11-02_project-runway-comparison.html" class="no-external">
 <p class="card-img-top"><img src="notebooks/2023-11-02_project-runway-comparison_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -693,10 +715,10 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="24" data-listing-date-sort="1698724800000" data-listing-file-modified-sort="1698941598593" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="17" data-listing-word-count-sort="3340">
+<div class="quarto-post image-right" data-index="25" data-listing-date-sort="1698724800000" data-listing-file-modified-sort="1698941598593" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="17" data-listing-word-count-sort="3340">
 <div class="thumbnail">
 <p><a href="./notebooks/2023-10-31_project-runway-initial.html" class="no-external"></a></p><a href="./notebooks/2023-10-31_project-runway-initial.html" class="no-external">
-<p class="card-img-top"><img src="notebooks/2023-10-31_project-runway-initial_files/figure-html/unnamed-chunk-3-1.png"  class="thumbnail-image card-img"/></p>
+<p class="card-img-top"><img data-src="notebooks/2023-10-31_project-runway-initial_files/figure-html/unnamed-chunk-3-1.png"  class="thumbnail-image card-img"/></p>
 </a><p><a href="./notebooks/2023-10-31_project-runway-initial.html" class="no-external"></a></p>
 </div>
 <div class="body">
@@ -715,7 +737,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="25" data-listing-date-sort="1697688000000" data-listing-file-modified-sort="1697766328595" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11" data-listing-word-count-sort="2178">
+<div class="quarto-post image-right" data-index="26" data-listing-date-sort="1697688000000" data-listing-file-modified-sort="1697766328595" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="11" data-listing-word-count-sort="2178">
 <div class="thumbnail">
 <p><a href="./notebooks/2023-10-19_deduplication.html" class="no-external"></a></p><a href="./notebooks/2023-10-19_deduplication.html" class="no-external">
 <p class="card-img-top"><img data-src="notebooks/2023-10-19_deduplication_files/figure-html/unnamed-chunk-2-1.png"  class="thumbnail-image card-img"/></p>
@@ -737,29 +759,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="26" data-listing-date-sort="1697428800000" data-listing-file-modified-sort="1697493211896" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="15" data-listing-word-count-sort="2863">
-<div class="thumbnail">
-<p><a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external"></a></p><a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external">
-<p class="card-img-top"><img data-src="notebooks/2023-10-13_rrna-removal_files/figure-html/rrna-overlap-venn-johnson-1.png"  class="thumbnail-image card-img"/></p>
-</a><p><a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external"></a></p>
-</div>
-<div class="body">
-<h3 class="no-anchor listing-title">
-<a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external">Comparing Ribodetector and bbduk for rRNA detection</a>
-</h3>
-<div class="listing-subtitle">
-<a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external">In search of quick rRNA filtering.</a>
-</div>
-</div>
-<div class="metadata">
-<a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external">
-<div class="listing-date">
-Oct 16, 2023
-</div>
-</a>
-</div>
-</div>
-<div class="quarto-post image-right" data-index="27" data-listing-date-sort="1697428800000" data-listing-file-modified-sort="1718983223134" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="15" data-listing-word-count-sort="2863">
+<div class="quarto-post image-right" data-index="27" data-listing-date-sort="1697428800000" data-listing-file-modified-sort="1697493211896" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="15" data-listing-word-count-sort="2863">
 <div class="thumbnail">
 <p><a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external"></a></p><a href="./notebooks/2023-10-13_rrna-removal.html" class="no-external">
 <p class="card-img-top"><img data-src="notebooks/2023-10-13_rrna-removal_files/figure-html/rrna-overlap-venn-johnson-1.png"  class="thumbnail-image card-img"/></p>
diff --git a/docs/listings.json b/docs/listings.json
index b6ca44f..c5e3eb1 100644
--- a/docs/listings.json
+++ b/docs/listings.json
@@ -2,6 +2,7 @@
   {
     "listing": "/index.html",
     "items": [
+      "/notebooks/2024-07-01_partial-hv.html",
       "/notebooks/2024-06-11_batch.html",
       "/notebooks/2024-05-07_munk.html",
       "/notebooks/2024-05-01_ng.html",
@@ -29,7 +30,6 @@
       "/notebooks/2023-10-31_project-runway-initial.html",
       "/notebooks/2023-10-19_deduplication.html",
       "/notebooks/2023-10-13_rrna-removal.html",
-      "/notebooks/2023-10-13_rrna-removal.html",
       "/notebooks/2023-10-12_fastp-vs-adapterremoval.html",
       "/notebooks/2023-10-12_how-does-element-sequencing-work.html",
       "/notebooks/2023-09-12_settled-solids-extraction-test.html"
diff --git a/docs/notebooks/2024-07-01_partial-hv.html b/docs/notebooks/2024-07-01_partial-hv.html
new file mode 100644
index 0000000..835582b
--- /dev/null
+++ b/docs/notebooks/2024-07-01_partial-hv.html
@@ -0,0 +1,717 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
+
+<meta charset="utf-8">
+<meta name="generator" content="quarto-1.4.552">
+
+<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
+
+<meta name="author" content="Will Bradshaw">
+<meta name="dcterms.date" content="2024-07-01">
+
+<title>Will’s Public NAO Notebook - Investigating the v2 pipeline’s human-virus assignment behavior</title>
+<style>
+code{white-space: pre-wrap;}
+span.smallcaps{font-variant: small-caps;}
+div.columns{display: flex; gap: min(4vw, 1.5em);}
+div.column{flex: auto; overflow-x: auto;}
+div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+ul.task-list{list-style: none;}
+ul.task-list li input[type="checkbox"] {
+  width: 0.8em;
+  margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 
+  vertical-align: middle;
+}
+/* CSS for syntax highlighting */
+pre > code.sourceCode { white-space: pre; position: relative; }
+pre > code.sourceCode > span { line-height: 1.25; }
+pre > code.sourceCode > span:empty { height: 1.2em; }
+.sourceCode { overflow: visible; }
+code.sourceCode > span { color: inherit; text-decoration: inherit; }
+div.sourceCode { margin: 1em 0; }
+pre.sourceCode { margin: 0; }
+@media screen {
+div.sourceCode { overflow: auto; }
+}
+@media print {
+pre > code.sourceCode { white-space: pre-wrap; }
+pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
+}
+pre.numberSource code
+  { counter-reset: source-line 0; }
+pre.numberSource code > span
+  { position: relative; left: -4em; counter-increment: source-line; }
+pre.numberSource code > span > a:first-child::before
+  { content: counter(source-line);
+    position: relative; left: -1em; text-align: right; vertical-align: baseline;
+    border: none; display: inline-block;
+    -webkit-touch-callout: none; -webkit-user-select: none;
+    -khtml-user-select: none; -moz-user-select: none;
+    -ms-user-select: none; user-select: none;
+    padding: 0 4px; width: 4em;
+  }
+pre.numberSource { margin-left: 3em;  padding-left: 4px; }
+div.sourceCode
+  {   }
+@media screen {
+pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
+}
+</style>
+
+
+<script src="../site_libs/quarto-nav/quarto-nav.js"></script>
+<script src="../site_libs/quarto-nav/headroom.min.js"></script>
+<script src="../site_libs/clipboard/clipboard.min.js"></script>
+<script src="../site_libs/quarto-search/autocomplete.umd.js"></script>
+<script src="../site_libs/quarto-search/fuse.min.js"></script>
+<script src="../site_libs/quarto-search/quarto-search.js"></script>
+<meta name="quarto:offset" content="../">
+<script src="../site_libs/quarto-html/quarto.js"></script>
+<script src="../site_libs/quarto-html/popper.min.js"></script>
+<script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
+<script src="../site_libs/quarto-html/anchor.min.js"></script>
+<link href="../site_libs/quarto-html/tippy.css" rel="stylesheet">
+<link href="../site_libs/quarto-html/quarto-syntax-highlighting.css" rel="stylesheet" id="quarto-text-highlighting-styles">
+<script src="../site_libs/bootstrap/bootstrap.min.js"></script>
+<link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
+<link href="../site_libs/bootstrap/bootstrap.min.css" rel="stylesheet" id="quarto-bootstrap" data-mode="light">
+<script id="quarto-search-options" type="application/json">{
+  "location": "navbar",
+  "copy-button": false,
+  "collapse-after": 3,
+  "panel-placement": "end",
+  "type": "overlay",
+  "limit": 50,
+  "keyboard-shortcut": [
+    "f",
+    "/",
+    "s"
+  ],
+  "show-item-context": false,
+  "language": {
+    "search-no-results-text": "No results",
+    "search-matching-documents-text": "matching documents",
+    "search-copy-link-title": "Copy link to search",
+    "search-hide-matches-text": "Hide additional matches",
+    "search-more-match-text": "more match in this document",
+    "search-more-matches-text": "more matches in this document",
+    "search-clear-button-title": "Clear",
+    "search-text-placeholder": "",
+    "search-detached-cancel-button-title": "Cancel",
+    "search-submit-button-title": "Submit",
+    "search-label": "Search"
+  }
+}</script>
+<style>
+
+      .quarto-title-block .quarto-title-banner {
+        background: black;
+      }
+</style>
+
+
+</head>
+
+<body class="nav-fixed fullcontent">
+
+<div id="quarto-search-results"></div>
+  <header id="quarto-header" class="headroom fixed-top quarto-banner">
+    <nav class="navbar navbar-expand-lg " data-bs-theme="dark">
+      <div class="navbar-container container-fluid">
+      <div class="navbar-brand-container mx-auto">
+    <a class="navbar-brand" href="../index.html">
+    <span class="navbar-title">Will’s Public NAO Notebook</span>
+    </a>
+  </div>
+        <div class="quarto-navbar-tools tools-end">
+</div>
+          <div id="quarto-search" class="" title="Search"></div>
+      </div> <!-- /container-fluid -->
+    </nav>
+</header>
+<!-- content -->
+<header id="title-block-header" class="quarto-title-block default page-columns page-full">
+  <div class="quarto-title-banner page-columns page-full">
+    <div class="quarto-title column-body">
+      <div class="quarto-title-block"><div><h1 class="title">Investigating the v2 pipeline’s human-virus assignment behavior</h1><button type="button" class="btn code-tools-button" id="quarto-code-tools-source"><i class="bi"></i> Code</button></div></div>
+            <p class="subtitle lead">Checking treatment of partially-human-infecting virus taxa</p>
+                      </div>
+  </div>
+    
+  
+  <div class="quarto-title-meta">
+
+      <div>
+      <div class="quarto-title-meta-heading">Author</div>
+      <div class="quarto-title-meta-contents">
+               <p>Will Bradshaw </p>
+            </div>
+    </div>
+      
+      <div>
+      <div class="quarto-title-meta-heading">Published</div>
+      <div class="quarto-title-meta-contents">
+        <p class="date">July 1, 2024</p>
+      </div>
+    </div>
+    
+      
+    </div>
+    
+  
+  </header><div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar">
+<!-- sidebar -->
+<!-- margin-sidebar -->
+    
+<!-- main -->
+<main class="content quarto-banner-title-block" id="quarto-document-content">
+
+
+
+
+
+<p>One question that came up in a recent team meeting about the <a href="https://github.com/naobservatory/mgs-workflow">refactored v2 pipeline</a> is how higher-level taxa are handled during human-virus identification.</p>
+<p>As a reminder, the process for HV identification currently looks like this:</p>
+<ol type="1">
+<li>Preprocessed read pairs are mapped to a database of HV genomes with Bowtie2, retaining any read that meets a fairly low alignment-score threshold.</li>
+<li>Surviving read pairs are mapped to human, cow, and assorted other contaminant sequences with Bowtie2 and BBMap, discarding any read that maps to any contaminant.</li>
+<li>Surviving read pairs are merged into single sequences with BBMerge, then passed to Kraken2, which performs taxonomic assignment using the Standard database.</li>
+<li>Based on the Kraken2 assignments, reads are classified into those that are (1) assigned to a human-infecting virus taxon with Kraken, (2) assigned to a non-HV taxon with Kraken, or (3) not assigned to any taxon. All reads in category (2) are filtered out.</li>
+<li>Surviving reads are assigned HV status if they (a) are given HV assignments by both Bowtie2 and Kraken2, or (b) are unassigned by Kraken and align to an HV taxon with Bowtie2 with an alignment score above a user-specified threshold (typically 20).</li>
+</ol>
+<p>This all works well for reads that are assigned to a taxon that is entirely composed of human-infecting viruses. The question is, what about reads that Kraken assigns to taxa that are only partially human-infecting? For example, the virus family Coronaviridae (taxid 11118) contains several coronaviruses that infect humans (e.g.&nbsp;SARS-CoV-2) and many that do not (e.g.&nbsp;assorted bat coronaviruses). What would happen to a read that was assigned by Kraken2 to this taxid?</p>
+<p>The key process here is <code>PROCESS_KRAKEN_HV</code> (<a href="https://github.com/naobservatory/mgs-workflow/blob/master/modules/local/processKrakenHV/main.nf"><code>modules/local/processKrakenHV/main.nf</code></a>), which calls <a href="https://github.com/naobservatory/mgs-workflow/blob/master/bin/process_kraken_hv.py"><code>bin/process_kraken_hv.py</code></a> on the output of Kraken2. This script:</p>
+<ol type="1">
+<li>Imports a TSV of human-infecting virus taxa (generated by <code>FINALIZE_HUMAN_VIRUS_DB</code> in the index workflow) as well as the NCBI taxonomy tree structure (generated by <code>EXTRACT_NCBI_TAXONOMY</code> in the index workflow). Importantly, this initial HV TSV does <em>not</em> include higher taxa.</li>
+<li>Iterates line-by-line over the readwise Kraken2 output as follows:
+<ol type="a">
+<li>Extract the name and taxid assignment for the read.</li>
+<li>Check if the taxid is in the list of HV taxids; if so, return <code>True</code>.</li>
+<li>If not, get the taxid’s parent taxid from the tree structure and go to (b).</li>
+<li>Repeat (b) and (c) until an HV taxid is found or the taxid being screened is 0 (unassigned), 1 (root) or 2 (Bacteria); in the latter case, return <code>False</code>.</li>
+</ol></li>
+<li>Save the read’s HV status, along with other information, to an output file.</li>
+</ol>
+<p>As such, the script checks whether the assigned taxid <em>or any of its ancestors</em> are HV taxa, but not whether any of its descendents are. Reads that are assigned to higher-level taxa will thus be treated as though they were assigned to a non-HV taxa, and filtered out during HV read identification.</p>
+<p>This seems suboptimal. That said, it’s not obvious what the correct behavior is here; treating reads assigned to partially-HV taxa as HV comes with its own problems<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Someone should think more about what the right approach is here.</p>
+
+
+<!-- -->
+
+
+
+<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>
+
+<ol>
+<li id="fn1"><p>In particular, since the Bowtie2 database used for initial screening only contains HV genomes, this approach would likely lead to closely-related non-human-infecting virus reads being classified as HV.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+</ol>
+</section></div></main> <!-- /main -->
+<script id="quarto-html-after-body" type="application/javascript">
+window.document.addEventListener("DOMContentLoaded", function (event) {
+  const toggleBodyColorMode = (bsSheetEl) => {
+    const mode = bsSheetEl.getAttribute("data-mode");
+    const bodyEl = window.document.querySelector("body");
+    if (mode === "dark") {
+      bodyEl.classList.add("quarto-dark");
+      bodyEl.classList.remove("quarto-light");
+    } else {
+      bodyEl.classList.add("quarto-light");
+      bodyEl.classList.remove("quarto-dark");
+    }
+  }
+  const toggleBodyColorPrimary = () => {
+    const bsSheetEl = window.document.querySelector("link#quarto-bootstrap");
+    if (bsSheetEl) {
+      toggleBodyColorMode(bsSheetEl);
+    }
+  }
+  toggleBodyColorPrimary();  
+  const icon = "";
+  const anchorJS = new window.AnchorJS();
+  anchorJS.options = {
+    placement: 'right',
+    icon: icon
+  };
+  anchorJS.add('.anchored');
+  const isCodeAnnotation = (el) => {
+    for (const clz of el.classList) {
+      if (clz.startsWith('code-annotation-')) {                     
+        return true;
+      }
+    }
+    return false;
+  }
+  const clipboard = new window.ClipboardJS('.code-copy-button', {
+    text: function(trigger) {
+      const codeEl = trigger.previousElementSibling.cloneNode(true);
+      for (const childEl of codeEl.children) {
+        if (isCodeAnnotation(childEl)) {
+          childEl.remove();
+        }
+      }
+      return codeEl.innerText;
+    }
+  });
+  clipboard.on('success', function(e) {
+    // button target
+    const button = e.trigger;
+    // don't keep focus
+    button.blur();
+    // flash "checked"
+    button.classList.add('code-copy-button-checked');
+    var currentTitle = button.getAttribute("title");
+    button.setAttribute("title", "Copied!");
+    let tooltip;
+    if (window.bootstrap) {
+      button.setAttribute("data-bs-toggle", "tooltip");
+      button.setAttribute("data-bs-placement", "left");
+      button.setAttribute("data-bs-title", "Copied!");
+      tooltip = new bootstrap.Tooltip(button, 
+        { trigger: "manual", 
+          customClass: "code-copy-button-tooltip",
+          offset: [0, -8]});
+      tooltip.show();    
+    }
+    setTimeout(function() {
+      if (tooltip) {
+        tooltip.hide();
+        button.removeAttribute("data-bs-title");
+        button.removeAttribute("data-bs-toggle");
+        button.removeAttribute("data-bs-placement");
+      }
+      button.setAttribute("title", currentTitle);
+      button.classList.remove('code-copy-button-checked');
+    }, 1000);
+    // clear code selection
+    e.clearSelection();
+  });
+  const viewSource = window.document.getElementById('quarto-view-source') ||
+                     window.document.getElementById('quarto-code-tools-source');
+  if (viewSource) {
+    const sourceUrl = viewSource.getAttribute("data-quarto-source-url");
+    viewSource.addEventListener("click", function(e) {
+      if (sourceUrl) {
+        // rstudio viewer pane
+        if (/\bcapabilities=\b/.test(window.location)) {
+          window.open(sourceUrl);
+        } else {
+          window.location.href = sourceUrl;
+        }
+      } else {
+        const modal = new bootstrap.Modal(document.getElementById('quarto-embedded-source-code-modal'));
+        modal.show();
+      }
+      return false;
+    });
+  }
+  function toggleCodeHandler(show) {
+    return function(e) {
+      const detailsSrc = window.document.querySelectorAll(".cell > details > .sourceCode");
+      for (let i=0; i<detailsSrc.length; i++) {
+        const details = detailsSrc[i].parentElement;
+        if (show) {
+          details.open = true;
+        } else {
+          details.removeAttribute("open");
+        }
+      }
+      const cellCodeDivs = window.document.querySelectorAll(".cell > .sourceCode");
+      const fromCls = show ? "hidden" : "unhidden";
+      const toCls = show ? "unhidden" : "hidden";
+      for (let i=0; i<cellCodeDivs.length; i++) {
+        const codeDiv = cellCodeDivs[i];
+        if (codeDiv.classList.contains(fromCls)) {
+          codeDiv.classList.remove(fromCls);
+          codeDiv.classList.add(toCls);
+        } 
+      }
+      return false;
+    }
+  }
+  const hideAllCode = window.document.getElementById("quarto-hide-all-code");
+  if (hideAllCode) {
+    hideAllCode.addEventListener("click", toggleCodeHandler(false));
+  }
+  const showAllCode = window.document.getElementById("quarto-show-all-code");
+  if (showAllCode) {
+    showAllCode.addEventListener("click", toggleCodeHandler(true));
+  }
+    var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
+    var mailtoRegex = new RegExp(/^mailto:/);
+      var filterRegex = new RegExp('/' + window.location.host + '/');
+    var isInternal = (href) => {
+        return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
+    }
+    // Inspect non-navigation links and adorn them if external
+ 	var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool)');
+    for (var i=0; i<links.length; i++) {
+      const link = links[i];
+      if (!isInternal(link.href)) {
+        // undo the damage that might have been done by quarto-nav.js in the case of
+        // links that we want to consider external
+        if (link.dataset.originalHref !== undefined) {
+          link.href = link.dataset.originalHref;
+        }
+      }
+    }
+  function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {
+    const config = {
+      allowHTML: true,
+      maxWidth: 500,
+      delay: 100,
+      arrow: false,
+      appendTo: function(el) {
+          return el.parentElement;
+      },
+      interactive: true,
+      interactiveBorder: 10,
+      theme: 'quarto',
+      placement: 'bottom-start',
+    };
+    if (contentFn) {
+      config.content = contentFn;
+    }
+    if (onTriggerFn) {
+      config.onTrigger = onTriggerFn;
+    }
+    if (onUntriggerFn) {
+      config.onUntrigger = onUntriggerFn;
+    }
+    window.tippy(el, config); 
+  }
+  const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
+  for (var i=0; i<noterefs.length; i++) {
+    const ref = noterefs[i];
+    tippyHover(ref, function() {
+      // use id or data attribute instead here
+      let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');
+      try { href = new URL(href).hash; } catch {}
+      const id = href.replace(/^#\/?/, "");
+      const note = window.document.getElementById(id);
+      if (note) {
+        return note.innerHTML;
+      } else {
+        return "";
+      }
+    });
+  }
+  const xrefs = window.document.querySelectorAll('a.quarto-xref');
+  const processXRef = (id, note) => {
+    // Strip column container classes
+    const stripColumnClz = (el) => {
+      el.classList.remove("page-full", "page-columns");
+      if (el.children) {
+        for (const child of el.children) {
+          stripColumnClz(child);
+        }
+      }
+    }
+    stripColumnClz(note)
+    if (id === null || id.startsWith('sec-')) {
+      // Special case sections, only their first couple elements
+      const container = document.createElement("div");
+      if (note.children && note.children.length > 2) {
+        container.appendChild(note.children[0].cloneNode(true));
+        for (let i = 1; i < note.children.length; i++) {
+          const child = note.children[i];
+          if (child.tagName === "P" && child.innerText === "") {
+            continue;
+          } else {
+            container.appendChild(child.cloneNode(true));
+            break;
+          }
+        }
+        if (window.Quarto?.typesetMath) {
+          window.Quarto.typesetMath(container);
+        }
+        return container.innerHTML
+      } else {
+        if (window.Quarto?.typesetMath) {
+          window.Quarto.typesetMath(note);
+        }
+        return note.innerHTML;
+      }
+    } else {
+      // Remove any anchor links if they are present
+      const anchorLink = note.querySelector('a.anchorjs-link');
+      if (anchorLink) {
+        anchorLink.remove();
+      }
+      if (window.Quarto?.typesetMath) {
+        window.Quarto.typesetMath(note);
+      }
+      // TODO in 1.5, we should make sure this works without a callout special case
+      if (note.classList.contains("callout")) {
+        return note.outerHTML;
+      } else {
+        return note.innerHTML;
+      }
+    }
+  }
+  for (var i=0; i<xrefs.length; i++) {
+    const xref = xrefs[i];
+    tippyHover(xref, undefined, function(instance) {
+      instance.disable();
+      let url = xref.getAttribute('href');
+      let hash = undefined; 
+      if (url.startsWith('#')) {
+        hash = url;
+      } else {
+        try { hash = new URL(url).hash; } catch {}
+      }
+      if (hash) {
+        const id = hash.replace(/^#\/?/, "");
+        const note = window.document.getElementById(id);
+        if (note !== null) {
+          try {
+            const html = processXRef(id, note.cloneNode(true));
+            instance.setContent(html);
+          } finally {
+            instance.enable();
+            instance.show();
+          }
+        } else {
+          // See if we can fetch this
+          fetch(url.split('#')[0])
+          .then(res => res.text())
+          .then(html => {
+            const parser = new DOMParser();
+            const htmlDoc = parser.parseFromString(html, "text/html");
+            const note = htmlDoc.getElementById(id);
+            if (note !== null) {
+              const html = processXRef(id, note);
+              instance.setContent(html);
+            } 
+          }).finally(() => {
+            instance.enable();
+            instance.show();
+          });
+        }
+      } else {
+        // See if we can fetch a full url (with no hash to target)
+        // This is a special case and we should probably do some content thinning / targeting
+        fetch(url)
+        .then(res => res.text())
+        .then(html => {
+          const parser = new DOMParser();
+          const htmlDoc = parser.parseFromString(html, "text/html");
+          const note = htmlDoc.querySelector('main.content');
+          if (note !== null) {
+            // This should only happen for chapter cross references
+            // (since there is no id in the URL)
+            // remove the first header
+            if (note.children.length > 0 && note.children[0].tagName === "HEADER") {
+              note.children[0].remove();
+            }
+            const html = processXRef(null, note);
+            instance.setContent(html);
+          } 
+        }).finally(() => {
+          instance.enable();
+          instance.show();
+        });
+      }
+    }, function(instance) {
+    });
+  }
+      let selectedAnnoteEl;
+      const selectorForAnnotation = ( cell, annotation) => {
+        let cellAttr = 'data-code-cell="' + cell + '"';
+        let lineAttr = 'data-code-annotation="' +  annotation + '"';
+        const selector = 'span[' + cellAttr + '][' + lineAttr + ']';
+        return selector;
+      }
+      const selectCodeLines = (annoteEl) => {
+        const doc = window.document;
+        const targetCell = annoteEl.getAttribute("data-target-cell");
+        const targetAnnotation = annoteEl.getAttribute("data-target-annotation");
+        const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));
+        const lines = annoteSpan.getAttribute("data-code-lines").split(",");
+        const lineIds = lines.map((line) => {
+          return targetCell + "-" + line;
+        })
+        let top = null;
+        let height = null;
+        let parent = null;
+        if (lineIds.length > 0) {
+            //compute the position of the single el (top and bottom and make a div)
+            const el = window.document.getElementById(lineIds[0]);
+            top = el.offsetTop;
+            height = el.offsetHeight;
+            parent = el.parentElement.parentElement;
+          if (lineIds.length > 1) {
+            const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);
+            const bottom = lastEl.offsetTop + lastEl.offsetHeight;
+            height = bottom - top;
+          }
+          if (top !== null && height !== null && parent !== null) {
+            // cook up a div (if necessary) and position it 
+            let div = window.document.getElementById("code-annotation-line-highlight");
+            if (div === null) {
+              div = window.document.createElement("div");
+              div.setAttribute("id", "code-annotation-line-highlight");
+              div.style.position = 'absolute';
+              parent.appendChild(div);
+            }
+            div.style.top = top - 2 + "px";
+            div.style.height = height + 4 + "px";
+            div.style.left = 0;
+            let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
+            if (gutterDiv === null) {
+              gutterDiv = window.document.createElement("div");
+              gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");
+              gutterDiv.style.position = 'absolute';
+              const codeCell = window.document.getElementById(targetCell);
+              const gutter = codeCell.querySelector('.code-annotation-gutter');
+              gutter.appendChild(gutterDiv);
+            }
+            gutterDiv.style.top = top - 2 + "px";
+            gutterDiv.style.height = height + 4 + "px";
+          }
+          selectedAnnoteEl = annoteEl;
+        }
+      };
+      const unselectCodeLines = () => {
+        const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];
+        elementsIds.forEach((elId) => {
+          const div = window.document.getElementById(elId);
+          if (div) {
+            div.remove();
+          }
+        });
+        selectedAnnoteEl = undefined;
+      };
+        // Handle positioning of the toggle
+    window.addEventListener(
+      "resize",
+      throttle(() => {
+        elRect = undefined;
+        if (selectedAnnoteEl) {
+          selectCodeLines(selectedAnnoteEl);
+        }
+      }, 10)
+    );
+    function throttle(fn, ms) {
+    let throttle = false;
+    let timer;
+      return (...args) => {
+        if(!throttle) { // first call gets through
+            fn.apply(this, args);
+            throttle = true;
+        } else { // all the others get throttled
+            if(timer) clearTimeout(timer); // cancel #2
+            timer = setTimeout(() => {
+              fn.apply(this, args);
+              timer = throttle = false;
+            }, ms);
+        }
+      };
+    }
+      // Attach click handler to the DT
+      const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
+      for (const annoteDlNode of annoteDls) {
+        annoteDlNode.addEventListener('click', (event) => {
+          const clickedEl = event.target;
+          if (clickedEl !== selectedAnnoteEl) {
+            unselectCodeLines();
+            const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');
+            if (activeEl) {
+              activeEl.classList.remove('code-annotation-active');
+            }
+            selectCodeLines(clickedEl);
+            clickedEl.classList.add('code-annotation-active');
+          } else {
+            // Unselect the line
+            unselectCodeLines();
+            clickedEl.classList.remove('code-annotation-active');
+          }
+        });
+      }
+  const findCites = (el) => {
+    const parentEl = el.parentElement;
+    if (parentEl) {
+      const cites = parentEl.dataset.cites;
+      if (cites) {
+        return {
+          el,
+          cites: cites.split(' ')
+        };
+      } else {
+        return findCites(el.parentElement)
+      }
+    } else {
+      return undefined;
+    }
+  };
+  var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
+  for (var i=0; i<bibliorefs.length; i++) {
+    const ref = bibliorefs[i];
+    const citeInfo = findCites(ref);
+    if (citeInfo) {
+      tippyHover(citeInfo.el, function() {
+        var popup = window.document.createElement('div');
+        citeInfo.cites.forEach(function(cite) {
+          var citeDiv = window.document.createElement('div');
+          citeDiv.classList.add('hanging-indent');
+          citeDiv.classList.add('csl-entry');
+          var biblioDiv = window.document.getElementById('ref-' + cite);
+          if (biblioDiv) {
+            citeDiv.innerHTML = biblioDiv.innerHTML;
+          }
+          popup.appendChild(citeDiv);
+        });
+        return popup.innerHTML;
+      });
+    }
+  }
+});
+</script><div class="modal fade" id="quarto-embedded-source-code-modal" tabindex="-1" aria-labelledby="quarto-embedded-source-code-modal-label" aria-hidden="true"><div class="modal-dialog modal-dialog-scrollable"><div class="modal-content"><div class="modal-header"><h5 class="modal-title" id="quarto-embedded-source-code-modal-label">Source Code</h5><button class="btn-close" data-bs-dismiss="modal"></button></div><div class="modal-body"><div class="">
+<div class="sourceCode" id="cb1" data-shortcodes="false"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="an">title:</span><span class="co"> "Investigating the v2 pipeline's human-virus assignment behavior"</span></span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="an">subtitle:</span><span class="co"> "Checking treatment of partially-human-infecting virus taxa"</span></span>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="an">author:</span><span class="co"> "Will Bradshaw"</span></span>
+<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="an">date:</span><span class="co"> 2024-07-01</span></span>
+<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="an">format:</span></span>
+<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="co">  html:</span></span>
+<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="co">    code-fold: true</span></span>
+<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="co">    code-tools: true</span></span>
+<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="co">    code-link: true</span></span>
+<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="co">    df-print: paged</span></span>
+<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="an">editor:</span><span class="co"> visual</span></span>
+<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="an">title-block-banner:</span><span class="co"> black</span></span>
+<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
+<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a>One question that came up in a recent team meeting about the <span class="co">[</span><span class="ot">refactored v2 pipeline</span><span class="co">](https://github.com/naobservatory/mgs-workflow)</span> is how higher-level taxa are handled during human-virus identification.</span>
+<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>As a reminder, the process for HV identification currently looks like this:</span>
+<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a><span class="ss">1.  </span>Preprocessed read pairs are mapped to a database of HV genomes with Bowtie2, retaining any read that meets a fairly low alignment-score threshold.</span>
+<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a><span class="ss">2.  </span>Surviving read pairs are mapped to human, cow, and assorted other contaminant sequences with Bowtie2 and BBMap, discarding any read that maps to any contaminant.</span>
+<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="ss">3.  </span>Surviving read pairs are merged into single sequences with BBMerge, then passed to Kraken2, which performs taxonomic assignment using the Standard database.</span>
+<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="ss">4.  </span>Based on the Kraken2 assignments, reads are classified into those that are (1) assigned to a human-infecting virus taxon with Kraken, (2) assigned to a non-HV taxon with Kraken, or (3) not assigned to any taxon. All reads in category (2) are filtered out.</span>
+<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a><span class="ss">5.  </span>Surviving reads are assigned HV status if they (a) are given HV assignments by both Bowtie2 and Kraken2, or (b) are unassigned by Kraken and align to an HV taxon with Bowtie2 with an alignment score above a user-specified threshold (typically 20).</span>
+<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a>This all works well for reads that are assigned to a taxon that is entirely composed of human-infecting viruses. The question is, what about reads that Kraken assigns to taxa that are only partially human-infecting? For example, the virus family Coronaviridae (taxid 11118) contains several coronaviruses that infect humans (e.g. SARS-CoV-2) and many that do not (e.g. assorted bat coronaviruses). What would happen to a read that was assigned by Kraken2 to this taxid?</span>
+<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a>The key process here is <span class="in">`PROCESS_KRAKEN_HV`</span> (<span class="co">[</span><span class="ot">`modules/local/processKrakenHV/main.nf`</span><span class="co">](https://github.com/naobservatory/mgs-workflow/blob/master/modules/local/processKrakenHV/main.nf)</span>), which calls <span class="co">[</span><span class="ot">`bin/process_kraken_hv.py`</span><span class="co">](https://github.com/naobservatory/mgs-workflow/blob/master/bin/process_kraken_hv.py)</span> on the output of Kraken2. This script:</span>
+<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a><span class="ss">1.  </span>Imports a TSV of human-infecting virus taxa (generated by <span class="in">`FINALIZE_HUMAN_VIRUS_DB`</span> in the index workflow) as well as the NCBI taxonomy tree structure (generated by <span class="in">`EXTRACT_NCBI_TAXONOMY`</span> in the index workflow). Importantly, this initial HV TSV does *not* include higher taxa.</span>
+<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a><span class="ss">2.  </span>Iterates line-by-line over the readwise Kraken2 output as follows:</span>
+<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a>    a.  Extract the name and taxid assignment for the read.</span>
+<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"></a>    b.  Check if the taxid is in the list of HV taxids; if so, return <span class="in">`True`</span>.</span>
+<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"></a>    c.  If not, get the taxid's parent taxid from the tree structure and go to (b).</span>
+<span id="cb1-35"><a href="#cb1-35" aria-hidden="true" tabindex="-1"></a>    d.  Repeat (b) and (c) until an HV taxid is found or the taxid being screened is 0 (unassigned), 1 (root) or 2 (Bacteria); in the latter case, return <span class="in">`False`</span>.</span>
+<span id="cb1-36"><a href="#cb1-36" aria-hidden="true" tabindex="-1"></a><span class="ss">3.  </span>Save the read's HV status, along with other information, to an output file.</span>
+<span id="cb1-37"><a href="#cb1-37" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-38"><a href="#cb1-38" aria-hidden="true" tabindex="-1"></a>As such, the script checks whether the assigned taxid *or any of its ancestors* are HV taxa, but not whether any of its descendents are. Reads that are assigned to higher-level taxa will thus be treated as though they were assigned to a non-HV taxa, and filtered out during HV read identification.</span>
+<span id="cb1-39"><a href="#cb1-39" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-40"><a href="#cb1-40" aria-hidden="true" tabindex="-1"></a>This seems suboptimal. That said, it's not obvious what the correct behavior is here; treating reads assigned to partially-HV taxa as HV comes with its own problems<span class="ot">[^1]</span>. Someone should think more about what the right approach is here.</span>
+<span id="cb1-41"><a href="#cb1-41" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb1-42"><a href="#cb1-42" aria-hidden="true" tabindex="-1"></a><span class="ot">[^1]: </span>In particular, since the Bowtie2 database used for initial screening only contains HV genomes, this approach would likely lead to closely-related non-human-infecting virus reads being classified as HV.</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+</div></div></div></div></div>
+</div> <!-- /content -->
+
+
+
+
+</body></html>
\ No newline at end of file
diff --git a/docs/search.json b/docs/search.json
index 9dda6e9..cef8b03 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -32,7 +32,7 @@
     "href": "index.html",
     "title": "Will's Public NAO Notebook",
     "section": "",
-    "text": "Setting up AWS Batch to work with the NAO’s MGS workflow\n\n\nA hopefully-simple guide to an unfortunately-complicated service.\n\n\n\n\n\nJun 11, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Munk et al. (2022)\n\n\nA global wastewater study.\n\n\n\n\n\nMay 7, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Ng et al. (2019)\n\n\nWastewater from Singapore.\n\n\n\n\n\nMay 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Bengtsson-Palme et al. (2016)\n\n\nWastewater grab samples from Sweden.\n\n\n\n\n\nMay 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Maritz et al. (2019)\n\n\nWastewater from NYC.\n\n\n\n\n\nMay 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Brinch et al. (2020)\n\n\nWastewater from Copenhagen.\n\n\n\n\n\nApr 30, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Leung et al. (2021)\n\n\nAir sampling from urban public transit systems.\n\n\n\n\n\nApr 19, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Rosario et al. (2018)\n\n\nAir sampling from a student dorm in Colorado.\n\n\n\n\n\nApr 12, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Prussin et al. (2019)\n\n\nAir filters from a daycare in Virginia.\n\n\n\n\n\nApr 12, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Brumfield et al. (2022)\n\n\nWastewater from a manhole in Maryland.\n\n\n\n\n\nApr 8, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Spurbeck et al. (2023)\n\n\nCave carpa.\n\n\n\n\n\nApr 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nFollowup analysis of Yang et al. (2020)\n\n\nDigging into deduplication.\n\n\n\n\n\nMar 19, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Yang et al. (2020)\n\n\nWastewater from Xinjiang.\n\n\n\n\n\nMar 16, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nImproving read deduplication in the MGS workflow\n\n\nRemoving reverse-complement duplicates of human-viral reads.\n\n\n\n\n\nMar 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Rothman et al. (2021), part 2\n\n\nPanel-enriched samples.\n\n\n\n\n\nFeb 29, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Rothman et al. (2021), part 1\n\n\nUnenriched samples.\n\n\n\n\n\nFeb 27, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Crits-Christoph et al. (2021), part 3\n\n\nFixing the virus pipeline.\n\n\n\n\n\nFeb 15, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Crits-Christoph et al. (2021), part 2\n\n\nAbundance and composition of human-infecting viruses.\n\n\n\n\n\nFeb 8, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Crits-Christoph et al. (2021), part 1\n\n\nPreprocessing and composition.\n\n\n\n\n\nFeb 4, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nAutomating BLAST validation of human viral read assignment\n\n\nExperiments with BLASTN remote mode\n\n\n\n\n\nJan 30, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nProject Runway RNA-seq testing data: removing livestock reads\n\n\n\n\n\n\n\n\nDec 22, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Project Runway RNA-seq testing data\n\n\nApplying a new workflow to some oldish data.\n\n\n\n\n\nDec 19, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nEstimating the effect of read depth on duplication rate for Project Runway DNA data\n\n\nHow deep can we go?\n\n\n\n\n\nNov 8, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing viral read assignments between pipelines on Project Runway data\n\n\n\n\n\n\n\n\nNov 2, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nInitial analysis of Project Runway protocol testing data\n\n\n\n\n\n\n\n\nOct 31, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing options for read deduplication\n\n\nClumpify vs fastp\n\n\n\n\n\nOct 19, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing Ribodetector and bbduk for rRNA detection\n\n\nIn search of quick rRNA filtering.\n\n\n\n\n\nOct 16, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing Ribodetector and bbduk for rRNA detection\n\n\nIn search of quick rRNA filtering.\n\n\n\n\n\nOct 16, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing FASTP and AdapterRemoval for MGS pre-processing\n\n\nTwo tools – how do they perform?\n\n\n\n\n\nOct 12, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nHow does Element AVITI sequencing work?\n\n\nFindings of a shallow investigation\n\n\n\n\n\nOct 11, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nExtraction experiment 2: high-level results & interpretation\n\n\nComparing RNA yields and quality across extraction kits for settled solids\n\n\n\n\n\nSep 21, 2023\n\n\n\n\n\n\nNo matching items"
+    "text": "Investigating the v2 pipeline’s human-virus assignment behavior\n\n\nChecking treatment of partially-human-infecting virus taxa\n\n\n\n\n\nJul 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nSetting up AWS Batch to work with the NAO’s MGS workflow\n\n\nA hopefully-simple guide to an unfortunately-complicated service.\n\n\n\n\n\nJun 11, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Munk et al. (2022)\n\n\nA global wastewater study.\n\n\n\n\n\nMay 7, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Ng et al. (2019)\n\n\nWastewater from Singapore.\n\n\n\n\n\nMay 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Bengtsson-Palme et al. (2016)\n\n\nWastewater grab samples from Sweden.\n\n\n\n\n\nMay 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Maritz et al. (2019)\n\n\nWastewater from NYC.\n\n\n\n\n\nMay 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Brinch et al. (2020)\n\n\nWastewater from Copenhagen.\n\n\n\n\n\nApr 30, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Leung et al. (2021)\n\n\nAir sampling from urban public transit systems.\n\n\n\n\n\nApr 19, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Rosario et al. (2018)\n\n\nAir sampling from a student dorm in Colorado.\n\n\n\n\n\nApr 12, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Prussin et al. (2019)\n\n\nAir filters from a daycare in Virginia.\n\n\n\n\n\nApr 12, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Brumfield et al. (2022)\n\n\nWastewater from a manhole in Maryland.\n\n\n\n\n\nApr 8, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Spurbeck et al. (2023)\n\n\nCave carpa.\n\n\n\n\n\nApr 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nFollowup analysis of Yang et al. (2020)\n\n\nDigging into deduplication.\n\n\n\n\n\nMar 19, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Yang et al. (2020)\n\n\nWastewater from Xinjiang.\n\n\n\n\n\nMar 16, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nImproving read deduplication in the MGS workflow\n\n\nRemoving reverse-complement duplicates of human-viral reads.\n\n\n\n\n\nMar 1, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Rothman et al. (2021), part 2\n\n\nPanel-enriched samples.\n\n\n\n\n\nFeb 29, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Rothman et al. (2021), part 1\n\n\nUnenriched samples.\n\n\n\n\n\nFeb 27, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Crits-Christoph et al. (2021), part 3\n\n\nFixing the virus pipeline.\n\n\n\n\n\nFeb 15, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Crits-Christoph et al. (2021), part 2\n\n\nAbundance and composition of human-infecting viruses.\n\n\n\n\n\nFeb 8, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Crits-Christoph et al. (2021), part 1\n\n\nPreprocessing and composition.\n\n\n\n\n\nFeb 4, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nAutomating BLAST validation of human viral read assignment\n\n\nExperiments with BLASTN remote mode\n\n\n\n\n\nJan 30, 2024\n\n\n\n\n\n\n\n\n\n\n\n\nProject Runway RNA-seq testing data: removing livestock reads\n\n\n\n\n\n\n\n\nDec 22, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nWorkflow analysis of Project Runway RNA-seq testing data\n\n\nApplying a new workflow to some oldish data.\n\n\n\n\n\nDec 19, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nEstimating the effect of read depth on duplication rate for Project Runway DNA data\n\n\nHow deep can we go?\n\n\n\n\n\nNov 8, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing viral read assignments between pipelines on Project Runway data\n\n\n\n\n\n\n\n\nNov 2, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nInitial analysis of Project Runway protocol testing data\n\n\n\n\n\n\n\n\nOct 31, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing options for read deduplication\n\n\nClumpify vs fastp\n\n\n\n\n\nOct 19, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing Ribodetector and bbduk for rRNA detection\n\n\nIn search of quick rRNA filtering.\n\n\n\n\n\nOct 16, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nComparing FASTP and AdapterRemoval for MGS pre-processing\n\n\nTwo tools – how do they perform?\n\n\n\n\n\nOct 12, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nHow does Element AVITI sequencing work?\n\n\nFindings of a shallow investigation\n\n\n\n\n\nOct 11, 2023\n\n\n\n\n\n\n\n\n\n\n\n\nExtraction experiment 2: high-level results & interpretation\n\n\nComparing RNA yields and quality across extraction kits for settled solids\n\n\n\n\n\nSep 21, 2023\n\n\n\n\n\n\nNo matching items"
   },
   {
     "objectID": "notebooks/2023-10-12_fastp-vs-adapterremoval.html",
@@ -362,5 +362,19 @@
     "title": "Setting up AWS Batch to work with the NAO’s MGS workflow",
     "section": "Footnotes",
     "text": "Footnotes\n\n\nIn more depth, you need the following actions to be enabled for the bucket in question for your IAM user or role: s3:ListBucket, s3:GetBucketLocation, s3:GetObject, s3:GetObjectAcl, s3:PutObject, s3:PutObjectAcl, s3:PutObjectTagging, and s3:DeleteObject. If you’re using a bucket specific to your user, all this is easier if you first have your administrator enable s3:GetBucketPolicy and s3:PutBucketPolicy for your user.↩︎\nIf you want even more IOPS, you can provision an io2 volume instead of gp3. However, that’s beyond the scope of this guide.↩︎\nIn the future, I’ll investigate running Batch with Fargate for Nextflow workflows. For now, using EC2 gives us greater control over configuration than Fargate, at the cost of additional setup complexity and occasional startup delays.↩︎\nOn the latest version of the v2 pipeline, most of these options are already configured appropriately when running the pipeline with -profile batch (which is also the default profile). In this case, you only need to change process.queue to point to the name of your job queue.↩︎"
+  },
+  {
+    "objectID": "notebooks/2024-07-01_partial-hv.html",
+    "href": "notebooks/2024-07-01_partial-hv.html",
+    "title": "Investigating the v2 pipeline’s human-virus assignment behavior",
+    "section": "",
+    "text": "One question that came up in a recent team meeting about the refactored v2 pipeline is how higher-level taxa are handled during human-virus identification.\nAs a reminder, the process for HV identification currently looks like this:\nThis all works well for reads that are assigned to a taxon that is entirely composed of human-infecting viruses. The question is, what about reads that Kraken assigns to taxa that are only partially human-infecting? For example, the virus family Coronaviridae (taxid 11118) contains several coronaviruses that infect humans (e.g. SARS-CoV-2) and many that do not (e.g. assorted bat coronaviruses). What would happen to a read that was assigned by Kraken2 to this taxid?\nThe key process here is PROCESS_KRAKEN_HV (modules/local/processKrakenHV/main.nf), which calls bin/process_kraken_hv.py on the output of Kraken2. This script:\nAs such, the script checks whether the assigned taxid or any of its ancestors are HV taxa, but not whether any of its descendents are. Reads that are assigned to higher-level taxa will thus be treated as though they were assigned to a non-HV taxa, and filtered out during HV read identification.\nThis seems suboptimal. That said, it’s not obvious what the correct behavior is here; treating reads assigned to partially-HV taxa as HV comes with its own problems1. Someone should think more about what the right approach is here."
+  },
+  {
+    "objectID": "notebooks/2024-07-01_partial-hv.html#footnotes",
+    "href": "notebooks/2024-07-01_partial-hv.html#footnotes",
+    "title": "Investigating the v2 pipeline’s human-virus assignment behavior",
+    "section": "Footnotes",
+    "text": "Footnotes\n\n\nIn particular, since the Bowtie2 database used for initial screening only contains HV genomes, this approach would likely lead to closely-related non-human-infecting virus reads being classified as HV.↩︎"
   }
 ]
\ No newline at end of file
diff --git a/notebooks/2024-07-01_partial-hv.qmd b/notebooks/2024-07-01_partial-hv.qmd
new file mode 100644
index 0000000..2edd683
--- /dev/null
+++ b/notebooks/2024-07-01_partial-hv.qmd
@@ -0,0 +1,42 @@
+---
+title: "Investigating the v2 pipeline's human-virus assignment behavior"
+subtitle: "Checking treatment of partially-human-infecting virus taxa"
+author: "Will Bradshaw"
+date: 2024-07-01
+format:
+  html:
+    code-fold: true
+    code-tools: true
+    code-link: true
+    df-print: paged
+editor: visual
+title-block-banner: black
+---
+
+One question that came up in a recent team meeting about the [refactored v2 pipeline](https://github.com/naobservatory/mgs-workflow) is how higher-level taxa are handled during human-virus identification.
+
+As a reminder, the process for HV identification currently looks like this:
+
+1.  Preprocessed read pairs are mapped to a database of HV genomes with Bowtie2, retaining any read that meets a fairly low alignment-score threshold.
+2.  Surviving read pairs are mapped to human, cow, and assorted other contaminant sequences with Bowtie2 and BBMap, discarding any read that maps to any contaminant.
+3.  Surviving read pairs are merged into single sequences with BBMerge, then passed to Kraken2, which performs taxonomic assignment using the Standard database.
+4.  Based on the Kraken2 assignments, reads are classified into those that are (1) assigned to a human-infecting virus taxon with Kraken, (2) assigned to a non-HV taxon with Kraken, or (3) not assigned to any taxon. All reads in category (2) are filtered out.
+5.  Surviving reads are assigned HV status if they (a) are given HV assignments by both Bowtie2 and Kraken2, or (b) are unassigned by Kraken and align to an HV taxon with Bowtie2 with an alignment score above a user-specified threshold (typically 20).
+
+This all works well for reads that are assigned to a taxon that is entirely composed of human-infecting viruses. The question is, what about reads that Kraken assigns to taxa that are only partially human-infecting? For example, the virus family Coronaviridae (taxid 11118) contains several coronaviruses that infect humans (e.g. SARS-CoV-2) and many that do not (e.g. assorted bat coronaviruses). What would happen to a read that was assigned by Kraken2 to this taxid?
+
+The key process here is `PROCESS_KRAKEN_HV` ([`modules/local/processKrakenHV/main.nf`](https://github.com/naobservatory/mgs-workflow/blob/master/modules/local/processKrakenHV/main.nf)), which calls [`bin/process_kraken_hv.py`](https://github.com/naobservatory/mgs-workflow/blob/master/bin/process_kraken_hv.py) on the output of Kraken2. This script:
+
+1.  Imports a TSV of human-infecting virus taxa (generated by `FINALIZE_HUMAN_VIRUS_DB` in the index workflow) as well as the NCBI taxonomy tree structure (generated by `EXTRACT_NCBI_TAXONOMY` in the index workflow). Importantly, this initial HV TSV does *not* include higher taxa.
+2.  Iterates line-by-line over the readwise Kraken2 output as follows:
+    a.  Extract the name and taxid assignment for the read.
+    b.  Check if the taxid is in the list of HV taxids; if so, return `True`.
+    c.  If not, get the taxid's parent taxid from the tree structure and go to (b).
+    d.  Repeat (b) and (c) until an HV taxid is found or the taxid being screened is 0 (unassigned), 1 (root) or 2 (Bacteria); in the latter case, return `False`.
+3.  Save the read's HV status, along with other information, to an output file.
+
+As such, the script checks whether the assigned taxid *or any of its ancestors* are HV taxa, but not whether any of its descendents are. Reads that are assigned to higher-level taxa will thus be treated as though they were assigned to a non-HV taxa, and filtered out during HV read identification.
+
+This seems suboptimal. That said, it's not obvious what the correct behavior is here; treating reads assigned to partially-HV taxa as HV comes with its own problems[^1]. Someone should think more about what the right approach is here.
+
+[^1]: In particular, since the Bowtie2 database used for initial screening only contains HV genomes, this approach would likely lead to closely-related non-human-infecting virus reads being classified as HV.