title | tags | authors | affiliations | date | bibliography | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Results of the ISMRM 2020 joint Reproducible Research & Quantitative MR study groups reproducibility challenge on phantom and human brain T<sub>1</sub> mapping |
|
|
|
5 June 2023 |
paper.bib |
We present the results of the ISMRM 2020 joint Reproducible Research and Quantitative MR study groups reproducibility challenge on T1 mapping in phantom and human brain. T1 mapping, a widely used quantitative MRI technique, exhibits inconsistent tissue-specific values across protocols, sites, and vendors. The challenge aimed to assess the reliability of an independently-implemented image acquisition protocol using inversion recovery in a standardized phantom and healthy human brains. Participants acquired T1 mapping data from MRIs of three manufacturers at 3T, resulting in 38 phantom datasets and 56 datasets from healthy human subjects. The robust imaging protocol and fitting algorithm demonstrated good reproducibility in both phantom and human brain T1 measurements. However, variations in implementation led to higher variance in reported values compared to intra-submission variability. This challenge resulted in the creation of a comprehensive open database of T1 mapping acquisitions, accessible at osf.io/ywc9g/, and an interactive dashboard for wider community access and engagement.
T1 mapping is a widely used quantitative MRI technique that provides valuable information about tissue properties. However, the field faces a significant challenge due to the inconsistency of tissue-specific T1 values across different imaging protocols, sites, and vendors. This inconsistency hampers the comparability and reliability of T1 measurements, limiting their utility in both research and clinical applications. To address this critical issue, the ISMRM Reproducible Research study group (RRSG) and Quantitative MR study group (qMRSG) collaborated to launch the T1 mapping reproducibility challenge.
The primary objective of the challenge was to investigate whether independently-implemented image acquisition protocols at multiple centers could reliably measure T1 using inversion recovery in a standardized phantom and in the brains of healthy volunteers. By evaluating the reproducibility of a well-established T1 mapping protocol and fitting algorithm from a reputable publication [@Barral2010-qm], the challenge aimed to identify sources of variability and establish best practices for achieving consistent and accurate T1 measurements.
A diverse group of participants was invited to acquire T1 mapping data on a standard ISMRM/NIST phantom and/or in healthy human brains using MRI scanners from three different manufacturers (Siemens, GE, Philips) operating at 3T, with one submission acquired at 0.35T. To enhance reproducibility and transparency, data submission, pipeline development, and analysis were conducted using open-source platforms. For participants collecting data at multiple sites, both inter-submission and intra-submission comparisons were performed by selecting one dataset per submission.
The results of the challenge were promising, with a total of 18 submissions accepted, consisting of 38 phantom datasets and 56 datasets from healthy human subjects. The mean coefficient of variation (CoV) for inter-submission phantom measurements was 6.1%, nearly twice as high as the evaluated intra-submission CoV of 2.9%. A similar trend was observed in the human data, where the inter-submission CoV for the genu was 6.0% compared to the intra-submission CoV of 2.9%, and for the cortical gray matter, the inter-submission CoV was 16% while the intra-submission CoV was 6.9%.
The evaluation of the imaging protocol and fitting algorithm based on @Barral2010-qm demonstrated good reproducibility of both phantom and human brain T1 measurements. However, variations in the implementation of the protocol among the submissions led to higher variance in reported values relative to the intra-submission variability. This finding underscores the importance of standardized protocols and consistent implementation to ensure reliable and comparable T1 measurements across different imaging centers.
One of the major outcomes of the challenge was the creation of a large open database of inversion recovery T1 mapping acquisitions, which encompasses data acquired from multiple sites and MRI vendors. This database, accessible at osf.io/ywc9g/, holds significant value for the wider research community, enabling researchers to explore and engage with a comprehensive collection of T1 mapping data. To further facilitate access and utilization of the dataset, an interactive dashboard (\autoref{fig:dashboard}) was developed, accessible at https://rrsg2020.db.neurolibre.org.
Overall, this T1 mapping reproducibility challenge fills a critical need in the field by addressing the inconsistency of T1 values across different protocols, sites, and vendors. The findings and resources generated through this challenge will contribute to the standardization and improvement of T1 mapping techniques, promoting greater reliability and comparability of T1 measurements. Ultimately, these advancements will enhance the accuracy and clinical relevance of T1 mapping in various research and clinical applications, fostering advancements in precision medicine and improving patient care.
The conception of this collaborative reproducibility challenge originated from discussions with experts, including Paul Tofts, Joëlle Barral, and Ilana Leppert, who provided valuable insights. Additionally, Kathryn Keenan, Zydrunas Gimbutas, and Andrew Dienstfrey from NIST provided their code to generate the ROI template for the ISMRM/NIST phantom. Dylan Roskams-Edris and Gabriel Pelletier from the Tanenbaum Open Science Institute (TOSI) offered valuable insights and guidance related to data ethics and data sharing in the context of this international multi-center conference challenge. The 2020 RRSG study group committee members who launched the challenge, Martin Uecker, Florian Knoll, Nikola Stikov, Maria Eugenia Caligiuri, and Daniel Gallichan, as well as the 2020 qMRSG committee members, Kathryn Keenan, Diego Hernando, Xavier Golay, Annie Yuxin Zhang, and Jeff Gunter, also played an essential role in making this challenge possible. Finally, we extend our thanks to all the volunteers and individuals who helped with the scanning at each imaging site.
\awesomebox[red]{2pt}{\faExclamationCircle}{red}{\textbf{NOTE}}
The following section in this document repeats the narrative content exactly as found in the corresponding NeuroLibre Reproducible Preprint (NRP). The content was automatically incorporated into this PDF using the NeuroLibre publication workflow [@Karakuzu2022-nlwf] to credit the referenced resources. The submitting author of the preprint has verified and approved the inclusion of this section through a GitHub pull request made to the source repository from which this document was built. Please note that the figures and tables have been excluded from this (static) document. To interactively explore such outputs and re-generate them, please visit the corresponding NRP. For more information on integrated research objects (e.g., NRPs) that bundle narrative and executable content for reproducible and transparent publications, please refer to @Dupre2022-iro. NeuroLibre is sponsored by the Canadian Open Neuroscience Platform (CONP) [@Harding2023-conp].
Significant challenges exist in the reproducibility of quantitative MRI (qMRI) [@Keenan2019-ni]. Despite its promise of improving the specificity and reproducibility of MRI acquisitions, few qMRI techniques have been integrated into clinical practice. Even the most fundamental MR parameters cannot be measured with sufficient reproducibility and precision across clinical scanners to pass the second of six stages of technical assessment for clinical biomarkers [@Fryback1991-sy; @Schweitzer2016-fl; @Seiberlich2020-xe]. Half a century has passed since the first quantitative T1 (spin-lattice relaxation time) measurements were first reported as a potential biomarker for tumors [@Damadian1971-sc], followed shortly thereafter by the first in vivo quantitative T1 maps [@Pykett1978-mk] of tumors, but there is still disagreement in reported values for this fundamental parameter across different sites, vendors, and implementations [@stikov2015].
Among fundamental MRI parameters, T1 holds significant importance [@Boudreau2020-jf]. It represents the time it takes for the longitudinal magnetization to recover after being disturbed by an RF pulse. The T1 value varies based on molecular mobility and magnetic field strength [@Bottomley1984-qx; @Dieringer2014-qz; @Wansapura1999-tf], making it a valuable parameter for distinguishing different tissue types. Accurate knowledge of T1 values is essential for optimizing clinical MRI pulse sequences for contrast and time efficiency [@Ernst1966-pp; @Redpath1994-sb; @Tofts1997-ln] and as a calibration parameter for other quantitative MRI techniques [@Sled2001-fz; @Yuan2012-xh]. Among the number of techniques to measure T1, inversion recovery (IR) [@Drain1949-yk; @Hahn1949-wf] is widely held as the gold standard technique, as it is robust against other effects (e.g. B1 inhomogeneity) and potential errors in measurements (e.g. insufficient spoiling) [@stikov2015]. However, because the technique requires a long repetition time (TR > T1), it is very slow and impractical for whole-organ measurements, limiting its clinical use. In practice, it is mostly used as a reference to validate other T1 mapping techniques, such as variable flip angle imaging (VFA) [@Cheng2006-qe; @Deoni2003-qc; @Fram1987-jj], Look-Locker [@Look1970-no; @Messroghli2004-iv; @Piechnik2010-be], and MP2RAGE [@Marques2010-po; @Marques2013-yg].
Efforts have been made to develop quantitative MRI phantoms to assist in standardizing T1 mapping methods [@Keenan2018-px]. A quantitative MRI standard system phantom was created in a joint project between the International Society for Magnetic Resonance in Medicine (ISMRM) and the National Institute of Standards and Technology (NIST) [@Stupic2021-hu], and has since been commercialized (Premium System Phantom, CaliberMRI, Boulder, Colorado). The spherical phantom has a 57-element fiducial array containing spheres with doped liquids that model a wide range of T1, T2, and PD values. The reference values of each sphere were measured using NMR at 3T [@Stupic2021-hu]. The standardized concentration for relaxometry values established as references by NIST are also used by another company for their relaxometry MRI phantoms (Gold Standard Phantoms Ltd., Rochester, England). The cardiac TIMES phantom [@Captur2016-xn] is another commercially available system phantom focusing on T1 and T2 values in blood and heart muscles, pre- and post-contrast. The ISMRM/NIST phantom has been used in several large multicenter studies already, for example in [@Bane2018-wt] where they compared measurements at eight sites on a single ISMRM/NIST phantom using the inversion recovery and VFA T1 mapping protocols recommended by NIST, as well as some site-specific imaging protocols used for dynamic contrast enhanced (DCE) imaging. @Bane2018-wt concluded that the acquisition protocol, field strength, and T1 value of the sample impacted the level of accuracy, repeatability, and interplatform reproducibility that was observed. In another study led by NIST researchers [@Keenan2021-ly], T1 measurements were done at two clinical field strengths (1.5T and 3.0 T) and 27 MRI systems (three vendors) using the recommended NIST protocols. That study, which only investigated phantoms, found no significant relationship between T1 discrepancies of the measurements and the MRI vendors used.
The 2020 ISMRM reproducibility challenge 1 posed the following question:
Will an imaging protocol independently-implemented at multiple centers reliably measure what is considered one of the fundamental MR parameters (T1) using the most robust technique (inversion recovery) in a standardized phantom (ISMRM/NIST system phantom) and in the healthy human brain?
More broadly, this challenge aimed at assessing the reproducibility of a qMRI method presented in a seminal paper, [@Barral2010-qm], by evaluating the variability in measurements observed by different research groups that implemented this imaging protocol. As the focus of this challenge was on reproducibility, the challenge design emphasized the use of reproducible research practices, such as sharing code, pipelines, data, and scripts to reproduce figures.
The phantom portion of the challenge was launched for those with access to the ISMRM/NIST system phantom [@Stupic2021-hu] (Premium System Phantom, CaliberMRI, Boulder, Colorado). Two versions of the phantom have been produced with slightly different T1/T2/PD values in the liquid spheres, and both versions were used in this study. Phantoms with serial numbers 0042 or less are referred to as “Version 1”, and those with 0043 or greater are “Version 2”. The phantom has three plates containing sets of 14 spheres for ranges of proton density (PD), T1 (NiCl2), and T2 (MnCl2) values. Reference T1 values at 20 °C and 3T for the T1 plate are listed in {numref}table1
for both versions of the phantom. Researchers that participated in the challenge were instructed to record the temperature before and after scanning the phantom using the phantom's internal thermometer. Instructions for positioning and setting up the phantom were devised by NIST after they had designed the phantom (prior to the challenge), and were provided to participants through the NIST website 2. In brief, instructions included details about how to orient the phantom consistently at different sites, and how long the phantom should be in the scanner room prior to scanning so that a thermal equilibrium was achieved prior to scanning.
Participants were also instructed to collect T1 maps in the brains of healthy human participants, if possible. To ensure consistency across datasets, single-slice positioning parallel to the anterior commissure - posterior commissure (AC-PC) line was recommended. Before the scanning process, the participants granted their consent 3 to share their de-identified data openly with the challenge organizers and on the website Open Science Framework (OSF.io). As the submitted single-slice inversion recovery images would be along the AC-PC line, they are unlikely to contain sufficient information for facial identification, and therefore participants were not instructed to de-face their data. The researchers who submitted human data for the challenge provided written confirmation to the organizers that their data was acquired in accordance with their institutional ethics committee (or equivalent regulatory body) and that the subjects had consented to sharing their data as described in the challenge.
Participants were instructed to acquire the T1 mapping data using the spin-echo inversion recovery protocol for T1 mapping as reported in [@Barral2010-qm], and detailed in {numref}table2
. This protocol uses four inversion times optimized for human brain T1 values and uses a relatively short TR (2550 ms). It is important to note that this acquisition protocol is not suitable for T1 fitting models that assume TR > 5T1. Instead, more general models of inversion recovery, such as the @Barral2010-qm fitting model described in Section 2.4.1, can be used to fit this data.
Researchers who participated in the challenge were advised to adhere to this protocol as closely as possible, and to report any differences in protocol parameters due to technical limitations of their scanners and/or software. It was recommended that participants submit complex data (magnitude and phase, or real and imaginary), but magnitude-only data was also accepted if complex data could not be conveniently exported.
Data submissions for the challenge were managed through a dedicated repository on GitHub, accessible at https://github.com/rrsg2020/data_submission. This allowed transparent and open review of the submissions, as well as standardization of the process. All datasets were converted to the NIfTI file format, and images from different TIs needed to be concatenated into the fourth (or “time”) dimension. Magnitude-only datasets required one NIfTI file, while complex datasets required two files (magnitude and phase, or real and imaginary). Additionally, a YAML (*.yaml
) configuration file containing submission, dataset, and acquisition details (such as data type, submitter name and email, site details, phantom or volunteer details, and imaging protocol details) was required for each submitted dataset to ensure that the information was standardized and easily found. Each submission was reviewed to confirm that guidelines were followed, and then datasets and configuration files were uploaded to OSF.io (osf.io/ywc9g). A Jupyter Notebook [@Beg2021-ps; @Kluyver2016-nl] pipeline was used to generate T1 maps using qMRLab [@Cabana2015-zg; @Karakuzu2020-ul] and quality-check the datasets prior to accepting the submissions; in particular, we assured that the NiCL2 array was imaged, that the DICOM images were correctly converted to NIfTI, the each images for each acquired TI were not renormalized (in particular, Philips platforms have different image export options that changes how the images are scaled, and a reconversion was necessary to ensure proper scaling for quantitative imaging in some cases – see the submissions GitHub issue #5 for one example 4) for the purposes of quality assurance. Links to the Jupyter Notebook for reproducing the T1 map were shared using the MyBinder platform in each respective submission GitHub issue, ensuring that computational environments (eg, software dependencies and packages) could be reproduced to re-run the pipeline in a web browser.
A reduced-dimension non-linear least squares (RD-NLS) approach was used to fit the complex general inversion recovery signal equation:
where a and b are complex constants. This approach, introduced in , models the general T1 signal equation without the long-TR approximation. The a and b constants inherently factor TR in them, as well as other imaging parameters (eg, excitation and refocusing flip angles, TE, etc). @Barral2010-qm shared the implementation of the fitting algorithm used in their paper ^their-paper. Magnitude-only data were fitted to a modified-version of [1] (Eq. 15 of @Barral2010-qm) with signal-polarity restoration. To facilitate its use in our pipelines, a wrapper was implemented around this code available in the open-source software qMRLab [@Cabana2015-zg; @Karakuzu2020-ul], which provides a commandline interface (CLI) to call the fitting in MATLAB/Octave scripts.
A Jupyter Notebook data processing pipeline was written using MATLAB/Octave. This pipeline automatically downloads all the datasets from the data-hosting platform osf.io (osf.io/ywc9g), loads each dataset configuration file, fits the T1 data voxel-wise, and exports the resulting T1 map to the NIfTI and PNG formats. This pipeline is available in a GitHub repository (https://github.com/rrsg2020/t1_fitting_pipeline, filename: RRSG_T1_fitting.ipynb
). Once all submissions were collected and the pipeline was executed, the T1 maps were uploaded to OSF (osf.io/ywc9g).
A schematic of the phantom is shown in Figure 1-a. The T1 plate (NiCl2 array) of the phantom has 14 spheres that were labeled as the regions-of-interest (ROI) using a numerical mask template created in MATLAB, provided by NIST researchers (Figure 1-b). To avoid potential edge effects in the T1 maps, the ROI labels were reduced to 60% of the expected sphere diameter. A registration pipeline in Python using the Advanced Normalization Tools (ANTs) {cite}Avants2009-cw
was developed and shared in the analysis repository of our GitHub organization (https://github.com/rrsg2020/analysis, filename: register_t1maps_nist.py
, commit ID: 8d38644
). Briefly, a label-based registration was first applied to obtain a coarse alignment, followed by an affine registration (gradientStep: 0.1, metric: cross correlation, number of steps: 3, iterations: 100/100/100, smoothness: 0/0/0, sub-sampling: 4/2/1) and a BSplineSyN registration (gradientStep:0.5, meshSizeAtBaseLevel:3, number of steps: 3, iterations: 50/50/10, smoothness: 0/0/0, sub-sampling: 4/2/1). The ROI labels template was nonlinearly registered to each T1 map uploaded to OSF.
Figure 1 Schematic of the system phantom (a) used in this challenge. Reproduced and cropped from Stupic et al. 2021 (Stupic et al. 2021) (Creative Commons CC BY license). ROI selection for the ISMRM/NIST phantom (b) and the human brain (c). b) The 14 phantom ROIs (shades of blue/green) were automatically generated using a script provided by NIST. In yellow are the three reference pins in the phantom, i.e. these are not ROIs or spheres. c) Human brain ROIs were manually segmented in four regions: the genu (yellow, 5⨯5 voxels), splenium (green, 5⨯5 voxels), deep gray matter (blue, 5⨯5 voxels), and cortical gray matter (red, three sets of 3⨯3 voxels). Note: due to differences in slice positioning from the single-slice datasets provided by certain sites, for some datasets it was not possible to manually segment an ROI in the genu or deep gray matter. In the case of the missing genu, left or right frontal white matter (WM) was selected; for deep gray matter (GM), it was omitted entirely for those cases.
Manual ROIs were segmented by a single researcher (M.B., 11+ years of neuroimaging experience) using FSLeyes [@McCarthy2019-qd] in four regions for the human datasets Figure 1-c): located in the genu, splenium, deep gray matter, and cortical gray matter. Automatic segmentation was not used because the data were single-slice and there was inconsistent slice positioning between datasets.
Analysis code and scripts were developed and shared in a version-tracked public GitHub repository ^public-repo. T1 fitting and main data analysis was performed for all datasets by one of the challenge organizers (M.B.). Python-based Jupyter Notebooks were used for both the quality assurance and main analysis workflows. The computational environment requirements were containerized in Docker [@Boettiger2015-vd; @Merkel2014-cu], allowing for an executable environment that can reproduce the analysis in a web browser through MyBinder 5 [@Jupyter-2018]. Python scripts handled reference data, database handling, ROI masking, and general analysis tools, while configuration files managed the dataset information which were downloaded and pooled using a script (make_pooled_datasets.py
). The databases were created using a reproducible Jupyter Notebook and subsequently saved in the repository.
For the ISMRM/NIST phantom data, mean T1 values for each ROI were compared with temperature-corrected reference values and visualized in three different types of plots (linear axes, log-log axes, and error relative to the reference value). This comparison was repeated for individual measurements at each site and for all measurements grouped together. Temperature correction was carried out via nonlinear interpolation 6 of the set of reference NIST T1 values between 16 °C and 26 °C (2 °C intervals), listed in the phantom technical specifications. For the human datasets, a notebook was created to plot the mean and standard deviations for each tissue ROI from all submissions from all sites. All quality assurance and analysis plot images were saved to the repository for ease-of-access and a timestamped version-controlled record of the state of the analysis figures. The database files of ROI values and acquisition details for all submissions were also saved to the repository.
An interactive dashboard 7 was developed in Dash by Plotly (Plotly Technologies Inc. 2015) and hosted by NeuroLibre [@Karakuzu2022-nlwf] to enable real-time exploration of the data, analysis, and statistics of the challenge results. The dashboard reports descriptive statistics for a variety of alternative looks at phantom and brain data, as well as some statistical comparisons (e.g., the hierarchical shift function 8). The data was collected from the pre-prepared databases of masked ROI values and incorporated other database information, such as phantom version, temperature, MRI system, and reference values. The interactive dashboard displays these results for all measurements at all sites.
The challenge focused on exploring the reproducibility of the gold standard inversion recovery T1 mapping method reported in a seminal paper [@Barral2010-qm]. Eighteen submissions independently implemented the inversion recovery T1 mapping acquisition protocol as outlined in @Barral2010-qm(which is optimized for the T1 values observed in brain tissue), and reported T1 mapping data in a standard quantitative MRI phantom and/or human brains at 27 MRI sites, using systems from three different vendors (GE, Philips, Siemens). The collaborative effort produced an open-source database of 94 T1 mapping datasets, including 38 ISMRM/NIST phantom and 56 human brain datasets. A standardized T1 processing pipeline was developed for different dataset types, including magnitude-only and complex data. Additionally, Jupyter notebooks that can be executed in containerized environments were developed for quality assurance, visualization, and analyses. An interactive web-based dashboard was also developed to allow for easy exploration of the challenge results in a web-browser.
To evaluate the accuracy of the resulting T1 values, the challenge used the standard ISMRM/NIST phantom with fiducial spheres having T1 values in the range of human brain tissue, from 500 to 2000 ms (see Figure 5). As anticipated for this protocol, there was a decrease in the accuracy in measurements for spheres with T1 below 300 ms. Overall, the majority of the independently implemented imaging protocols from various sites are consistent with the temperature-corrected reference values, with only a few exceptions. Using the NIST phantom, we report that sites that independently implemented the imaging protocol resulted in an inter-submission mean CoV (6.1 %) that was twice as high as the intra-submission mean CoV measured at seven sites (2.9 %). A similar trend was observed in vivo. Inter-submission CoV for WM (genu) was 6.0 % and for GM (cortex) was 16.5 % vs the intra-submission CoV that was 2.9 % and 6.9%, with generally higher CoVs relative to the phantom measurements likely due to biological variability [@Piechnik2013-xl; @Stanisz2005-qg].
The work done during this challenge involved a multi-center quantitative T1 mapping study using the NIST phantom across various sites. This work overlaps with two recent studies [@Bane2018-wt; @Keenan2021-ly]. @Bane2018-wt focused on the reproducibility of two standard quantitative T1 techniques (inversion recovery and variable flip angle) and a wide variety of site-specific T1 mapping protocols for DCE, mostly VFA protocols with fewer flip angles, which were implemented at eight imaging centers covering the same 3 MRI vendors featured in this challenge (GE/Philips/Siemens). The inter-platform coefficient of variation for the standard inversion recovery T1 protocol was 5.46% at 3 T in [@Bane2018-wt], which was substantially lower than what they observed for their standard VFA protocol (22.87%). However, Bane et al.’s work differed from the challenge in several ways. First, the standard imaging protocol for inversion recovery used by @Bane2018-wt had more inversion times (14 compared to the challenge’s 4) to cover the entire range of T1 values of the phantom. Secondly, @Bane2018-wt used a single traveling phantom for all sites, whereas the challenge used a total of 8 different phantoms (some were shared amongst people who participated independently). Thirdly, @Bane2018-wt averaged the signals within each ROI of each sphere prior to fitting for the T1 values, whereas the challenge pipeline fits the T1 values on a per-voxel basis and only subsequently calculates the mean/median/std. They also only acquired magnitude data, in contrast to the challenge where participants were encouraged to submit both complex and magnitude-only data. Lastly, in @Bane2018-wt, the implementations of the common inversion recovery protocols were fully standardized (full protocol) across all the platforms (except for two cases where one manufacturer couldn’t achieve the lowest TI) and imposed and coordinated by the principal researchers. In contrast, the challenge sought to explore the variations that would occur for a less-restricted protocol (Table 2) that is independently-implemented at multiple centers, which more closely emulates the quantitative MR research flow (publication of a technique and protocol → independently implement the pulse sequence and/or protocol → use the new implementation independently in a study → publish). Of note, in the challenge, one participating group coordinated a large multicenter dataset that mirrors the study by @Bane2018-wt by imaging a single phantom across 7 different imaging sites, albeit doing so on a single manufacturer. Using this subset, the mean cross-site CoV was 2.9 % (range: 1.6 - 4.9 %) for the first five spheres, which is in agreement with the range of observations for all spheres by Bane et al. (Bane et al. 2018) at 3T using their full inversion recovery protocol (CoV = 5.46 %; range: 0.99 - 14.6 %).
Another study by @Bane2018-wt; @Keenan2021-ly also investigated the accuracy of T1 mapping techniques using a single ISMRM/NIST system phantom at multiple sites and on multiple platforms. Like @Bane2018-wt they used an inversion recovery imaging protocol optimized for the full range of T1 values represented in the ISMRM/NIST phantom, which consisted of 9 to 10 inversion times and a TR of 4500 ms (TR ~
5T1 of WM at 3T). They reported no consistent pattern of differences in measured inversion recovery T1 values across MRI vendors for the two T1 mapping techniques they used (inversion recovery and VFA). They observed relative errors between their T1 measurements and the reference values of the phantom to be below 10% for all T1 values and the larger errors were observed at the lowest and highest T1 values of the phantom.
There are some important things to note about this challenge. Firstly, the submissions for this challenge were due in March 2020, which was impacted by the COVID-19 pandemic lockdowns, thereby reducing repeated experiments due to access limitations. Nevertheless, a substantial number of participants submitted their datasets. Some groups intended on acquiring more data, and others intended on re-scanning volunteers, but could no longer do so due to local pandemic restrictions.
This reproducibility challenge aimed to compare differences between independently-implemented protocols. Crowning a winner was not an aim of this challenge, due to concerns that participants would have changed their protocols to get closer to the reference T1 values, leading to a broader difference in protocol implementations across MRI sites. Instead, we focused on building consensus by creating an open data repository, sharing reproducible workflows, and presenting the results through interactive visualizations. Future work warrants the study of inter-site differences in a vendor-neutral workflow [@Karakuzu2022-venus] by adhering to the latest Brain Imaging Data Structure (BIDS) community data standard on qMRI [@Karakuzu2022-bids].
Footnotes
-
The website provided to the participants has since been removed from the NIST website. ↩
-
This website was provided as a resource to the participants for best practices to obtain informed consent for data sharing. ↩
-
https://mybinder.org/v2/gh/rrsg2020/analysis/master?filepath=analysis ↩
-
The T1 values vs temperature tables reported by the phantom manufacturer did not always exhibit a linear relationship. We explored the use of spline fitting on the original data and quadratic fitting on the log-log representation of the data, Both methods yielded good results, and we opted to use the latter in our analyses. The code is found here, and a Jupyter Notebook used in temperature interpolation development is here. ↩
-
Interactive dashboard: https://rrsg2020.db.neurolibre.org, code repository: https://github.com/rrsg2020/rrsg2020-dashboard ↩
-
The hierarchical shift function compares distributions throughout their range across multiple dependent measurements. More information can be found in this article, [@Wilcox2023-jf], and in this blog post. ↩