-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss BEP021 derivatives for electrophys #5
Comments
On 12 February 2021 we had a Zoom call to discuss the progress on BEP021 for which the draft is on http://bids.neuroimaging.io/bep021 on google docs. Some of the people that attended were @arnodelorme, @dorahermes, @guiomar, @jasmainak, @agramfort, but do not know the github handle of all. Please help to attend the others with a github presence by mentioning them here. |
@jasmainak mentioned
|
The pull requests bids-standard/bids-examples#171 and bids-standard/bids-examples#161 refer to “derivatives”, but looking at the dataset at https://github.com/bids-standard/bids-examples/tree/master/eeg_face13 there is nothing that sets it apart from a regular raw BIDS EEG dataset. An example dataset that is a proper derivatives is https://github.com/bids-standard/bids-examples/tree/master/ds000001-fmriprep. Note, however, that there are quite some files in that derived dataset that have to be ignored as they are not standardized (yet). Just some general pointers that relate to some topics we discussed: On https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#derived-dataset-and-pipeline-description it is specified that a derived dataset is also a BIDS dataset. The derived dataset has some required and some recommended extra elements in the dataset_description. When you prepare an example/draft derived ieeg/eeg/meg dataset, please keep these in mind. Also, on https://bids-specification.readthedocs.io/en/stable/05-derivatives/01-introduction.html it states "Derivatives are outputs of common processing pipelines, capturing data and meta-data sufficient for a researcher to understand and (critically) reuse those outputs in subsequent processing.” That lines up with what Scott and I were saying that a (derived) dataset should be (re)usable in its own right. |
ping @adam2392 @hoechenberger |
@guimar wrote: Hi! I copy here the email too :) Thanks everyone for the meeting today! We thought that for the next meeting some of us could present some practical examples so we can have a starting point to discuss. Here's the link to the common derivatives specs merged in 1.4.0:
In line with what we have discussed. Also note that there are some metadata fields to point to source data: It's important to distinguish between two types of derivatives: pre-processed (in essence they are similar to source data - datatype remains unchanged) In this logic I don't know where annotations may exactly fall. But they are also an important step (and I think that can be also interesting for other modalities and extensions). Talk very soon! |
and @sappelhoff wrote that
|
Thank you, Robert, for this summary. We will generate test datasets as discussed and then we should reconvene. |
Hi @sappelhoff @robertoostenveld @arnodelorme, @dorahermes, @jasmainak, @agramfort @hoechenberger @smakeig @nucleuscub @CPernet ! I want to retake the effort on ephys derivatives. Having a new look at the document, I see there are 3 main blocks:
I would incline to divide the work into independent lines, to avoid getting stuck due to the big amount of work ahead. Would you like to meet and move this part forward? |
Preprocessing sounds good to me, I'm not sure about the point in the parens though ("derivatives that doesn't change datatype"). To me, preprocessing also includes epoching the data, which will create new data types too. But I'm also happy to just limit the next discussion to continuous data only (Maxwell-filtering, frequency filtering, …) I will share some datasets we process using our BIDS pipeline shortly so you all can see how we're currently approaching things. Cheers, cc @agramfort |
Thanks @hoechenberger !! I think this definition comes from the generic derivatives specification, but I'm not able to find it anymore. This sounds awesome Richard! Thanks! |
I have read over the BEP021 derivatives and also added my availability, although I'll be in California at the time, so could possibly be hard to overlap. I also utilized the "annotations" derivatives framework to create a dataset of event markings found in the iEEG via an automated algorithm. Specifically, these are "high-frequency oscillation" (HFO) onset/duration/channels stored as a tsv file. It works well for my use-case, but the tsv file does "explode" in length. Dataset is in dropbox link. https://www.dropbox.com/sh/5ih5ay9fvo3q12s/AADBY5eDc_SmszHGyC3Mn6QJa?dl=0 |
I'm hammered that week. Unlikely to be able to join :( I agree about the "explosion" issue that @adam2392 pointed out. I think I've seen it before. Probably it might help to limit the vocabulary of annotations so it is machine-readable not just machine-writeable. |
Hi @adam2392! Awesome! We can review the annotations with your example as well :) |
It seems the most preferred day to meet is Wed 15th Dec from 5 to 6pm CET. |
Thank you, @guiomar! |
For those who didn't receive the invitation and still wan to join, this is the link to hangouts: bep021 - ephys derivatives |
Thank you all who joined yesterday! I would like to sumarize here some of the main points we discussed in the meeting: 1) Annotations:
2) Preprocessing steps:
I have dedicated some time to reorganize the BEP021 documentation accordingly, since it was becoming very messy: Please, add any other point I may have forgoten and you considered important :) We planned to do another meeting in January to show some examples and continue further discussing the remaining issues. Talk soon! |
Thnaks @smakeig! I can't see any attachment, maybe it's easier if you share the links? |
Hello @sappelhoff @robertoostenveld @arnodelorme, @dorahermes, @jasmainak, @agramfort @hoechenberger @smakeig @nucleuscub @CPernet @adam2392 @tpatpa! We planned to do another meeting these days, if you are interested, you can mark your availabilities here: |
Thanks a lot! Let's make it Tuesday 25th January at 1:00pm CET |
Details for joining: bep021 - ephys derivatives |
Let me repost the content of an email here that I already sent to the invitees of the most recent BEP021 meeting. I'll reformat it slightly. Having the discussion here rather than via email keeps it better in the open for everyone to follow and contribute. Following the discussion of 25 Jan, in which we updated the BEP021 google doc to indicate what is in and out of scope, I have worked on some example pipelines and derivatives corresponding to sections 6.2, 6.3, 6.4, 6.5 and 6.6 in the google doc. I started with doi:10.18112/openneuro.ds003645.v1.0.0, downloaded it (partially) and wrote a script to make a selection. I do not consider this part of the pipeline yet (although it could have been), so my starting point is “ds003645_selection” (selection code included). Starting from “ds003645_selection", I ran the following pipelines
This results in 6 derivatives (indicated in bold, also below). The code for each is in the respective “code” directory. Note that there are more lines of code needed for data handling than for the actual pipeline (which I implemented with FieldTrip). The resulting directory tree (only showing directories, not the files) looks like this (see below) and can be browsed on google drive (no login needed, view only).
As discussed yesterday, your review of these examples can be used to get concretre ideas what needs to be done to extend the specification. Note that AFAIK I have now created derivative datasets that are compliant according to BIDS version 1.6.0. I did not run the validator (as it does not work yet on derivatives). Some known issues at the moment I created the derivatives:
|
@arnodelorme wrote in a reply to my email
|
Looking at the nested derivatives data structure that I created, I realize two aspects. These are more fundamental that the discussion on which specific metadata fields are needed (like matlab and fieldtrip version, to which I agree).
Regarding 1: it makes sense (and is needed) if each file were processed differently. But along the processing pipeline we usually make raw data that might be inconsistent at the individual participant level more and more consistent. If the same pipeline is used on all datafiles, then documenting the pipeline metadata once should suffice. Note that documenting it along with the data is like Regarding 2: Assuming that we would only store provenance of the last step, then in my example the metadata of My item 2 also relates to what Arno mentions w.r.t. pointing in the derivative to its source. He discusses it in relation to DOIs and hence published/persistent datasets, whereas I was in creating the examples not thinking about publishing and hence more thinking about referencing the local (on my hard disk or my labs network disk) relation between datasets. This also relates to PR bids-standard/bids-specification#820 and bids-standard/bids-specification#821. I think we all have some implicit expectation about how people (including ourselves) work and when to start a new derivative, or when to "weave" additional files into an existing derivatives. In general my directory organization looks like this
and as long as I keep on working on the pipelines in the code directory, the intermediate and final files continue to go into the results directory (which has some internal directory structure similar to BIDS). Once the analysis is finalized, I would clean up the code and results (prune dead branches, rename files for improved consistency, possibly rerun it all once more to ensure consistency and reproducibility) and consider publishing the results+code directories together. Those would then comprise the BIDS derivative. An example is this with raw and derivatives data, plus a copy of the code on github (a static version of the code is also included in the derivative). The example that I prepared is however at a larger collaborative (and more FAIR) scale, where Daniel and Rik prepared the initial multimodal data |
I have not looked in details but browsing the google drive I see that the entity |
From the spec (emphasis mine): "The proc label is analogous to rec for MR and denotes a variant of a file that was a result of particular processing performed on the device. This is useful for files produced in particular by Elekta’s MaxFilter (for example, sss, tsss, trans, quat or mc), which some installations impose to be run on raw data because of active shielding software corrections before the MEG data can actually be exploited." The existing entities My example pipelines only produced a single result per raw input file; the sequential application served to have us think about what happens if we pass derivatives around between each other or on openneuro. If you have a pipeline that produces multiple results (which also makes sense), then those can be placed next to each other. I imagine that could result in
where the description (hence
where I don't think that we will benefit from long file names with specific entities for specific pipelines (such as An important difference between |
Yesterday we have a BIDS steering group meeting (@guiomar also attended) and it was mentioned there that BEP028 is making good progress. I will study that, you might want to look at that as well. Furthermore, we also shortly touched upon the two tangential motivations for BIDS in making the results of analysis replicable (e.g. be able to recompute) and making raw or derived data reusable (for follow-up analyses). The first requires extensive details, the second can also be achieved with minimal metadata. Also (as I was reminded in the meeting yesterday), the overarching BIDS strategy is to keep things as simple and small as possible, and we consider the 80/20 pareto principle. |
Hello All, With the help of @CPernet and @jesscall, @jadesjardins and I have prepared an example of the Face13 dataset with annotations stored in .tsv files and described in .json files. This current example is for discussion surrounding how to store continuous time annotations. These files are located within the bids-examples/eeg_face13/derivatives/BIDS-Lossless-EEG/sub-*/eeg folders. The annotations in this example were produced by the EEG-IP-L pipeline. There are several different types of annotations from this pipeline, including channel annotations, component annotations, binary time annotations and non-binary (continuous) time annotations. The EEG-IP-L pipeline currently produces an annotations.json, annotations.tsv, and annotations.mat file. The .json describes all of the pipeline annotations. The .tsv contains the channel, component and binary time annotations. The .mat file contains the continuous time annotations. Since the .mat file is not a part of the BIDS specification, this current example has added a ‘recording-marks_annotation.tsv.gz’ and an accompanying 'recording-marks_annotation.json' for continuous time annotations. The 'recording-marks_annotation.tsv.gz' and the .json file were created based on the BIDS spec for storing physiological and other continuous recordings. If we are to store continuous time annotations in a tsv file, one concern we have is the need for two annotations tsv files because the non-binary time annotations are stored differently than the binary time annotations, component annotations, and channel annotations. As all of these annotation types are important for the EEG-IP-L pipeline, we are looking forward to some suggestions around how they can best be stored in BIDS. Thanks, |
Thanks @SaraStephenson! To help others that want to look at it on their own computer: I just did this to get the changes (which are on a branch that is 160 commits behind and 1 commit ahead of HEAD)
|
YES Robert! thx -- I'll also have a look as well |
@SaraStephenson Let me comment on what I encounter while going through the data. first in derivatives/BIDS-Lossless-EEG
The Rather than linking to https://jov.arvojournals.org/article.aspx?articleid=2121634 I recommend to link to https://doi.org/10.1167/13.5.22 The file The electrode files are identical for all subjects; that suggests that they are not measured but a template. It is not recommended to add template data to the individual subjects. If you want to add a single template to all subjects, better put it at the top level (i.e. following the inheritance principle). The IntendedFor path is inconsistent between It is not clear to me what the difference is between I don't think that the I don't think that EDF is the optimal binary format for processed EEG data. EDF is limited to 16 bits, whereas the data was recorded with 24 bit (since Biosemi) and subsequently processed as single or even double precision. I recommend writing to the BrainVision format, that allows single precision floats to be represented. Or to EEGLAB .set. The way you coded two things in the annotations.tsv files to me appear to be nearly orthogonal and not relating to each other: the first few rows (with chan_ and comp_) don't relate to onset and duration, and the latter rows don't relate to channels. Each row has a label, but the chan_xxx and comp_xxx labels appear to be very different from all others. Would it not be better to have those in two TSV files? Or possibly even three: desc-chan_annotations.tsv, desc-comp_annotations.tsv and desc-task_annotations.tsv file? I am not sure (cannot check, since zero bytes) what is in the mat files. There is a Not related to the derivative, but I noticed a typo in Moving on to BIDS-Seg-Face13-EEGLAB:Again duplicate README files. Again PipelineDescription.Version being unfindable on github. Also here the dataset_description could contain more info. There is no license (should be ODbL, since derivatives from the original data should be share-alike). I cannot review anything else at this level any more (since only binary files), which is not a problem per see. |
I think the discussion is fruitful. There is the issue of annotation and then there is the issue of derivatives data structures
Let me address the issue of derivative.
I think it is fine to generate the full derivative tree as long as it can be cleaned up. The alternative to having the full hierarchy is to have pipelines as described in Robert’s email of Feb 2, 2022 so I think this strategy covers both approaches.
Three comments
1. Hierarchy. It is a detail, but I would prefer a hierarchy where the name of the derivate is appended to derivative folder. For example derivative-filtering, then subfolder, derivative-downsampling. I think it is closer to the current BIDS implementation (simply need to allow wildcard after “derivative”). Also simpler for user browsing (half the number of sub-folders to dig into). So instead of ds003645_selection/derivative/filtering/derivative/downsampling we would have ds003645_selection/derivative-filtering/derivative-downsampling. I am expecting Robert might have resistance to that (he always have a very good reason to do thing the way he does :-). Maybe we can vote?
2. Reproducibility. For the final derivative tree (to be published)
- each branch should have a DOI and can reference the DOI of the parent instead of being embedded in it (so you can share the derivative folder directly without loosing tracking)
- We need tools that can reiterate the tree from the raw data and the code in “code” folders and subfolders for quality control
- Maybe in the code folder a JSON file, with field, software (i.e. Fieldtrip, EEGLAB, MNE), language (python, MATLAB), and dependencies which would contain a list as well (with name and version, for example for EEGLAB plugins, or other dependencies), and then a field with “script” that could contain the script name in the same folder to execute using the parent BIDS to obtain the current derivative
{
"software": {
"name": "EEGLAB",
"version": "2022.0",
"url": "xxxx"
},
"language": {
"name": "MATLAB",
"version": "2021b"
},
"dependencies": [{
"name": "bids-mjatlab-tools",
"version": "6.1"
},
{
"name": "Fieldtrip",
"version": "2022_03_10"
}
],
"script": {
"name": "my_pipeline.m"
}
}
3. New data file types. We need to define new EEG data files which can be reused (for group analysis etc…) in addition to the processed EEG. For example, Liedfield matrix, ERP/ERSP results, ICA, custom results, etc...
Arno
… On Mar 10, 2022, at 1:08 AM, Robert Oostenveld ***@***.***> wrote:
@SaraStephenson Let me comment on what I encounter while going through the data.
first in derivatives/BIDS-Lossless-EEG
dataset_description.json
• even though it is now nested in a dataset, I would prefer SourceDatasets still to be specified with a DOI.
• I cannot find version 0.1.0 Alpha, better would be to either tag that and/or make a github release, or use a git SHA as version
README and README.md are duplicates.
The LICENSE file applies to the code, but seems inappropriate to the data, i.e. it is not a data use agreement. The source eeg_face13 is ODbL, that also applies to derivatives (since share-alike). I recommend to add the license not only as a file, but also to dataset_description.json. Perhaps you want to move the LICENSE file to the code directory.
Rather than linking to https://jov.arvojournals.org/article.aspx?articleid=2121634 I recommend to link to https://doi.org/10.1167/13.5.22
The file eeg_face13/task-faceFO_events.json can be an empty object {} but not an empty list []. Better would be if it were to explain a bit about the events.tsv files.
The electrode files are identical for all subjects; that suggests that they are not measured but a template. It is not recommended to add template data to the individual subjects. If you want to add a single template to all subjects, better put it at the top level (i.e. following the inheritance principle).
The IntendedFor path is inconsistent between sub-001_task-faceFO_annotations.json and sub-001_task-faceFO_desc-qc_annotations.json (one has ./ in front, the other not).
It is not clear to me what the difference is between sub-001_task-faceFO_annotations.tsv and sub-001_task-faceFO_desc-qc_annotations.tsv.
I don't think that the SamplingFrequency field in sub-002_task-faceFO_annotations.json is needed. The corresponding TSV file is not expressed in samples, but in seconds.
I don't think that EDF is the optimal binary format for processed EEG data. EDF is limited to 16 bits, whereas the data was recorded with 24 bit (since Biosemi) and subsequently processed as single or even double precision. I recommend writing to the BrainVision format, that allows single precision floats to be represented. Or to EEGLAB .set.
The way you coded two things in the annotations.tsv files to me appear to be nearly orthogonal and not relating to each other: the first few rows (with chan_ and comp_) don't relate to onset and duration, and the latter rows don't relate to channels. Each row has a label, but the chan_xxx and comp_xxx labels appear to be very different from all others. Would it not be better to have those in two TSV files? Or possibly even three: desc-chan_annotations.tsv, desc-comp_annotations.tsv and desc-task_annotations.tsv file?
I am not sure (cannot check, since zero bytes) what is in the mat files.
There is a sub-001_task-faceFO_recording-marks_annotation.json file with a data dictionary, but no corresponding data. I would expect that to come with a TSV file (even when it would be empty, i.e. only with the first header row).
Not related to the derivative, but I noticed a typo in eeg_face13/sub-003/eeg/sub-003_task-faceFO_eeg.json: McMaster Univertisy rather than University.
Moving on to BIDS-Seg-Face13-EEGLAB:
Again duplicate README files.
Again PipelineDescription.Version being unfindable on github. Also here the dataset_description could contain more info. There is no license (should be ODbL, since derivatives from the original data should be share-alike).
I cannot review anything else at this level any more (since only binary files), which is not a problem per see.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.
|
Thank you for all your comments about the Face13 example @robertoostenveld, I will look into making the appropriate corrections. I want to provide some clarifying information about annotations in the Face13 example dataset so that the discussion about how to best store the different types of annotations (component, channel, binary time and continuous time annotations) in BIDS can continue. The The formatting of our current annotations.tsv files (that contains component, channel, and binary time annotations in one file) are based on a combination of Examples 2 and 3 in Section 5.1: Sidecar TSV Document in the BEP 021 google doc. I have a few concerns about storing the chan, comp, and time annotations in separate files. One concern is that this will result in a large number of annotation files considering there would also be multiple versions of each file (one for the EEG-IP-L pipeline output and at least one for the QC’ed (desc-qc) data). Another concern is the naming of these annotation files. Currently we use desc-qc to indicate if the annotations are associated with QC’ed data, but would this complicate naming the different types of annotation files with desc-chan, desc-comp and desc-task? The .mat file contains the continuous time annotations (such as the AMICA log likelihood). Since the .mat file is not a part of the BIDS specification, this current example has added a Hopefully this example can help move the discussion on how to store annotations (particularly continuous time annotations) in BIDS forward. Thanks, |
Sara -
I wonder if it would be productive to call what you refer to as
'continuous time annotations' as, rather, 'continuous time data measures' -
you give the example of AMICA model likelihoods; other measures could
include RMS amplitude, "theta/beta ratio", etc. (any of which might be used
in some data quality, cleaning, or evaluation pipeline). In other words, I
'd suggest treating the AMICA likelihood index as a derived data channel
time sync'ed with the original data channels - reserving the term
'annotation' for text or numeric markers of facts pertaining either to the
whole run (as with basic metadata) or to some portion of it (as with event
annotations).
Scott
…On Tue, Mar 22, 2022 at 6:52 PM Sara Stephenson ***@***.***> wrote:
Thank you for all your comments about the Face13 example @robertoostenveld
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_robertoostenveld&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=KEnFjcsfiKF_BPOsgvPP912y1yQ0q05CJ14uAvMNdNQ&m=cC2TsLzL9KCPRV5UYYh9_xtdEiXYHV3qmLSg8Pf3kgwQciIi1h1TgiIXQz5W612D&s=RwFiAbQuvIn9iB_IeUl19NBp7OtEj8oUwlSAyaawuOs&e=>,
I will look into making the appropriate corrections. I want to provide some
clarifying information about annotations in the Face13 example dataset so
that the discussion about how to best store the different types of
annotations (component, channel, binary time and continuous time
annotations) in BIDS can continue.
The sub-001_task-faceFO_annotations.tsv file contains the annotations
that were produced by the EEG-IP-L pipeline
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.sciencedirect.com_science_article_pii_S0165027020303848&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=KEnFjcsfiKF_BPOsgvPP912y1yQ0q05CJ14uAvMNdNQ&m=cC2TsLzL9KCPRV5UYYh9_xtdEiXYHV3qmLSg8Pf3kgwQciIi1h1TgiIXQz5W612D&s=Z8Ogw_P1AhwacLF3_0aijaH89R_Kak1Rm3KGWU-QhVc&e=>.
The sub-001_task-faceFO_desc-qc_annotations.tsv file contains annotations
after the manual quality control (QC) procedure has been completed. During
the QC procedure, the reviewer can modify some time and component
annotations (particularly the ‘manual’ mark) based on visual inspection of
the data.
The formatting of our current annotations.tsv files (that contains
component, channel, and binary time annotations in one file) are based on a
combination of Examples 2 and 3 in Section 5.1: Sidecar TSV Document in
the BEP 021 google doc
<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1PmcVs7vg7Th-2DcGC-2DUrX8rAhKUHIzOI-2DuIOh69-5Fmvdlw_edit-23heading-3Dh.begtazq5lz86&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=KEnFjcsfiKF_BPOsgvPP912y1yQ0q05CJ14uAvMNdNQ&m=cC2TsLzL9KCPRV5UYYh9_xtdEiXYHV3qmLSg8Pf3kgwQciIi1h1TgiIXQz5W612D&s=3KYY2QUgq1Zd0hGMFGsUe3avKxWr4-ryJLpjKToA6_Q&e=>
.
I have a few concerns about storing the chan, comp, and time annotations
in separate files. One concern is that this will result in a large number
of annotation files considering there would also be multiple versions of
each file (one for the EEG-IP-L pipeline output and at least one for the
QC’ed (desc-qc) data). Another concern is the naming of these annotation
files. Currently we use desc-qc to indicate if the annotations are
associated with QC’ed data, but would this complicate naming the different
types of annotation files with desc-chan, desc-comp and desc-task?
The .mat file contains the continuous time annotations (such as the AMICA
log likelihood). Since the .mat file is not a part of the BIDS
specification, this current example has added a
recording-marks_annotation.tsv.gz and an accompanying
recording-marks_annotation.json for continuous time annotations. The
recording-marks_annotation.tsv.gz and the .json file were created based
on the BIDS spec for storing physiological and other continuous recordings
<https://urldefense.proofpoint.com/v2/url?u=https-3A__bids-2Dspecification.readthedocs.io_en_stable_04-2Dmodality-2Dspecific-2Dfiles_06-2Dphysiological-2Dand-2Dother-2Dcontinuous-2Drecordings.html&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=KEnFjcsfiKF_BPOsgvPP912y1yQ0q05CJ14uAvMNdNQ&m=cC2TsLzL9KCPRV5UYYh9_xtdEiXYHV3qmLSg8Pf3kgwQciIi1h1TgiIXQz5W612D&s=DW3z-6NI2tAjm_R3O1YbI8UCbzBZvoqJf1t53ixqp-I&e=>.
The recording-marks_annotation.tsv.gz in the Face13 example contains 100
rows for each of the annotations listed in the
recording-marks_annotation.json. These new files were created because the
continuous time annotations can not be stored in the same way the
component, channel, and binary time annotations are currently stored.
Hopefully this example can help move the discussion on how to store
annotations (particularly continuous time annotations) in BIDS forward.
Thanks,
Sara
—
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bids-2Dstandard_bep021_issues_5-23issuecomment-2D1075728495&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=KEnFjcsfiKF_BPOsgvPP912y1yQ0q05CJ14uAvMNdNQ&m=cC2TsLzL9KCPRV5UYYh9_xtdEiXYHV3qmLSg8Pf3kgwQciIi1h1TgiIXQz5W612D&s=9wZxec_T6vXmANINfG5W1rPzwqMGqVARU9GbI5rt0MU&e=>,
or unsubscribe
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AKN2SFWI5H7Z7SSSWGOM5MDVBJFKHANCNFSM4XUNFFSA&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=KEnFjcsfiKF_BPOsgvPP912y1yQ0q05CJ14uAvMNdNQ&m=cC2TsLzL9KCPRV5UYYh9_xtdEiXYHV3qmLSg8Pf3kgwQciIi1h1TgiIXQz5W612D&s=dYo2eHv8Ui2wQ-n7YSY-F7g2cg0ms3z_m85ItIbam1M&e=>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Scott Makeig, Research Scientist and Director, Swartz Center for
Computational Neuroscience, Institute for Neural Computation, University of
California San Diego, La Jolla CA 92093-0559, http://sccn.ucsd.edu/~scott
|
Continuous data that is time synched with other data is already part of the BIDS specification and documented here https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/06-physiological-and-other-continuous-recordings.html. A very similar approach (again with TSV files) is used for PET blood recording data. |
Thanks Robert,
Do you know of an EEG dataset (joint EEG, eye-tracking, or accelerometer etc…) that you know of that uses this synchronization scheme?
Cheers,
Arno
… On Mar 25, 2022, at 2:28 AM, Robert Oostenveld ***@***.***> wrote:
I'd suggest treating the AMICA likelihood index as a derived data channel time sync'ed with the original data channels
Continuous data that is time synched with other data is already part of the BIDS specification and documented here https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/06-physiological-and-other-continuous-recordings.html. A very similar approach (again with TSV files) is used for PET blood recording data.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
Hi BEP021 community, I'm looking to move forward on a few points from @SaraStephenson 's thread above:
@robertoostenveld What are your thoughts on this? Should this be revisited? We'd like to avoid unnecessary complexities in file naming, and Sara's example follows examples 2 and 3 of BEP021 5.1: Sidecar TSV. ... Second,
@smakeig thank you -- calling them "measures" rather than annotations can address this nicely, and works with @robertoostenveld and @CPernet's prior comments on storing in TSV files as continuous recordings. see spec here in previous comments. @SaraStephenson, I think we'll try moving away from "annotations" - perhaps |
I agree with differentiating continuous time measurements versus event based annotations with an onset and duration. @tpatpa and I are making some updates to the examples for event based annotations with an onset and duration that use HED/SCORE tags. |
Hi all! @robertoostenveld @CPernet @arnodelorme Thanks! CC @SaraStephenson @christinerogers PS: @smakeig we're happy to workshop the naming of those TSV files - you're right that |
It is unclear to me how the annotation file differs from the event.tsv file. |
Have you tried @robertoostenveld proposal for the continuous measurement/annotation? |
@CPernet Can we go ahead and make the necessary adjustments to the BEP021 spec to reflect this suggestion? This paragraph in section 5.1 of the doc still proposes the alternative method of creating a synthetic data channel. Perhaps it can be updated with information on storing continuous recordings as described here? |
I don't think they technically differ (except for the name), but conceptually I see it like this
For many things this works, although I realize that there are some situations where it might not be clear. But as long as it covers 80% of the use-cases, I think it is useful. No need to get 99.99% coverage. |
Hi all, I made a suggestion in the BEP021 document in order to reflect the discussion in this thread (RE: continuous annotations / data measures). The suggestion is based on feedback from @CPernet and @robertoostenveld. Please have a look and let me know if further amendments are needed. |
Robert and all -
In HED-3G we have made a deliberate choice to include 'data feature' events
as a primary event type (along with 'sensory presentation', 'agent action',
etc.). The SCORE library schema for neurophysiologist annotation of events
in clinical data annotates data feature events noted in events.tsv.
As I understand it (correctly?) the EEGNet project proposed use of
'annotation.tsv' files is to apply flags to *each* EEG frame - therefore
out of the event framework - which requires that an 'event' is something
that unfolds over a definite time period in the experiment.
Events are marked in events.tsv by 'event onset markers' - plus possibly
other 'event phase markers' -- foremost, 'event offset markers', but also
possibly markers of other critical points in the event process, for example
'max volume', 'motion course correction','seizure process shift', etc.
I would not support the term 'annotations' for the proposed purpose of
flagging, e.g., data frames for rejection, etc. A more specific term
should be used - for example 'data_flags.tsv' or other ...
Scott
…On Thu, Jun 2, 2022 at 12:18 PM Robert Oostenveld ***@***.***> wrote:
It is unclear to me how the annotation file differs from the event.tsv
file.
I don't think they technically differ (except for the name), but
conceptually I see it like this
- events is for the observable things that happened during acquisition
independent of the EEG data (like stimuli, button presses) and that could
also have happened and be observed if the EEG were not recorded at all
- annotations is for things that are observed specifically in the EEG
data and hence requires (re)viewing the data
For many things this works, although I realize that there are some
situations where it might not be clear. But as long as it covers 80% of the
use-cases, I think it is useful. No need to get 99.99% coverage.
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/bids-standard/bep021/issues/5*issuecomment-1145225333__;Iw!!Mih3wA!GlxM8tl6qtOmomxPCAriXw-Nqrxbw5kDjmKLK4okV9rtldg4AnZa9B4bZXoVFUEHl6kIFdj2E_P6ebJOP-8BBIsY$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKN2SFXYOMSQQI3PDKG5KD3VNECH7ANCNFSM4XUNFFSA__;!!Mih3wA!GlxM8tl6qtOmomxPCAriXw-Nqrxbw5kDjmKLK4okV9rtldg4AnZa9B4bZXoVFUEHl6kIFdj2E_P6ebJOP0qpMMVZ$>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Scott Makeig, Research Scientist and Director, Swartz Center for
Computational Neuroscience, Institute for Neural Computation, University of
California San Diego, La Jolla CA 92093-0559, http://sccn.ucsd.edu/~scott
|
Hello community! It's been awhile since I have been involved in these discussions, but I am once again contributing to some BIDS EEG efforts on behalf of EEGNet. I've taken some time to read over this thread and want to state here as part of my post what I believe to be the current understanding of annotations so that I can update Face13 (and its various example repositories) as necessary once again. If there's anything wrong with my broad statements of the current efforts, please let me know and hopefully it can restart some discussion here.
I'll be coordinating offline with @SaraStephenson and @jadesjardins as necessary for changes to Face13, so no worries. I think I own a few of the commits too... CCing @jesscall and @christinerogers as well. Thanks! |
we can raise those points at OHBM2022
|
Some updates to move forward - Second, to get unstuck on annotation nomenclature, @sappelhoff could you confirm / advise on best BIDS practice /other specs? (We've looked at cardio but would appreciate more eyes on this, as previously discussed.)
Thanks - and cc Tyler @Andesha Collins, @SaraStephenson @CPernet to your last bullet above - I can shoot you the latest on events/HED creation next week. |
Another important point for discussion here is that it will be necessary to have a recommendation for how to indicate a list in a .tsv file, here for the channels. This is not specified in BIDS now. Similar point raised in BEP021 spec Would there be any issue with a simple comma separated list? (e.g. |
'Would there be any issue with a simple comma separated list?' just that BIDS has stuck with tsv :-; |
Can somebody briefly summarize what the issue is with the two terms "annotations" versus "flags", please? From my perspective, we are talking about data and event annotations. Re: "continuous annotations", I would (as many others already have) advise to follow the specification on physio and stim data. That entails:
Alternative: Not sure if Scott proposed this above, but we could also specify that continuous annotations must be formatted as a data channel and included in the (derivative) neural data file (BrainVision, EDF, ...). In that case all we might have to do is to come up with a good way of how to specify such "channel types" (e.g., see table in this section) I find this alternative very attractive, because it makes the "name finding" issue easier (see "problem" / 3 "considerations" above) Re: specifying which particular "channels" a given event pertains to --> there is a discussion thread in the Gdoc draft: https://docs.google.com/document/d/1PmcVs7vg7Th-cGC-UrX8rAhKUHIzOI-uIOh69_mvdlw/edit?disco=AAAAc8E_W-g Incorporating comments in that thread, my suggestion is to:
# two channels are affected
["Cz", "Fpz"]
# no channel is affected
[]
# the notion of channel is not applicable for this data point
"n/a"
# all channels are affects
"all"
# a channel with the name "all" is affected
["all"]
# a channel that contains double quotes in its name is affected
# note the backslash as escape character
["my_weird\"channel_name", "FCz", "C3"]
# channel names MUST NOT contain "tab characters"
# this would break the TSV format / make it uncomfortable ☝️ depending on the event to be annotated, the above format may be used to indicate channels and/or electrodes in their respective columns Finally, re: non-continuous annotations --> I am fine with both: including them in the standard |
Thanks for the input @sappelhoff - I'm going to respond in order to your sections... For my lab, "annotations" vs "flags" is mostly just terminology. We used annotations as a term to encompass all post processed derivative data properties like bridged channels (computationally discovered), rejected ICs, or ICA breakdown quality. Flags to me implies things that are just boolean. I believe on this point we're simply just discussing phrasing. I believe at this point we can lock in the idea of the continuous annotations being in the external files. There appears to be significant agreement. I would however suggest the adding of a new term as it will likely age better in the long term and may help inform inform the process. As above, we are likely past packaging things into the data as either "status" channels or other. I'm fine with the external personally. I am in favour of the idea of adding this information of channel/electrode as an optional column. It would allow our process to fairly easily encode whatever "annotation" (or your chosen term) fairly simply, as long as there's no notion of a "simplest form" requirement. I'm not sure, but how would you envision this extending to ICs @sappelhoff ? Lastly, I agree. Inventing a new datatype/suffix is not strictly needed. Some extra thoughts as I was responding: How does this relate to the provenance (lifecycle?) of a file within BIDS. Does it work naturally or is there things we should be considering from BEP028? Consider the case of a pipeline doing processing, going back and marking up events, adding reaction time info, marking study things, etc... thoughts @christinerogers ? Thanks! |
It is best not to try to give a more generally applied term in
widespread/longstanding use ('annotation') in a quite more limited meaning
in the same application (e.g., BIDS). The continuous measures you introduce
in LP are in a broader sense like any other continuous data measure (say,
RMS, likelihood of x, etc.). I suggest the term in BIDS should be chosen to
apply equally to all these that users may want to include in the dataset in
an external file. Perhaps 'derived data'?
Scott
…On Mon, Aug 8, 2022 at 11:08 AM Tyler Collins ***@***.***> wrote:
Thanks for the input @sappelhoff
<https://urldefense.com/v3/__https://github.com/sappelhoff__;!!Mih3wA!CXjZZ_eofzmLug6NdUwXhUi0ggwEDEYc1NYVItTR7esBTOzvVGZbcx13Bi8RyWR5iTv2tNzaEbwCrTGPUGEz9MQI$>
- I'm going to respond in order to your sections...
For my lab, "annotations" vs "flags" is mostly just terminology. We used
annotations as a term to encompass all post processed derivative data
properties like bridged channels (computationally discovered), rejected
ICs, or ICA breakdown quality. Flags to me implies things that are just
boolean. I believe on this point we're simply just discussing phrasing.
I believe at this point we can lock in the idea of the continuous
annotations being in the external files. There appears to be significant
agreement. I would however suggest the adding of a new term as it will
likely age better in the long term and may help inform inform the process.
As above, we are likely past packaging things into the data as either
"status" channels or other. I'm fine with the external personally.
I am in favour of the idea of adding this information of channel/electrode
as an optional column. It would allow our process to fairly easily encode
whatever "annotation" (or your chosen term) fairly simply, as long as
there's no notion of a "simplest form" requirement. I'm not sure, but how
would you envision this extending to ICs @sappelhoff
<https://urldefense.com/v3/__https://github.com/sappelhoff__;!!Mih3wA!CXjZZ_eofzmLug6NdUwXhUi0ggwEDEYc1NYVItTR7esBTOzvVGZbcx13Bi8RyWR5iTv2tNzaEbwCrTGPUGEz9MQI$>
?
Lastly, I agree. Inventing a new datatype/suffix is not strictly needed.
Some extra thoughts as I was responding:
How does this relate to the provenance (lifecycle?) of a file within BIDS.
Does it work naturally or is there things we should be considering from
BEP028? Consider the case of a pipeline doing processing, going back and
marking up events, adding reaction time info, marking study things, etc...
thoughts @christinerogers
<https://urldefense.com/v3/__https://github.com/christinerogers__;!!Mih3wA!CXjZZ_eofzmLug6NdUwXhUi0ggwEDEYc1NYVItTR7esBTOzvVGZbcx13Bi8RyWR5iTv2tNzaEbwCrTGPUNceXHV5$>
?
Thanks!
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/bids-standard/bep021/issues/5*issuecomment-1208252806__;Iw!!Mih3wA!CXjZZ_eofzmLug6NdUwXhUi0ggwEDEYc1NYVItTR7esBTOzvVGZbcx13Bi8RyWR5iTv2tNzaEbwCrTGPUOfRmH-2$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKN2SFWGSSNZW72UBXVSI3TVYEPF5ANCNFSM4XUNFFSA__;!!Mih3wA!CXjZZ_eofzmLug6NdUwXhUi0ggwEDEYc1NYVItTR7esBTOzvVGZbcx13Bi8RyWR5iTv2tNzaEbwCrTGPUKz1AVxO$>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Scott Makeig, Research Scientist and Director, Swartz Center for
Computational Neuroscience, Institute for Neural Computation, University of
California San Diego, La Jolla CA 92093-0559, http://sccn.ucsd.edu/~scott
|
This continues the issue started here bids-standard/bids-specification#733. This BEP021 is a better place to continue the discussion, as we can also use projects and share files here which would not fit under bids-specification directly.
The text was updated successfully, but these errors were encountered: