Releases: USM-CHU-FGuyon/BlendedICU
Releases · USM-CHU-FGuyon/BlendedICU
v0.5.1
v0.5.0
⚡Speedups:
- Support for time resampling, min-max clipping, and pivoting data to wide format is temporarily dropped. They will be moved later in the pipeline, this provided major speedups.
- Exploiting polars laziness to provide fast harmonization and low memory pressure.
- No more processing by patient chunk and individual patient files.
The full OMOP-ization pipeline can be run for all 5 databases in a single day :)
🐛bugfix:
- ICU stays with missing Length-of-stay data were dropped from the database. All patients are now preserved.
- Drug exposures that were not omop-ized were kept, with drug_concept_id = 0, previously they were dropped.
🏃 Getting further:
- Drug dosages were partially omop-ized: dosage and routes were extracted. Some units were omop-ized, routes were not harmonized yet.
- Observation period table was added
- drug strength table is still a work in progress, contributions are welcome ! especially for eICU.
v0.4.2
Changes :
⚡ Speedup : Converting MIMIC-III, MIMIC-IV and Amsterdam's csv.gz files as parquet in step 1. This conversion is only done once and allowed speeding up the following step.
v0.4.1
Changes :
- 🐛 Bugfix :
visit_occurrence_id
is no longer missing fromcondition_occurrence
table. - ⚡ Speedup : Converting eICU's
csv.gz
files as parquet in step 1. This makes re-running1_extract_eicu.py
3 times faster.
v0.4.0
Started to speed up some operations using polars.
v0.3.2
Corrections on variables and dtypes in final OMOP tables.
Bugfix:
- Removed
visit_start_date
from measurement table, and string values incare_site
table'splace_of_service_concept_id
- Save all OMOP tables to parquet + corrected wrong dtypes on some tables.
- Rounding times to the second. This avoids an error due to high precision in time when writing some records to parquet OMOP tables.
Minor changes:
- Refactored timeseriespreprocessing to timeseriesprocessor
- Option to skip reset_dir() when starting
2_{dataset}.py
v0.3.1
Major changes:
- Generated a numeric patient id for OMOP-standardization. (Issue #15 )
- Added some insight for running times of each scripts. (as suggested in Issue #24 )
- Simplified the structure of paths.json
- Fixed inconsistency in datetimes of OMOP tables : some datetime columns contained the date, other contained the time of day. Now they all contain the full datetime. Issue #26
Minor changes:
- Added unit_concept_id to auxillary_files/user_input/timeseries_variables.csv
- Fixed harmless SettingWithCopy warning happening in database_processing/dataprocessor.py
Thanks to @mostafaalishahi, and @xinyuejohn for their contribution to the project.