Recently added support for LLM based natural language queries of OMOP CDM databases using llama-index. Please install the llm extras as follows. Please be cognizant of the privacy issues with publically hosted LLMs. Any feedback will be highly appreciated. See usage!
pip install pyomop[llm]
The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases. This is a python library to use the CDM v6 compliant databases using SQLAlchemy as the ORM. pyomop also supports converting query results to a pandas dataframe (see below) for use in machine learning pipelines. See some useful SQL Queries here.
pip install pyomop
- git clone this repository and:
pip install -e .
from pyomop import CdmEngineFactory, CdmVocabulary, CdmVector, Cohort, Vocabulary, metadata
from sqlalchemy.future import select
import datetime
import asyncio
async def main():
cdm = CdmEngineFactory() # Creates SQLite database by default
# Postgres example (db='mysql' also supported)
# cdm = CdmEngineFactory(db='pgsql', host='', port=5432,
# user='', pw='',
# name='', schema='cdm6')
engine = cdm.engine
# Create Tables if required
await cdm.init_models(metadata)
# Create vocabulary if required
vocab = CdmVocabulary(cdm)
# vocab.create_vocab('/path/to/csv/files') # Uncomment to load vocabulary csv files
# Add a cohort
async with cdm.session() as session:
async with session.begin():
session.add(Cohort(cohort_definition_id=2, subject_id=100,
cohort_end_date=datetime.datetime.now(),
cohort_start_date=datetime.datetime.now()))
await session.commit()
# Query the cohort
stmt = select(Cohort).where(Cohort.subject_id == 100)
result = await session.execute(stmt)
for row in result.scalars():
print(row)
assert row.subject_id == 100
# Query the cohort pattern 2
cohort = await session.get(Cohort, 1)
print(cohort)
assert cohort.subject_id == 100
# Convert result to a pandas dataframe
vec = CdmVector()
vec.result = result
print(vec.df.dtypes)
result = await vec.sql_df(cdm, 'TEST') # TEST is defined in sqldict.py
for row in result:
print(row)
result = await vec.sql_df(cdm, query='SELECT * from cohort')
for row in result:
print(row)
# Close session
await session.close()
await engine.dispose()
# Run the main function
asyncio.run(main())
from pyomop import CdmEngineFactory, CdmVocabulary, CdmVector, Cohort, Vocabulary, metadata
from sqlalchemy.sql import select
import datetime
cdm = CdmEngineFactory() # Creates SQLite database by default
# Postgres example (db='mysql' also supported)
# cdm = CdmEngineFactory(db='pgsql', host='', port=5432,
# user='', pw='',
# name='', schema='cdm6')
engine = cdm.engine
# Create Tables if required
metadata.create_all(engine)
# Create vocabulary if required
vocab = CdmVocabulary(cdm)
# vocab.create_vocab('/path/to/csv/files') # Uncomment to load vocabulary csv files
# Create a Cohort (SQLAlchemy as ORM)
session = cdm.session
session.add(Cohort(cohort_definition_id=2, subject_id=100,
cohort_end_date=datetime.datetime.now(),
cohort_start_date=datetime.datetime.now()))
session.commit()
result = session.query(Cohort).all()
for row in result:
print(row)
# Convert result to a pandas dataframe
vec = CdmVector()
vec.result = result
print(vec.df.dtypes)
# Execute a query and convert it to dataframe
vec.sql_df(cdm, 'TEST') # TEST is defined in sqldict.py
print(vec.df.dtypes) # vec.df is a pandas dataframe
# OR
vec.sql_df(cdm, query='SELECT * from cohort')
print(vec.df.dtypes) # vec.df is a pandas dataframe
pyomop -help
Want to convert FHIR to pandas data frame? Try fhiry
Use the same functions in .NET and Golang!
- Postgres
- MySQL
- SqLite
- More to follow..
If you find this project useful, give us a star. It helps others discover the project.
- Bell Eapen |
- PRs welcome. See CONTRIBUTING.md