pydian is a pure Python library for readable and repeatable data mappings. pydian reduces boilerplate for data manipulation and provides a framework for expressive data wrangling.
Using pydian, developers can collaboratively and incrementally write data mappings that are expressive, safe, and reusable. Similar to how libraries like React were able to streamline UI components for frontend development, pydian aims to streamline data transformations for backend development.
pydian currently offers an ergonomic API for:
- Working with
dict
data - Working with
DataFrame
data - Validating
dict
data using theRule
andRuleGroup
framework
All the core API functions are pure transforms, so you can trust that code will be consistent between runs and when sharing! For any exceptions, the function README will have an explicit NOTE.
The key idea behind is the following: get
data from an object, and if it succeeded, do stuff to it.
from pydian import get
# Some arbitrary source dict
payload = {
'some': {
'deeply': {
'nested': [{
'value': 'here!'
}]
}
},
'list_of_objects': [
{'val': 1},
{'val': 2},
{'val': 3}
]
}
# Conveniently get values and chain operations
assert get(payload, 'some.deeply.nested[0].value', apply=str.upper) == 'HERE!'
# Unwrap list structures with [*]
assert get(payload, 'list_of_objects[*].val') == [1,2,3]
# Safely specify your logic with built-in null checking (handle `None` instead of a stack trace!)
assert get(payload, 'some.deeply.nested[100].value', apply=str.upper) == None
Additional constructs are added for more complex data operations (Mapper
, select
on Dataframes, validate
, etc.).
See more examples in the comprehensive tests (definitely feel empowered to add more as well)!
The API is designed to be developer-friendly, prioritizing ergonomics and reliability (speed is decent, though be sure to benchmark your use-case).
pydian
-- working withdict[str,Any]
get
-- grab data using JMESPath syntaxMapper
-- cleans-up empty values and allows complex logic withDROP
andKEEP
pydian.partials
-- module with a bunch of common one-line functions. Good for codebase consistency- Suggested use:
import pydian.partials as p
- Suggested use:
pydian.dataframes
-- working withpolars.DataFrame
(convert from pandas using the polars API)- NOTE: install this with
pip install "pydian[dataframes]"
select
-- grab dict data using simple SQL-like string syntax,
-delimited columns,*
to get all,:
for row filtering,-> [ ... ]
for dict-unnesting,-> { 'new_name': ... }
for renaming,+>
for using->
and also keep the original column
join
,union
, andgroup_by
functions with SQL-like string syntax
- NOTE: install this with
pydian.validation
-- validatingdict[str, Any]
data via composition (interops with pydantic)Rule
andRuleGroup
classes to group functions logically, and wraps results inOk
orErr
validate
dict data using expressive, composible syntax
The following are in-progress:
pydian.pipes
-- Module for running pipeline of operationspydian.standards
-- Module for defining data standards and sharing mappingspydian.io
-- Module for disk/database IOpydian.ml
-- Module for ML training and inference- Implement core library in lower-level language (e.g. Rust)
- Port to JS
Please submit a GitHub Issue for any bugs + feature requests 🙌 🙏