-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSDC West - CFP Closes 05/20/24 #29
Comments
update 1 based on comments
Title: IbisML: Efficiently Streamlining and Unifying ML Data Processing from Development to ProductionDescriptionMachine learning projects require transforming raw data into prepared samples using a combination of feature engineering pipelines and online last-mile processing, integrated with model training workflows. Data scientists and engineers collaborate to prototype, develop, scale, and deploy both batch and streaming jobs. These processes present several challenges:
To address these challenges, IbisML harnesses the power of Ibis, offering a library designed to streamline and unify data preprocessing and feature engineering workflows across diverse environments and data scales. Its unified codebase eliminates the need for rewriting logic during transitions from local development to large-scale distributed production and from batch to streaming with the following key features:
The talk will explore the gaps in existing projects, emphasizing how IbisML effectively tackles these challenges and enables seamless transitions between development and deployment, both in offline and online deployment scenarios. Notes |
@jitingxu1 Some quick notes:
I don't 100% know whether need to talk about streaming in this talk, but I feel like there may be differing views. At a higher level (probably more important): what will the talk cover? I think this is a description of the project, and what it can do, but not of the talk. I think there must be some more clear discussion about what the gap with current projects is. |
For example, using spark for batch feature, flink for streaming features, and pandas, scikit-learn or pytorch for last-mile preprocessing.
Highlighting the strength of IbisML, it has streaming support, This capability might distinguish it from other options.
|
Great write-up! I wonder if it makes sense to talk about the benefits of moving away from sampling into using more holistic data and it just works with IbisML. I've heard a few cases where large organizations desire to train models using the full data instead of relying on sampling. |
Here is the submitted version, Thanks @ncclementi @deepyaman @chip for review. Title: Building ML pipelines that run anywhere with IbisMLAbstractFrom inception to production, the ML lifecycle requires a lengthy process involving multiple people, programming languages, and computational frameworks. In a traditional workflow, data scientists develop models and experiment with different features locally, using tools like pandas and scikit-learn on a small, often subsampled, dataset. However, as the need arises to scale up to larger datasets and production environments, engineers face the challenge of rewriting and testing these processes in distributed computing systems like Apache Spark or Dask. While frameworks like these have their own ML libraries (of various flavors and maturities) and technically allow the user to run on single machines and clusters, scaling ML pipelines is costly, resource-intensive, and inefficient. IbisML is an open-source, Python library designed for building and running scalable ML pipelines from experiment to production. It’s built on top of Ibis, an open source library that provides a familiar dataframe API to build up expressions that can be executed on a wide array of backends. They can use tools like DuckDB and Polars for efficient local computation, then scale to distributed engines such as Spark, BigQuery, and Snowflake. With IbisML, users can preprocess data at scale across development and deployment, compose transformations with other scikit-learn estimators, and seamlessly integrate with scikit-learn, XGBoost, and PyTorch models without rewriting code. In this talk, we will introduce IbisML and the utilities it provides to streamline ML pipeline development. We will demonstrate its functionalities on a simple, real-world problem, including the ability to train and fit estimators on different backends. Finally, we will showcase how you can efficiently hand off to the modeling framework of your choice. |
No response from the conference |
Idk who the conference does this, but they should have informed on the first round on July 12th according to their website. Maybe you can email them and ask, if not wait until the second round of notification. |
https://odsc.com/california/call-for-speakers-west/
Talk Session Formats
Proposals will be considered for the following types of presentations:
Format for Technical Sessions
Format for Business Sessions
The text was updated successfully, but these errors were encountered: