OSDC West - CFP Closes 05/20/24 #29

ncclementi · 2024-03-29T14:34:17Z

https://odsc.com/california/call-for-speakers-west/

Talk Session Formats

Proposals will be considered for the following types of presentations:

Format for Technical Sessions

- Talk (30 minutes)
- Hands-on Workshop (2 hrs)
- Tutorial (60 min, hands-off)
- Lightning Talks (10 min)

Format for Business Sessions

- Talk (30 minutes)
- Case Studies (30 minutes)
- Hands-on workshop (60 minutes)
- Startup Talk (30 minutes)

The text was updated successfully, but these errors were encountered:

jitingxu1 · 2024-05-20T18:06:22Z

update 1 based on comments

ibis-ML -> IbisML
removed Theseus
add talk outline

Title: IbisML: Efficiently Streamlining and Unifying ML Data Processing from Development to Production

Description

Machine learning projects require transforming raw data into prepared samples using a combination of feature engineering pipelines and online last-mile processing, integrated with model training workflows. Data scientists and engineers collaborate to prototype, develop, scale, and deploy both batch and streaming jobs. These processes present several challenges:

Development to Production: Ensuring smooth transitions and consistency between development and production environments.
Data Scale: Managing the shift from small, local datasets to large, distributed data environments.
Batch and Streaming: Handling the complexities of both batch and streaming data processing.
Multilingual Frameworks: Coordinating multiple languages and frameworks, which can slow down the process and reduce interactivity.

To address these challenges, IbisML harnesses the power of Ibis, offering a library designed to streamline and unify data preprocessing and feature engineering workflows across diverse environments and data scales. Its unified codebase eliminates the need for rewriting logic during transitions from local development to large-scale distributed production and from batch to streaming with the following key features:

Versatile Backends: Seamlessly integrating with over 20 backends, including DuckDB, Dask, Polars, Pandas, BigQuery, PySpark, and Flink, ensuring swift, effective, and adaptable machine learning data processing.
Large Dataset Processing: IbisML excels in optimizing large dataset processing for speed and efficiency, facilitating rapid ML data processing with different backends.
Seamless Integration: Supporting diverse pipeline-like steps that are integrated with scikit-learn pipeline, and delivering rich output data formats such as Pandas DataFrame, torch dataset, and Xgboost Dmatrix, seamlessly aligning with scikit-learn, XGBoost, and PyTorch models.

The talk will explore the gaps in existing projects, emphasizing how IbisML effectively tackles these challenges and enables seamless transitions between development and deployment, both in offline and online deployment scenarios.
At the end of this talk, we'll leverage IbisML to craft machine learning models starting from data engineering, through last-mile preprocessing using ibisML Recipes across various backends, and feeding diverse data into downstream model training libraries or frameworks like scikit-learn, XGBoost, and PyTorch.

Notes

ibis
ibis-ml

deepyaman · 2024-05-20T18:28:06Z

@jitingxu1 Some quick notes:

Can you change Ibis-ML to IbisML in all places?
I don't think Theseus is necessary here. Let's focus on the open source project for ODSC (and in most submissions).
What does multilingual frameworks mean? This point isn't clear to me personally.

I don't 100% know whether need to talk about streaming in this talk, but I feel like there may be differing views.

At a higher level (probably more important): what will the talk cover? I think this is a description of the project, and what it can do, but not of the talk. I think there must be some more clear discussion about what the gap with current projects is.

jitingxu1 · 2024-05-20T18:53:43Z

For example, using spark for batch feature, flink for streaming features, and pandas, scikit-learn or pytorch for last-mile preprocessing.

What does multilingual frameworks mean? This point isn't clear to me personally.

Highlighting the strength of IbisML, it has streaming support, This capability might distinguish it from other options.

I don't 100% know whether need to talk about streaming in this talk, but I feel like there may be differing views.

zhenzhongxu · 2024-05-20T20:04:58Z

Great write-up! I wonder if it makes sense to talk about the benefits of moving away from sampling into using more holistic data and it just works with IbisML. I've heard a few cases where large organizations desire to train models using the full data instead of relying on sampling.

jitingxu1 · 2024-05-21T05:18:58Z

Here is the submitted version, Thanks @ncclementi @deepyaman @chip for review.

Title: Building ML pipelines that run anywhere with IbisML

Abstract

From inception to production, the ML lifecycle requires a lengthy process involving multiple people, programming languages, and computational frameworks. In a traditional workflow, data scientists develop models and experiment with different features locally, using tools like pandas and scikit-learn on a small, often subsampled, dataset. However, as the need arises to scale up to larger datasets and production environments, engineers face the challenge of rewriting and testing these processes in distributed computing systems like Apache Spark or Dask. While frameworks like these have their own ML libraries (of various flavors and maturities) and technically allow the user to run on single machines and clusters, scaling ML pipelines is costly, resource-intensive, and inefficient.

IbisML is an open-source, Python library designed for building and running scalable ML pipelines from experiment to production. It’s built on top of Ibis, an open source library that provides a familiar dataframe API to build up expressions that can be executed on a wide array of backends. They can use tools like DuckDB and Polars for efficient local computation, then scale to distributed engines such as Spark, BigQuery, and Snowflake. With IbisML, users can preprocess data at scale across development and deployment, compose transformations with other scikit-learn estimators, and seamlessly integrate with scikit-learn, XGBoost, and PyTorch models without rewriting code.

In this talk, we will introduce IbisML and the utilities it provides to streamline ML pipeline development. We will demonstrate its functionalities on a simple, real-world problem, including the ability to train and fit estimators on different backends. Finally, we will showcase how you can efficiently hand off to the modeling framework of your choice.

jitingxu1 · 2024-08-16T04:17:01Z

No response from the conference

ncclementi · 2024-08-16T14:55:01Z

Idk who the conference does this, but they should have informed on the first round on July 12th according to their website. Maybe you can email them and ask, if not wait until the second round of notification.

github-project-automation bot added this to Ibis talks and tutorials and Ibis planning and roadmap Mar 29, 2024

github-project-automation bot moved this to CFP in Ibis talks and tutorials Mar 29, 2024

github-project-automation bot moved this to backlog in Ibis planning and roadmap Mar 29, 2024

cpcloud assigned deepyaman May 8, 2024

jitingxu1 moved this from CFP to Preparing for submission in Ibis talks and tutorials May 20, 2024

jitingxu1 self-assigned this May 20, 2024

jitingxu1 moved this from Preparing for submission to Submitted in Ibis talks and tutorials May 21, 2024

cpcloud moved this from backlog to cooking in Ibis planning and roadmap May 29, 2024

jitingxu1 moved this from cooking to review in Ibis planning and roadmap Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSDC West - CFP Closes 05/20/24 #29

OSDC West - CFP Closes 05/20/24 #29

ncclementi commented Mar 29, 2024

jitingxu1 commented May 20, 2024 •

edited

Loading

deepyaman commented May 20, 2024

jitingxu1 commented May 20, 2024 •

edited

Loading

zhenzhongxu commented May 20, 2024

jitingxu1 commented May 21, 2024

jitingxu1 commented Aug 16, 2024

ncclementi commented Aug 16, 2024

OSDC West - CFP Closes 05/20/24 #29

OSDC West - CFP Closes 05/20/24 #29

Comments

ncclementi commented Mar 29, 2024

jitingxu1 commented May 20, 2024 • edited Loading

Title: IbisML: Efficiently Streamlining and Unifying ML Data Processing from Development to Production

Description

Notes

deepyaman commented May 20, 2024

jitingxu1 commented May 20, 2024 • edited Loading

zhenzhongxu commented May 20, 2024

jitingxu1 commented May 21, 2024

Title: Building ML pipelines that run anywhere with IbisML

Abstract

jitingxu1 commented Aug 16, 2024

ncclementi commented Aug 16, 2024

jitingxu1 commented May 20, 2024 •

edited

Loading

jitingxu1 commented May 20, 2024 •

edited

Loading