-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyData NYC 2024 -- CFP Closes 2024/09/03 #40
Comments
Proposal title:Ibis: Don't Let the Engine Dictate the Interface Abstract:Tabular data is ubiquitous, and pandas has been the de facto tool in Python for Description:Ibis is an open-source, pure Python library that lets you write Python to build up expressions Modern analytical databases (like DuckDB) are able to analyze tabular data pandas and other Python libraries can interact with databases, but they were Treating a remote database as a data store isn’t wrong, but it provides an Because they are very, very fast. 50 years of database research hasn't gone to In a cruel twist of fate, though, almost all of them require you to write SQL in SQL is only a language – it’s an interface. The execution engine is a separate Maybe you would like to use the DuckDB execution engine, but you don’t like the interface (SQL)? Or you would like to use the Spark execution engine, but you don’t like the interface (PySpark API)? The interface shouldn’t be a hurdle for a user to clear in order to make use of |
+1-ing this, since I've seen the talk and it's amazing. |
I submitted a talk too, I adapted the geospatial one a bit. TitleIbis, DuckDB, and GeoParquet: Making Geospatial Analytics Fast, Simple, and Pythonic Abstract:Geospatial data is becoming increasingly integral to data workflows, and Python offers a wide array of tools to handle it. A powerful new option has recently emerged: DuckDB, which now supports geospatial analytics with its new extension. DuckDB has taken the data world by storm (~23k stars on GitHub) and is making waves in geospatial data too. Plus, with the increasing developments and adoption of GeoParquet, storing and exchanging geospatial data has never been easier. But what if you prefer writing Python code over SQL? That’s where Ibis comes in. Ibis is a Python library that provides a dataframe-like interface, allowing you to write Python code to construct SQL expressions that can be executed on various backends, including DuckDB. In this talk, I’ll demonstrate how to leverage the power of DuckDB’s spatial capabilities while staying within the Python ecosystem—yes, there will be a live demo! (Pssst... I’ll show you how to work with GeoParquet data from Overture Maps, create nice plots that won’t kill your laptop, and avoid SQL.) This is an introductory talk; everyone is welcome, and no prior experience with spatial databases or geospatial workflows is needed. Description:Ibis is an open-source Python library that provides a dataframe-like API, enabling you to write Python code to build expressions that can be executed across multiple backends such as DuckDB, PostgreSQL, BigQuery, and more. Some of these backends offer support for geospatial operations that can be executed via Ibis without the need to write any SQL. In this talk, we aim to showcase our default backend: DuckDB. Over the past year, DuckDB has introduced support for over 100 geospatial operations, many of which are now accessible via Ibis. This allows you to experiment with these operations while remaining in Python land. If you have experience working with spatial databases, you are likely familiar with PostGIS, a library that extends PostgreSQL's capabilities to handle geospatial data. The DuckDB spatial extension provides a healthy subset of PostGIS-like options, but getting started is much simpler. No server-side setup, user configuration, or client configuration. DuckDB seamlessly integrates into existing GIS workflows, regardless of data formats or projections. Recently, DuckDB has also added support for GeoParquet. GeoParquet extends the powerful Apache Parquet columnar data format to the geospatial domain, making it easier to work with geospatial data in a high-performance, columnar format. With Ibis, performing your first spatial operations becomes even easier and, most importantly, it’s Python! During this talk, we will introduce Ibis and demonstrate its geospatial functionality through an example, with DuckDBas backend and working with a GeoParquet data source. We will also explore compatibility with other Python libraries such as GeoPandas, and lonboard for plotting purposes. By the end of the talk, you’ll learn how to get started with Ibis and work with spatial databases with DuckDB as a backend engine. |
Proposal titleBuilding machine learning pipelines that scale: a case study using Ibis and IbisML Session typeTutorial (90 minutes) AbstractLibraries like Ibis have been gaining traction recently, by unifying the way we work with data across multiple data platforms—from dataframe APIs to databases, from dev to prod. What if we could extend the abstraction to machine learning workflows (broadly, sequences of steps that implement DescriptionAs Python has become the lingua franca of data science, pandas and scikit-learn have cemented their roles in the standard machine learning toolkit. However, when data volumes rise, this stack becomes unwieldy (requiring proportionately-larger compute, subsampling to reduce data size, or both) or altogether untenable. Luckily, modern analytical databases (like DuckDB) and dataframe libraries (such as Polars) can crunch this same tabular data, but perform orders-of-magnitude faster than pandas, all while using less memory. Ibis already provides a unified dataframe API that lets users leverage a plethora of popular databases and analytics tools (BigQuery, Snowflake, Spark, DuckDB, etc.) without rewriting their data engineering code. However, at scale, the performance bottleneck is pushed to the ML pipeline. IbisML extends the intrinsic benefits of using Ibis to the ML workflow. It lets you bring your ML to the database (or other Ibis-supported backend), and supports efficient integration with modeling frameworks like XGBoost, PyTorch, and scikit-learn. On top of that, IbisML steps can be used as estimators within the familiar context of scikit-learn pipelines. In this tutorial, we'll cover:
This is a hands-on tutorial, and you will train a simple (not great!) live win probability model on a provided dataset. You'll also see how the result can be run at scale on a distributed backend. Participants should ideally have some experience using Python dataframe libraries; scikit-learn or other modeling framework familiarity is helpful but not required. |
My talk: Ibis, DuckDB, and GeoParquet: Making Geospatial Analytics Fast, Simple, and Pythonic got accepted. Leaving this here for tracking purposes. |
https://pydata.org/nyc2024/call-for-proposals
The text was updated successfully, but these errors were encountered: