-
Notifications
You must be signed in to change notification settings - Fork 15
adding starburst trino nyctaxi example for review #28
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this in @dhuynh!
I've left some comments inline (hopefully respecting notebook JSON 🤞 ).
There are a few points where I think what Ibis is doing can be clarified. A lot of the rest are style points -- we've been following certain conventions but we haven't documented them anywhere (very bad form from us). We will definitely put together a "style guide" for examples soon.
"outputs": [], | ||
"source": [ | ||
"import ibis\n", | ||
"import pandas as pd\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"import pandas as pd\n", |
"id": "ad9927bc", | ||
"metadata": {}, | ||
"source": [ | ||
"IMPORTANT!!!! Change your user, host, port, database, schema and roles to be relevant to your Starburst Galaxy setup. If you are using OAuth2, uncomment the keyword lines roles, and auth. Then comment PASSWORD to proceed. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here it would be useful to link to our detailed instructions on figuring out which values to use: https://ibis-project.org/backends/trino#connecting-to-starburst-managed-trino-instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would remove the choice about oauth2
or not -- it shouldn't be the focus of this example to handle multiple authentication flows and the above link also details how to connect.
"id": "e9bd2195-381a-4b20-aa12-f1ee07dafd6c", | ||
"metadata": {}, | ||
"source": [ | ||
"Ibis tables in trino can be stored through the use of con.table. We're going to create two ibis tables from our tables below:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, I think, inadvertently a touch misleading. Using con.table
isn't storing anything, we're grabbing a reference to a table that already exists in Trino, nothing is being added to the database, nor is the underlying data being changed in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
"nycjanduration_new = (\n", | ||
" nycjanduration.filter(nycjanduration.trip_distance != 0.0)\n", | ||
")\n", | ||
"nycjanduration_new\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the nycjanduration
expression isn't going to be used after this, I would suggest just using the same variable name. You avoid all the _new
in the names, e.g.
nycjanduration = nycjanduration.filter(_.trip_distance > 0)
"P.S This ibis code below at the time of writing uses a pre-release function \"bucket\". This is included with future version of IBIS. When bucket is released, you can uncomment the lines below for a temporal grouping. If you recieve an error \n", | ||
"the function does not exist, you can uncomment the line below to try to upgrade to a pre-release version of ibis:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1f8f9893-a369-4337-8c86-38045e75a461", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#python -m pip install -U 'ibis-framework[trino]' --pre" | ||
] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in getting 7.1 released -- I think we should hold off on publishing this until we don't require a pre-release feature.
adding suggestions Co-authored-by: Gil Forsyth <[email protected]>
@gforsyth tried to double check some of your changes and I got this error with regards to your NOT suggestion: ValueError Traceback (most recent call last) File ~/opt/anaconda3/lib/python3.9/site-packages/ibis/expr/types/core.py:132, in Expr.bool(self) ValueError: The truth value of an Ibis expression is not defined |
Oops, sorry about that. It should probably be: nyc_filtered = nycjantrips.filter((nycjantrips.passenger_count != 0) | ~(nycjantrips.passenger_count.isnan())) |
Adding starburst trino ibis notebook example for review with NYC taxi dataset