Skip to content
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.

Ibis 2.0: Can't connect to table - _parse_project_and_dataset called without database #107

Open
saschahofmann opened this issue Jan 5, 2022 · 2 comments

Comments

@saschahofmann
Copy link
Contributor

saschahofmann commented Jan 5, 2022

I am trying to update our codebase to ibis 2.0 and started getting the ValueError("Unable to determine BigQuery dataset.") error when trying to fetch a table from BigQuery like this ibis_client.table("table", database="dataset").

I followed the traceback like this, starting at /ibis_bigquery/init.py, line 177

def table(self, name, database=None) -> ir.TableExpr:
    t = super().table(name, database=database)
    ...

This calls the standard ibis table function /ibis/backends/base/sql/init.py line 41

def table(self, name, database=None):
    qualified_name = self._fully_qualified_name(name, database)
    schema = self.get_schema(qualified_name)
    ...

which in turn calls the get_schema method in bigquery /ibis_bigquery/init.py, line 329 BUT only with a name (this will be important in a sec)

def get_schema(self, name, database=None):
    table_id = self._fully_qualified_name(name, database)
    ...

Now we are trying to create the fully_qualified (/ibis_bigquery/init.py", line 183) name again but this time database is None!

def _fully_qualified_name(self, name, database):
    default_project, default_dataset = self._parse_project_and_dataset(database)
    ...

Which leads us to _parse_project_and_dataset in /ibis_bigquery/init.py, line 161

 def _parse_project_and_dataset(self, dataset) -> Tuple[str, str]:
    if not dataset and not self.dataset:
        raise ValueError("Unable to determine BigQuery dataset.")
    ...

As mentioned earlier dataset is now None and nowhere in the stacktrace above did we set self.dataset, I probably can set the dataset before calling the table method but it feels to me like this is a bug?

This code also means that the bigquery table method makes two requests to bigquery one in the table method itself and one in the get_schema method that is called via the super method. Maybe we could make bq_table a cached property to avoid that?

@saschahofmann
Copy link
Contributor Author

Right now, I am monkeypatching the class like this

from ibis.backends.base.sql import BaseSQLBackend
from ibis_bigquery import Backend

original_table = BaseSQLBackend.table

def table(self, name, database=None):
    self.set_database(database)
    return original_table(self, name, database)


Backend.table = table

which makes me wonder why the Bigquery Backend class has it's own table method?

@jayceslesar
Copy link

jayceslesar commented Jan 14, 2022

@saschahofmann I'm curious is there a difference between the syntax at the top vs

import ibis
import ibis_bigquery


conn = ibis_bigquery.connect(
    project_id='some_project',
    dataset_id='some_dataset'
)

table = conn.table('some_table')
# can compose queries on table now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants