Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: cannot convert y to numpy on kaggle notebook in sklearn pipeline #149

Open
jitingxu1 opened this issue Sep 4, 2024 · 0 comments
Open

Comments

@jitingxu1
Copy link
Collaborator

jitingxu1 commented Sep 4, 2024

In this competition, y column cannot be converted to numpy array.

I could run this on my local machine, but not on kaggle notebook.

~~**I could reproduce this on my local.**~~

local env

Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 10:07:17) [Clang 14.0.6 ]
scikit-learn version: 1.5.1
skorch version: 1.0.0
torch version: 2.4.0
ibis-framework version: 9.3.0

kaggle env

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
scikit-learn version: 1.2.2
skorch version: 1.0.0
torch version: 2.4.0+cpu
ibis-framework version: 9.3.0

# Wrap the PyTorch model with skorch
net = NeuralNetClassifier(
    MyModel,
    module__input_dim=635,  # Specify the input dimension
    max_epochs=1,
    lr=0.001,
    batch_size=32,
    optimizer=optim.Adam,
    criterion=nn.BCELoss,
    iterator_train__shuffle=True,
    callbacks=[
        EarlyStopping(monitor='valid_loss', patience=25, load_best=True),  # Early stopping
        LRScheduler(policy='ReduceLROnPlateau', monitor='valid_loss', factor=0.1, patience=25, min_lr=1e-6)
    ],
    verbose=1
)

# Define the sklearn pipeline with preprocessing and PyTorch model
pipeline = Pipeline([
    ('ibisml-prep', recipe),  # Preprocessing step in IbisML
    ('model', net)  # The PyTorch model wrapped as NeuralNetClassifier via skorch
])

pipeline.fit(X_train, y_train)

log

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 pipeline.fit(X_train, y_train)

File /opt/conda/lib/python3.10/site-packages/sklearn/pipeline.py:405, in Pipeline.fit(self, X, y, **fit_params)
    403     if self._final_estimator != "passthrough":
    404         fit_params_last_step = fit_params_steps[self.steps[-1][0]]
--> 405         self._final_estimator.fit(Xt, y, **fit_params_last_step)
    407 return self

File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:165, in NeuralNetClassifier.fit(self, X, y, **fit_params)
    154 """See ``NeuralNet.fit``.
    155 
    156 In contrast to ``NeuralNet.fit``, ``y`` is non-optional to
   (...)
    160 
    161 """
    162 # pylint: disable=useless-super-delegation
    163 # this is actually a pylint bug:
    164 # https://github.com/PyCQA/pylint/issues/1085
--> 165 return super(NeuralNetClassifier, self).fit(X, y, **fit_params)

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1319, in NeuralNet.fit(self, X, y, **fit_params)
   1316 if not self.warm_start or not self.initialized_:
   1317     self.initialize()
-> 1319 self.partial_fit(X, y, **fit_params)
   1320 return self

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1278, in NeuralNet.partial_fit(self, X, y, classes, **fit_params)
   1276 self.notify('on_train_begin', X=X, y=y)
   1277 try:
-> 1278     self.fit_loop(X, y, **fit_params)
   1279 except KeyboardInterrupt:
   1280     pass

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1172, in NeuralNet.fit_loop(self, X, y, epochs, **fit_params)
   1136 def fit_loop(self, X, y=None, epochs=None, **fit_params):
   1137     """The proper fit loop.
   1138 
   1139     Contains the logic of what actually happens during the fit
   (...)
   1170 
   1171     """
-> 1172     self.check_data(X, y)
   1173     self.check_training_readiness()
   1174     epochs = epochs if epochs is not None else self.max_epochs

File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:141, in NeuralNetClassifier.check_data(self, X, y)
    137         pass
    139 if y is not None:
    140     # pylint: disable=attribute-defined-outside-init
--> 141     self.classes_inferred_ = np.unique(to_numpy(y))

File /opt/conda/lib/python3.10/site-packages/skorch/utils.py:152, in to_numpy(X)
    149     return np.asarray(X)
    151 if not is_torch_data_type(X):
--> 152     raise TypeError("Cannot convert this data type to a numpy array.")
@jitingxu1 jitingxu1 changed the title bug: bug: cannot convert y to numpy on kaggle notebook in sklearn pipeline Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: backlog
Development

No branches or pull requests

1 participant