fix(steps): add unit test and fix unique col scaling #158

jitingxu1 · 2024-09-13T06:53:19Z

changes:

fix unique col standardization
add unit test

resolve #119

codecov-commenter · 2024-09-13T07:03:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.73%. Comparing base (6dce35e) to head (b49543b).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #158      +/-   ##
==========================================
+ Coverage   88.70%   89.73%   +1.03%     
==========================================
  Files          27       28       +1     
  Lines        2009     2045      +36     
==========================================
+ Hits         1782     1835      +53     
+ Misses        227      210      -17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

deepyaman · 2024-09-13T19:10:46Z

This doesn't quite fix it; it throws an error. I'm not too keen on it.

Why? Preprocessing can be like a recipe you apply, and you may always want to scale your numeric columns. I don't want to throw an error just because one of those columns is not unique; should figure out how to handle it instead.

See how scikit-learn handles it, for example: https://github.com/betatim/scikit-learn/blob/main/sklearn/preprocessing/_data.py#L82 (huh, not sure how I got to somebody's fork, but whatever... get the point)

If you want to refactor this PR to just add the unit tests, and create an issue to handle scaling without throwing an error, we can do that (but doesn't need to go in today).

(Or you can push off adding the unit tests until address the scaling in a future PR)

jitingxu1 · 2024-09-17T18:04:40Z

This doesn't quite fix it; it throws an error. I'm not too keen on it.

Why? Preprocessing can be like a recipe you apply, and you may always want to scale your numeric columns. I don't want to throw an error just because one of those columns is not unique; should figure out how to handle it instead.

See how scikit-learn handles it, for example: https://github.com/betatim/scikit-learn/blob/main/sklearn/preprocessing/_data.py#L82 (huh, not sure how I got to somebody's fork, but whatever... get the point)

If you want to refactor this PR to just add the unit tests, and create an issue to handle scaling without throwing an error, we can do that (but doesn't need to go in today).

(Or you can push off adding the unit tests until address the scaling in a future PR)

make sense, I will update this PR soon.

jitingxu1 · 2024-09-17T21:50:30Z

ibis_ml/steps/_standardize.py

@@ -11,6 +11,8 @@
    from collections.abc import Iterable

 _DOCS_PAGE_NAME = "standardization"
+# a small epsilon value to handle near-constant columns during normalization
+_APPROX_EPS = 10e-7


numpy could get the machine precision for different float precision. I took the float16 precision as all the approximate precision for all float types to bypass the use of numpy.

add unit test and fix unique col scaling

d6446c7

jitingxu1 force-pushed the fix-all-unique-value-when-scale branch from cd44247 to d6446c7 Compare September 13, 2024 07:00

Merge branch 'ibis-project:main' into fix-all-unique-value-when-scale

72eb5b3

jitingxu1 commented Sep 17, 2024

View reviewed changes

jitingxu1 requested a review from deepyaman September 17, 2024 21:51

handle near constant column

b49543b

jitingxu1 force-pushed the fix-all-unique-value-when-scale branch from ac7e48d to b49543b Compare September 17, 2024 21:52

deepyaman merged commit c32d604 into ibis-project:main Sep 25, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(steps): add unit test and fix unique col scaling #158

fix(steps): add unit test and fix unique col scaling #158

jitingxu1 commented Sep 13, 2024

codecov-commenter commented Sep 13, 2024 •

edited

Loading

deepyaman commented Sep 13, 2024

jitingxu1 commented Sep 17, 2024

jitingxu1 Sep 17, 2024

fix(steps): add unit test and fix unique col scaling #158

fix(steps): add unit test and fix unique col scaling #158

Conversation

jitingxu1 commented Sep 13, 2024

codecov-commenter commented Sep 13, 2024 • edited Loading

Codecov Report

deepyaman commented Sep 13, 2024

jitingxu1 commented Sep 17, 2024

jitingxu1 Sep 17, 2024

Choose a reason for hiding this comment

codecov-commenter commented Sep 13, 2024 •

edited

Loading