-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(steps): add unit test and fix unique col scaling #158
fix(steps): add unit test and fix unique col scaling #158
Conversation
cd44247
to
d6446c7
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #158 +/- ##
==========================================
+ Coverage 88.70% 89.73% +1.03%
==========================================
Files 27 28 +1
Lines 2009 2045 +36
==========================================
+ Hits 1782 1835 +53
+ Misses 227 210 -17 ☔ View full report in Codecov by Sentry. |
This doesn't quite fix it; it throws an error. I'm not too keen on it. Why? Preprocessing can be like a recipe you apply, and you may always want to scale your numeric columns. I don't want to throw an error just because one of those columns is not unique; should figure out how to handle it instead. See how scikit-learn handles it, for example: https://github.com/betatim/scikit-learn/blob/main/sklearn/preprocessing/_data.py#L82 (huh, not sure how I got to somebody's fork, but whatever... get the point) If you want to refactor this PR to just add the unit tests, and create an issue to handle scaling without throwing an error, we can do that (but doesn't need to go in today). (Or you can push off adding the unit tests until address the scaling in a future PR) |
make sense, I will update this PR soon. |
@@ -11,6 +11,8 @@ | |||
from collections.abc import Iterable | |||
|
|||
_DOCS_PAGE_NAME = "standardization" | |||
# a small epsilon value to handle near-constant columns during normalization | |||
_APPROX_EPS = 10e-7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy could get the machine precision for different float precision. I took the float16 precision as all the approximate precision for all float types to bypass the use of numpy.
ac7e48d
to
b49543b
Compare
changes:
resolve #119