-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP - do not merge!] Move sparkdl utilities for conversion between numpy arrays and image schema to ImageSchema #90
base: master
Are you sure you want to change the base?
Conversation
a959b18
to
b533e69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, just a few minor comments.
buffer=image.data, | ||
strides=(width * nChannels, nChannels, 1)) | ||
strides=(width * nChannels * itemSz, nChannels * itemSz, itemSz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will numpy figure out the right strides if we don't pass it explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I would think so. The original code from ms folks was like this and I did not want to do more changes than necessary.
@@ -152,29 +192,29 @@ def toImage(self, array, origin=""): | |||
"array argument should be numpy.ndarray; however, it got [%s]." % type(array)) | |||
|
|||
if array.ndim != 3: | |||
raise ValueError("Invalid array shape") | |||
raise ValueError("Invalid array shape %s" % str(array.shape)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to reshape 2d arrays to be shape + (1,)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with their approach. I think it's better to make the caller pass the arguments in expected format rather than trying to auto-convert unless that is completely unambiguous.
So in this case, we say images are always 3 dimensional arrays and it's up to the user to make sure they conform to that. Otherwise they might be passing something else than they think they are passing and we would mask their bug until later.
python/sparkdl/image/image.py
Outdated
"Unexpected/unsupported array data type '%s', currently only supported formats are %s" % | ||
(str( | ||
array.dtype), str( | ||
self._numpyToOcvMap.keys()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get this on fewer lines or sue some variables, it looks odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it does, I think it's the autopep8 being weird here. I'll reformat that.
@@ -128,4 +128,6 @@ class DeepImageFeaturizerSuite extends FunSuite with TestSparkContext with Defau | |||
.setOutputCol("myOutput") | |||
testDefaultReadWrite(featurizer) | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra white space.
b533e69
to
52c6041
Compare
python/sparkdl/image/image.py
Outdated
dataType=x.dataType(), | ||
nptype=self._ocvToNumpyMap[x.dataType()]) | ||
for x in ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()] | ||
return [x for x in self._ocvTypes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? Isn't self._ocvTypes
already a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose was to return copy of the list so that the private member can not be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oic, I usually see myList[:]
or list(myList)
to make a (shallow) copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's nicer :)
The members of the list are tuples so shallow copy suffices here.
52c6041
to
f616462
Compare
Codecov Report
@@ Coverage Diff @@
## master #90 +/- ##
==========================================
+ Coverage 82.49% 83.92% +1.43%
==========================================
Files 33 33
Lines 1879 1866 -13
Branches 35 39 +4
==========================================
+ Hits 1550 1566 +16
+ Misses 329 300 -29
Continue to review full report at Codecov.
|
f616462
to
c8c90e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to merge this? It'll be difficult to maintain and resolve the differences in the copied files between DL Pipelines and Spark. It'd be much easier cognitively to merge these changes into Spark then remove the corresponding files in sparkdl. If we need to keep these files in sparkdl until Spark 2.4 is out, it'd be safer to first get the changes merged into Spark then copy the exact changes here; if we merge this first, it could easily get out of sync with whatever revisions get made in Spark.
@sueann Yes I agree, I would merge spark version first and merge this one only after spark 2.4 is released. I made the PR here mostly because that's what we need the changes for, so it can be reviewed in context, also to run tests. I'll mark it WIP. |
ah ok got it. thanks! |
[WIP] Preparation for moving stuff to Spark.
Moved utilities for image schema <=> numpy array conversion to (copy pasted from spark 2.3) Image schema code.