-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of "private" Spark APIs #300
Comments
Thank @ceedubs, I always worried a bit about this. Databricks and other fork's runtimes will be an issue and that's at least something we have to document. I have not extensively analyzed if we absolutely need to do this, but I am afraid that some of the encoding work we have require some APIs that are not exposed as public by core Spark. |
I had no idea about these mima exclusions, it's really unfortunate... Anything actionable on our side? |
@OlivierBlanvillain If there were ways to reduce dependencies on sections of code under these exclusions, that would be great. Short of that, the actionable item might be to just warn about this in the README. I'd be willing to contribute this documentation, but it might be a little while before I get to it. |
Another anecdote: Technically speaking, I believe EMR Spark is also a fork as they've backported changes before (but much less often), so it's entirely possible that some bincompat issue can happen there as well. |
…se rc1, so1 not a default repo it seems
Frameless is depending on portions of Spark for which there is no binary compatibility commitment. For example, Frameless uses
StaticInvoke
, which is part of theorg.apache.spark.sql.catalyst.expressions.objects
package. If you look at the (bountiful) mima exclusions in Spark, the entireorg.apache.spark.sql.catalyst
package is not checked for binary compatibility.I don't really consider this a bug with frameless, but I wanted to at least raise it as a concern as it recently bit us at work.
backstory for those who care
At work we use Databricks runtime 3.5. Databricks claims that this runtime uses Spark 2.2. However, we ran into a bewildering issue with a binary incompatibility between Frameless and the runtime Spark version (related to
org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke
). After quite a bit of investigation, we realized that the Databricks runtime doesn't actually include Spark 2.2 proper, but a private fork of it that has some incompatible changes. It has a backported change from Spark 2.3 that is incompatible with Spark 2.2 (and the version of Frameless that is built against Spark 2.2). We can work around this particular issue by moving to Spark 2.3 and the Databricks 4.0 runtime, but it's tough to know what other incompatibilities could be lurking in the private forks, and I could envision other people running into similar issues (especially if they can't move to Spark 2.3).The text was updated successfully, but these errors were encountered: