Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ReferencedClassFinder to base #1987

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

johnynek
Copy link
Collaborator

Ideally anything that does not increase the dependencies should be moved to base. By removing a function it looks like no one was using (and is just a trivial redirection to Config) we can do that here (although the tests do reference things in code).

cc @navinvishy @daniel-sudz I think we could use this code in the beam and spark backends in each stage of writes to try to create tokens for all the classes we can see. In kryo, this is registering classes so they have names. Note, I would sort the classes first so we assign the classes in a reproducible way that never depends on any iteration order of a hash set.

@codecov-commenter
Copy link

codecov-commenter commented Apr 11, 2022

Codecov Report

Merging #1987 (fe647f8) into develop (6434348) will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##             develop    #1987      +/-   ##
=============================================
- Coverage      37.82%   37.78%   -0.05%     
- Complexity      1116     1335     +219     
=============================================
  Files            316      315       -1     
  Lines          21117    21061      -56     
  Branches        2924     2874      -50     
=============================================
- Hits            7988     7957      -31     
+ Misses         12141    12123      -18     
+ Partials         988      981       -7     
Impacted Files Coverage Δ
...a/com/twitter/scalding/ReferencedClassFinder.scala

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6434348...fe647f8. Read the comment docs.

@navinvishy
Copy link
Contributor

In the beam backend, I see that we're getting a kryo instantiator here. We're using KryoHadoop by default which gets Cascading serialization tokens and registers those classes with Kryo. Are you suggesting removing the dependency on KryoHadoop?

Copy link
Contributor

@daniel-sudz daniel-sudz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change seems fine to me

I don't quite understand this part though:

I think we could use this code in the beam and spark backends in each stage of writes to try to create tokens for all the classes we can see. In kryo, this is registering classes so they have names. Note, I would sort the classes first so we assign the classes in a reproducible way that never depends on any iteration order of a hash set.

what is the use case for this? Having writers be aware of all exiting classes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants