-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use soft values in the EntityCache; approximate entity size #490
base: main
Are you sure you want to change the base?
Conversation
+ value.getEntity().getProperties().length() | ||
+ value.getEntity().getInternalProperties().length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption here is wrong. There is no guarantee that one character is only one byte, even with "compact strings" - it's likely off by factor 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an approximate size. Even if it's off by a factor of 10, that should be okay. A 1GB cache is greatly preferable to an unbounded cache.
public static final long WEIGHT_PER_MB = 1024 * 1024; | ||
|
||
/* Represents the approximate size of an entity beyond the properties */ | ||
private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the value 1000 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this old comment -- the apparent assumption was that entities are generally ~1KB.
This means in the worst case, we will not get a larger number of entries due to this change. It should strictly be smaller.
@@ -828,7 +828,7 @@ void dropEntity(List<PolarisEntityCore> catalogPath, PolarisEntityCore entityToD | |||
} | |||
|
|||
/** Grant a privilege to a catalog role */ | |||
void grantPrivilege( | |||
public void grantPrivilege( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here look unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not; the EntityCacheTest
was moved into a new package and so these methods were no longer callable from it if they remained package-private
.expireAfterAccess(1, TimeUnit.HOURS) // Expire entries after 1 hour of no access | ||
.removalListener(removalListener) // Set the removal listener | ||
.softValues() // Account for memory pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either there's a weigher or there are soft-references, the latter is IMHO a bad choice, because it can likely ruin the efficiency of the eviction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs seem to say you can do both, WDYM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for soft values being a bad choice, in my experience this is quite a common practice and the degradation we're likely to see is far less than the degradation we'd see in a situation where heap space gets exhausted and we do not have soft values.
Assuming there is not significant memory pressure, my expectation is that GC should more or less ignore the SoftReference objects.
If there's a specific performance concern that can be borne out in a benchmark, we should address it.
this.byId = | ||
Caffeine.newBuilder() | ||
.maximumSize(100_000) // Set maximum size to 100,000 elements | ||
.maximumWeight(100 * EntityWeigher.WEIGHT_PER_MB) // Goal is ~100MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're here - why is this always hard coded and not configurable? Or at least determined based on the actual heap size (which is not as easy as it sounds)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good callout. My concern with making it configurable is that the limit is close to arbitrary.
As you noted in another comment, we have a "goal" of 100MB but that could really be 200MB or 1MB depending on how eviction happens with soft values and the weights. So while raising the limit definitely gets you a bigger cache, it's quite opaque as to what value you would actually want to set.
Also, the EntityCache is ideally very transparent and not something a typical end user would want to concern themselves with.
In light of the above I kept it hard-coded for now but if you feel strongly that we should make it a featureConfiguration
I am comfortable doing so
Let's defer this change until the bigger CDI refactoring and potential follow-ups have been done. |
To me, this seems quite unrelated to the CDI refactoring. Clearly, we cannot pause all development on the project due to a potential conflict with another existing PR. If there's an especially nasty conflict you're worried about here, can you highlight it? Maybe we can try to minimize the potential problem. |
Sure, nobody says we shall pause everything. But how the cache is used is heavily influenced by the CDI work. I still have concerns about the heap and CPU pressure in this PR and #433 and how it's persisted. Also, having a weigher meant to limit heap pressure but having it's calculation being largely off is IMO not good. It's also that the design has hard limits, meaning the cache cannot be as effective as it could be - and at the same time there can be an unbounded number of |
Description
The EntityCache currently uses a hard-coded limit of 100k entities, which is apparently intended to keep the cache around 100MB based on the assumption that entities are ~1KB.
If that assumption is not true, or if there is not even 100MB of free memory, we should not let the cache use up too much memory.
This PR adjusts the cache to use soft values to allow GC to clean up entries when memory pressure becomes too great.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Added a new test in
EntityCacheTest
Checklist:
Please delete options that are not relevant.