-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support persisting TableMetadata in the metastore #433
base: main
Are you sure you want to change the base?
Conversation
Actually, I think loading/saving table metadata should be transparently done, using a delegating code architecture.
That way the calling code does not need to care about caching at all, and testing would be quite isolated. Using such an interface with various implementations/delegates can also ease testing quite a bit. |
Yes @snazy, I completely agree here. It's part of why I'm advocating for the loadTableV2 proposal in Iceberg. Even for this workstream, I noted in the very beginning as a non-goal for this PR:
I think it will be important to add in the future, but I think this PR is valuable as-is without this optimization.
I think this idea sounds good. Ideally, if loadTableV2 goes in, we could couple our interface with that API |
I think, we can already experiment and reason about good approaches of how to "chunk the huge table/view/udf metadata blobs". My concerns about overly increased heap pressure, especially when metadata becomes big/bigger/huge, still stand. We should really not rush things that have the potential to excessive over-use of resources. |
However, it would help a lot, if different functionalities (think: persistence and cache) are abstracted and separated. So some REST resource implementation calls "loadMetadataOfThatTable()" on some interface - the first implementation is maybe a tracing facade, the next delegate is a caching layer, the next persistence metrics, and the next the actual persistence layer. |
Unless I am missing something, this is "safe" if In the absence of testing showing that there is measurable performance degradation due to churning large strings in the heap when If a benchmark attached to the PR would make you feel better about the approach, I would be happy to add something like that -- but if you think the approach is totally dead in the water we should have that discussion. |
Still, your approach re-serialized especially the big ones over and over again - and in fact: that is completely unbounded. To be clear: that's opening the door wider for a potential denial-of-service due to out-of-memory situations. |
Doesn't the catalog always have to do this regardless? We have to load the metadata.json from disk no matter what so the risk from unbounded json's is always present regardless of whether the Catalog also writes and reads a copy for itself. |
The answer is actually that technically a catalog does not really have to materialize it in full in memory. It could technically stream it directly from the object store - or in chunks from another source. @RussellSpitzer The bigger problem than materializing the big thing once is, that this change RE-serializes it again - and that is the real issue. |
This is true, but we are not talking about what an arbitrary catalog technically has to do but what Polaris actually does do. In the future I think we should optimize this with a structured schema in the metastore (perhaps that more closely matches the schema of the Java TableMetadata object) -- and when we do, we should optimize the cache as well. The question is whether this future optimization is a blocker for adding caching on top of the current behavior.
I have not observed much overhead associated with this. If this is the really the key point here, let's add a benchmark? |
case PolarisConfiguration.METADATA_CACHE_MAX_BYTES_INFINITE_CACHING -> true; | ||
case PolarisConfiguration.METADATA_CACHE_MAX_BYTES_NO_CACHING -> false; | ||
default -> { | ||
metadataJson = TableMetadataParser.toJson(metadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snazy in this way, the re-serialization only happens when the feature is enabled
I don't think this is actually a big deal, especially if users can decided whether or not to use it. Users can potentially reduce their latency now for a cost of more memory usage and possible cpu cost. I honestly think most folks would want this now regardless of the current cost since I've seen several private implementations of REST catalogs that do this. I also probably would have it on by default. |
@snazy sorry for all the pings. I took another look at your comments and the PR, and I identified one idea that I wanted to get your POV on. I added this commit to bound the extra serde. I am not in love with this exact implementation, but what do you think like an approach like this? Does this address your concerns around unbounded serde in the event that someone has a metadata.json which exceeds the configured limit? |
this.key = key; | ||
this.description = description; | ||
this.defaultValue = defaultValue; | ||
this.catalogConfigImpl = catalogConfig; | ||
this.typ = (Class<T>) defaultValue.getClass(); | ||
this.validation = validation; | ||
|
||
validate(cast(defaultValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a safety check? Feels like this shouldn't be able to break right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also isn't this redundant now. (cast calls validate)
So wouldn't this be
validate(
cast(defaultValue){
validate(defaultValue)
}
}
@@ -53,14 +62,27 @@ public String catalogConfig() { | |||
} | |||
|
|||
T cast(Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda want to call this apply() now instead of cast ... :)
Just because it does dual duty now
TableMetadataParser.toJson(metadata, generator); | ||
generator.flush(); | ||
String result = boundedWriter.toString(); | ||
if (boundedWriter.isLimitExceeded()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A previous implementation had BoundedStringWriter
throw an exception when the limit is hit, so the JsonGenerator
actually stops instead of just sending strings into the void.
I am okay going back to this, but I don't like using an exception as flow control. WDYT @snazy ?
Not sure - but the direction is correct. However, I think there's a much bigger problem introduced by this PR: the permanent heap usage. Even if you bound each metadata to 1MB, that can accumulate to 100,000 * 1MB (~= 95GB) of permanent heap usage. With 10MB that's nearly 1TB of heap. Plus all the other attributes plus heap pressure during runtime. (100,000 entries is the hard coded limit in |
Hi @snazy; I agree that the EntityCache is potentially a problem. I think it can be optimized pretty easily with a Weigher. Although the metastore caching is disabled by default, do you view this as a blocker by itself? One very simple thing we could do in this PR is just adjust to use Indeed originally in older versions of this PR I had the metadata skip the EntityCache, but @dennishuo convinced me that in many cases it's quite useful to have the in-memory caching. |
Yes, the heap usage leading to OOMs is a blocker. |
Description
This adds a new flag
METADATA_CACHE_MAX_BYTES
which allows the catalog to store table metadata in the metastore and vend it from there when loadTable is called.Entries are cached based on the metadata location. Currently, the entire metadata.json content is cached.
Features not included in this PR:
There is partial support for (1) here and I want to extend it, but the goal is to structure things in a way that will allow us to implement (2) and (3) in the future as well.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Existing tests vend table metadata correctly when caching is enabled.
Added a small test in
BasePolarisCatalogTest
to cover the basic semantics of cachingManual testing with eclipselink -- I observed the entities getting created in Postgres and saw large metadata being cached:
With MySQL, small metadata is persisted:
However large metadata may cause
internalproperties
to exceed the size limit and nothing will be cached. Calls still return safely.