-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add invalidation for the schema-list cache #215
Comments
Hi @voropaevp I'm trying to think through how urgent this problem is. A typical cache ttl might be 600 seconds. Realistically on a production pipeline there will be >600 seconds between a schema being published to Iglu, and when production tracking starts using the new schema. I understand that there is a 600 second window where either data is wrongly (but safely) loaded, or it becomes a failed event. But I'm not feeling like it's a critical problem. Please tell me though if there is a more dangerous edge case that I haven't thought of. These ideas around cache invalidation might become more important for development setups. E.g. if we ever put the rdb transformer into Snowplow Mini, or its replacement. In dev setups, it is probably more important to allow rapid evolution of schema versions. This problem reminded me of another issue, which is similar but probably more important: This one. Although it's for Redshift, which you haven't started looking at yet. |
As this is possible...I think it's unlikely to happen on real life production setup? With TTL in cache we are sure that eventually we end up with fresh state everywhere, so not getting stuck with old state is the crucial bit. This is somewhat similar case to patching schemas - like But as we're in the process of making transformer/loader more robust/resilient and you have @voropaevp some neat idea how to solve (e.g. the second option you mentioned) it...why not? ;) |
I also like your second possible solution, out of the two. |
Add invalidation for the schema-list cache 2.2.0 branch (close #215)
We had a problem with listSchemasLike when using the mustIncludeKey option, which was introduced in #215 For a particular schema... - Iglu Server 1 hosted versions `1-0-0` and `1-0-1` - Iglu Server 2 hosted versions `1-0-0`, `1-0-1`, `1-0-2` and `1-0-3` and `mustIncludeKey` was set to `1-0-3`. But `listSchemasLike` returned only `1-0-0` and `1-0-1` because Iglu Server 1 had higher priority. After this change, `listSchemasLike` will return the list from Iglu Server 2 under these circumstances. Even though it has lower priority, it is the server that contains the `mustIncludeKey`.
We had a problem with listSchemasLike when using the mustIncludeKey option, which was introduced in #215 For a particular schema... - Iglu Server 1 hosted versions `1-0-0` and `1-0-1` - Iglu Server 2 hosted versions `1-0-0`, `1-0-1`, `1-0-2` and `1-0-3` and `mustIncludeKey` was set to `1-0-3`. But `listSchemasLike` returned only `1-0-0` and `1-0-1` because Iglu Server 1 had higher priority. After this change, `listSchemasLike` will return the list from Iglu Server 2 under these circumstances. Even though it has lower priority, it is the server that contains the `mustIncludeKey`.
We had a problem with listSchemasLike when using the mustIncludeKey option, which was introduced in #215 For a particular schema... - Iglu Server 1 hosted versions `1-0-0` and `1-0-1` - Iglu Server 2 hosted versions `1-0-0`, `1-0-1`, `1-0-2` and `1-0-3` and `mustIncludeKey` was set to `1-0-3`. But `listSchemasLike` returned only `1-0-0` and `1-0-1` because Iglu Server 1 had higher priority. After this change, `listSchemasLike` will return the list from Iglu Server 2 under these circumstances. Even though it has lower priority, it is the server that contains the `mustIncludeKey`.
We had a problem with listSchemasLike when using the mustIncludeKey option, which was introduced in #215 For a particular schema... - Iglu Server 1 hosted versions `1-0-0` and `1-0-1` - Iglu Server 2 hosted versions `1-0-0`, `1-0-1`, `1-0-2` and `1-0-3` and `mustIncludeKey` was set to `1-0-3`. But `listSchemasLike` returned only `1-0-0` and `1-0-1` because Iglu Server 1 had higher priority. After this change, `listSchemasLike` will return the list from Iglu Server 2 under these circumstances. Even though it has lower priority, it is the server that contains the `mustIncludeKey`.
Problem
Currently, schema lists are cached based on the
SchemaListKey
, which does not include therevision
andaddition
parts of the fullSchemaKey
.This makes up for the breaking scenario, particularly affecting the transformer.
1-0-1
is added to the server1-*-*
as [1-0-0
]1-0-0
to be the highest available schema for the model1-0-1
would get downcasted to the1-0-0
oftentimes producing a bad row.Possible solutions
listSimilarSchemas(k:
SchemaKey)
. Which could detect isk
was in the list and automatically invalidate/refresh the cache.I think the second solution is a cleaner one.
The text was updated successfully, but these errors were encountered: