Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dekaf: Get UI working with new materialization types #1270

Open
3 tasks
travjenkins opened this issue Sep 25, 2024 · 2 comments
Open
3 tasks

dekaf: Get UI working with new materialization types #1270

travjenkins opened this issue Sep 25, 2024 · 2 comments
Assignees
Labels
connectors Related to connectors enhancement New feature or request

Comments

@travjenkins
Copy link
Member

travjenkins commented Sep 25, 2024

UI portion of work for: estuary/flow#1622

Notes:
Need to handle empty materialization endpoint configs (this should work as is)
We'll have a hard coded (I think) row in connector tags to represent this
The endpoint will contain a nested property dekaf in endpointconfig that will be empty

Requirements:

  • A new connector tag row will have the configs for dekaf.
  • Need to handle the variant tag properly.
  • endpoint:connector:image is normal but needs to support endpoint:dekaf:something
@travjenkins travjenkins added enhancement New feature or request connectors Related to connectors labels Sep 25, 2024
@travjenkins travjenkins changed the title Get UI working with dekaf materialization dekaf: Get UI working with new materialization types Sep 25, 2024
@kiahna-tucker
Copy link
Member

kiahna-tucker commented Nov 4, 2024

From my vantage point, Dekaf connectors should supply data to the same columns used for their standard counterparts. If the connector_tags table provided a reliable means to identify a Dekaf connector (e.g., a boolean flag in the table itself, an image_tag with a set prefix), supporting this connector type would be relatively simple.

Questions

  • Are there connectors and/or connector_tags columns used for standard connectors that cannot be attributed a value for Dekaf connectors? If so, what are they and why?

  • Do Dekaf variants follow a specific string pattern (e.g., 'some-hyphen-pattern', 'some/slash/pattern', etc.)? If so, what is the pattern and is the backend fully responsible for its validation?

  • Are Dekaf variants suffixed by an image tag akin to the analogous image path (i.e., endpoint.connector.image)?

  • What are the core distinctions between Dekaf variants and standard connector image paths (i.e., endpoint.connector.image)?

  • To confirm, the endpoint config of a Dekaf connector (i.e., endpoint.dekaf.config) is still expected to be an empty object, correct?

  • Are there any common pitfalls or errors users may face that are specific to Dekaf connectors? Relatively, how helpful are the errors returned by Dekaf connectors in your opinion?

  • What are the steps required to use or mock a Dekaf connector when testing locally? The primary objective here is to ensure the client can be sufficiently tested, at the very least.

  • How are customers interacting with Dekaf today?

Reference

The client utilizes the following, non-standard connectors columns:

  • image_name - this string value is referenced directly and truncated to suffix the entity name (e.g., ghcr.io/estuary/source-connector becomes source-connector). The unedited value is primarily used to construct the image path (i.e., endpoint.connector.image) but it is broadly referenced.

  • logo_url

  • title

The client utilizes the following, non-standard connector_tags columns:

  • connector_id

  • documentation_url - this string value is referenced directly. It is primarily used to dictate whether the connector configuration documentation can be opened in a side panel in workflows or linked to on the details page of a given task.

  • endpoint_spec_schema

  • image_tag - this string value is referenced directly. It is primarily used to evaluate the connector version and construct the image path (i.e., endpoint.connector.image), but it is broadly referenced. The client expects a colon to be the first character of the string.

  • protocol

  • resource_spec_schema

NOTE: The use of the data provided by many of these columns is expected and self-explanatory. Standard columns defined within the internal Supabase table model and optional columns are not mentioned above.

@jshearer

@jshearer
Copy link
Contributor

jshearer commented Nov 7, 2024

Alright, now I think we have a pretty good idea of how this will look.

For some background, Dekaf is a new service we're exposing that will let users read their collections' data as if they were served by Kafka. This lets us integrate with a whole bunch of services for the cost of one integration, as opposed to having to write materialization connectors targetting each one.

Because Kafka is designed around a client/server architecture, and we're emulating the server side, we can't easily package Dekaf as a regular materialization connector. Instead what we've chosen to do is expose it as a standalone service running in each data-plane. Users of Dekaf will connect to it like they would a regular Kafka broker, and will be presented with an environment that looks like regular Kafka, with their existing collections showing up as kafka topics.

Rather than just exposing every one of your collections as a separate topic, we decided that it would make more sense to model Dekaf usage around the same concept of materializations that we already use to model all of our other materialization connectors. This is useful for a few reasons:

  • Configurability
    • Endpoint config
    • Projections etc
  • Stats/logs
  • Monitoring/billing

So, in order to make this work, we added a new "kind" of materialization. Currently, we have local- and connector- type materializations, so it was pretty straightforward to add dekaf as a new kind. Practically, a dekaf materialization looks something like this:

test://example/catalog.yaml:
  materializations:
    materialization/dekaf/inline:
      endpoint:
        dekaf:
          variant: foo
          config:
            strict_topic_names: false
      bindings:
        - source: some/source/collection
          resource:
            topic_name: foo

These will be represented in the database just like existing materializations, but a few important differences. Unlike connector-type materializations, Dekaf doesn't use Docker, so the current behavior of looking up the connectors and connector_tags row based on the image config value won't work. But, we do want to support multiple variants of Dekaf in the UI, in order to present the specific other systems we integrate with. So, in order to represent this, we've come up with the following.

Rows in connectors should be considered Dekaf connectors if their image_name starts with ghcr.io/estuary/dekaf-.
And,
Dekaf connectors with variant foo will map to the connectors row with image_name equal to ghcr.io/estuary/dekaf-foo.

This has two consequences:

  • When going to create a new materialization, if you select a Dekaf-type connector (i.e its image_name starts with ghcr.io/estuary/dekaf-), then you should initialize it with endpoint: {dekaf: {variant: "..."}, config: {..}} where the variant is whatever comes after dekaf- in its image_name.
  • When selecting the connectors row for for a particular materialization (i.e to render its logo, title, or look up its schemas), if it is a Dekaf-type connector (i.e it has endpoint: {dekaf: {...}} instead of endpoint: {connector: {..}}), you should use the connectors row with image_name = ghcr.io/estuary/dekaf-${spec.endpoint.dekaf.variant}.

As far as I know, everything else should be the same. Support for the new materialization endpoint type has been added and there is test coverage of support for this new syntax, but other than that Dekaf itself doesn't actually support it yet, so you're on the bleeding edge. Given that, it would be super nice if we could get support for these new materialization types out behind a feature flag, as that would make testing the whole process end to end much easier once I finish the work on Dekaf to support being configured and authenticated by a materialization.

One really nice nice-to-have here would be the addition of some annotation like x-generate-token: true (or even x-generate-with: "random"? 😁) to allow for automatically generating a nice long, random string for the dekaf bearer token, which will serve to authorize access to a particular materialization through Dekaf. See here for the endpoint config shape, which will get serialized as a JSON schema and be provided in a corresponding connector_tags row for each dekaf connectors row, just like any other connector.

To confirm, the endpoint config of a Dekaf connector (i.e., endpoint.dekaf.config) is still expected to be an empty object, correct?

Hmm, I suspect the answer is yes based on how I've seen the UI fill in defaults only once a particular field is edited, but to be clear I would expect this to work the same way all the other connectors do wrt support for default, required etc. Maybe it starts out as {}, and then gets filled in with defaults at publication time if you don't specify any values yourself?

What are the steps required to use or mock a Dekaf connector when testing locally? The primary objective here is to ensure the client can be sufficiently tested, at the very least.

The story here isn't great right now. I'm currently working on support for e2e Dekaf testing, and part of that will be adding Dekaf to the Tiltfile. Once that's done, the only thing left to test Dekaf in a local stack will be to create a Dekaf-type materialization, and some kcat commands to attempt to read from one of its bindings via the running Dekaf service. I'll send along instructions for this once it's working.

How are customers interacting with Dekaf today?

Today, they are pointing their Kafka consumers at dekaf.estuary-data.com:9092 and using a refresh token they generate in the UI as a password, and then selecting one of their tenant's collections to read via Dekaf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connectors Related to connectors enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants