Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add multi-version design document #1780

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
361 changes: 361 additions & 0 deletions docs/design/proposals/multi-version/multi-version.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
# ACK Multi-Version CRD support

## Background

### ACK and upgrade issues

AWS Controllers for Kubernetes (ACK) is a set of open-source controllers for
managing AWS resources in Kubernetes. It's designed to simplify the management
of AWS resources in Kubernetes by taking advantage of the native kubernetes
management system known as the Control loop or Reconcilliating systems.

Currently, ACK uses a code-genrator to generate everything that is need to
build and run a controller: API definitions, CRDs, AWSResourceManager, helm
charts etc... 98% of the controllers code is generated by the ACK community
members. When a new service team introduces a new field, resource or
operation all we do on the ACK side is bumping the aws-sdk-go version and
regenerate the controller. :magic:

However, when service teams/ack developers have to rename a field or make
a field a kubernetes secret reference, they make significant changes to
the CRD that are backward incompatible. In kubernetes world, CRD schemas
should not be modified but rather introduced in a new CRD version.

### Kubernetes and multi-version CRDs

In a multi-version CRDs, each version of the resource is identified by a
distinct version string (v1alpha1, v1beta2, v1 etc...). Kubernetes uses the
version string to route API requests to the appropriate version of the
resource.

Even when dealing with multiple versions, we still need one controller handling
the main "Hub" version, and use some "mutating" mechanisms to transform the
non-main "Spoke" version to the hub version. This mechanism is known as the
mutating webhooks.

Mutating webhooks are a type of admission controller that allows you to modify
incoming requests to Kubernetes API server before they are persisted. They can
be used to modify or validate the content of a Kubernetes object before it's
created or updated. There are three known type of admission controllers:
- Mutating Webhooks
- Defaulting Webhooks
- Validating Webhooks

Mutating webhooks work by intercepting incoming requests to the Kubernetes
API server and forwarding them to a webhook endpoint. The webhook endpoint
can then modify the request payload, add, remove or change fields or validate
the request payload before it's persisted in etcd storage. Once the webhook
endpoint modifies the request payload, the admission controller sends the
updated payload to the API server for storage.

In the context of multi-version CRDs, mutating webhooks can be used to
automatically update older versions of custom resources to the latest version.
The mutating webhook can modify the payload of an older version of the
resource to match the new version of the resource. This can help you maintain
backwards compatibility when you need to change the schema of a custom resource.

[TODO Add schema explaining the admission webhooks + api-server + controller
interactions]

## Motifivation

The purpose of this design document is to outline a plan for adding support
for multi-versioning and generating mutating webhooks for ACK Controllers.
Multi-versioning and mutating webhooks are important features that can help
improve the reliability and maintainability of custom resources in Kubernetes.

This document will provide an overview of the current state of ACK code-generator
and existing controllers, and the need for multi-versioning and generating
mutating webhooks.
It will also propose a solution for adding these features to ACK and outline
the technical challenges involved in implementing them. The document will then
provide a high-level plan for how to implement multi-versioning and mutating
webhooks in ACK and discuss best practices for testing and validation.

## Goals

- Provide a clear understanding of the need for multi-versioning and
generating mutating webhooks for ACK Controllers.
- Propose a solution for adding these features to ACK
- Outline the technical challenges involved in implementing multi-versioning
and generating mutating webhooks in ACK.
- Provide a high-level plan for how to implement multi-versioning and
generating mutating webhooks in ACK
- Discuss best practices for testing and validation of the new features


## Non-Goals

- Provide a comprehensive overview of all features of ACK
- Provide detailed instructions for using ACK
- Discuss implementation details outside of the scope of multi-versioning
and generating mutating webhooks in ACK

## Prior work

Prior to this design, several components were built to support
multi-versioning in Amazon Controllers for Kubernetes (ACK). The focus
of these efforts was mainly on modifying the ACK/code-generator tool
to generate API definitions for different API versions and enabling
the webhook server through controller flags. A key contribution was the
generalization of webhook generation beyond just mutating webhooks,
as described in the webhook PR.

Github [issue][mv-gh-issue] and
code-generator PRs:
- https://github.com/aws-controllers-k8s/code-generator/pull/108
- https://github.com/aws-controllers-k8s/code-generator/pull/111
- https://github.com/aws-controllers-k8s/code-generator/pull/121
- https://github.com/aws-controllers-k8s/code-generator/pull/122

### Issue with prior work

However, one issue that prevented completion of the webhook PR was the
difference between how ACK views CRD versions and how they are handled
in Kubernetes. In ACK, all resources are versioned with the same APIVersion,
while in Kubernetes, resources can have multiple versions and can deprecate
at various stages and levels. This makes it challenging to implement
webhooks that can handle multiple versions and deprecations effectively.

## Overview of the design

The proposed design aims to address the challenges of multi-versioning
and generating mutating webhooks in Amazon Controllers for Kubernetes (ACK).
It involves modifications to the ACK/code-generator tool and the implementation
of a webhook server to handle multiple versions and deprecations of Custom
Resource Definitions (CRDs). The solution uses a combination of Kubernetes API
machinery and controller-runtime libraries, and the document outlines a
high-level plan for implementation, including updating the code-generator
tool and modifying ACK's controllers. Additionally, the document discusses
the testing strategies that will be employed to ensure backward compatibility.

### Rethinking the ACK versioning

Before jumping to any special topic, we need to answer two important questions:
- Do we still want to bump all the CRD versions all at once? meaning that
if we rename a field in CRD A, CRD B C D and E will also need a version bump
(and a mutating webhook handler) even if they are not modified?
- How do we deprecate/remove versions in a programatical way?

#### CRDs Mono Version vs Group Version

The answer to the questions above will impact the design of [`metadata.yaml`]
[iam-metadata.yaml]. And how we think about ACK CRD versioning and how
version deprecation happens. Do we deprecate a set of CRD versions? or only
one specific CRD version? unfortunetaly picking one of those solutions is kinda
a hard one way door solution to say the least.

CRD Group versioning:

- Pros:
1. Current ACK code-generator works this way
2. Easier code generation (import paths, packaging etc...)
3. Easier CRD Version difference calculation
4. Doesn't enormously impact the rest of the features (references etc...)
5. No need to change `metadata.yaml`

- Cons:
1. Confusing for users
2. Little CRD Changes means we will bump all the CRDs versions
3. Not very kubernetes style.

CRD Mono versioning:

- Pros:
1. Similar to k8s style
2. Easier deployments (risk reduced when upgrading CRDs)
3. Less error zone as we will target resources one by one

- Cons:
1. Very tricky in code generation
2. Needs a rethinking of generator.yaml + metadata.yaml
3. We don't know the impact on existing features (references etc...)

#### CRDs version deprecation

Curently we only have `metadata.yaml` as a tool to deprecate versions, see
APIInfoStatuses contants in [code-generator/pkg/metadata][code-gen/pkg/md]. Which
works very well with `pkg/model/multiversion` package. In case we want to go with
CRD Mono Version way, we will have to change this model to be Resource specific
rather than version specific.

### Webhook server

Controller runtime already offers webhook server setup as a builtin mechanisms, and
it was ported to ACK runtime/code-generator earlier in [this PR][ack-webhook-pr].
The missing part here is a certificate issuer like Cert Manager - behind the scenes
the webhook server needs to authenticate the controllers in order to allow them to
mutate/validate CRs.
The Cert Manager part will be discuss bellow.

### Code Generation

```go
// APIVersion represents an API version of the generated CRD.
type APIVersion struct {
// Name of the API version, e.g. v1beta1
Name string `json:"name"`
// Served whether this version is enabled or not
Served *bool `json:"served,omitempty"`
// Storage whether this version is the storage version.
// One and only one version can be set as the storage version.
Storage *bool `json:"storage,omitempty"`
}
```

https://github.com/aws-controllers-k8s/code-generator/pull/122
[TODO AMINE EXPAND THIS SECTION]

### Cert manager

Cert-manager plays a critical role in securing and authenticating communication between
different components of the Cluster. It is essential to provide valid certificates for the
webhook server to ensure secure communication between the webhook server and the Controllers
API server. Without valid certificates, the webhook server cannot be authenticated, which
can result in a failure to register the webhooks. Therefore, it is imperative to use
cert-manager to ensure that the webhook server has a valid certificate before registering
any webhooks.

Cert manager will probably be added as an optional install to the helm charts, and will
be installed by default when running multi-version e2e tests.

# Implementation details

Goal is whatever generated code it should keep things backward compatible

## How to use multi-version with code-gen

Scenarios:
Renaming a field
```yaml
resources:
FunctionURLConfig:
apiVersions:
# introduce a new version for FunctionURLConfigs
- name: v1alpha2
served: true # api is still served by the controller
storage: true # api is storage version of the CRD (usually it should be the last one)
# Renaming FunctionName to Function
renames:
operations:
CreateFunctionURLConfig:
input_fields:
FunctionName: Function
...
```

How it should work.
1. Pick field from old name

Generated code
```go
```


Removing a field
```yaml
```
generated code

Adding fields:
1. Going-up: Look up the field values from annotations
2. Going-down: Store new value in annotation

Making a field a secret
```yaml
```

spec.Password

1. Going-up: Look up the field values from annotations
2. Going down: Store secret name/namespace in annotations

lambda.services.k8s.aws/field-name:{
"v1alpha2-v1alpha1": {
"name": "***"
"namespace": "***"
"key": "***"
},
"v1alpha2-v1alpha1": {
"name": "***"
"namespace": "***"
"key": "***"
}
}

lambda.services.k8s.aws/multi-version:{
"field-name": {
"v1alpha2-v1alpha1": {
"name": "***"
"namespace": "***"
"key": "***"
},
},
{
"field-2":
"v1alpha2-v1alpha1": {
"name": "***"
"namespace": "***"
"key": "***"
}
}
}

backward compatibility

Making a field a reference
```yaml
```


## Testing

### Unit tests

To ensure the correctness of the code generated by `ack-generate`, unit tests will be
written to validate the generated codes behavior. These tests will ensure that the
code generated for each version of the API is correct and functional. It should not target
all the CRDs of one controller, but try to cover all the possible struct/shape combinations,
like ApiGateway, DynamoDB, RDS, etc...

These tests will be run as part of the ack codegen prow job.

### E2E tests

The E2E tests will ensure that the generated webhooks are registered correctly and
function as expected. The e2e tests should be able to test multiple stages from the
controller startup to the resources being handled in multiple versions.

To test the multi-version feature, we will create multiple CRDs with different versions
and test the controller's behavior when handling them. The E2E tests will also test the
webhook behavior when a new version of the CRD is introduced, ensuring that the webhook
correctly converts the old version to the new version.

To test the generated mutating webhooks, we will create test cases where the webhook is
expected to mutate the incoming request. These tests will ensure that the generated
webhook correctly mutates the request and that the resulting object is valid and conforms
to the new version's schema.

#### Requirements

The process of spining a controller, upgrading it and it's CRDs require a good rework of
our existing prow e2e testing framework, part of that work is described in this [design proposal]
[test-infra-proposal].

## Alternatives

Discuss why cuelang can be of help in the future.

---

[aws-sdk-go]:
[aws-sdk-go-version]:
[kubernetes-docs]:
[kubernetes-versioning-docs]:
[ack-webhook-pr]: https://github.com/aws-controllers-k8s/code-generator/pull/111
[ack-webhook-pr-rt]: https://github.com/aws-controllers-k8s/runtime/pull/29
[mv-gh-issue]: https://github.com/aws-controllers-k8s/community/issues/835
[webhook-pr]: https://github.com/aws-controllers-k8s/code-generator/pull/122
[iam-metadata.yaml]: https://github.com/aws-controllers-k8s/iam-controller/blob/main/metadata.yaml#L6-L8
[code-gen/pkg/md]: https://github.com/aws-controllers-k8s/code-generator/blob/main/pkg/metadata/api_info.go#L18-L22
[test-infra-proposal]: https://github.com/aws-controllers-k8s/test-infra/pull/310