-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement KatibConfig API #2176
Implement KatibConfig API #2176
Conversation
cb33ce4
to
b7cb65a
Compare
Uhm... I can not reproduce CI error on my local...
https://github.com/kubeflow/katib/actions/runs/5613737623/job/15210405916?pr=2176#step:4:9781 |
ef2a9ae
to
2cec0b2
Compare
This was resolved. |
529b005
to
6595c5b
Compare
@andreyvelich @johnugeorge @gaocegege This PR is ready for review. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this great feature @tenzen-y!
This should give users much better experience.
I left a few first comments.
return nil | ||
} | ||
|
||
func SetDefaults_KatibConfig(cfg *KatibConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, in the future, we should migrate our defaults for Katib Experiment to follow similar model which is more Kubernetes-native.
Currently, we set defaults via mutation webhook: https://github.com/tenzen-y/katib/blob/067c11933792f8060c3f9cd5349ef1b508a5b17c/pkg/webhook/v1beta1/experiment/mutate_webhook.go#L60.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. If it's possible, it would be good.
By doing that, we can reduce the kube-apiserver load and then improve cluster performance :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the kustomize can not merge (patch) against the structured .data
in the ConfigMap.
So, we need to have a separate KatibConfig for the leaderElection.
NOTE: In the future, we might be able to merge the structured .data
in the ConfigMap using the kustomize:
kubernetes-sigs/kustomize#4517
# This KatibConfig is mostly same as https://github.com/kubeflow/katib/manifests/v1beta1/components/controller/katib-config.yaml. | ||
# Only `.init.controller.enableLeaderElection` field is different. | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Kustomize doesn't allow us to patch the ConfigMap data that is why we need to keep the whole config for each overlay 😞
Which means we need to update the image tags for each release in the future in these overlays.
I think, we need to do something with this script:
katib/scripts/v1beta1/update-images.sh
Lines 78 to 81 in 199aa8b
echo -e "Update Katib Metrics Collectors, Suggestions and EarlyStopping images\n" | |
update_yaml_files "${CONFIG_PATH}" "${OLD_PREFIX}" "${NEW_PREFIX}" | |
update_yaml_files "${CONFIG_PATH}" ":[^[:space:]].*\"" ":${TAG}\"" | |
I think, at this stage it maybe worth to discuss how we want to maintain our suggestion image versions.
Currently, tags for suggestions we maintain under /manifests/v1beta1/components/controller/katib-config.yaml
, tags for controller we maintain under /manifests/v1beta1/installs/katib-controller/kustomization.yaml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think @kubeflow/wg-automl-leads @tenzen-y ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which means we need to update the image tags for each release in the future in these overlays.
I think, we need to do something with this script:
It's a good point.
I think, at this stage it maybe worth to discuss how we want to maintain our suggestion image versions.
Currently, tags for suggestions we maintain under /manifests/v1beta1/components/controller/katib-config.yaml, tags for controller we maintain under /manifests/v1beta1/installs/katib-controller/kustomization.yaml.
I think generating KatibConfig for each installs
by script would be good. This means we generate manifests/v1beta1/installs/katib-leader-election/katib-config.yaml
from manifests/v1beta1/components/katib-config.yaml
by a scripts.
If so, we could maintain the suggestion service image versions the same way as in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generating KatibConfig for each installs by script would be good.
What do you mean by that ?
I was thinking that since Kustomize doesn't allow us to patch the ConfgMap data, we can just replace it where it is required.
Similar what you did for Katib Leader Election install.
So for Katib Standalone install and Katib with Kubeflow install we need to do the same to replace Katib Config with ConfigMap with appropriate image tags.
E.g. for master
branch the image tags will be :latest
for release-0.x
branch the image tags will be :v0.x.0
.
WDYT @tenzen-y @johnugeorge ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generating KatibConfig for each installs by script would be good.
What do you mean by that ?
@andreyvelich NVM. My suggestion wouldn't work well.
So for Katib Standalone install and Katib with Kubeflow install we need to do the same to replace Katib Config with ConfigMap with appropriate image tags.
Does that mean that we use kustomize imageTagTransformer like
katib/manifests/v1beta1/installs/katib-standalone/kustomization.yaml
Lines 22 to 34 in 199aa8b
images: | |
- name: docker.io/kubeflowkatib/katib-controller | |
newName: docker.io/kubeflowkatib/katib-controller | |
newTag: latest | |
- name: docker.io/kubeflowkatib/katib-db-manager | |
newName: docker.io/kubeflowkatib/katib-db-manager | |
newTag: latest | |
- name: docker.io/kubeflowkatib/katib-ui | |
newName: docker.io/kubeflowkatib/katib-ui | |
newTag: latest | |
- name: docker.io/kubeflowkatib/cert-generator | |
newName: docker.io/kubeflowkatib/cert-generator | |
newTag: latest |
Kustomize recognizes .data
field as string data, not structured data such as YAML. So we can not replace image tags on the kind: KatibConfig
embedded on ConfigMap using imageTagTransformer.
So, we need to replace image tags using vars
like
vars: | |
- fieldref: | |
fieldPath: metadata.namespace | |
name: KATIB_UI_NAMESPACE | |
objref: | |
apiVersion: apps/v1 | |
kind: Deployment | |
name: katib-ui | |
configurations: | |
- params.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, that for every install we are going to replace the whole Katib Config, similar to Katib Leader Election. E.g. for Katib Standalone we just have the same config as in /components/katib-config, but it will have different image tags (for master branch latest tag, for release- branch v0.x tag. Does it make sense?
@andreyvelich Oh, I see. Thanks for the clarification.
I think, the idea of installs/overlays was to have 1 place where we set settings, images, or any other changes for our Katib Control Plane.
Other solution could be to just remove Katib Config from components/ manifests and add it only in installs. We can modify our docs to explain it.
It sounds reasonable. I will do it ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnugeorge @gaocegege I am happy to discuss any other suggestions on how to configure our Katib manifests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich I've done it: 42fe278
Is this your expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's right @tenzen-y, that what was I proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the check :)
199aa8b
to
20a1723
Compare
Thanks for the review. I created an issue: #2186 |
@tenzen-y Let's also rebase this PR, so we can test it in K8s 1.26. |
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
…beta1 Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
e684420
to
4e5ea07
Compare
Yea, I rebased this PR. |
/lgtm |
/hold cancel |
What this PR does / why we need it:
I implemented the new KatibConfig API to set all parameters for the katib.
Also, the new KatibConfig supports JSON and YAML, similar to the kubernetes manifest.
NOTE: Since the KatibConfig is NOT CustomResource, we need to create a ConfigMap embedded KatibConfig as in the past.
Given KatibConfig is used for the following:
.init.controller
: This holds parameters for the katib-controller. Here is evaluated only when launching the controller. So we need to restart the controller's pod to set new parameters. Also, we need to mount the comfigMap embedded KatibConfig to the controller, and set the mounted path to controller's option,--katib-config
like:.runtime
: This holds parameters for the metrics-collectors, suggestion-services, and earlystoppings. Here is reevaluated for every query. So we don't need to restart the controler's pods to set new parameters.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2150
Checklist: