Skip to content
This repository has been archived by the owner on Mar 25, 2019. It is now read-only.

[WIP] RFC about container image update with 4.0 #11

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

flavio
Copy link
Member

@flavio flavio commented Dec 21, 2018

This is a WIP RFC that describes how, starting from 4.0, images should be tagged and how we should deal with their update.

This PR is going to replace #10

Note well: PR created right before leaving for vacations. I didn't proofread the text, but I didn't want to leave everything idle on my computer. Be kind with me :)

This is a WIP RFC that describes how, starting from 4.0, images
should be tagged and how we should deal with their update.
* Once the admin node is rebooted, Velum allows the user to press the
"update nodes" button

**TODO:** answer these questions:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ereslibre: you have a better understanding of the upgrade orchestration. Can you help here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will provide detailed feedback tomorrow.

@hwoarang hwoarang self-requested a review December 24, 2018 09:50
our images are being tagged.

All our images are built inside of the build service and have at least two tags
associated with them:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tags looks like they are the short and long ones. Maybe we should mention that.

associated with them:

* **short:** `<app-version>` this is something like: `2.6.1`
* **intermediate:** `<app-version>-<image-version>` this is something like:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any intermediate tag in my v3 deployment. Does this really exist right now?

* **intermediate:** `<app-version>-<image-version>` this is something like:
`2.6.1-1`. This value is hard-coded into the KIWI image definition; image
owners are going to bump it manually.
* **long:** `<app-version>-<image-version>-<buildID>`. The `buildID` is a
Copy link

@hwoarang hwoarang Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct template is

Suggested change
* **long:** `<app-version>-<image-version>-<buildID>`. The `buildID` is a
* **long:** `<app-version>-<image-version>.<buildID>`. The `buildID` is a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seem just missing a CR/NL before "The 'buildID ..."

(eg: a `libopenssl` CVE issue). Only the long tag would change.
4. Maintainer of the *"core"* application releases a regular SLE update of the
application. The container image is automatically rebuilt. This case should
be handled carefully: the image tags won't match the core application
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the image tags won't match the core application version mean here? If there is a bugfix release for the core application, then the version (ie short tag) remains the same. So the image tag matches the application version.

* Case #4 - update of core application via SLE maintainer: we think the best
solution to this scenario is to have the maintenance QA block this automatic
rebuild of the image from being published. They should notify the development
team about this core application upgrade so that everything can fall back to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is about an application maintenance update (ie no version changes) so it's not similar to the 1st case. As such, on regular SLE app updates, we should be able to fully allow automated image updates since the short and intermediate tags are not changing.

would be `2.6.2-rev1`.
* **long:** intermediate tag + build numbers generated by the Open Build
Service. The long tag for the first iteration of docker-registry image
would be something like `2.6.2-rev1+5.1`, where `5.1` are numbers that
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different compared to the <app-version>-<image-version>.<buildID> template described before. Are we allowed to change the template ourselves?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we have to keep in mind that certain characters like +, : or ~ are not accepted as part of the tag. What about using something like <app-version>-rev<image-version>-b<buildID> (2.6.2-rev1-b5.1). Anyway we have to make sure they will be comparable.

### Automatic image rebuild

This happens when the image is rebuilt because one of its dependencies got
updated (eg: a `libopenssl` issue).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or when we have a new maintenance update of the core application itself?

* The pod starts immediately using the old image.
* If the admin node is rebooted: same outcome of the scenario above.

When `transactional-update` is run:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better title this as What happens *after*... similar to the previous section

> The impact of this issue can be reduced by configuring all the nodes of the
> cluster to use a local mirror of the SUSE registry.

When `transactional-update` is run:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as before :)


This happens when the image is rebuilt because one of its dependencies got
updated (eg: a `libopenssl` issue).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before. this can happen when the core application itself receives a maintenance update.

The contents of the file are not actually useful, the purpose of this RPM is
to allow us to keep using the current upgrade mechanism of CaaSP v3 that relies
on the assumption that container image upgrades are delivered via RPMs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't be cleaner (and equally simple perhaps?) to poll the registry and set a grain when there is a new image available? this way we don't have to reboot a node to get the updated image.

Copy link

@ereslibre ereslibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM; I have to double check and confirm some of the questions.

Just some thoughts that we may want to explicitly state:

  • The GM versions of the container images (at least) need to be present on the registry for the whole lifetime of the product. This will ensure that once that we have a GM ISO it's possible to always install it during the whole product lifetime.

  • More implementation specific: we need to drop some specific kubelet patches that we had to avoid the image GC from deleting local images that were previously loaded by container-feeder. We should probably avoid deleting our own images unless it's really necessary, but that patch should be gone for this changes.

without introducing technical debt, without making the whole upgrade story worse
than the one of v3 and without having to write that much code.

V3 already has some limits, we are fine keeping them around for 4.0:
Copy link

@ereslibre ereslibre Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
V3 already has some limits, we are fine keeping them around for 4.0:
V3 already has some limitations, we are fine keeping them around for 4.0:

* All the images are going to be referenced by using their intermediate tag.
* All the manifests will enforce kubelet to use the "always pull" policy.
* The service that previously rendered the template files and wrote them
under `/etc/` is no longer needed: it must be dropped from the package.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the functionality, I don't think we can get rid of it completely, as it performs some other tasks, however we could verify what's still required for V4 on that script. Regarding to this section, this part of the code could be removed: https://github.com/kubic-project/caasp-container-manifests/blob/800ef2dd896b80eca06214cf16cab190de7626a8/admin-node-setup.sh#L12-L47

* The service that previously rendered the template files and wrote them
under `/etc/` is no longer needed: it must be dropped from the package.

### Changes for the `kuberentes-salt` package

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Changes for the `kuberentes-salt` package
### Changes for the `kubernetes-salt` package

* Reference all the images using their intermediate tag
* Use the "always pull" policy

### Changes for the base operative system

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Changes for the base operative system
### Changes for the base operating system

### Changes for the base operative system

This section describes the changes that have to be done to the underlying
operative system. We basically have to alter it's package selection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
operative system. We basically have to alter it's package selection.
operating system. We basically have to alter its package selection.

This section describes the changes that have to be done to the underlying
operative system. We basically have to alter it's package selection.
That will require changes to our KIWI templates (the ones used to create
ISOs, qcows,...) and to our package patters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ISOs, qcows,...) and to our package patters.
ISOs, qcows,...) and to our package patterns.

and always predictable).
* [QA] Test the fresh image. The stability and predictability of the
intermediate tags makes their life easier and reduces the chances of
mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: how will QA happen if on this stage the image is not yet published? (as it will be published once QA itself passes)


* [developers] Nothing is done.
* [QA] Test the fresh image.
* Image is published to the SUSE registry.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question as before, regarding this point (Image is published to the SUSE registry) and the previous (Test the fresh image), can QA test an image that hasn't been published to the registry? How are we going to plug these two things together?


Considerations about the images:

* The old image can still be pulled via its long tag.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should specify for how long this should be true, or if it's out of the scope for this RFC.

* Once the admin node is rebooted, Velum allows the user to press the
"update nodes" button

**TODO:** answer these questions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will provide detailed feedback tomorrow.


> Why do we want this change and what problem are we trying to address?

Starting with v4 the need to reboot a node to get newer container images is no

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting with v4 it is no longer required to reboot a node to get newer container images:

@sysrich
Copy link
Member

sysrich commented Jan 17, 2019

Has @Vogtinator been consulted about these ideas?

Obviously CaaSP can do whatever they'd like, but given his work on base containers and official openSUSE containers, he might have opinions to help keep everything tied together nicely

per image: `<container-image-name>-metadata`. This RPM file will have a simple
text file inside of it with the metadata of the image (like all its tags).

The contents of the file are not actually useful, the purpose of this RPM is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism won't work at all or not well - releasing containers and RPM packages is completely independent.
Even if the container release mechanism was able to handle RPMs as well, it would not be synchronized with the containers currently. That won't be easy to do.


Images are going to have 3 tags:

* **short:** this is the core application version. For example, given `2.6.2`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that if there is a minor version update (e.g. 2.6.3 contains security fixes) the referenced tag has to change as well?

new container images.

With v4 we are still going to ship the kubernetes manifest files via two regular RPMs.
The only difference compared to v3 is that we are going to hard-code the tags of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tags, the short, long or intermediate ones?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants