-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vmware] Cache images as VM templates #206
base: stable/wallaby-m3
Are you sure you want to change the base?
Conversation
Needs sapcc/oslo.vmware#46 |
Why not use cinder's built in cache mechanism? |
I agree that the reasoning for this should be in the commit message. |
I added the reason to the commit message We're not using the cinder built-in cache functionality because we
|
How do we account for the consumed space of the cached volume in the specific datastore so that the scheduler knows how much we are using on that datastore? |
The driver reports the |
The scheduler still needs to account for the capacity allocated against the datastore though. These cached images will be hidden from what is actually allocated against the datastore. |
da700b1
to
c763bce
Compare
OK, thanks for the hint. Now the "image cache" capacity is being added to the pool's The volume backend reports a new Could you please check the new code @hemna ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we disable enable_image_cache
we might still have images around. Would it make sense to run some cleanup or at least count the existing cached images as usage or we we have to make sure that we clean up manually?
return list(itertools.chain( | ||
*[self._get_cached_images_in_folder(folder_ref) | ||
for folder_ref in folder_refs])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like you could use itertools.chain.from_iterable()
instead of itertools.chain(*[…])
.
cinder/volume/drivers/vmware/vmdk.py
Outdated
"created_at": date}] | ||
|
||
Where | ||
- name: the name of the template VM (set to the image_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe add a prefix or postfix or something that makes it possible to distinguish image-cache template-vms from shadow-vms? Especially when things are orphaned and only left as directories on the datastore, it's helpful to distinguish them by name and know what can definitely be deleted.
cinder/volume/drivers/vmware/vmdk.py
Outdated
max_objects = self.configuration.vmware_max_objects_retrieval | ||
options.maxObjects = max_objects | ||
try: | ||
result = self.session.vim.RetrievePropertiesEx( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use WithRetrieval
as in _get_image_cache_folder_ref()
? This looks to be so much code and I would have expected this code to already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean using oslo.vmware's get_objects()
?
We couldn't use it because that one looks in the rootFolder
and we only want to look into the cache folder here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with vutil.WithRetrieval(self._session.vim, retr_res) as retr_objects
is what I mean. We only get result
here and I would assume all the exception handing and such would already be handled in WithRetrieval
. We later only use result.objects
.
How do we do it in Nova? Did we also copy the contents of get_objects()
there? (That's what I understood from you we basically have to do here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nova has it's own get_objects()
-like called get_inner_objects()
Additionally, WithRetrieval doen't handle this NOT_AUTHENTICATED exception that's thrown when there are no objects in the folder (see nova's _get_image_template_vms
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. Thank you.
What's the use of WithRetrieval
then? I thought we added it everywhere in Nova, because code missed to handle cases at times.
cinder/volume/drivers/vmware/vmdk.py
Outdated
|
||
cached_images = [] | ||
for obj in result.objects: | ||
props = vim_util.propset_dict(obj.propSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be that propSet
doesn't exist? We sometimes have exceptions like that in Nova.
cinder/volume/drivers/vmware/vmdk.py
Outdated
img_volume = copy.deepcopy(volume) | ||
img_volume['id'] = image_id | ||
img_volume['project_id'] = self._cache_project_name() | ||
img_volume['size'] = image_size_in_bytes / units.Gi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we keep the other data and what happens with it? Do we really need whatever else is in the volume
dict? Would it maybe make sense to be explicit about what we expect to be use going forward? Can there be private information (metadata) that somehow gets copied to another project's cloned root-disk metadata?
cinder/volume/drivers/vmware/vmdk.py
Outdated
@@ -1998,6 +2052,50 @@ def copy_image_to_volume(self, context, volume, image_service, image_id): | |||
VMwareVcVmdkDriver._get_disk_type(volume)) | |||
# TODO(vbala): handle volume_size < disk_size case. | |||
|
|||
def _can_use_image_cache(self, volume, image_size): | |||
requested = (volume['metadata'] | |||
.get(CREATE_PARAM_USE_IMAGE_CACHE) == "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We we maybe want to add a .lower()
or does that come in automatically?
LOG.debug("The requested image cannot be cached because it's " | ||
"too big (%(image_size)s > %(max_size)s)", | ||
{'image_size': image_size, | ||
'max_size': max_size}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we error out here? The customer requested to use the image-cache and thus expect speedy creations, but with the image they can't get it.
Same question towards requested but not enabled, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the user experience point of view, I'd expect such an error to be returned in the API response.
Otherwise they will ask why their volume is in error
state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. We're too late in the process, because scheduling needs to have happened already. On the other hand, nobody will know ever ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any conclusion for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik, there's a way to add error messages and retrieve them with cinder message list
. I think we can go with "ignore what the customer requested" for now, but should investigate this as a follow-up and maybe bring it up in a bigger round for a decision.
cinder/volume/drivers/vmware/vmdk.py
Outdated
|
||
img_backing = None | ||
if self._can_use_image_cache(volume, metadata['size']): | ||
img_backing = self._get_cached_image_backing( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename _get_cached_image_backing()
to _get_or_create_cached_image_backing()
. Mainly because I was wondering where we would create the cached image backing if we only call a get.
LOG.exception("Failed to delete the expired image %s", | ||
cached_image['name']) | ||
|
||
def _get_cached_images(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run this function every minute with the stats generation (I think). How long does it take to run in a real-world environment? Do we have to optimize somewhere?
3f6d754
to
8333237
Compare
2b59588
to
17e2b86
Compare
Upon user request, the driver can cache the image as a VM Template and reuse that to create the volume(s). This feature is useful when creating many volumes in parallel from the same image. We're not using the cinder built-in cache functionality because we need a few extra features: - the built-in cache doesn't account for shards. The cache entry will be placed on any backend/shard and could trigger a lot of slower cross-vc migrations when creating volumes from it. - the built-in cache doesn't have a periodic task for deleting the expired cache entries - we want to cache the images only when the customer requests it Users can request the image cache feature when creating the volume, by passing the use_image_cache='true' as a property (metadata). The feature must be enabled per backend, for example: ``` [vmware] enable_image_cache = true ``` This will enable the image cache feature for the vmware backend. The image templates will then be stored in a folder similar to the volumes folder: OpensStack/Project (vmware_image_cache)/Volumes, where {backend}_image_cache is used as a project name. The driver will periodically delete the cached images that are expired. The expiry time can be controlled via the property `image_cache_age_seconds` set on the backend configuration. Only images smaller than the configured `image_cache_max_size_gb` will be cached. Change-Id: I6f5e481f6997a180a455b47abe525b93bcf9aa4e
Upon user request, the driver can cache the image as a VM Template and reuse that to create the volume(s). This feature is useful when creating many volumes in parallel from the same image.
Users can request the image cache feature when creating the volume, by passing the use_image_cache='true' as a property (metadata).
The feature must be enabled per backend, for example:
This will enable the image cache feature for the vmware backend.
The image templates will then be stored in a folder similar to the volumes folder: OpensStack/Project (vmware_image_cache)/Volumes, where {backend}_image_cache is used as a project name.
The driver will periodically delete the cached images that are expired. The expiry time can be controlled via the property
image_cache_age_seconds
set on the backend configuration.Only images smaller than the configured
image_cache_max_size_gb
will be cached.Change-Id: I6f5e481f6997a180a455b47abe525b93bcf9aa4e