-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
velero restore create stuck in status "InProgress" when restore pv and pvc only #488
Comments
Please attach logs, see instructions at https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/troubleshooting.md#logs |
|
What is the "velero backup" command you used? After backup is complete, did you check the status of "Snapshot" and "Upload"? |
I used cmd "$ velero backup create whole-wc" to backup the whole workload cluster.
The snapshot CR phase is "uploaded" as above.
The upload CR phase is "completed" as above. |
continue the backup "whole-wc" above, I forcely delete the pvc and pv, and try to restore the pvc and pv from the backup:
|
velero debug --backup whole-wc: The log bundle is all attached, thanks. |
This is most likely a vddk issue, please ensure you retrieve the vddk logs: https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/troubleshooting.md#vddk |
I try to follow the doc you provided, but:
It is not easy for me to collect the log, could you please suggest a better way or could you please reproduce the issue? Thanks |
Can you show more details from velero backup so we can see what resources get backed up? We do not recommend backing up the entire cluster as we can't guarantee custom resources can be restored. We recommend backup by namespaces or by label selector. Regarding the following restore command:
Can you delete the resources and try to do a restore without the include "pv, pvc" and "update" policy as follows? I don't think we've qualified with those options.
At restore time, PVC will be restored as long as it was backed up and PV will be re-created. If you specify to include PV resource, Velero may try to restore it on top of the PV that is already created. That could also cause the problem. |
Finally I got the vddk logfile: This time I create backup with only "pv,pvc":
and then delete pvc and pvc, try restore them:
|
Can you try only include "pvc" but not "pv" when you do the backup? This is because pv will be freshly created at restore time. If Velero tries to restore it on top the newly created pv, it could cause problems. velero backup create backup-pv-pvc2 --include-resources pvc |
yes, here is the result:
Here is the new vddk log: velero debug --backup backup-pv-pvc2: velero debug --restore backup-pv-pvc2-20221104151007: |
I see the following in the logs: vixDiskLib logs:
DataMover logs:
There are frequent "Enable VMotion" and "Disable VMotion" messages in the DataMover logs. What other operation was going on that triggered this? I think that is why we saw "Open virtual disk file failed" errors. |
Hi @jhuisss , |
Hi Thanks for your reply, it is not easy for me to get the requested log bunble, I'll find how to collect them. At the same time, could you please try to reproduce the issue? |
Hi, any updates? If you can not reproduce the issue, please let me know, thanks~~ |
I opened an internal bug again VDDK to understand the error code, hopefully they get back soon. In the meanwhile could you please answer Xing's query about "Enable VMotion" and "Disable VMotion" messages in the DataMover logs, this is something we haven't seen in the past. |
Got a reply from the vddk team:
|
Describe the bug
"velero restore create" cmd is not working when using flag “--include-resources pv,pvc”, it stucked in status “InProgress”
To Reproduce
First I succeeded to create a backup of the whole workload cluster with pv, see:
Then I manually deleted the pv and pvc, to test velero restore with flag “--include-resources pv,pvc”, but it is stuck in "InProgress":
Expected behavior
velero restore create succeed and the pv and pvc restore back.
Troubleshooting Information
Velero server version: 1.9.2
AWS plugin version: v1.5.1
vSphere plugin version: v1.4.0
Kubernetes: Vanilla (tkgm workload cluster)
Kubernetes version:
The text was updated successfully, but these errors were encountered: