Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data manager will not start in vSphere 8.0 with Tanzu #521

Open
ChrisJLittle opened this issue Mar 22, 2023 · 9 comments
Open

data manager will not start in vSphere 8.0 with Tanzu #521

ChrisJLittle opened this issue Mar 22, 2023 · 9 comments

Comments

@ChrisJLittle
Copy link

ChrisJLittle commented Mar 22, 2023

Describe the bug

After deploying and configuring the Velero data manager (ova, vSphere 8 with Tanzu), the velero-datamgr.service fails to start as it can't find the velero-token secret in the velero namespace

To Reproduce

Starting with a vSphere 8.0b with Tanzu (NSX-T 4.1) environment with Workload Management enabled.
Installed Velero Operator 1.3.0 as a Supervisor Service

This by itself was problematic as the velero-vsphere-operator and velero-vsphere-operator-webhook deployments were set to tolerate "master" nodes but vSphere 8 with Tanzu supervisor control plane nodes are tainted with "control-plane". I was able to workaround this by editing the toleration on both deployments and all pods came up.

Created velero supervisor namespace and assigned permissions and storage.
Created velero-vsphere-plugin-config configmap in velero namespace:

apiVersion: v1
kind: ConfigMap
metadata:
  name: velero-vsphere-plugin-config
data:
  cluster_flavor: SUPERVISOR
  vsphere_secret_name: velero-vsphere-config-secret
  vsphere_secret_namespace: velero

Installed velero-vsphere 1.4.2 binary
Ran velero-vsphere install

velero-vsphere install  --namespace velero --version v1.9.2 --image velero/velero:v1.9.2 --provider aws --plugins velero/velero-plugin-for-aws:v1.6.1,vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.2 --bucket velero --secret-file /home/ubuntu/Velero/s3-credentials --snapshot-location-config region=minio --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.110.60:9000

The backup-driver deployment had the same issue as the two previously-noted deployments and had to be edited such that the tolerations were for "control-plane" and not "master"

Everything seemed to be up and running as expected at this point.
Deployed the 1.4.2 data manager OVA to vsphere.
Configured the advanced parameters as needed for my environment.

guestinfo.cnsdp.vcUser, guestinfo.cnsdp.vcAddress, guestinfo.cnsdp.vcPasswd, guestinfo.cnsdp.wcpControlPlaneIP

Powered on the data manager VM

I tested a backup that included a pvc and noticed that the upload never left the new state.

I logged in to the data manager VM and saw that the velero-datamgr.service service had crashed with the following output:

Mar 21 23:10:33 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: [1B blob data]
Mar 21 23:10:33 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: If the context you wish to use is not in this list, you may ne
ed to try
Mar 21 23:10:33 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: logging in again later, or contact your cluster administrator.
Mar 21 23:10:33 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: [1B blob data]
Mar 21 23:10:33 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: To change context, use `kubectl config use-context <workload n
ame>`
Mar 21 23:10:33 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: [1B blob data]
Mar 21 23:10:33 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: Switched to context "vi-user".
Mar 21 23:10:35 photon-cnsdp velero-vsphere-plugin-datamgr.sh[504]: Failed to get single valid velero service account
Mar 21 23:10:35 photon-cnsdp systemd[1]: velero-datamgr.service: Main process exited, code=exited, status=
1/FAILURE
Mar 21 23:10:35 photon-cnsdp systemd[1]: velero-datamgr.service: Failed with result 'exit-code'.

An examination of the /bin/velero-vsphere-plugin-datamgr.sh script showed that a secret with name containing "velero-token" was expected in the velero namespace. The following are the secrets present in the velero namespace:

NAME                                          TYPE                             DATA   AGE
cloud-credentials                             Opaque                           1      120m
velero-default-image-pull-secret              kubernetes.io/dockerconfigjson   1      123m
velero-default-image-push-secret              kubernetes.io/dockerconfigjson   1      123m
velero-restic-credentials                     Opaque                           1      118m
velero-vsphere-operator-object-store-secret   Opaque                           1      120m

Expected behavior

I'm not sure if the issue is with the data manager or velero-vsphere (and/or the operator), but there should either be a velero-token secret present or the datamanager should be looking for something else.

Troubleshooting Information

I checked the same process on vSphere 7 U3 with Tanzu and it works as expected. The velero operator version is 1.1, the velero-vsphere version is 1.1, the data manager ova version is 1.1, verlero version is 1.5.1, velero-plugin-for-aws version is 1.1.

The following were the secrets present in the velero namespace in 7.0U3:

NAME                                          TYPE                                  DATA   AGE
cloud-credentials                             Opaque                                1      509d
default-token-r6d2n                           kubernetes.io/service-account-token   3      509d
velero-restic-credentials                     Opaque                                1      509d
velero-token-25rvr                            kubernetes.io/service-account-token   3      509d
velero-vsphere-operator-object-store-secret   Opaque                                1      509d
@xing-yang
Copy link
Contributor

@ChrisJLittle
Copy link
Author

ChrisJLittle commented Mar 23, 2023

Can you take a look of step 3 here https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/velero-vsphere-operator-user-manual.md#installing-velero-on-supervisor-cluster, did you create a configmap?

Yes:

apiVersion: v1
kind: ConfigMap
metadata:
  name: velero-vsphere-plugin-config
data:
  cluster_flavor: SUPERVISOR
  vsphere_secret_name: velero-vsphere-config-secret
  vsphere_secret_namespace: velero

It was created in the velero namespace just prior to running velero-vsphere install.

Looking at what I have, I see that I have extra parametrs, vsphere_secret_name and vsphere_secret_namespace, and I'm not sure where I got them from (maybe an older sample file?). Are these causing the problem?

@deepakkinni
Copy link
Collaborator

It shouldn't be there. Try without the extra params.

@ChrisJLittle
Copy link
Author

ChrisJLittle commented Mar 23, 2023

Did a velero-vsphere uninstall, deleted/recreated the velero namespace, updated the configmap file and re-created it, redid the velero-vsphere install, still no velero-token secret in the velero namespace. Is there anything that would need to be done with the operator?

@deepakkinni
Copy link
Collaborator

Can you share the logs https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/troubleshooting.md#project-pacific

looking for:

  1. velero logs
  2. operator logs

@ChrisJLittle
Copy link
Author

ChrisJLittle commented Mar 24, 2023

I re-did the configuration on a system where it had not been previously configured with the suspect configmap and got the same results. I'm attaching the velero and operator logs.

backup-driver.log
velero.log
velero-vsphere-operator.log

@ChrisJLittle
Copy link
Author

And if it would help, I could grant you direct access to the environment.

@ChrisJLittle
Copy link
Author

ChrisJLittle commented Apr 10, 2023

As a very quick workaround, I was able to create the missing token/secret and the container in the data manager VM came up.

velero-token.yaml:

apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: velero-token
  annotations:
    kubernetes.io/service-account.name: "velero"

kubectl -n velero apply -f velero-token.yaml

kubectl -n velero describe secret velero-token

Name:         velero-token
Namespace:    velero
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: velero
              kubernetes.io/service-account.uid: a868e1c1-f13b-40af-aecc-ca16e493388b

Type:  kubernetes.io/service-account-token

Data
====
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6Ii05RDc2OFdwLXM2QVlfM2hIdnQ5b2NoYVlSZE4tZ2RIVnlSV0pVS0FDd2sifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJ2ZWxlcm8iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoidmVsZXJvLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InZlbGVybyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE4NjhlMWMxLWYxM2ItNDBhZi1hZWNjLWNhMTZlNDkzMzg4YiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDp2ZWxlcm86dmVsZXJvIn0.IrHDLxNM_DIyx1By0nzRPBBJqv6HHgxdCpJqFKH3e9kv3pO9CUf2hlvpCXpRVlo8u33i24Z209N0P0nb1tiNgquxBbsJkJ3d4r31_6w38HHtLYEPjJc9Ct1DyR6i2gRWwT-RXfGPzffhIxTnrwdyCNhPhQQeZUp5ufwjJFuoa69M_IYKWm4LB6_HjN8TjkzHXldHsjow8ztYDV9I_izgxAgt-SLpiuo79Pk3PLNjXtp8P-DRyfIsoJ7yC5ZhPmjWwJpbWoHE5YnoCjZjJv0f81na-V1HMYeSLgDN0CscxPe0EepW_WyDd2vkepEDTGwSJWJ4IqMzPvxMWwik0aHnRA
ca.crt:     1099 bytes
namespace:  6 bytes

On the data manager VM:

docker ps

CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS               NAMES
fde6ec1adb2a        vsphereveleroplugin/data-manager-for-plugin:v1.4.1   "/datamgr server --u…"   4 minutes ago       Up 4 minutes                            velero-datamgr

I'll test doing stateful backup later but this is obviously much farther along than I was getting previously.

@ChrisJLittle
Copy link
Author

ChrisJLittle commented Apr 10, 2023

Stateful backup of a vSphere pod/pvc was successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants