Newly created volume cannot be mounted to pod after upgrade to 4.2.1 #142

sieczak-a · 2024-03-07T08:28:16Z

My stack

Kubernetes version - 1.25
Huawei CSI driver upgraded from 3.2.1 to 4.2.1
Device Model: 5500 V5
Version: V500R007C10

Problem

PVC provisioning works fine, resizing too, but when trying to mount volume to pod, pod is in pending state with a reason

"MountVolume.MountDevice failed for volume "pvc-716caca1-d7ef-48d6-b29a-6892280a0dc7" : rpc error: code = Internal desc = publishInfo doesn't exist, PublishContext:map[]"

Solution from docs doesn't work

I've read the docummentation and solution provided in 11.13 page, which is "Fail over the workload to another node" doesn't work. Pod is crashing at all nodes. I thought it's related to pvc created using 3.2.1 version, but newly created pvc has this problem too.
Also tried to delete and recreate entire workload, but same problem.

Configuration

Here's my configuration. vaules.yaml:

images:
  # Images provided by Huawei
  huaweiCSIService: myregistry/huawei-csi:4.2.1
  storageBackendSidecar: myregistry/storage-backend-sidecar:4.2.1
  storageBackendController: myregistry/storage-backend-controller:4.2.1

  # CSI-related sidecar images provided by the Kubernetes community.
  # These must match the appropriate Kubernetes version.
  sidecar:
    attacher: k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
    provisioner: k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
    resizer: k8s.gcr.io/sig-storage/csi-resizer:v1.4.0
    registrar: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
    livenessProbe: k8s.gcr.io/sig-storage/livenessprobe:v2.5.0
    snapshotter: k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.1
    snapshotController: k8s.gcr.io/sig-storage/snapshot-controller:v4.2.1

# Default image pull policy for sidecar container images, support [IfNotPresent, Always, Never]
sidecarImagePullPolicy: "IfNotPresent"

# Default image pull policy for Huawei plugin container images, support [IfNotPresent, Always, Never]
huaweiImagePullPolicy: "IfNotPresent"

# Namespace for installing huawei-csi-nodes and huawei-csi-controllers
kubernetes:
  # the default value huawei-csi is recommended.
  namespace: huawei-csi

# Specify kubelet config dir path.
# kubernetes and openshift is usually /var/lib/kubelet
# Tanzu is usually /var/vcap/data/kubelet
# CCE is usually /mnt/paas/kubernetes/kubelet
kubeletConfigDir: /var/lib/kubelet

CSIDriverObject:
  # isCreate: create CSIDriver Object
  # If the Kubernetes version is lower than 1.18, set this parameter to false.
  # Allowed values:
  #   true: will create CSIDriver object during installation.
  #   false: will not create CSIDriver object during installation.
  # Default value: false
  isCreate: true
  # If the Kubernetes version is lower than 1.20, set this parameter to null.
  # fsGroupPolicy: Defines if the underlying volume supports changing ownership and permission of the volume before being mounted.
  # 'fsGroupPolicy' is only valid when 'isCreate' is true
  # Allowed values:
  #   ReadWriteOnceWithFSType: supports volume ownership and permissions change only if the fsType is defined
  #   and the volume's accessModes contains ReadWriteOnce.
  #   File: kubernetes may use fsGroup to change permissions and ownership of the volume
  #   to match user requested fsGroup in the pod's security policy regardless of fstype or access mode.
  #   None: volumes will be mounted with no modifications.
  # Default value: null
  fsGroupPolicy: ReadWriteOnceWithFSType
  # If the Kubernetes version is lower than 1.18, set this parameter to true.
  # attachRequired: Whether to skip any attach operation altogether.
  # When 'isCreate' is true and 'attachRequired' is false, csi-attacher sidecar will not be deployed
  # Allowed values:
  #   true: attach will be called.
  #   false: attach will be skipped.
  # Default value: true
  attachRequired: false

controller:
  # controllerCount: Define the number of huawei-csi controller
  # Allowed values: n, where n > 0
  # Default value: 1
  # Recommended value: 2
  controllerCount: 1
  
  # volumeNamePrefix: Define a prefix that is prepended to volumes.
  # THIS MUST BE ALL LOWER CASE.
  # Default value: pvc
  # Examples: "volumes", "vol"
  volumeNamePrefix: pvc

  # Port used by the webhook service. The default port is 4433.
  # You can change the port to another port that is not occupied.
  webhookPort: 4433

  snapshot:
    # enabled: Enable/Disable volume snapshot feature
    # If the Kubernetes version is lower than 1.17, set this parameter to false.
    # Allowed values:
    #   true: enable volume snapshot feature(install snapshotter sidecar)
    #   false: disable volume snapshot feature(do not install snapshotter sidecar)
    # Default value: None
    enabled: true

  resizer:
    # enabled: Enable/Disable volume expansion feature
    # Allowed values:
    #   true: enable volume expansion feature(install resizer sidecar)
    #   false: disable volume snapshot feature(do not install resizer sidecar)
    # Default value: None
    enabled: true

  # nodeSelector: Define node selection constraints for controller pods.
  # For the pod to be eligible to run on a node, the node must have each
  # of the indicated key-value pairs as labels.
  # Leave as blank to consider all nodes
  # Allowed values: map of key-value pairs
  # Default value: None
  nodeSelector:
  # Uncomment if nodes you wish to use have the node-role.kubernetes.io/master taint
  #  node-role.kubernetes.io/master: ""
  # Uncomment if nodes you wish to use have the node-role.kubernetes.io/control-plane taint
  #  node-role.kubernetes.io/control-plane: ""

  # tolerations: Define tolerations that would be applied to controller deployment
  # Leave as blank to install controller on worker nodes
  # Allowed values: map of key-value pairs
  # Default value: None
  tolerations:
  # Uncomment if nodes you wish to use have the node-role.kubernetes.io/master taint
  #  - key: "node-role.kubernetes.io/master"
  # Uncomment if nodes you wish to use have the node-role.kubernetes.io/control-plane taint
  #  - key: "node-role.kubernetes.io/control-plane"
  #    operator: "Exists"
  #    effect: "NoSchedule"

node:
  # maxVolumesPerNode: Defines the maximum number of volumes that can be used by a node.
  # Examples: 100
  # Uncomment if you want to limit the number of volumes that can be used in a Node.
  # maxVolumesPerNode: 100

  # nodeSelector: Define node selection constraints for node pods.
  # For the pod to be eligible to run on a node, the node must have each
  # of the indicated key-value pairs as labels.
  # Leave as blank to consider all nodes
  # Allowed values: map of key-value pairs
  # Default value: None
  nodeSelector:
  # Uncomment if nodes you wish to use have the node-role.kubernetes.io/master taint
  #  node-role.kubernetes.io/master: ""
  # Uncomment if nodes you wish to use have the node-role.kubernetes.io/control-plane taint
  #  node-role.kubernetes.io/control-plane: ""

  # tolerations: Define tolerations that would be applied to node daemonset
  # Add/Remove tolerations as per requirement
  # Leave as blank if you wish to not apply any tolerations
  # Allowed values: map of key-value pairs
  # Default value: None
  tolerations:
    - key: "node.kubernetes.io/memory-pressure"
      operator: "Exists"
      effect: "NoExecute"
    - key: "node.kubernetes.io/disk-pressure"
      operator: "Exists"
      effect: "NoExecute"
    - key: "node.kubernetes.io/network-unavailable"
      operator: "Exists"
      effect: "NoExecute"
#    - key: "node-role.kubernetes.io/control-plane"
#      operator: "Exists"
#      effect: "NoSchedule"
#    - key: "node-role.kubernetes.io/master"
#      operator: "Exists"
#      effect: "NoSchedule"


# The CSI driver parameter configuration
csiDriver:
  # Driver name, it is strongly recommended not to modify this parameter
  # The CCE platform needs to modify this parameter, e.g. csi.oceanstor.com
  driverName: csi.huawei.com
  # Endpoint, it is strongly recommended not to modify this parameter
  endpoint: /csi/csi.sock
  # DR Endpoint, it is strongly recommended not to modify this parameter
  drEndpoint: /csi/dr-csi.sock
  # Maximum number of concurrent disk scans or detaches, support 1~10
  connectorThreads: 4
  # Flag to enable or disable volume multipath access, support [true, false]
  volumeUseMultipath: true
  # Multipath software used by fc/iscsi. support [DM-multipath, HW-UltraPath, HW-UltraPath-NVMe]
  scsiMultipathType: DM-multipath
  # Multipath software used by roce/fc-nvme. only support [HW-UltraPath-NVMe]
  nvmeMultipathType: HW-UltraPath-NVMe
  # Timeout interval for waiting for multipath aggregation when DM-multipath is used on the host. support 1~600
  scanVolumeTimeout: 3
  # Timeout interval for running command on the host. support 1~600
  execCommandTimeout: 30
  # check the number of paths for multipath aggregation
  # Allowed values:
  #   true: the number of paths aggregated by DM-multipath is equal to the number of online paths
  #   false: the number of paths aggregated by DM-multipath is not checked.
  # Default value: false
  allPathOnline: false
  # Interval for updating backend capabilities. support 60~600
  backendUpdateInterval: 60
  # label enable
  enableLabel: false
  # Huawei-csi-controller log configuration
  controllerLogging:
    # Log record type, support [file, console]
    module: file
    # Log Level, support [debug, info, warning, error, fatal]
    level: info
    # Directory for storing logs
    fileDir: /var/log/huawei
    # Size of a single log file
    fileSize: 20M
    # Maximum number of log files that can be backed up.
    maxBackups: 9
  # Huawei-csi-node log configuration
  nodeLogging:
    # Log record type, support [file, console]
    module: file
    # Log Level, support [debug, info, warning, error, fatal]
    level: info
    # Directory for storing logs
    fileDir: /var/log/huawei
    # Size of a single log file
    fileSize: 20M
    # Maximum number of log files that can be backed up.
    maxBackups: 9

# leaderElection configuration
leaderElection:
  leaseDuration: 8s
  renewDeadline: 6s
  retryPeriod: 2s

Related problem is in #137, but here is more detailed info.

sieczak-a · 2024-03-12T09:40:48Z

Nevermind, changing configuration from fsGroupPolicy: ReadWriteOnceWithFSType to fsGroupPolicy: null resolved problem with mounting volumes.

sieczak-a closed this as completed Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newly created volume cannot be mounted to pod after upgrade to 4.2.1 #142

Newly created volume cannot be mounted to pod after upgrade to 4.2.1 #142

sieczak-a commented Mar 7, 2024

sieczak-a commented Mar 12, 2024

Newly created volume cannot be mounted to pod after upgrade to 4.2.1 #142

Newly created volume cannot be mounted to pod after upgrade to 4.2.1 #142

Comments

sieczak-a commented Mar 7, 2024

My stack

Problem

Solution from docs doesn't work

Configuration

sieczak-a commented Mar 12, 2024