Race condition: two PVCs get the same project quota #155

andreasreinhardt · 2023-03-31T08:42:21Z

Describe the bug: We use localpv with ext4 hard quotas. They work quite fine, but from time to time, we get the problem, that the quota has exceeded despite the folder contains less than the defined quota (10GiB). Today I could track the problem down to 2 PVCs that oviously had the same project quota ID set:

/nvme/disk# ls
lost+found  pvc-2fabebc9-8143-4b60-beef-563180845e64  pvc-6d3a015a-c547-4292-9ed6-95b35a7aea41

/nvme/disk/pvc-6d3a015a-c547-4292-9ed6-95b35a7aea41# du -h --max-depth=1
4.2G	./workspace
33M	./remoting
8.0K	./caches
4.3G	.

/nvme/disk# du -h --max-depth=1
6.1G	./pvc-2fabebc9-8143-4b60-beef-563180845e64
16K	./lost+found
4.3G	./pvc-6d3a015a-c547-4292-9ed6-95b35a7aea41
11G	.

/nvme/disk# repquota -avugP
*** Report for project quotas on device /dev/md0
Block grace time: 7days; Inode grace time: 7days
                        Block limits                File limits
Project         used    soft    hard  grace    used  soft  hard  grace
----------------------------------------------------------------------
#0        --      20       0       0              2     0     0       
#1        --       0 10737419 10737419              0     0     0       
#2        --       0 10737419 10737419              0     0     0       
#3        --       0 10737419 10737419              0     0     0       
#4        -- 10737416 10737419 10737419           6122     0     0       
#5        --       0 10737419 10737419              0     0     0       
#6        --       0 10737419 10737419              0     0     0

I think the problem occurs because of a race condition when determining the project id:
https://github.com/openebs/dynamic-localpv-provisioner/blob/e797585cb1e2c3578b914102bfe0e8768b04d950/cmd/provisioner-localpv/app/helper_hostpath.go#L294+L295

I see two possible workaround: either make sure that only one create-quota-pod can run at a time on one single node or apply a random project number instead of trying to increment them.

Expected behaviour: Each PVC has the quota it is configured with.

Steps to reproduce the bug:
Unfortunately, it is really hard to reproduce the bug, as it only happens now and then. During tests I scaled a deployment with a PVC up and down very fast to check the create and cleanup and had no problem. Maybe you can reproduce it with more than one deployment scaled up in parallel

The output of the following commands will help us better understand what's going on:

kubectl get pods -n <openebs_namespace> --show-labels
nvme-provisioner-localpv-provisioner-68f8494cf7-84hdv 1/1 Running 80 (12h ago) 32d app=localpv-provisioner,chart=localpv-provisioner-3.3.0,component=localpv-provisioner,heritage=Helm,name=openebs-localpv-provisioner,openebs.io/component-name=openebs-localpv-provisioner,openebs.io/version=3.3.0,pod-template-hash=68f8494cf7,release=nvme-provisioner

Anything else we need to know?:
The provisioner pod has lots of restarts, we don't know why, there is no error in the pod log, but it seems not to be related

Environment details:

OpenEBS version (use kubectl get po -n openebs --show-labels): 3.3.0
Kubernetes version (use kubectl version): 1.23.15
Cloud provider or hardware configuration: AWS
OS (e.g: cat /etc/os-release): Amazon Linux 2
kernel (e.g: uname -a): 5.4.228-131.415.amzn2.x86_64

The text was updated successfully, but these errors were encountered:

niladrih · 2024-06-26T06:12:08Z

The provisioning jobs are asynchronous. The issue makes sense to me. I understand it'd be difficult to reproduce, so I'm not going to try it. It is clearly apparent that it exists. Thank you for reporting this!

niladrih added the bug Something isn't working label Jun 26, 2024

niladrih self-assigned this Jun 26, 2024

tiagolobocastro mentioned this issue Oct 7, 2024

When docker overlay2.size is enabled, the actual size of the pvc request is always the overlay2.size setting size #176

Open

avishnu added this to the v4.3 milestone Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition: two PVCs get the same project quota #155

Race condition: two PVCs get the same project quota #155

andreasreinhardt commented Mar 31, 2023

niladrih commented Jun 26, 2024

Race condition: two PVCs get the same project quota #155

Race condition: two PVCs get the same project quota #155

Comments

andreasreinhardt commented Mar 31, 2023

niladrih commented Jun 26, 2024