Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the documentation about how to delete over-resource-used pods in "Multi Hierarchy Elastic Quota Management" #138

Open
jiasheng55 opened this issue Jul 11, 2023 · 2 comments

Comments

@jiasheng55
Copy link
Contributor

https://github.com/koordinator-sh/website/blob/main/docs/user-manuals/multi-hierarchy-elastic-quota-management.md#used--runtime-revoke

Suggestions:

  1. To delete over-resource-used pods, we need to set enableCheckParentQuota: true in koord-scheduler-config, otherwise there would be no pods revocation.
  2. To enable deletion, we need to add the following config to koord-scheduler-role ClusterRole, otherwise we would get error messages like User "system:serviceaccount:koordinator-system:koord-scheduler" cannot create resource "pods/eviction" in API group "" in the namespace "xxx"
- apiGroups:
  - ""
  resources:
  - pods/eviction
  verbs:
  - create
@xulinfei1996
Copy link

  1. To delete over-resource-used pods, we need to set enableCheckParentQuota: true in koord-scheduler-config, otherwise there would be no pods revocation.
    If enableCheckParentQuota is true, PreFIlter will check whether the parentQuota's RuntimeQuota and UsedQuota satisfies the pod's Request when the pod's request is satisfied with the childQuota. Its usage is to avoid the parent's quota over-resource-used.

As for the parent quota over-resource-used, a possible case can be that.
Parent Max 10Gi, Child A Max 10Gi, Child B Max 10Gi
At time t0, only child A submit pod, and its runtime and used is 10Gi.
At time t1, child B submit 10 pods, each pod request is 1Gi, so child A and Child B runtime are both 5Gi. 5 pods of child B can pass the ElasticQuota check, so child B used is 5Gi, while Child A used is 10Gi. So the parent Used is 15 Gi, larger than the runtime.
If enableCheckParentQuota is true, all of the 10 child B's pods will be Pending because of the quota limit, but child A used is still larger than runtime. If over-use-revoke is enabled, after child A's pods evicted, child B's pods can be running.

@jiasheng55
Copy link
Contributor Author

@xulinfei1996 Sorry I don't quite understand your comments. Are you trying to say that we should not set enableCheckParentQuota: true to enable "used > runtime revoke"? Or we should add more documentation about the consequences of setting it?

jiasheng55 pushed a commit to jiasheng55/koordinator-website that referenced this issue Aug 2, 2023
koordinator-bot bot pushed a commit that referenced this issue Aug 2, 2023
… in elastic quota management (#138) (#139)

Signed-off-by: Victor Wang <[email protected]>
Co-authored-by: wangjiasheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants