- Deny List in Operator
- Reconcile Time in Operator
- Finalizer in Druid CR
- Deletion of Orphan PVCs
- Rolling Deploy
- Force Delete of Sts Pods
- Horizontal Scaling of Druid Pods
- Volume Expansion of Druid Pods Running As StatefulSets
- Add Additional Containers to Druid Pods
- Default Yet Configurable Probes
There may be use cases where we want the operator to watch all namespaces except a few
(might be due to security, testing flexibility, etc. reasons).
Druid operator supports such cases - in the chart, edit env.DENY_LIST
to be a comma-seperated list.
For example: "default,kube-system"
As per operator pattern, the druid operator reconciles every 10s (default reconciliation time) to make sure
the desired state (in that case, the druid CR's spec) is in sync with the current state.
The reconciliation time can be adjusted - in the chart, add env.RECONCILE_WAIT
to be a duration
in seconds.
Examples: "10s", "30s", "120s"
The Druid operator supports provisioning of StatefulSets and Deployments. When a StatefulSet is created,
a PVC is created along. When the Druid CR is deleted, the StatefulSet controller does not delete the PVC's
associated with it.
In case the PVC data is important and you wish to reclaim it, you can enable: DisablePVCDeletionFinalizer: true
in the Druid CR.
The default behavior is to trigger finalizers and pre-delete hooks that will be executed. They will first clean up the
StatefulSet and then the PVCs referenced to it. That means that after a
deletion of a Druid CR, any PVCs provisioned by a StatefulSet will be deleted.
There are some use-cases (the most popular is horizontal auto-scaling) where a StatefulSet scales down. In that case,
the statefulSet will terminate its owned pods but nit their attached PVCs which left orphaned and unused.
The operator support the ability to auto delete these PVCs. This can be enabled by setting deleteOrphanPvc: true
.
The operator supports Apache Druid's recommended rolling updates. It will do incremental updates in the order
specified in Druid's documentation.
In case any of the node goes in pending/crashing state during an update, the operator halts the update and does
not continue with the update - this will require a manual intervention.
Default updates are done in parallel. Since cluster creation does not require a rolling update, they will be done
in parallel anyway. To enable this feature, set rollingDeploy: true
in the Druid CR.
During upgradeS, if THE StatefulSet is set to OrderedReady
- the StatefulSet controller will not recover from
crash-loopback state. The issues is referenced here.
Documentation reference: doc
The operator solves this by using the forceDeleteStsPodOnError
key, the operator will delete the sts pod if its in
crash-loopback state.
Example scenario: During upgrade, user rolls out a faulty configuration causing the historical pod going in crashing
state. Then, the user rolls out a valid configuration - the new configuration will not be applied unless user manually
delete the pods. To solve this scenario, the operator will delete the pod automatically without user intervention.
NOTE: User must be aware of this feature, there might be cases where crash-loopback might be caused due probe failure,
fault image etc, the operator will keep on deleting on each re-concile loop. Default Behavior is True.
The operator supports the HPA autosaling/v2
specification in the nodeSpec
for druid nodes. In case an HPA deployed,
the HPA controller maintains the replica count/state for the particular workload referenced.
Refer to examples.md
for HPA configuration.
NOTE: This option in currently prefered to scale only brokers using HPA. In order to scale Middle Managers with HPA,
its recommended not to use HPA. Refer to these discussions which have adderessed the issues in details:
NOTE: This feature has been tested only on cloud environments and storage classes which have supported volume expansion.
This feature uses cascade=orphan strategy to make sure that only the StatefulSet is deleted and recreated and pods
are not deleted.
Druid Nodes (specifically historical nodes) run as StatefulSets. Each StatefulSet replica has a PVC attached. The
NodeSpec
in Druid CR has the key volumeClaimTemplates
where users can define the PVC's storage class as well
as size. Currently, in Kubernetes, in case a user wants to increase the size in the node, the StatefulSets cannot
be directly updated. The Druid operator can perform a seamless update of the StatefulSet, and patch the
PVCs with the desired size defined in the druid CR. Behind the scenes, the operator performs a cascade deletion of the
StatefulSet, and patches the PVC. Cascade deletion has no affect to the pods running (queries are served and no
downtime is experienced).
While enabling this feature, the operator will check if volume expansion is supported in the storage class mentioned
in the druid CR, only then will it perform expansion.
This feature is disabled by default. To enable it set scalePvcSts: true
in the Druid CR.
By default, this feature is disabled.
IMPORTANT: Shrinkage of pvc's isnt supported - desiredSize cannot be less than currentSize as well as counts.
The operator supports adding additional containers to run along with the druid pods. This helps support co-located,
co-managed helper processes for the primary druid application. This can be used for init containers, sidecars,
proxies etc.
To enable this features users just need to add new containers to the AdditionalContainers
in the Druid spec API.
There are two scopes you can add additional container:
- Cluster scope: Under
spec.additionalContainer
which means that additional containers will be common to all the nodes. - Node scope: Under
spec.nodes[NODE_TYPE].additionalContainer
which means that additional containers will be common to all the pods whithin a specific node group.
The operator create the Deployments and StatefulSets with a default set of probes for each druid components.
These probes can be overriden by adding one of the probes in the DruidSpec
(global) or under the
NodeSpec
(component-scope).
This feature is enabled by default.
defaultProbes: false
if you have the kubernetes-overlord-extensions
enabled also named middle manager less druid in k8s
more details are described here: #97 (comment)
All the probes definitions are documented bellow:
Coordinator, Overlord, MiddleManager, Router and Indexer probes
livenessProbe:
failureThreshold: 10
httpGet:
path: /status/health
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
failureThreshold: 10
httpGet:
path: /status/health
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
startupProbe:
failureThreshold: 10
httpGet:
path: /status/health
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
Broker probes
livenessProbe:
failureThreshold: 10
httpGet:
path: /status/health
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
failureThreshold: 20
httpGet:
path: /druid/broker/v1/readiness
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
startupProbe:
failureThreshold: 20
httpGet:
path: /druid/broker/v1/readiness
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
Historical probes
livenessProbe:
failureThreshold: 10
httpGet:
path: /status/health
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
failureThreshold: 20
httpGet:
path: /druid/historical/v1/readiness
port: $druid.port
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
startUpProbe:
failureThreshold: 20
httpGet:
path: /druid/historical/v1/readiness
port: $druid.port
initialDelaySeconds: 180
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10