Skip to content

Commit

Permalink
NETOBSERV-1692: Add FLP-based deduper options
Browse files Browse the repository at this point in the history
FLP-based dedup allows to decrease Loki CPU / memory / storage a lot (~50%) at the cost of minimal loss in data accuracy (e.g. loosing interfaces involved in egress traffic)
  • Loading branch information
jotak committed Oct 15, 2024
1 parent 63c1b7e commit 82f5360
Show file tree
Hide file tree
Showing 11 changed files with 385 additions and 5 deletions.
28 changes: 28 additions & 0 deletions apis/flowcollector/v1beta1/flowcollector_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -604,13 +604,41 @@ type FlowCollectorFLP struct {
// When a subnet matches the source or destination IP of a flow, a corresponding field is added: `SrcSubnetLabel` or `DstSubnetLabel`.
SubnetLabels SubnetLabels `json:"subnetLabels,omitempty"`

//+optional
// `deduper` allows to sample or drop flows identified as duplicates, in order to save on resource usage.
Deduper *FLPDeduper `json:"deduper,omitempty"`

// `debug` allows setting some aspects of the internal configuration of the flow processor.
// This section is aimed exclusively for debugging and fine-grained performance optimizations,
// such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk.
// +optional
Debug DebugConfig `json:"debug,omitempty"`
}

type FLPDeduperMode string

const (
FLPDeduperDisabled FLPDeduperMode = "Disabled"
FLPDeduperDrop FLPDeduperMode = "Drop"
FLPDeduperSample FLPDeduperMode = "Sample"
)

// `FLPDeduper` defines the desired configuration for FLP-based deduper
type FLPDeduper struct {
// Set the Processor deduper mode (de-duplication). It comes in addition to the Agent deduper because the Agent cannot de-duplicate same flows reported from different nodes.<br>
// - Use `Drop` to drop every flow considered as duplicates, allowing saving more on resource usage but potentially loosing some information such as the network interfaces used from peer.<br>
// - Use `Sample` to randomly keep only 1 flow on 50 (by default) among the ones considered as duplicates. This is a compromise between dropping every duplicates or keeping every duplicates. This sampling action comes in addition to the Agent-based sampling. If both Agent and Processor sampling are 50, the combined sampling is 1:2500.<br>
// - Use `Disabled` to turn off Processor-based de-duplication.<br>
// +kubebuilder:validation:Enum:="Disabled";"Drop";"Sample"
// +kubebuilder:default:=Disabled
Mode FLPDeduperMode `json:"mode,omitempty"`

// `sampling` is the sampling rate when deduper `mode` is `Sample`.
//+kubebuilder:validation:Minimum=0
//+kubebuilder:default:=50
Sampling int32 `json:"sampling,omitempty"`
}

const (
HPAStatusDisabled = "DISABLED"
HPAStatusEnabled = "ENABLED"
Expand Down
34 changes: 34 additions & 0 deletions apis/flowcollector/v1beta1/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions apis/flowcollector/v1beta1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 28 additions & 0 deletions apis/flowcollector/v1beta2/flowcollector_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -657,13 +657,41 @@ type FlowCollectorFLP struct {
// When a subnet matches the source or destination IP of a flow, a corresponding field is added: `SrcSubnetLabel` or `DstSubnetLabel`.
SubnetLabels SubnetLabels `json:"subnetLabels,omitempty"`

//+optional
// `deduper` allows to sample or drop flows identified as duplicates, in order to save on resource usage.
Deduper *FLPDeduper `json:"deduper,omitempty"`

// `advanced` allows setting some aspects of the internal configuration of the flow processor.
// This section is aimed mostly for debugging and fine-grained performance optimizations,
// such as `GOGC` and `GOMAXPROCS` env vars. Set these values at your own risk.
// +optional
Advanced *AdvancedProcessorConfig `json:"advanced,omitempty"`
}

type FLPDeduperMode string

const (
FLPDeduperDisabled FLPDeduperMode = "Disabled"
FLPDeduperDrop FLPDeduperMode = "Drop"
FLPDeduperSample FLPDeduperMode = "Sample"
)

// `FLPDeduper` defines the desired configuration for FLP-based deduper
type FLPDeduper struct {
// Set the Processor deduper mode (de-duplication). It comes in addition to the Agent deduper because the Agent cannot de-duplicate same flows reported from different nodes.<br>
// - Use `Drop` to drop every flow considered as duplicates, allowing saving more on resource usage but potentially loosing some information such as the network interfaces used from peer.<br>
// - Use `Sample` to randomly keep only 1 flow on 50 (by default) among the ones considered as duplicates. This is a compromise between dropping every duplicates or keeping every duplicates. This sampling action comes in addition to the Agent-based sampling. If both Agent and Processor sampling are 50, the combined sampling is 1:2500.<br>
// - Use `Disabled` to turn off Processor-based de-duplication.<br>
// +kubebuilder:validation:Enum:="Disabled";"Drop";"Sample"
// +kubebuilder:default:=Disabled
Mode FLPDeduperMode `json:"mode,omitempty"`

// `sampling` is the sampling rate when deduper `mode` is `Sample`.
//+kubebuilder:validation:Minimum=0
//+kubebuilder:default:=50
Sampling int32 `json:"sampling,omitempty"`
}

type HPAStatus string

const (
Expand Down
20 changes: 20 additions & 0 deletions apis/flowcollector/v1beta2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions bundle/manifests/flows.netobserv.io_flowcollectors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1739,6 +1739,30 @@ spec:
in edge debug or support scenarios.
type: object
type: object
deduper:
description: '`deduper` allows to sample or drop flows identified
as duplicates, in order to save on resource usage.'
properties:
mode:
default: Disabled
description: |-
Set the Processor deduper mode (de-duplication). It comes in addition to the Agent deduper because the Agent cannot de-duplicate same flows reported from different nodes.<br>
- Use `Drop` to drop every flow considered as duplicates, allowing saving more on resource usage but potentially loosing some information such as the network interfaces used from peer.<br>
- Use `Sample` to randomly keep only 1 flow on 50 (by default) among the ones considered as duplicates. This is a compromise between dropping every duplicates or keeping every duplicates. This sampling action comes in addition to the Agent-based sampling. If both Agent and Processor sampling are 50, the combined sampling is 1:2500.<br>
- Use `Disabled` to turn off Processor-based de-duplication.<br>
enum:
- Disabled
- Drop
- Sample
type: string
sampling:
default: 50
description: '`sampling` is the sampling rate when deduper
`mode` is `Sample`.'
format: int32
minimum: 0
type: integer
type: object
dropUnusedFields:
default: true
description: '`dropUnusedFields` [deprecated (*)] this setting
Expand Down Expand Up @@ -7819,6 +7843,30 @@ spec:
in the flows data. This is useful in a multi-cluster context.
When using OpenShift, leave empty to make it automatically determined.'
type: string
deduper:
description: '`deduper` allows to sample or drop flows identified
as duplicates, in order to save on resource usage.'
properties:
mode:
default: Disabled
description: |-
Set the Processor deduper mode (de-duplication). It comes in addition to the Agent deduper because the Agent cannot de-duplicate same flows reported from different nodes.<br>
- Use `Drop` to drop every flow considered as duplicates, allowing saving more on resource usage but potentially loosing some information such as the network interfaces used from peer.<br>
- Use `Sample` to randomly keep only 1 flow on 50 (by default) among the ones considered as duplicates. This is a compromise between dropping every duplicates or keeping every duplicates. This sampling action comes in addition to the Agent-based sampling. If both Agent and Processor sampling are 50, the combined sampling is 1:2500.<br>
- Use `Disabled` to turn off Processor-based de-duplication.<br>
enum:
- Disabled
- Drop
- Sample
type: string
sampling:
default: 50
description: '`sampling` is the sampling rate when deduper
`mode` is `Sample`.'
format: int32
minimum: 0
type: integer
type: object
imagePullPolicy:
default: IfNotPresent
description: '`imagePullPolicy` is the Kubernetes pull policy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -790,6 +790,12 @@ spec:
path: networkPolicy.additionalNamespaces
- displayName: Enable
path: networkPolicy.enable
- displayName: Deduper
path: processor.deduper
- displayName: Mode
path: processor.deduper.mode
- displayName: Sampling
path: processor.deduper.sampling
- displayName: Log types
path: processor.logTypes
- displayName: Disable alerts
Expand Down
44 changes: 44 additions & 0 deletions config/crd/bases/flows.netobserv.io_flowcollectors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1574,6 +1574,28 @@ spec:
in edge debug or support scenarios.
type: object
type: object
deduper:
description: '`deduper` allows to sample or drop flows identified as duplicates, in order to save on resource usage.'
properties:
mode:
default: Disabled
description: |-
Set the Processor deduper mode (de-duplication). It comes in addition to the Agent deduper because the Agent cannot de-duplicate same flows reported from different nodes.<br>
- Use `Drop` to drop every flow considered as duplicates, allowing saving more on resource usage but potentially loosing some information such as the network interfaces used from peer.<br>
- Use `Sample` to randomly keep only 1 flow on 50 (by default) among the ones considered as duplicates. This is a compromise between dropping every duplicates or keeping every duplicates. This sampling action comes in addition to the Agent-based sampling. If both Agent and Processor sampling are 50, the combined sampling is 1:2500.<br>
- Use `Disabled` to turn off Processor-based de-duplication.<br>
enum:
- Disabled
- Drop
- Sample
type: string
sampling:
default: 50
description: '`sampling` is the sampling rate when deduper `mode` is `Sample`.'
format: int32
minimum: 0
type: integer
type: object
dropUnusedFields:
default: true
description: '`dropUnusedFields` [deprecated (*)] this setting is not used anymore.'
Expand Down Expand Up @@ -7209,6 +7231,28 @@ spec:
default: ""
description: '`clusterName` is the name of the cluster to appear in the flows data. This is useful in a multi-cluster context. When using OpenShift, leave empty to make it automatically determined.'
type: string
deduper:
description: '`deduper` allows to sample or drop flows identified as duplicates, in order to save on resource usage.'
properties:
mode:
default: Disabled
description: |-
Set the Processor deduper mode (de-duplication). It comes in addition to the Agent deduper because the Agent cannot de-duplicate same flows reported from different nodes.<br>
- Use `Drop` to drop every flow considered as duplicates, allowing saving more on resource usage but potentially loosing some information such as the network interfaces used from peer.<br>
- Use `Sample` to randomly keep only 1 flow on 50 (by default) among the ones considered as duplicates. This is a compromise between dropping every duplicates or keeping every duplicates. This sampling action comes in addition to the Agent-based sampling. If both Agent and Processor sampling are 50, the combined sampling is 1:2500.<br>
- Use `Disabled` to turn off Processor-based de-duplication.<br>
enum:
- Disabled
- Drop
- Sample
type: string
sampling:
default: 50
description: '`sampling` is the sampling rate when deduper `mode` is `Sample`.'
format: int32
minimum: 0
type: integer
type: object
imagePullPolicy:
default: IfNotPresent
description: '`imagePullPolicy` is the Kubernetes pull policy for the image defined above'
Expand Down
Loading

0 comments on commit 82f5360

Please sign in to comment.