Performance with one vs multiple output configurations #7273
-
I have the following use case:
I have two ways to configure output(s)
Configure a single output plugin for that writes log to the destination coming out of any of these 4 namespaces
As I see, the Match_Regex will be huge for 100 namespaces.
Configure an output plugin per namespace that writes logs to the same output in parallel
and so on. For 100 namespaces I'll have 100 outputs now. My questions are:
cc: @patrick-stephens if you could help answer these questions or know anyone who can answer these. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 7 replies
-
Why not just use a grep to drop the namespaces you do not care about then just a single output matching anything left? input --> kubernetes filter (to get metadata) --> grep kubernetes.namespace key --> output |
Beta Was this translation helpful? Give feedback.
-
You could also modify the tail input to only tail the namespaces you care about too. The namespace is part of the filename I believe. You could either do this in one tail input or maybe better to have multiple tail inputs which then stops a noisy namespace set of logs starving the other namespaces, i.e. a noisy neighbours problem. |
Beta Was this translation helpful? Give feedback.
-
All these questions are pretty subjective based on your actual data and infrastructure configuration I would say. Theoretical answers can probably be provided but honestly it would be easy to verify performance directly on your cluster to confirm actual results with your specific log files and data rates plus pod configuration. |
Beta Was this translation helpful? Give feedback.
No worries, and I'll try to answer without a generic "it depends" (but it does!) :)
Multiple outputs should not cause a problem in the simple, happy day scenario of all outputs being reachable, i.e. no backpressure.
If there is backpressure then you need to decide what to do - should we block input to allow stuff to catch up (what if it never does?), how many times should we retry, how much buffering do you want and do you want it persistent or in-memory, etc.
And obviously still depends on data rates but this is no different to a single pipeline having to do lots of data too, i.e. can you actually process the data rate required with the CPU you have available? Chunking it up into multipl…