Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing timestamp manually seems to work in VRL but throw errors when applied to Vector config #21812

Open
michellabbe opened this issue Nov 15, 2024 · 2 comments
Labels
type: bug A code related bug.

Comments

@michellabbe
Copy link

michellabbe commented Nov 15, 2024

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I'm exporting Traefik access logs in a JSON file, and using Vector (timberio/vector:latest-alpine docker image) to forward the logs to a Graylog server.

The configuration below works fine except timestamp in the access log isn't recognized.

Traefik saves ISO8601 timestamp in a field named time format, and final timestamp in Graylog differ from the time field:

message = [...],"time":"2024-11-15T08:53:48-05:00"}
time      = 2024-11-15 08:53:48.000  (field type = date)
timestamp = 2024-11-15 08:53:58.191  (field type = date)

While I could customize Traefik log format to rename time as timestamp, this would force me to maintain the custom format on any change. It would be much easier to teach Vector to use the default time field, and at first it seemed very easy to do.

Copy time to timestamp in VRL seems to work:

/ # vector vrl --input /var/log/traefik/access.log
[...]

$ .timestamp = .time
"2024-11-07T22:15:13-05:00"

$ .time
"2024-11-07T22:15:13-05:00"

$ .timestamp
"2024-11-07T22:15:13-05:00"

$ .
[...], "time": "2024-11-07T22:15:13-05:00", "timestamp": "2024-11-07T22:15:13-05:00" }

When applied to the vector config file (uncomment the line from the Configuration section below), Vector container log throws this error on the next access log:

2024-11-15T13:58:42.676883Z ERROR sink{component_kind="sink" component_id=graylog component_type=socket}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Failed serializing frame." internal_log_rate_limit=true
2024-11-15T13:58:42.676835Z ERROR sink{component_kind="sink" component_id=graylog component_type=socket}: vector::internal_events::codecs: Failed serializing frame. error=LogEvent contains a value with an invalid type. field = "timestamp" type = "null" expected type = "timestamp or integer" error_code="encoder_serialize" error_type="encoder_failed" stage="sending" internal_log_rate_limit=true

-edit- Same behavior when trying to rename the field using .timestamp = del(.time) instead of copying it.

The time field is defined exactly the same and works fine so that definitely looks like a bug in Vector.

There doesn't seem to be a way to specify type = timestamp so let's try parse_timestamp instead.
Again, in VRL again seems to work:

$ .timestamp = parse_timestamp!(.time, format: "%+")
t'2024-11-08T03:15:13Z'

$ .time
"2024-11-07T22:15:13-05:00"

$ .timestamp
t'2024-11-08T03:15:13Z'

$ .
[...], "time": "2024-11-07T22:15:13-05:00", "timestamp": t'2024-11-08T03:15:13Z' }

That's a long shot but let's try anyway.

Also note that the parsed timestamp got switched to UTC, which doesn't seem to be revertable before sinking it (ref: #3333 ).
While I prefer to keep times in local timezone as much as possible for readability, it shouldn't be an issue as timezone is specified in the value.

When applied to Vector config file, Vector container log throws this error on the next access log:

2024-11-15T14:21:57.525859Z ERROR transform{component_kind="transform" component_id=cf_traefik component_type=remap}: vector::internal_events::remap: Mapping failed with event. error="function call error for \"parse_timestamp\" at (83:120): unable to convert value to timestamp" error_type="conversion_failed" stage="processing" internal_log_rate_limit=true
2024-11-15T14:21:57.525898Z ERROR transform{component_kind="transform" component_id=cf_traefik component_type=remap}: vector::internal_events::remap: Internal log [Mapping failed with event.] is being suppressed to avoid flooding.

I thought maybe the timestamp field is treated special so I tried the same functions with other field names, with the same results.

Configuration

# Set global options
data_dir: "/var/lib/vector"
#timezone: "local"
timezone: "America/New_York"

sources:
  traefik_logs:
    type: "file"
    include:
      - "/var/log/traefik/access.log"
    ignore_older_secs: 3600      # 1 hour

transforms:
  custom_fields:
    type: "remap"
    inputs:
      - cf_*
    source: |
      .host_name = "${HOST_NAME}"
      .agent_type = "vector"

  cf_traefik:
    type: "remap"
    inputs:
      - traefik_logs
    source: |
      .event_kind = "traefik"
      .service_type = "traefik"
      #.timestamp = .time
      #.timestamp = parse_timestamp!(.time, format: "%+")
      #.timestamp = parse_timestamp!(.time, format: "%Y-%m-%dT%H:%M:%S%.f%:z")
      #.timestamp_msg = .time
      #.timestamp_msg = parse_timestamp!(.time, format: "%+")
      #.timestamp_msg = parse_timestamp!(.time, format: "%Y-%m-%dT%H:%M:%S%.f%:z")

sinks:
  graylog:
    type: "socket"
    inputs:
      - custom_fields
    address: "graylog.homelab.lan:12201"
    mode: "udp"
    encoding:
      codec: "gelf"
      timestamp_format: "rfc3339"

Version

0.42.0

Debug Output

No response

Example Data

{ "ClientAddr": "192.168.0.33:54978", "ClientHost": "192.168.0.33", "ClientPort": "54978", "ClientUsername": "-", "DownstreamContentSize": 569679, "DownstreamStatus": 200, "Duration": 699781673, "OriginContentSize": 569679, "OriginDuration": 699595289, "OriginStatus": 200, "Overhead": 186384, "RequestAddr": "cadvisor.docker2.mlabbe.lan:443", "RequestContentSize": 0, "RequestCount": 1, "RequestHost": "cadvisor.docker2.mlabbe.lan", "RequestMethod": "GET", "RequestPath": "/metrics", "RequestPort": "443", "RequestProtocol": "HTTP/2.0", "RequestScheme": "https", "RetryAttempts": 0, "RouterName": "cadvisor@docker", "ServiceAddr": "192.168.80.6:8080", "ServiceName": "cadvisor-service@docker", "ServiceURL": "http://192.168.80.6:8080", "SpanId": "0000000000000000", "StartLocal": "2024-11-07T22:15:13.010239743-05:00", "TLSCipher": "TLS_AES_128_GCM_SHA256", "TLSVersion": "1.3", "TraceId": "00000000000000000000000000000000", "downstream_Content-Encoding": "gzip", "downstream_Content-Type": "text/plain; version=0.0.4; charset=utf-8", "downstream_Date": "Fri, 08 Nov 2024 03:15:13 GMT", "entryPointName": "websecure", "level": "info", "msg": "", "origin_Content-Encoding": "gzip", "origin_Content-Type": "text/plain; version=0.0.4; charset=utf-8", "origin_Date": "Fri, 08 Nov 2024 03:15:13 GMT", "request_Accept": "application/openmetrics-text;version=1.0.0;q=0.5,application/openmetrics-text;version=0.0.1;q=0.4,text/plain;version=0.0.4;q=0.3,/;q=0.2", "request_Accept-Encoding": "gzip", "request_User-Agent": "Prometheus/2.55.0", "request_X-Forwarded-Host": "cadvisor.docker2.mlabbe.lan:443", "request_X-Forwarded-Port": "443", "request_X-Forwarded-Proto": "https", "request_X-Forwarded-Server": "traefik", "request_X-Prometheus-Scrape-Timeout-Seconds": "45", "request_X-Real-Ip": "192.168.0.33", "time": "2024-11-07T22:15:13-05:00" }

Additional Context

No response

References

No response

@michellabbe michellabbe added the type: bug A code related bug. label Nov 15, 2024
@pront
Copy link
Member

pront commented Nov 15, 2024

Hi @michellabbe, thank you for providing all the details. However, I set this up locally and I cannot reproduce.

# Set global options
data_dir: /path/to/data_dir_0
#timezone: "local"
timezone: "America/New_York"

sources:
  traefik_logs:
    type: "file"
    include:
      - path/to/traefik-sample.txt
#    ignore_older_secs: 3600      # 1 hour
    ignore_checkpoints: true
    offset_key: offset
    read_from: beginning

transforms:
  custom_fields:
    type: "remap"
    inputs:
      - cf_*
    source: |
      .host_name = "foo"
      .agent_type = "vector"

  cf_traefik:
    type: "remap"
    inputs:
      - traefik_logs
    source: |
      .event_kind = "traefik"
      .service_type = "traefik"

      .time = to_unix_timestamp(parse_timestamp!(.time, format: "%+"))
      #.timestamp = to_unix_timestamp(parse_timestamp!(.time, format: "%+"))

sinks:
  graylog:
    type: "socket"
    inputs:
      - custom_fields
    address: 127.0.0.1:12201 # dummy python script
    mode: "udp"
    encoding:
      codec: "gelf"
      timestamp_format: "rfc3339"

  console:
    inputs:
      - custom_fields
    target: stdout
    type: console
    encoding:
      codec: "gelf"
    buffer:
      type: memory
      max_events: 500
      when_full: block
cargo run --color=always -- --config path/to/config.yml
/usr/bin/python3 pront/scripts/socket_sink.py 
Listening on 127.0.0.1:12201
Received message from ('127.0.0.1', 61171): {"_agent_type":"vector","_host_name":"foo","_offset":0,"_source_type":"file","file":"/Users/pavlos.rontidis/CLionProjects/vector/pront/data/traefik-sample.txt","host":"COMP-LPF0JYPP2Q","short_message":"{ \"ClientAddr\": \"192.168.0.33:54978\", \"ClientHost\": \"192.168.0.33\", \"ClientPort\": \"54978\", \"ClientUsername\": \"-\", \"DownstreamContentSize\": 569679, \"DownstreamStatus\": 200, \"Duration\": 699781673, \"OriginContentSize\": 569679, \"OriginDuration\": 699595289, \"OriginStatus\": 200, \"Overhead\": 186384, \"RequestAddr\": \"cadvisor.docker2.mlabbe.lan:443\", \"RequestContentSize\": 0, \"RequestCount\": 1, \"RequestHost\": \"cadvisor.docker2.mlabbe.lan\", \"RequestMethod\": \"GET\", \"RequestPath\": \"/metrics\", \"RequestPort\": \"443\", \"RequestProtocol\": \"HTTP/2.0\", \"RequestScheme\": \"https\", \"RetryAttempts\": 0, \"RouterName\": \"cadvisor@docker\", \"ServiceAddr\": \"192.168.80.6:8080\", \"ServiceName\": \"cadvisor-service@docker\", \"ServiceURL\": \"http://192.168.80.6:8080\", \"SpanId\": \"0000000000000000\", \"StartLocal\": \"2024-11-07T22:15:13.010239743-05:00\", \"TLSCipher\": \"TLS_AES_128_GCM_SHA256\", \"TLSVersion\": \"1.3\", \"TraceId\": \"00000000000000000000000000000000\", \"downstream_Content-Encoding\": \"gzip\", \"downstream_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"downstream_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"entryPointName\": \"websecure\", \"level\": \"info\", \"msg\": \"\", \"origin_Content-Encoding\": \"gzip\", \"origin_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"origin_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"request_Accept\": \"application/openmetrics-text;version=1.0.0;q=0.5,application/openmetrics-text;version=0.0.1;q=0.4,text/plain;version=0.0.4;q=0.3,/;q=0.2\", \"request_Accept-Encoding\": \"gzip\", \"request_User-Agent\": \"Prometheus/2.55.0\", \"request_X-Forwarded-Host\": \"cadvisor.docker2.mlabbe.lan:443\", \"request_X-Forwarded-Port\": \"443\", \"request_X-Forwarded-Proto\": \"https\", \"request_X-Forwarded-Server\": \"traefik\", \"request_X-Prometheus-Scrape-Timeout-Seconds\": \"45\", \"request_X-Real-Ip\": \"192.168.0.33\", \"time\": \"2024-11-07T22:15:13-05:00\" }","timestamp":1731693265.424,"version":"1.1"}
Received message from ('127.0.0.1', 55630): {"_agent_type":"vector","_event_kind":"traefik","_host_name":"foo","_offset":0,"_service_type":"traefik","_source_type":"file","file":"/Users/pavlos.rontidis/CLionProjects/vector/pront/data/traefik-sample.txt","host":"COMP-LPF0JYPP2Q","short_message":"{ \"ClientAddr\": \"192.168.0.33:54978\", \"ClientHost\": \"192.168.0.33\", \"ClientPort\": \"54978\", \"ClientUsername\": \"-\", \"DownstreamContentSize\": 569679, \"DownstreamStatus\": 200, \"Duration\": 699781673, \"OriginContentSize\": 569679, \"OriginDuration\": 699595289, \"OriginStatus\": 200, \"Overhead\": 186384, \"RequestAddr\": \"cadvisor.docker2.mlabbe.lan:443\", \"RequestContentSize\": 0, \"RequestCount\": 1, \"RequestHost\": \"cadvisor.docker2.mlabbe.lan\", \"RequestMethod\": \"GET\", \"RequestPath\": \"/metrics\", \"RequestPort\": \"443\", \"RequestProtocol\": \"HTTP/2.0\", \"RequestScheme\": \"https\", \"RetryAttempts\": 0, \"RouterName\": \"cadvisor@docker\", \"ServiceAddr\": \"192.168.80.6:8080\", \"ServiceName\": \"cadvisor-service@docker\", \"ServiceURL\": \"http://192.168.80.6:8080\", \"SpanId\": \"0000000000000000\", \"StartLocal\": \"2024-11-07T22:15:13.010239743-05:00\", \"TLSCipher\": \"TLS_AES_128_GCM_SHA256\", \"TLSVersion\": \"1.3\", \"TraceId\": \"00000000000000000000000000000000\", \"downstream_Content-Encoding\": \"gzip\", \"downstream_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"downstream_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"entryPointName\": \"websecure\", \"level\": \"info\", \"msg\": \"\", \"origin_Content-Encoding\": \"gzip\", \"origin_Content-Type\": \"text/plain; version=0.0.4; charset=utf-8\", \"origin_Date\": \"Fri, 08 Nov 2024 03:15:13 GMT\", \"request_Accept\": \"application/openmetrics-text;version=1.0.0;q=0.5,application/openmetrics-text;version=0.0.1;q=0.4,text/plain;version=0.0.4;q=0.3,/;q=0.2\", \"request_Accept-Encoding\": \"gzip\", \"request_User-Agent\": \"Prometheus/2.55.0\", \"request_X-Forwarded-Host\": \"cadvisor.docker2.mlabbe.lan:443\", \"request_X-Forwarded-Port\": \"443\", \"request_X-Forwarded-Proto\": \"https\", \"request_X-Forwarded-Server\": \"traefik\", \"request_X-Prometheus-Scrape-Timeout-Seconds\": \"45\", \"request_X-Real-Ip\": \"192.168.0.33\", \"time\": \"2024-11-07T22:15:13-05:00\" }","timestamp":1731693356.305,"version":"1.1"}

A few things to note here:

  • If the timestamp field is absent:
    • Vector will generate a timestamp e.g. "timestamp":1731693356.305
    • time is a string and will be sent as is
  • You can use the VRL playground for quick tests, example

@michellabbe
Copy link
Author

Sorry for the long delay.

I'm not sure if I'm missing something from your tests.

Your test shows that a timestamp field is really sent, but it's still not matching the event timestamp in the time field.

\"time\": \"2024-11-07T22:15:13-05:00\" }","timestamp":1731693356.305
1731693356.305 = 2024-11-15T12:55:56-05:00

So that message would appear multiple days offset in Graylog.

While the timestamp could be fixed using a processing pipeline in Graylog, It's best when the log shippers can send data already processed so we don't end up wasting alot of resources on the Graylog server.

My testing works in VRL playground:
image
and also in vector vrl inside the container (as explained in my original post)

However, when I apply the same syntax in vector.yaml it throws an error, no matter if I'm trying to fix timestamp directly, or just create a new field (string).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants