Events: Stats sometimes too slow. #8

bernhardreiter · 2017-05-17T15:38:11Z

When running a query over a full month it may take several minutes (up to 10) on a test-system we are running which has a couple of million events in that timeframe (e.g. 25M events).

One reason is that the SQL query sometimes uses the index and sometimes it does not.

bernhardreiter · 2020-04-22T12:36:39Z

On IntelMQ-CB-Mailgen installations with large databases, we see that the stats calls are slow. Initial calls of PostgreSQL's analyse explain have shown that indices are not used in many cases.

Typical is to first have a time frame, which gives a very large result set and then more criteria, for which no index is used anymore, because the indexes used to far to not have information that can be used to apply a second index on a result set of the first.

Because the other search criteria also matching often, it also does not make sense with many queries to do the other criteria first because the intermediate result set would also be large leading to the same problem.

The idea is to change the structure of the database to allow indexes to be applied after each other. At least the advantage of the time range should be used.

bernhardreiter · 2020-05-05T13:58:07Z

If apache2 is used as https Server, we can add the time it takes to serve a request to the LogFormat: e431153

bernhardreiter · 2020-05-05T14:02:35Z

If apache is used to log stuff, the relevant log files are

/var/log/apache2/fody-backend-access.log
/var/log/apache2/error.log which has the sql queries, if at least INFO is activated as loglevel. And if some request parameters are transfered in the body, this file has to be used to see the details.

Here some commands that can help to count certain requests, here with parameters in the url:

zgrep --only-matching "GET /api/events/search?[^ ]*" fody-backend-access.log*

or only requests not made by the client python-requests:

zgrep --invert-match --count  python-requests fody-backend-access.lo* | sort

bernhardreiter · 2020-05-06T10:20:14Z

Analysis

One problem seems to be that many queries will use the time.observation field as a limiting parameter. As it is possible that events come in freshly with time.observation being well in the past, the database will have to consider all entries. It would be interesting to know how big the typical gap is in a large database. And it would be interesting to know if users could define a limit when it does not matter to them anymore if time.observation is in the past for x days in a freshly incoming event.

This is likely to have an impact as the PostgreSQL CLUSTER can https://www.postgresql.org/docs/current/sql-cluster.html physically reorder a table matching an index. There are also partial indexes.

estimate of table size

As a real select count(*) from events; can be really slow, we can estimate the size by

SELECT reltuples::bigint AS estimate
FROM   pg_class
WHERE  oid = to_regclass('public.events');

understand where time is spend

long queries can be understood better by issuing as statement like the following:

EXPLAIN (ANALYZE, BUFFERS) statement

bernhardreiter · 2020-07-01T15:35:11Z

Fine tuning

Measurements with EXPLAIN and atop showed that there was room for finetuning of postgresql 9.5,
this helps to adapt to the memory size and the type of storage (e.g. fast SSD or something else).

One source of optimisation ideas is https://pgtune.leopard.in.ua .
Another the documentation of PostgreSQL, e.g. https://www.postgresql.org/docs/9.5/runtime-config-query.html#GUC-RANDOM-PAGE-COST and https://www.postgresql.org/docs/9.5/runtime-config-resource.html#GUC-EFFECTIVE-IO-CONCURRENCY.

Looking at logs

The following commands from the https://en.wikipedia.org/wiki/The_Unix_Programming_Environment
may help to look at the fody-logfiles to see how many request are taking how long

zgrep --invert-match  python-requests fody-backend-access.log.2.gz |  \
 sed 's/\(^.*\) \([0-9]*$\)/\2 \1/' | \
 awk '{ if ($1 > 1000) print $0 }' | \
 sort -n

Short explanation: The sed replaces the value of milliseconds to the front of the row, the awk only prints out lines that have more than a second.

bernhardreiter mentioned this issue Apr 22, 2020

output/postgresql allow for faster queries in large databases certtools/intelmq#1516

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events: Stats sometimes too slow. #8

Events: Stats sometimes too slow. #8

bernhardreiter commented May 17, 2017

bernhardreiter commented Apr 22, 2020

bernhardreiter commented May 5, 2020

bernhardreiter commented May 5, 2020 •

edited

Loading

bernhardreiter commented May 6, 2020 •

edited

Loading

bernhardreiter commented Jul 1, 2020

Events: Stats sometimes too slow. #8

Events: Stats sometimes too slow. #8

Comments

bernhardreiter commented May 17, 2017

bernhardreiter commented Apr 22, 2020

bernhardreiter commented May 5, 2020

bernhardreiter commented May 5, 2020 • edited Loading

bernhardreiter commented May 6, 2020 • edited Loading

Analysis

estimate of table size

understand where time is spend

bernhardreiter commented Jul 1, 2020

Fine tuning

Looking at logs

bernhardreiter commented May 5, 2020 •

edited

Loading

bernhardreiter commented May 6, 2020 •

edited

Loading