Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Events: Stats sometimes too slow. #8

Open
bernhardreiter opened this issue May 17, 2017 · 5 comments
Open

Events: Stats sometimes too slow. #8

bernhardreiter opened this issue May 17, 2017 · 5 comments

Comments

@bernhardreiter
Copy link
Member

When running a query over a full month it may take several minutes (up to 10) on a test-system we are running which has a couple of million events in that timeframe (e.g. 25M events).

One reason is that the SQL query sometimes uses the index and sometimes it does not.

@bernhardreiter
Copy link
Member Author

On IntelMQ-CB-Mailgen installations with large databases, we see that the stats calls are slow. Initial calls of PostgreSQL's analyse explain have shown that indices are not used in many cases.

Typical is to first have a time frame, which gives a very large result set and then more criteria, for which no index is used anymore, because the indexes used to far to not have information that can be used to apply a second index on a result set of the first.

Because the other search criteria also matching often, it also does not make sense with many queries to do the other criteria first because the intermediate result set would also be large leading to the same problem.

The idea is to change the structure of the database to allow indexes to be applied after each other. At least the advantage of the time range should be used.

@bernhardreiter
Copy link
Member Author

If apache2 is used as https Server, we can add the time it takes to serve a request to the LogFormat: e431153

@bernhardreiter
Copy link
Member Author

bernhardreiter commented May 5, 2020

If apache is used to log stuff, the relevant log files are

  • /var/log/apache2/fody-backend-access.log
  • /var/log/apache2/error.log which has the sql queries, if at least INFO is activated as loglevel. And if some request parameters are transfered in the body, this file has to be used to see the details.

Here some commands that can help to count certain requests, here with parameters in the url:

zgrep --only-matching "GET /api/events/search?[^ ]*" fody-backend-access.log*

or only requests not made by the client python-requests:

zgrep --invert-match --count  python-requests fody-backend-access.lo* | sort

@bernhardreiter
Copy link
Member Author

bernhardreiter commented May 6, 2020

Analysis

One problem seems to be that many queries will use the time.observation field as a limiting parameter. As it is possible that events come in freshly with time.observation being well in the past, the database will have to consider all entries. It would be interesting to know how big the typical gap is in a large database. And it would be interesting to know if users could define a limit when it does not matter to them anymore if time.observation is in the past for x days in a freshly incoming event.

This is likely to have an impact as the PostgreSQL CLUSTER can https://www.postgresql.org/docs/current/sql-cluster.html physically reorder a table matching an index. There are also partial indexes.

estimate of table size

As a real select count(*) from events; can be really slow, we can estimate the size by

SELECT reltuples::bigint AS estimate
FROM   pg_class
WHERE  oid = to_regclass('public.events');

understand where time is spend

long queries can be understood better by issuing as statement like the following:

EXPLAIN (ANALYZE, BUFFERS) statement

@bernhardreiter
Copy link
Member Author

Fine tuning

Measurements with EXPLAIN and atop showed that there was room for finetuning of postgresql 9.5,
this helps to adapt to the memory size and the type of storage (e.g. fast SSD or something else).

One source of optimisation ideas is https://pgtune.leopard.in.ua .
Another the documentation of PostgreSQL, e.g. https://www.postgresql.org/docs/9.5/runtime-config-query.html#GUC-RANDOM-PAGE-COST and https://www.postgresql.org/docs/9.5/runtime-config-resource.html#GUC-EFFECTIVE-IO-CONCURRENCY.

Looking at logs

The following commands from the https://en.wikipedia.org/wiki/The_Unix_Programming_Environment
may help to look at the fody-logfiles to see how many request are taking how long

zgrep --invert-match  python-requests fody-backend-access.log.2.gz |  \
 sed 's/\(^.*\) \([0-9]*$\)/\2 \1/' | \
 awk '{ if ($1 > 1000) print $0 }' | \
 sort -n

Short explanation: The sed replaces the value of milliseconds to the front of the row, the awk only prints out lines that have more than a second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant