Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't rely on .sqlite metadata #97

Open
dralley opened this issue Feb 1, 2023 · 7 comments
Open

Don't rely on .sqlite metadata #97

dralley opened this issue Feb 1, 2023 · 7 comments
Assignees

Comments

@dralley
Copy link
Contributor

dralley commented Feb 1, 2023

Currently mdapi is the only consumer of .sqlite metadata. Previously it was used by "yum" on EL7, but yum is perfectly capable of working without it anyway. DNF does not use it. The "repoview" tool used it, but it has been defunct for years, and AFAIK hasn't been packaged in Fedora since Fedora 27.

This blocks the discontinuation of .sqlite metadata https://pagure.io/releng/issue/10745.

The possible approaches:

  • Process the XML directly. This might not be as performant or light on memory as querying the sqlite databases so it may not be a good option.
  • Create the sqlite metadata from the XML metadata using sqliterepo_c if it is not present
  • Always create the sqlite metadata from the XML metadata using sqliterepo_c, this would make the code simpler over the long term as you would expect repos that provide sqlite metadata to gradually become a minority
  • Process the XML ourselves and create a single combined sqlite database from it manually, this might simplify some things (only one file to track) and make new APIs possible but would perhaps be the most work. The only reason not to do this is that sqliterepo_c already exists
@dralley
Copy link
Contributor Author

dralley commented Feb 3, 2023

@t0xic0der I'd give implementing this a shot, but I'm curious if you have an preference on the particular approach to take?

@dralley
Copy link
Contributor Author

dralley commented Feb 15, 2023

@t0xic0der I would like your input please ^^

@gridhead
Copy link
Member

@dralley,

Hi, you have my sincere apologies for getting back to you late.

We would want to really make sure that the API remains as lightweight and fast in operation as possible so any approaches that could potentially slow it down, are something we would not want to take.

I am assigning this issue ticket to you, to begin with. You can find me on the internal slack platform for a more synchronous conversation and we can take the discussion around the details forward there.

@dralley
Copy link
Contributor Author

dralley commented Feb 28, 2023

Cool, I can do that. That rules out option 1, but the latter 3 options should be equivalent or very nearly equivalent to what exists currently.

@gridhead
Copy link
Member

Perfect, I would let you pick which one of the latter 3 approaches would suit the stated conditions properly.

Thank you for taking this up.

@dralley
Copy link
Contributor Author

dralley commented Feb 28, 2023

I will probably go with option 3, in that case. Thanks!

Is the updating of databases a bottleneck at all? That is to say, generating the sqlite metadata locally will add runtime during that step, but as it is separate from the actual queries taking place it may not matter.

@gridhead
Copy link
Member

Fetching those databases is not a bottleneck. Or we ensure that it does not end up becoming one as we run the service for fetching the databases as a periodic job and only when we have confirmed that the database file has been downloaded successfully by verifying the hashes, that we replace the existing database with a new one.

Failing to download a database successfully would only lead to the incompletely downloaded database file being disposed of and the last instance of the successfully fetched database would be retained and read from by the backend. As we are downloading a lot of databases for a bunch of branches, the only time this ends up being a bottleneck is the first time of its execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants