Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct memory leaks when reading parquet files containing interleaving plain/dictionary pages #11533

Open
3 tasks
CodingJun opened this issue Nov 13, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@CodingJun
Copy link

Apache Iceberg version

1.5.2

Query engine

Spark

Please describe the bug 🐞

When I used spark to continuously rewrite the data file, I found a direct memory leak. The reason is that if parquet file containing interleaving plain/dictionary encoding in different RowGroups, the old Vector cannot be released normally.

I found a pull request that mentioned this issue, but it has not been merged.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@CodingJun CodingJun added the bug Something isn't working label Nov 13, 2024
@Fokko
Copy link
Contributor

Fokko commented Nov 13, 2024

@CodingJun Are you able to check if the problem persists in 1.7.0?

@CodingJun
Copy link
Author

@CodingJun Are you able to check if the problem persists in 1.7.0?

Yes, I have checked, and there is still a problem in 1.7.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants