Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters to specify number of columns and also each column's area #110

Open
premadh opened this issue Sep 17, 2019 · 1 comment
Open

Comments

@premadh
Copy link

premadh commented Sep 17, 2019

This is a suggested code or documentation change, improvement to the code, or feature request

The package is great works in most conditions (many thanks for this) but also makes lazy that I don't want to wrangle misread pdf pages. Hence, I'd like to request below.

Provide a parameter/method to specify the number of columns; start and end co-ordinates of each column so that table is extracted properly. For some pdfs, I have found that columns are misaligned.

@pm321
Copy link

pm321 commented Dec 19, 2019

I would also support this improvement using the 'columns' argument in extract_tables does not always work well where some columns are populated with blank values for the initial rows.
One approach would be to use the area function but applied to each column on a pdf page.
Hope this enhancement can be incorporated to what is a really useful and effective package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants