Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Python > 3.8 #39

Open
harshkhandeparkar opened this issue Sep 10, 2023 · 5 comments
Open

Support for Python > 3.8 #39

harshkhandeparkar opened this issue Sep 10, 2023 · 5 comments

Comments

@harshkhandeparkar
Copy link
Member

The PDF library used to read timetables, camelot-py, only supports Python versions 3.6, 3.7, and 3.8. Support for Python 3.10+ would be mandatory in a year since 3.8 will stop receiving security updates in Oct 2024.

Possible solutions:

  • Use an alternate PDF library
  • Fork and update the current library
@anuraganand92
Copy link

I'm working with to use tabula-py as the alternate library to fix this and pandas to export as excel

@harshkhandeparkar
Copy link
Member Author

harshkhandeparkar commented Sep 30, 2023

@shikharish

@shikharish
Copy link
Member

@anuraganand92
Go ahead!
And do share your progress. How is tabula working on the pdf?

@anuraganand92
Copy link

I discarded tabula as it wasn't that good or fast enough for parsing, I tried pdfplumber which was similar to camelot.
I attempted it on test.pdf, but i am not sure if the parsing format in test.xls is the correct one, because some cells have multiple entries or different arrangement of entries
test.xlsx

@shikharish
Copy link
Member

shikharish commented Oct 1, 2023

Yes, I myself tried tabula, pdfplumber and a few others. None of them were as good as camelot.
If we can't find an alternative, forking and updating camelot seems like the only option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants