-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text extraction hangs on MacOS 10.14 #14
Comments
I have the same issue. Did you find a solution to this? |
@adarsa Not really no. I ended up abandoning pdfbox altogether, and used tesseract to extract text instead. |
Does this occur with all PDFs, or only with some? If the latter, can you attach it to this issue? |
@lebedov I haven't had the chance to try it on other PDFs, but as for the file I am using in the screenshot, it is this one. |
I can't reproduce the hanging problem with the input PDF file you mentioned on Ubuntu Linux 18.0.4 with Python 3.7.3 and OpenJDK 11.0.4. I suspect some sort of platform-specific jpype weirdness, but I unfortunately don't have a MacOS box to debug this. I'll leave the issue open for the time being in case anyone who can investigate further has further input. |
I had this issue with all pdf's I tried. |
+1 |
1 similar comment
+1 |
I finally obtained access to a MacOS box. I can't reproduce the problem with Python 3.8.5, OpenJDK 14.0.2, and python-pdfbox 0.1.8 on MacOS 10.15.6; processing the indicated file succeeds without any error. |
I had the same issue also on macOS Mojave and this Java JDK version: I installed the openJDK 15 from here and that fixed the issue. |
@peterHeuz Given that more than person has encountered the issue on MacOS, I added a note to the package README. |
I am trying to use
pdfbox
, with this vanilla snippet:But it becomes stuck. I debugged the stack tree, and it hangs at this line:
I confirmed that a Java process is spawned:
But it is just stuck there.
Running the cached jar by
python-pdfbox
in the terminal works:So I am no longer sure what's going on. Thoughts?
Environment
Python
python-pdfbox = "==0.1.7"
python_version = "3.7"
Java
openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-20190711112007.graal.jdk8u-src-tar-gz-b08)
OpenJDK 64-Bit GraalVM CE 19.2.0 (build 25.222-b08-jvmci-19.2-b02, mixed mode)
OS
macOS Mojave 10.14.4
The text was updated successfully, but these errors were encountered: