Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to image support #55

Open
werenall opened this issue Apr 2, 2020 · 3 comments
Open

Export to image support #55

werenall opened this issue Apr 2, 2020 · 3 comments

Comments

@werenall
Copy link

werenall commented Apr 2, 2020

First of all - kudos for this library! It proves to be very useful to our project in Magnet.
However we need an export to image functionality that Apache's PDFbox provides. We fought that it would be nice if your library has it as well.

We'd be happy to make a PR with this.

werenall pushed a commit to biotz/pdfboxing that referenced this issue Apr 2, 2020
[Re dotemacs#55]
This commit also changes slightly the prerequisities for split function
Previously it only allowed strings as inputs. IMHO it should also
accept files.
@dotemacs
Copy link
Owner

dotemacs commented Apr 2, 2020

First of all - kudos for this library! It proves to be very useful to our project in Magnet.

Thank you, I'm glad that you're finding it useful.

However we need an export to image functionality that Apache's PDFbox provides.

OK. Is this functionality already present in any of the Java examples here:
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/

I'm asking because I'm trying to understand what exactly are you trying to do: extract images out of a PDF or ...?

We fought that it would be nice if your library has it as well.
We'd be happy to make a PR with this.

OK, but let me understand what you're trying to do first. Then if you're willing to do the work, then that would be great.

@werenall
Copy link
Author

werenall commented Apr 2, 2020

We have pdfs (possibly multi-paged) that we need thumbnails for. In our case, each page gets converted into an image. Something like with Google Drive - they don't display a pdf in the preview. Just an image with its thumbnail.
image

werenall pushed a commit to biotz/pdfboxing that referenced this issue Nov 12, 2020
[Re dotemacs#55]
This commit also changes slightly the prerequisities for split function
Previously it only allowed strings as inputs. IMHO it should also
accept files.
werenall pushed a commit to biotz/pdfboxing that referenced this issue Nov 12, 2020
[Re dotemacs#55]
This commit also changes slightly the prerequisities for split function
Previously it only allowed strings as inputs. IMHO it should also
accept files.
werenall pushed a commit to biotz/pdfboxing that referenced this issue Nov 12, 2020
[Re dotemacs#55]
This commit also changes slightly the prerequisities for split function
Previously it only allowed strings as inputs. IMHO it should also
accept files.
@avocade
Copy link

avocade commented Mar 11, 2024

We have a use case where we want to extract all images from the entire document so we can then do ML on each image. Extracting the text is done separately. PDFBox looks like the right tool for it:

https://docs.aspose.com/pdf/java/extract-images-from-pdf-file/

Similar use case with the nodeJS pdf-lib (the extract-images.zip example which seems to work well):
Hopding/pdf-lib#83 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants