-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to remove headers and footers permanently)? #52
Comments
I'm hesitant to suggest a way to permanently remove margins, because if people want to use it for redaction they may end up being surprised. You mentioned |
Points are the standard unit of PDF files, 1 point = 1/72 inch. The percentage values take a percentage of the existing margins, for example if the existing margin is 100 points then 50% would reduce it to 50 points. |
Thanks. I'll keep looking at a way to remove stuff I need permanently removed, either through changing the mediabox or redaction annotations. |
Has a solution been implemented for this feature? It is badly needed. The current workaround I use is saving the pdfs as image only. And then performing ocr and saving it again with ABBY. Could this be used to auto detect and use as reference to crop? Excluding the Header and Footer Contents of a page of a PDF file while extracting text? [D] Data cleaning techniques for PDF documents with semantically meaningful parts Perhaps these also for ideas Convert PDFs to Audiobooks with Machine Learning |
The work-around I found is 1) finding the coords with SumatraPDF (hit the "m" key to see the coordinates), and 2) running a Python script to add and delete redaction annotations. |
All the current processing of PDF files is done with the PyMuPdf program. If there is a way to do this with that program then I would consider adding an option. I'm not entirely clear what your exact use-case is. You want to remove the actual PDF content that is rendered outside a selected box, without turning the document into a rendered-image or scanned-style document? Does this need to be secure data destruction, such as for legal documents, etc.? |
Thanks for the reply and sorry for the late return. The use case is to process many different books, articles, plays etc. Ideally, doing a batch process as this example:
The end use would be to then process as text to speech or to port to audio format. My problem doing it with ABBYY is:
Thanks again for your suggestions! |
Hello,
I don't know much about PDF, and am confused about *box (mediabox, cropbox, etc.) and the units used in *box and pdfCropMargins (pt vs. %).
What would be the right way to permanently — not just for viewing: The data must no longer be in the output file — remove the headers and footers on most pages of a PDF, while leaving some pages untouched (eg. the first page of each chapter)?
Thank you.
The text was updated successfully, but these errors were encountered: