Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meaning of input_binary #95

Open
bertsky opened this issue Feb 16, 2023 · 7 comments
Open

meaning of input_binary #95

bertsky opened this issue Feb 16, 2023 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@bertsky
Copy link
Contributor

bertsky commented Feb 16, 2023

The only documentation for this kwarg is in the standalone CLI:

in general, eynollah uses RGB as input but if the input document is strongly dark, bright or for any other reason you can turn binarized input on. This option does not mean that you have to provide a binary image, otherwise this means that the tool itself will binarized the RGB input document

I find that second sentence very confusing (esp. around otherwise).

So this means that binarization is attempted internally (when activated)? What steps of the pipeline are affected?

(Also, implementation-wise, it looks like binarization is repeated multiple times, without re-using the previous result...)

Can anything be said about how pretrained models would fare when passed (externally) binarized images?

@cneud cneud added the documentation Improvements or additions to documentation label Mar 31, 2023
@cneud
Copy link
Member

cneud commented Apr 12, 2023

As far as I understand (and please @vahidrezanezhad correct me), Eynollah will almost always produce a better result from a grayscale or color image than from a binarized image.

However, if the input image is "strongly dark or bright" (and this needs a bit more explanation), the user may try to get a better result by setting "input_binary" to true. In this case, Eynollah itself will binarize the image and the user does not have to worry about having to binarize the image with another tool. (Note: I would like to fully integrate sbb_binarization for this)

I find that second sentence very confusing (esp. around otherwise).

Agreed, we will try and reformulate this for better clarity.

What steps of the pipeline are affected?

@vahidrezanezhad should be able to answer this.

it looks like binarization is repeated multiple times, without re-using the previous result

This we will also check wrt to performance.

Can anything be said about how pretrained models would fare when passed (externally) binarized images?

The only thing I can say is that it would be an interesting experiment to evaluate this :) But I am afraid it will require a lot of effort to do this properly (per step, with different binarization methods/models and good metrics for OCR and layout) and only be relevant for few images with bad quality.

@bertsky
Copy link
Contributor Author

bertsky commented Apr 12, 2023

Ok, then (besides reformulation of the description) I highly recommend renaming that option, e.g. apply_binarization: after all, it's not the input that must/can be binary, but the internal step that is performed.

@bertsky
Copy link
Contributor Author

bertsky commented Apr 12, 2023

Integrating sbb_binarization / experimenting with external tools: the OCR-D way would be to just use whatever derived images with binarized in @comments can be found, i.e. whatever binarization has been on the workflow. So whether it is sbb_binarization or any other tool – it would be up to the user to decide and experiment. (But if the internal binarizer here is different than sbb_binarize and perhaps better, then it gets more complicated...)

@cneud
Copy link
Member

cneud commented Apr 12, 2023

Let me first confirm the above and then we can rename the option, ideally also consistent for scaling, enhancing, resizing.

@vahidrezanezhad
Copy link
Member

As far as I understand (and please @vahidrezanezhad correct me), Eynollah will almost always produce a better result from a grayscale or color image than from a binarized image.

This is exactly the case. Our best performance can be met from a grayscale or color image.

@vahidrezanezhad
Copy link
Member

(Also, implementation-wise, it looks like binarization is repeated multiple times, without re-using the previous result...)

I will check it. By the way it should not be implemented multiple times.

@vahidrezanezhad
Copy link
Member

Integrating sbb_binarization / experimenting with external tools: the OCR-D way would be to just use whatever derived images with binarized in @comments can be found, i.e. whatever binarization has been on the workflow. So whether it is sbb_binarization or any other tool – it would be up to the user to decide and experiment. (But if the internal binarizer here is different than sbb_binarize and perhaps better, then it gets more complicated...)

The internal binarizer uses the same models as sbb_binarization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants