You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the text in pdf list on the same line does not on the same line after extract to text.
for example:
6. Miu to kekkon shitai desu.
becomes
6.
Miu to kekkon shitai desu.
Expected Behavior
text extraction for list works OK, the result should be:
6. Miu to kekkon shitai desu.
7. TENDŌ RAIZŌ: Ore.... Miu .... (Fun)
8. ŌZOYA HARUYA: A... sumimasen. Boku wa Miu-san to kekkon shitai desu.
9. O, o-jō-san o kudasai.
10. TENDŌ RAIZŌ: Kimi wa... Nani ga hoshii? Kane ka? Ie ka? A?
11. Soretomo uchi no kaisha ga hoshii no ka?
12. TENŌ MIU: Papa!
the second list result in:
ENGLISH
1. ŌZORA HARUYA: (Tendō Family)Ah, nice to meet you. I am Ōzora Haruya.
2. TENZO RAIZO: Huh?
3. ŌZORAHARUYA: ah, um...sir, please give me your daughter.
4. TENDOU RAIZO: Huh?
5. ŌZORA HARUYA: I...I want to be with Miu forever.
6. I want to marry Miu.
7. TENDO RAIZO: I? Miu? (harrumph)
8. ŌZORA HARUYA: I...I'm sorry. I want to marry Miu.
Actual Behavior
the list extraction is buggy
for example:
the first list extracted result in
6.
Miu to kekkon shitai desu.
7. TENDŌ RAIZŌ: Ore.... Miu .... (Fun)
8. ŌZOYA HARUYA: A... sumimasen. Boku wa Miu-san to kekkon shitai desu.
9.
10.
11.
12.
O, o-jō-san o kudasai.
TENDŌ RAIZŌ: Kimi wa... Nani ga hoshii? Kane ka? Ie ka? A?
Soretomo uchi no kaisha ga hoshii no ka?
TENŌ MIU: Papa!
the second list result in:
ENGLISH
1. ŌZORA HARUYA:
2. TENZO RAIZO:
3. ŌZORAHARUYA:
4. TENDOU RAIZO:
5. ŌZORA HARUYA:
6. I want to marry Miu.
7. TENDO RAIZO:
8. ŌZORA HARUYA: (Tendō Family)Ah, nice to meet you. I am Ōzora Haruya.
Huh?
ah, um...sir, please give me your daughter.
Huh?
I...I want to be with Miu forever.
I? Miu? (harrumph)
I...I'm sorry. I want to marry Miu.
Attachments
Include a self-contained reproducible code snippet and PDF file that demonstrates the issue. B_S4L4_p4_github.pdf
Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/
Description
the text in pdf list on the same line does not on the same line after extract to text.
for example:
becomes
Expected Behavior
text extraction for list works OK, the result should be:
the second list result in:
Actual Behavior
the list extraction is buggy
for example:
the first list extracted result in
the second list result in:
Attachments
Include a self-contained reproducible code snippet and PDF file that demonstrates the issue.
B_S4L4_p4_github.pdf
The text was updated successfully, but these errors were encountered: