Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] add regex pattern filter #33

Open
sify21 opened this issue Feb 1, 2024 · 0 comments
Open

[feature request] add regex pattern filter #33

sify21 opened this issue Feb 1, 2024 · 0 comments

Comments

@sify21
Copy link

sify21 commented Feb 1, 2024

I have this pdf file: https://docs.ton.org/ton.pdf
I used following recipe to create a toc:

[[heading]]
# TON Blockchain
level = 1
greedy = true
font.name = "F102"
font.size = 17.21540069580078
# font.size_tolerance = 1e-5
# font.color = 0x000000
# font.superscript = false
# font.italic = false
# font.serif = false
# font.monospace = false
# font.bold = false
# bbox.left = 138.70851135253906
# bbox.top = 127.66803741455078
# bbox.right = 274.1837158203125
# bbox.bottom = 144.88343811035156
# bbox.tolerance = 1e-5
[[heading]]
# TON Blockchain as a Collection of 2-Blockchains
level = 2
greedy = true
font.name = "F108"
font.size = 14.346199989318848
# font.size_tolerance = 1e-5
# font.color = 0x000000
# font.superscript = false
# font.italic = false
# font.serif = false
# font.monospace = false
# font.bold = false
# bbox.left = 146.76255798339844
# bbox.top = 291.47509765625
# bbox.right = 486.075927734375
# bbox.bottom = 305.8212890625
# bbox.tolerance = 1e-5
[[heading]]
# 2.1.1. List of blockchain types.
level = 3
greedy = false
font.name = "F104"
font.size = 11.9552001953125
# font.size_tolerance = 1e-5
# font.color = 0x000000
# font.superscript = false
# font.italic = false
# font.serif = false
# font.monospace = false
# font.bold = false
# bbox.left = 110.85400390625
# bbox.top = 395.5226745605469
# bbox.right = 289.56573486328125
# bbox.bottom = 407.52569580078125
# bbox.tolerance = 1e-5

The problem is that level 3 would contain many wrong outputs, for example:

"1 Brief Description of TON Components" 3
        "2 2.1.17 2.4.20" 3
        "3" 3
        "4.1.7" 3
        "4.1.10 3.1.6" 3
        "3.2 3.2.10 3.2.14 3.2.12" 3
        "4 4.3.14 4.3.17 3.2.12 4.1.6" 4
        "4.3.1" 4
        "5" 4
        "4.3.23" 4
        "2.9.13 4.1" 4
"2 TON Blockchain" 5
    "2.1 TON Blockchain as a Collection of 2-Blockchains" 5
        "2.1.17" 5
        "2.1.1. List of blockchain types." 5
        "2.8.8 2.9.7 2.9.8" 5
        "2.8.12 2.8.8" 6
        "2.1.17" 6
        "2.1.2. Innite Sharding Paradigm." 6
        "2.1.3. Messages. Instant Hypercube Routing. 2.4.2 2.4.20" 7
        "2.1.4. Quantity of masterchains, workchains and shardchains." 7

The correct ones all share the same pattern: "\d+\.\d+\.\d+\.. Currently I can delete wrong level 3 lines in vim using this command

:'<,'>g!/"\d\+\.\d\+\.\d\+\./d

But it's better to have a regex pattern matching filter. The filter should be able to:

  • exclude an output that doesn't match a regex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant