Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the output is not escaped on error #572

Open
yegor256 opened this issue Nov 28, 2024 · 19 comments
Open

the output is not escaped on error #572

yegor256 opened this issue Nov 28, 2024 · 19 comments

Comments

@yegor256
Copy link
Member

yegor256 commented Nov 28, 2024

I see this in the log:

$ eo-phi-normalizer rewrite --rules 0.yml Foo.phi --single -o Bar.phi
eo-phi-normalizer: syntax error at line 1, column 1 due to lexer error
on input
?.org.eolang.bytes ( ?0 ? ? ? ? 00-00-00-00-00-00-1E-61 ? )

Here, I don't understand whether the problem is with the encoding or the input was indeed formatted as ?0 instead of α0. I suggest you to "escape" non-ASCII symbols in the output. Instead of printing UTF-8 as is, convert them to something like \u045e.

Maybe you can say on input (non-ASCII symbols escaped) instead of just on input.

@yegor256
Copy link
Member Author

@deemp please, help

@deemp
Copy link
Member

deemp commented Nov 28, 2024

@yegor256, run export LC_ALL=C.UTF-8 before running this command.

@yegor256
Copy link
Member Author

@deemp yes, we know the workaround, but please make the output escaped :)

@deemp
Copy link
Member

deemp commented Nov 28, 2024

@yegor256,

  1. Does normalizer render Unicode correctly in error messages with export ...?
  2. Does normalizer render Unicode correctly in normal output without export ...? If it doesn't, then export ... is not a workaround, but a necessity. We can write it explicitly on command pages on the docs site.

@yegor256
Copy link
Member Author

@deemp yes, it works with the export, but I kindly ask you to implement this escaping feature because it will help users debug much faster

@deemp
Copy link
Member

deemp commented Nov 29, 2024

@yegor256, can you suggest how to distinguish when to print Unicode and when to escape?

I thought about:

  • Checking the LANG environment variable, but the locale may be set in other ways on different platforms.
  • Adding an option like --use-unicode-code-points that would always output unicode.

@yegor256
Copy link
Member Author

@deemp just escape always, when you print this error message. Why not to escape? It's an error message, it won't be parsed by any software, it will always be read by humans. Replace all 0x7f+ symbols with their mnemos, that's it.

@deemp
Copy link
Member

deemp commented Nov 29, 2024

@yegor256, it's inconvenient to read numbers when you can read Unicode characters. If the locale is set correctly, users may prefer to see Unicode.

@yegor256
Copy link
Member Author

@deemp I'm the primary user of this app :) I'm telling you, as a user, that error messages must be as non-ambiguous as possible. Unicode is more ambiguous than ASCII.

@deemp
Copy link
Member

deemp commented Nov 29, 2024

I'm the primary user of this app

@yegor256, OK, I'll keep that in mind :) Let's escape.

@deemp
Copy link
Member

deemp commented Nov 29, 2024

@yegor256, here are representations of errors.

  1. With escaping:

    syntax error at line 1, column 1 before `\961'
    on input
    \961 \8614 \10214 t \8614 \958.\961.k.\961.t
    
  2. With correctly set locale and without escaping:

    syntax error at line 1, column 1 before `ρ'
    on input
    ρ ↦ ⟦ t ↦ ξ.ρ.k.ρ.t
    

Do you really prefer the option with escaping?

@deemp deemp linked a pull request Nov 29, 2024 that will close this issue
@yegor256
Copy link
Member Author

@deemp can you do both? show the original one and then print the escaped one?

@deemp
Copy link
Member

deemp commented Nov 29, 2024

the original one

@yegor256, which one do you mean?

@yegor256
Copy link
Member Author

@deemp how many do you have? :) print them both

@deemp
Copy link
Member

deemp commented Nov 29, 2024

@yegor256, see #572 (comment)

@yegor256
Copy link
Member Author

@deemp please, print both outputs in case of error: 1) not escaped, and 2) escaped

@deemp
Copy link
Member

deemp commented Dec 2, 2024

@yegor256

Platform: Linux

Input program: ξ.a.b(c ↦ ⟦ Δ ⤍ 3F-FC ⟧)

  1. Not escaped, export LANG=en_US.UTF-8:

    eo-phi-normalizer: An error occurred when parsing the input program:
    syntax error at line 1, column 1 before `'
    on the input:
    .a.b(c     3F-FC )
    
  2. Not escaped, export LANG=C.UTF-8:

    eo-phi-normalizer: An error occurred when parsing the input program:
    syntax error at line 1, column 1 before `ξ'
    on the input:
    ξ.a.b(c ↦ ⟦ Δ ⤍ 3F-FC ⟧)
    
  3. Non-ASCII escaped (Unicode characters replaced with their numbers), LANG doesn't matter:

    eo-phi-normalizer: An error occurred when parsing the input program:
    syntax error at line 1, column 1 before `\961'
    on the input:
    \961 \8614 \10214 t \8614 \958.\961.k.\961.t
    

@yegor256
Copy link
Member Author

yegor256 commented Dec 3, 2024

@deemp the first option is not "escaping" but "removing" :) please, use option two and option three together

@deemp deemp removed a link to a pull request Dec 3, 2024
@deemp
Copy link
Member

deemp commented Dec 3, 2024

@yegor256, I've implemented in #590 a way to always use the option 2 despite the locale. The eo-phi-normalizer will set the locale on its own as well as the code page on Windows.

We'll soon make a release where this functionality is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants