Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when parsing tag attributes starting with @ #177

Open
jvkerckh opened this issue Sep 7, 2022 · 1 comment
Open

Error when parsing tag attributes starting with @ #177

jvkerckh opened this issue Sep 7, 2022 · 1 comment

Comments

@jvkerckh
Copy link

jvkerckh commented Sep 7, 2022

Greetings,

for my current project I need to be able to parse attributes that can start with @. However, parsehtml throws up a warning and doesn't parse the attribute at all while parsexml outright throws an error.

Example:

julia> htmlsnip = "<p @foo=\"bar\">content</p>"
"<p @foo=\"bar\">content</p>"

Using parsehtml:

julia> htmlsnip |> parsehtml
┌ Warning: XMLError: error parsing attribute name from HTML parser (code: 68, line: 1)
└ @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:95
EzXML.Document(EzXML.Node(<HTML_DOCUMENT_NODE@0x0000000005cb8ad0>))

Printing the result shows the attribute is not parsed:

julia> htmlsnip |> parsehtml |> prettyprint
┌ Warning: XMLError: error parsing attribute name from HTML parser (code: 68, line: 1)
└ @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:95
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <body>
    <p>content</p>
  </body>
</html>

Using parsexml:

julia> htmlsnip |> parsexml
┌ Warning: caught 4 errors; showing the first one
└ @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:79
ERROR: XMLError: error parsing attribute name from XML parser (code: 68, line: 1)
Stacktrace:
 [1] throw_xml_error()
   @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:87
 [2] macro expansion
   @ ~/.julia/packages/EzXML/ZNwhK/src/error.jl:52 [inlined]
 [3] parsexml(xmlstring::String)
   @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/document.jl:80
 [4] |>(x::String, f::typeof(parsexml))
   @ Base ./operators.jl:911
 [5] top-level scope
   @ REPL[77]:1

I'm using Julia v1.8.0 and EzXML v1.1.0, with no other packages in the environment.

@hhaensel
Copy link

I had the same problem today. As I always traverse the whole document I could mask the '@' char and replace it afterwards. Depending on your goal you could do something similar.

function parse_vue_html(html)
  doc_string = replace(html, "@"=>"__vue-on__")
  empty!(EzXML.XML_GLOBAL_ERROR_STACK)
  doc = Logging.with_logger(Logging.SimpleLogger(stdout, Logging.Error)) do
    EzXML.parsehtml(doc_string).root
  end
  # remove the html -> body levels
  replace(parse_elem(first(eachelement(first(eachelement(doc))))), "__vue-on__" => "@")
end

Note that the parser parse_elem() replaces the instances of __vue-on__ that occur as attribute name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants