Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to help #1

Open
azu opened this issue Nov 20, 2017 · 9 comments
Open

I want to help #1

azu opened this issue Nov 20, 2017 · 9 comments

Comments

@azu
Copy link

azu commented Nov 20, 2017

Hi, I'm interesting in asciidoc parser and textlint.
Because, I am owner of textlint and I've written a book in asciidoctor.
But, I not have domain knowledge about Asciidoc/Asciidoctor.
Previously, I've tried to create textlint-plugin-asciidoc-loose, but it is failure.

Is there anything that I can help with?

@mojavelinux
Copy link
Member

I'd love your help! I plan on making this a fully-compliant AsciiDoc parser that is geared exclusively for validation. After studying validation for AsciiDoc, I've come to realize that it doesn't really make sense to have the same parser for conversion and validation because they have very different goals and needs. Therefore, it makes sense to develop a full parser dedicated for validation. textlint is a perfect fit.

What I need the most is assistance with the mapping of the model. As I began to work on this plugin, I realized I needed a more complete model than what textlint was providing by default (or providing for Markdown). If we could define a more complete model, then I can map the parser to that model directly instead of creating one just for this plugin.

I don't have a complete list of hand, but here are some of the nodes I know I'll need:

  • TitleNode
  • SectionNode
  • DelimitedBlockNode
  • ParagraphNode
  • AttributeEntryNode
  • DocumentNode
  • HeaderNode
  • LineNode

There may be others.

Some of these already exist in textlint. But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that. Let's think about the model that we want. I can handle the parser part as I've already written a parser for AsciiDoc in Asciidoctor.

@mojavelinux
Copy link
Member

...and the reason this really matters is that if I don't use the model in textlint, then existing plugins won't work with parsed AsciiDoc documents. I'd really like to be able to tap into the existing plugin ecosystem.

@azu
Copy link
Author

azu commented Nov 24, 2017

Thanks for reply!

it makes sense to develop a full parser dedicated for validation

I agree.
textlint's built-in markdown plugin use markdown-to-ast that is subset of remark.

textlint require superset or subsest of Textlint AST.
If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.

In my experience, Maybe we should get minimal steps.

  1. Parse any asciidoc/asciidoctor document without error
  • For example, markdown plugin has stress test using fixtures.
  • For example, current implementation throw error for macro
  1. Add missing nodes
    • TitleNode, SectionNode ...

But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that.

OK

@ntgussoni
Copy link

Hello @mojavelinux @azu I wonder if further development on this subject has been made.
I'm working really closely to a TW team and have been thinking about linting for asciidoctor.

How can I help?

@ntgussoni
Copy link

I somehow always end up coming back to this "issue". @mojavelinux what do you think is missing from the example parser you were working on. Im more than happy to continue working on it, I believe a full-blown AST could lead to great tools.

@ggrossetie
Copy link
Contributor

ggrossetie commented Jan 1, 2021

In my experience, Maybe we should get minimal steps.
Parse any asciidoc/asciidoctor document without error
For example, markdown plugin has stress test using fixtures.
For example, current implementation throw error for macro
Add missing nodes
TitleNode, SectionNode ...

@azu Sounds good!

textlint require superset or subsest of Textlint AST.
If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.

I noticed that Strong, Emphasis and Monospaced (Inline code) don't have a value field. I think it would be necessary if we want to support both Markdown and AsciiDoc.

Let's take a concrete example:

Markdown
# This *Is a* Title
node {
  type: 'Header',
  depth: 1,
  children: [
    {
      type: 'Str',
      value: 'This ',
      loc: [Object],
      range: [Array],
      raw: 'This '
    },
    {
      type: 'Emphasis',
      children: [Array],
      loc: [Object],
      range: [Array],
      raw: '*Is a*'
    },
    {
      type: 'Str',
      value: ' Title',
      loc: [Object],
      range: [Array],
      raw: ' Title'
    }
  ],
  loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
  range: [ 0, 19 ],
  raw: '# This *Is a* Title'
}
AsciiDoc

And here's the same document in AsciiDoc

= This _Is a_ Title
node {
  type: 'Header',
  depth: 1,
  children: [
    {
      type: 'Str',
      value: 'This ',
      loc: [Object],
      range: [Array],
      raw: 'This '
    },
    {
      type: 'Emphasis',
      children: [Array],
      loc: [Object],
      range: [Array],
      raw: '_Is a_'
    },
    {
      type: 'Str',
      value: ' Title',
      loc: [Object],
      range: [Array],
      raw: ' Title'
    }
  ],
  loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
  range: [ 0, 19 ],
  raw: '= This _Is a_ Title'
}

As you can see, it would be beneficial to have a value field on the Emphasis otherwise the AST is not markup-agnostic (i.e., we cannot extract the text from the markup without knowing the markup language).

@azu Should I open an issue at https://github.com/textlint/textlint?

@azu
Copy link
Author

azu commented Jan 2, 2021

I always use textlint-util-to-string for extracting text content from TxtParentNode.
It picks each children nodes's value and joins these.
(It aims to pick values that are displayed as rendering result)

it would be beneficial to have a value field on the Emphasis

Basic TxtAST is based on remark.
Emphasis node of remark's AST(mdast) has not value property.
But, I do not know the reason…

**__1__** may be the reason.

I agree that Universal AST like TxtAST has ambiguous.

@ggrossetie
Copy link
Contributor

ggrossetie commented Jan 2, 2021

Emphasis node of remark's AST(mdast) has not value property.

My bad I didn't see that the Markdown to TxtAST plugin is using a Str child (as described in: https://github.com/syntax-tree/mdast#emphasis).
So the value is effectively available on the Str child:

node {
  type: 'Emphasis',
  loc: [Object],
  range: [Array],
  raw: '_Is a_',
  children: [{
    type: 'Str',
    value: 'Is a',
    loc: [Object],
    range: [Array],
    raw: 'Is a'
  }],
}

I will update the AST produced by the AsciiDoc plugin.

@mojavelinux
Copy link
Member

The reason this project is stalled is because we don't yet have a clear definition of the formal grammar for AsciiDoc. That is something that the AsciiDoc Language project is working on. Once we have those rules nailed down, we can implement them in a lint project like this one. As I have said elsewhere, I don't think an official parser for AsciiDoc is going to be able to do all the things a linter will need to do (since the parser is focused on parsing a valid AsciiDoc document). However, the two tools will still need to be working off the same playbook, so to speak. That's what the formal grammar part of the specification will provide (and it's no small task).

Building off of work started by Guillaume, I have developed a prototype of an AsciiDoc parser for the formal grammar we are developing as part of the AsciiDoc Language. It's not yet complete, but can handle a good bulk of the syntax already. You can find it here: https://github.com/opendevise/asciidoc-parsing-lab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants