I want to help #1

azu · 2017-11-20T11:04:40Z

Hi, I'm interesting in asciidoc parser and textlint.
Because, I am owner of textlint and I've written a book in asciidoctor.
But, I not have domain knowledge about Asciidoc/Asciidoctor.
Previously, I've tried to create textlint-plugin-asciidoc-loose, but it is failure.

Is there anything that I can help with?

mojavelinux · 2017-11-23T04:07:46Z

I'd love your help! I plan on making this a fully-compliant AsciiDoc parser that is geared exclusively for validation. After studying validation for AsciiDoc, I've come to realize that it doesn't really make sense to have the same parser for conversion and validation because they have very different goals and needs. Therefore, it makes sense to develop a full parser dedicated for validation. textlint is a perfect fit.

What I need the most is assistance with the mapping of the model. As I began to work on this plugin, I realized I needed a more complete model than what textlint was providing by default (or providing for Markdown). If we could define a more complete model, then I can map the parser to that model directly instead of creating one just for this plugin.

I don't have a complete list of hand, but here are some of the nodes I know I'll need:

TitleNode
SectionNode
DelimitedBlockNode
ParagraphNode
AttributeEntryNode
DocumentNode
HeaderNode
LineNode

There may be others.

Some of these already exist in textlint. But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that. Let's think about the model that we want. I can handle the parser part as I've already written a parser for AsciiDoc in Asciidoctor.

mojavelinux · 2017-11-23T04:10:41Z

...and the reason this really matters is that if I don't use the model in textlint, then existing plugins won't work with parsed AsciiDoc documents. I'd really like to be able to tap into the existing plugin ecosystem.

azu · 2017-11-24T09:02:48Z

Thanks for reply!

it makes sense to develop a full parser dedicated for validation

I agree.
textlint's built-in markdown plugin use markdown-to-ast that is subset of remark.

textlint require superset or subsest of Textlint AST.
If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.

In my experience, Maybe we should get minimal steps.

Parse any asciidoc/asciidoctor document without error

For example, markdown plugin has stress test using fixtures.
For example, current implementation throw error for macro

Add missing nodes
- TitleNode, SectionNode ...

But I found that they weren't quite flexible enough to hold the data for AsciiDoc. Perhaps we can resolve that.

OK

ntgussoni · 2020-03-17T16:19:36Z

Hello @mojavelinux @azu I wonder if further development on this subject has been made.
I'm working really closely to a TW team and have been thinking about linting for asciidoctor.

How can I help?

ntgussoni · 2020-08-03T14:39:47Z

I somehow always end up coming back to this "issue". @mojavelinux what do you think is missing from the example parser you were working on. Im more than happy to continue working on it, I believe a full-blown AST could lead to great tools.

ggrossetie · 2021-01-01T22:18:18Z

In my experience, Maybe we should get minimal steps.
Parse any asciidoc/asciidoctor document without error
For example, markdown plugin has stress test using fixtures.
For example, current implementation throw error for macro
Add missing nodes
TitleNode, SectionNode ...

@azu Sounds good!

textlint require superset or subsest of Textlint AST.
If textlint-plugin-asciidoc return superset of Textlint AST, textlint's rule will just ignore Unknown node.

I noticed that Strong, Emphasis and Monospaced (Inline code) don't have a value field. I think it would be necessary if we want to support both Markdown and AsciiDoc.

Let's take a concrete example:

Markdown

# This *Is a* Title

node {
  type: 'Header',
  depth: 1,
  children: [
    {
      type: 'Str',
      value: 'This ',
      loc: [Object],
      range: [Array],
      raw: 'This '
    },
    {
      type: 'Emphasis',
      children: [Array],
      loc: [Object],
      range: [Array],
      raw: '*Is a*'
    },
    {
      type: 'Str',
      value: ' Title',
      loc: [Object],
      range: [Array],
      raw: ' Title'
    }
  ],
  loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
  range: [ 0, 19 ],
  raw: '# This *Is a* Title'
}

AsciiDoc

And here's the same document in AsciiDoc

= This _Is a_ Title

node {
  type: 'Header',
  depth: 1,
  children: [
    {
      type: 'Str',
      value: 'This ',
      loc: [Object],
      range: [Array],
      raw: 'This '
    },
    {
      type: 'Emphasis',
      children: [Array],
      loc: [Object],
      range: [Array],
      raw: '_Is a_'
    },
    {
      type: 'Str',
      value: ' Title',
      loc: [Object],
      range: [Array],
      raw: ' Title'
    }
  ],
  loc: { start: { line: 1, column: 0 }, end: { line: 1, column: 19 } },
  range: [ 0, 19 ],
  raw: '= This _Is a_ Title'
}

As you can see, it would be beneficial to have a value field on the Emphasis otherwise the AST is not markup-agnostic (i.e., we cannot extract the text from the markup without knowing the markup language).

@azu Should I open an issue at https://github.com/textlint/textlint?

azu · 2021-01-02T02:42:43Z

I always use textlint-util-to-string for extracting text content from TxtParentNode.
It picks each children nodes's value and joins these.
(It aims to pick values that are displayed as rendering result)

it would be beneficial to have a value field on the Emphasis

Basic TxtAST is based on remark.
Emphasis node of remark's AST(mdast) has not value property.
But, I do not know the reason…

**__1__** may be the reason.

I agree that Universal AST like TxtAST has ambiguous.

ggrossetie · 2021-01-02T21:24:10Z

Emphasis node of remark's AST(mdast) has not value property.

My bad I didn't see that the Markdown to TxtAST plugin is using a Str child (as described in: https://github.com/syntax-tree/mdast#emphasis).
So the value is effectively available on the Str child:

node {
  type: 'Emphasis',
  loc: [Object],
  range: [Array],
  raw: '_Is a_',
  children: [{
    type: 'Str',
    value: 'Is a',
    loc: [Object],
    range: [Array],
    raw: 'Is a'
  }],
}

I will update the AST produced by the AsciiDoc plugin.

mojavelinux · 2023-10-28T22:52:42Z

The reason this project is stalled is because we don't yet have a clear definition of the formal grammar for AsciiDoc. That is something that the AsciiDoc Language project is working on. Once we have those rules nailed down, we can implement them in a lint project like this one. As I have said elsewhere, I don't think an official parser for AsciiDoc is going to be able to do all the things a linter will need to do (since the parser is focused on parsing a valid AsciiDoc document). However, the two tools will still need to be working off the same playbook, so to speak. That's what the formal grammar part of the specification will provide (and it's no small task).

Building off of work started by Guillaume, I have developed a prototype of an AsciiDoc parser for the formal grammar we are developing as part of the AsciiDoc Language. It's not yet complete, but can handle a good bulk of the syntax already. You can find it here: https://github.com/opendevise/asciidoc-parsing-lab

ggrossetie mentioned this issue Jan 4, 2021

Get the header text/value from the children sapegin/textlint-rule-title-case#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to help #1

I want to help #1

azu commented Nov 20, 2017

mojavelinux commented Nov 23, 2017

mojavelinux commented Nov 23, 2017

azu commented Nov 24, 2017

ntgussoni commented Mar 17, 2020

ntgussoni commented Aug 3, 2020

ggrossetie commented Jan 1, 2021 •

edited

Loading

azu commented Jan 2, 2021 •

edited

Loading

ggrossetie commented Jan 2, 2021 •

edited

Loading

mojavelinux commented Oct 28, 2023

I want to help #1

I want to help #1

Comments

azu commented Nov 20, 2017

mojavelinux commented Nov 23, 2017

mojavelinux commented Nov 23, 2017

azu commented Nov 24, 2017

ntgussoni commented Mar 17, 2020

ntgussoni commented Aug 3, 2020

ggrossetie commented Jan 1, 2021 • edited Loading

Markdown

AsciiDoc

azu commented Jan 2, 2021 • edited Loading

ggrossetie commented Jan 2, 2021 • edited Loading

mojavelinux commented Oct 28, 2023

ggrossetie commented Jan 1, 2021 •

edited

Loading

azu commented Jan 2, 2021 •

edited

Loading

ggrossetie commented Jan 2, 2021 •

edited

Loading