-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown based notebooks #103
base: master
Are you sure you want to change the base?
Conversation
- small wording changes and typos - shortened and reduced the wordiness of one of the use cases
```{jupyter.code-cell metadata={json object}} | ||
:tags: [hide-output, show-input] | ||
|
||
print(Hello!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with this space, but from a simplistic perspective this is identical to the previous block except the json object
indicator. I expected the tag or its contents to look like JSON. Also, looks like the print
line is missing a double quote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is the reasoning on where snake, kebab and space cases are used. I had anticipated json-object.
Not to be picky, I love this work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @KathleenDollard thanks for the comment - yes there are multiple different ways of expressing the same thing here and, the motivation for that came from authoring in plain text rather than concerns on lossless serialisation, the fact that there are already a few different ways for people to author lightweight notebooks (in jupytext and in myst-notebooks for example) the different variation reflect supporting a few of the different existing styles that people use. @nthiery can maybe comment more on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo fixed in avli#2 ; thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the reasoning is to leave freedom to the author, depending on the
use case, on which syntax to use for cell metadata and cell parameters:
- one-line json object when the priority is on lossless serialization (without polluting too much the text)
- yaml when the priority is on human read/write ability
Thanks @KathleenDollard for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @KathleenDollard! Thank you for the feedback!
Answering your first comment, as @stevejpurves already mentioned, initially, it was decided to set a minimum amount of restrictions on how to write metadata. Theoretically, it allows users to select the suitable tradeoff between readability/writability. I'm unsure if such flexibility is a good idea, so more feedback is welcome to decide if the syntax should be more opinionated.
About the cases: do you suggest that the identifiers obey the JSON syntax (in other words, kebab-case
shouldn't be used)?
Note that the syntax
is incompatible with pandoc's markdown. Ideally, it would be nice if the proposed format could be read and processed by pandoc (and thus doesn't require a custom parser). Why not use an attribute that is compatible? E.g.
or
or even just
There is currently no official attribute syntax for commonmark, but if this comes it is likely to be very similar to the pandoc attribute syntax. See https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/attributes.md Similar remarks for other uses of |
thanks for the comments @jgm yes, the syntax
is aimed at providing concrete "directives" in the document that can be used to specify the various notebook blocks, which go beyond code blocks and also specify output and attachments and other complex/rich types. So the JEP isn't favoring any existing parser/library and while it isn't current compatible out of the box with A custom parser / serializer or modifications existing parsers are probably going to be needed anyways in order to support the serialisation requirements around output and attachment blocks? |
Yes, I understand the intent. But that intent can be met without departing from standard attribute syntax. If you used one of the variants I suggested, or e.g. With your current syntax suggestion, that wouldn't be possible; you'd be giving up easy-interoperability for no good reason that I can discern. |
This could all be handled with filters with the existing pandoc markdown or extended commonmark parser; none of it requires changes to the parser. |
Thanks @jgm for the feedback! The motivation for having Using I am not sure about Presumably a good guideline to follow is what would be customary in the |
Using Markdown for notebooks that display nicely as READMEs (similar to mwouts/jupytext#220) has been explored for Polyglot Notebooks / Try .NET. One detail from that design that might be of interest here is that we also put cell metadata after the code fence, but always prefixed with the language name in order to leverage existing syntax highlighting features. Here's an example:
This renders with language-specific highlighting without displaying the metadata: x = 1
if x == 1:
# indented four spaces
print("x is 1.") |
Some implementations may take
Another alternative would be to use a key-value pair: |
Thanks for the feedback that brings perspective to one of the open points. I personally lean toward making this a recommended feature: parsers should support it; |
On the class attribute syntax: I don't like the idea of syntax that overloads class attribute; Speaking from a jupyter point of view, I think we want strong semantics around what a jupyter code-cell (or output, or attachment) is (with or without the {}), and what information should be on them in terms of parameters, attributes, metadata, etc... these are not On interoperability: A block syntax of {code-cell} is already compatible with jupytext and MyST notebooks, with the introduction of new block types and the jupyter namespace {jupyter.code-cell} it is still well aligned with the block / directive syntax used by jupytext, myst and I think quarto - extension should be straightforward there. |
My point is that you should care about wider interoperability.
Quarto is based on pandoc (it uses pandoc's parsers with a bunch of filters on top to process the AST), so you need to be interoperable with pandoc for that. |
I think we do? and I think we're considering & discussing that here -- I guess what I'm not clear on is as there are multiple possible (probably conflicting) tools to be interoperable with, how to weight them. e.g. I'm not clear on the extent that pandoc is actively used alongside jupyter in the same way that jupytext is (i.e. in a tight loop over notebook development and execution) as opposed to say getting notebooks out to other formats for distribution of that material outside of jupyter. Also other big point on interoperability is which hasn't been mentioned yet is GFM! Maybe what were are missing the JEP so far are some clearer requirements like statements that can be discussed and agreed on, e.g.
As currently the "design goals" section is the closest to something like that but is still very loose: i.e. "The serialized notebook should be a valid Markdown file." whatever that means. This could better set the scene for then zeroing on the syntax.
Ah ok, I thought it was pandoc flavored markdown + additional extensions -- are you saying that pandoc already supports the quarto code block syntax, which doesn't use class attributes and is close to the syntax already outlined in the JEP? or is this special handling of a language attribute by pandoc? e.g. shown here |
I suspect that's a documentation bug.
or
I believe the same is true of Quarto, because they don't use a customized pandoc, just filters on top. All I'm saying is that if there's any room for a choice between
etc., it would be desirable (in this planning stage) to pick one that pandoc can already handle. This increases interoperability at little cost. (This would have been a good design goal for MyST, too.) |
I would love to see a new section addressing the topic of trust and signatures (Jupyter Notebook security model). In particular: would signature for notebook be computed and stored in the markdown file?
Please also see #95 (comment). |
|
@krassowski, thank you for raising this question! As far as I understand from the documentation, the signature is produced from the outputs. Can we apply the same procedure to the outputs inside the Markdown file? Most likely, I oversimplify things, and you probably see some rough edges. If so, could you share your thoughts? |
cell outputs and attachment are mentioned at several places, but it is not clear to me if there is an option to have a companion file to markdown to persist those cell outputs and attachments. |
cell outputs and attachment are mentioned at several places, but it
is not clear to me if there is an option to have a companion file to
markdown to persist those cell outputs and attachments.
Thanks for your feedback. Externalising cell outputs and attachments
(e.g. in companion files) is indeed a natural feature. During our
discussions, various use cases and approaches emerged. For an
incremental approach, and also because the feature could be relevant
as well for traditional ipynb notebooks, we decided to propose to
treat that feature in a followup JEP. See line 580 of:
https://github.com/jupyter/enhancement-proposals/pull/103/files#diff-932448845fb9d55aef27789043a371eb872aa644507bf72e049f5ab536428238R580
With the current JEP, cell outputs and attachemnts can be stored
inline only, or not at all.
|
Well, I would feel more comfortable that this important topic be handled in this JEP to make sure all bits make sense. It can make sense to discuss them in separate forums, but giving my +1 on a partial solution which excludes difficult aspects is not appealing to me.
oh yes, it was indeed excluded. |
.
|
Well, I would feel more comfortable that this important topic be
handled in this JEP to make sure all bits make sense. It can make
sense to discuss them in separate forums, but giving my +1 on a
partial solution which excludes difficult aspects is not appealing
to me.
Thanks for giving us the opportunity to detail and clarify our
reasoning.
In the use cases we had in mind, the feature did not look difficult,
at least when it comes to the notebook format itself: one simple
solution is to enable metadata for cell outputs and for attachements
specifying that the data is not provided inline, but to be fetched
from a given url.
The feature is relevant for both Markdown and ipynb notebooks, and the
above implementation does not depend on the format.
Of course, that's not all there is to it to externalizing data -- like
how you make sure, e.g., that companion files remain available or urls
remain valid when the notebook is moved around -- but these
difficulties are about tools and workflows, not the file format of the
notebook.
Does that sound adequate in the use cases you have in mind?
|
Keeping the companion file with its host is one aspect which is indeed not directly relevant to the file format. My attention point was more about the |
I have mixed feelings on the format proposed for a few reasons:
|
We have started looking at this at the SSC meetings. We have decided to give at least another 2 weeks of discussion before moving forward. |
I think having a markdown-based alternative format for Jupyter notebooks is a great idea. But supporting and slightly expanding on the interoperability issues @jgm raised: Just for simplicity's sake I would also suggest to as far as possible use or adapt an existing format, instead of introducing yet another variation. Since a Quarto In any case, I think it would be good to actively involve representatives of related projects in this process, e.g. Quarto's @cderv. |
There as been mention of https://github.com/executablebooks/mystmd here, and I remember having seen public discussions between MyST and Quarto if I am not mistaken. What about targeting interoperability between Around ipynb interoperability, a general question is for me "How related/different would it be to https://github.com/mwouts/jupytext?" |
@stevejpurves @jgm Just chiming in to add some precision about this. The syntax of In Quarto, computation are handled before Pandoc conversion through engine, among them Jupyter engine. Results of computation stages will produce a .md intermediary file with Source Code Blocks and there results as Pandoc's Markdown syntax, to be process with Pandoc. Hope it helps clarify. Happy to show more if needed. |
Just to clarify a little bit more on the Quarto side: we switched to a custom Reader since (I believe) Pandoc 3. So we're no longer strictly "just filters on top", so that we wouldn't break backwards compatibility for the very common syntax
As @jgm pointed out, that is indeed not valid syntax for codeblock nodes in pure pandoc:
But in quarto, you get this instead:
If we request markdown output we don't get precisely the same codeblock, but it's close enough that it roundtrips correctly:
|
I do in general think it would be better for everyone if we were to officially adopt (and potentially extend) an existing format, since there are at least three of these now, rather than define another new format for more text-friendly notebook serialization. I think a pretty strong case has to be made that none of these formats can be built on successfully before defining a new format, and I don't feel like that's been done. I'd start from what do myst/quarto/jupytext not do that we need, and how can we fill those gaps (if any) by building on those tools (or not). |
Sorry, I claimed that I'm not involved in Quarto development, but I have taken part in discussions on Quarto, and from that I know that there are mid-term plans to implement the initial extraction of code also via Pandoc, which needs a custom reader. @cscheid, I'm not sure whether that custom reader would be identical to the one you mentioned as already being used now? Would that mean that through that custom reader Pandoc would take over the complete work of initial |
I'm sorry - I'm not sure what you're referring to here. |
A combination of MyST-Markdown (Jupyter-book (Sphinx)) and QMD (Quarto, nbdev) would be a great thing. jupyter/nbformat does not and should not specify docutils or pandoc. Additional criteria:
JupyterLab extensions:
Challenges / Opportunities:
|
I mean the discussion in quarto-dev/quarto-cli#3330: |
I apologize for further polluting this thread here, but I want to clarify a few points before further confusion sets in.
Just to clarify for everyone: the user baptiste is not a quarto developer, and neither is allefeld, for other readers in here. Baptiste offering a suggestion and not one we're currently planing on implementing. My full reply was:
My "remove code cells" comment is not about "extracting code cells" or the Pandoc syntax for code blocks. It is about the ability to identify executable code blocks for processing ahead of the execution engine. cderv later says:
In here, the context is that knitr eventually needs a parser in order to be able to detect and handle nested code cells, ultimately reducing the need for hacks like the multiple curly bracket treatment of code cells inside comments. I appreciate the enthusiasm and energy to participate, but I'd just like to ask folks to try and refrain from stating or implying positions from quarto devs about the quarto project when they lack the appropriate context. If you need more clarification about the goals of the quarto project, please ask us quarto devs directly: that's me, cderv, dragonstyle, jjallaire, and rich-iannone. Thank you! |
Dear all, |
From the standpoint of jupyter-lsp (which does not have an SSC representation), a format which enables encoding:
would be amazing to enable jupyter-lsp/jupyterlab-lsp#467, quoting:
Now, I am not advocating for any specific format, but it would be amazing if a future "go-to" Markdown format supported this kind of metadata in some way. Note: for the most part such metadata should not be presented to the user, but it would still be valuable to have a way to achieve a full round trip from |
This PR is an outcome of Jupyter Notebook workshop. The JEP proposes an alternative Markdown-based serialization syntax for Jupyter notebooks that allows the lossless serialization from/to
.ipynb
, is reasonably human readable, interoperable with standard text tools, and is more VCS-friendly.Creating a GitHub issue to decide if it's a JEP in this repository is skipped after discussing it with @fcollonval during the workshop.
Resolve #102