pre-proposal: add `extraSchemas` to notebook format #96

agoose77 · 2023-03-07T13:44:35Z

Background

During the Jupyter Notebook workshop, we established three JEP drafts that would prepare the notebook format for additional cell types, and address the problem of un-typed metadata. On the latter issue, current notebook users have no way to indicate to the notebook consumer that metadata should conform to a particular schema. This prevents the validation of the metadata by third parties, and precludes the ability for frontends to display rich-editing interfaces for this metadata¹

Proposal

A separate JEP will move to deprecate the nbformat and nbformat_minor top-level properties, in favour of a direct $schema property. This must contain a URI to an nbformat schema.

This JEP will extend the previous schema to include an extraSchemas property. This optional property may contain an array of URIs that refer to additional schemas. These schemas may not conflict with one another, and all extraSchemas must validate the document alongside the root $schema in order for a notebook to be considered valid. To begin with, any schema in extraSchemas must conform with a restrictive metaschema that permits the addition of properties only to the notebook and cell metadata. In future, this may be relaxed.

Examples

Example of valid notebook under this proposal:

{
    "$schema": "https://jupyter.org/schema/notebook/4.6/notebook-4.6.schema.json"
    "extraSchemas": [
        "my-extension-schema-uri"
    ],
    "metadata": { 
        "my-extension": {
        }    
    }
    "cells": []
}

Example of schema referenced in extraSchemas ("my-extension-schema-uri"):

{
    "$schema": "https://jupyter.org/schema/notebook/4.6/notebook-4.6.schema.json"
    "metadata": { 
        "type": "object",
        "required": ["my-extension"]
    }
}

Further Information

As this is a complex area of discussion (multi-stakeholder, significant long-term impact, niche tooling), we are holding regular, open discussions under the general topic of "extra cell types". The meeting notes from the first of such meetings can be found here. Those wishing to attend can find more information there.

e.g. with tools like react-jsonschema-form ↩

The text was updated successfully, but these errors were encountered:

bollwyvl · 2023-03-07T14:10:07Z

As discussed in the workshop, we might need to do some more research into what existing standards exist for saying a document must conform to multiple schema: my cursory research check of the JSON schema spec didn't dig up anything (must always be a single URI), but there may be other specs of interest.

The first one that came to mind was the widely used (but still maligned for some nits like author order) Dublin Core Metadata, which includes a conformsTo description, but doesn't make many other claims, e.g. "the syntax conforms to," or "the underlying content conforms to."

If something authoritative (and already implemented) can't be found, we might also consider just making this a "well-known" #/metadata/extraSchemas value rather than adding a new top-level key: these would be considered "non-normative": a client or tool would be able to happily disregard a schema if it can't find it, and would not be under any compunction to actually download the schema (which isn't even guaranteed).

Indeed, one of the discussed points was reusing the schema terminology directly, e.g.

{
  "$schema": ...,
  "metadata": {
    "extraSchema": {
      "allOf": [
        {"$ref": "https://some/other/schema"},
      ]
    }
  },
  "cells": ...
}

But, again, this puts us back in an important member being in a list, which has addressability concerns brought up in other places.

Another aspect (which didn't come up as much directly in the workshop, as the focus was mostly on the data model) is how various clients would report any schema violations: as the schema could constrain any part of the document (even ones not rendered by a client), which would probably need to be fleshed out.

agoose77 · 2023-03-07T15:51:59Z

@bollwyvl both good points. I think you're recorded as planning to attend the meeting in 10 minutes, so let's discuss it there, and report back the findings!

willingc · 2023-03-10T06:28:21Z

FYI @MSeal @rgbkrk

tonyfast · 2023-03-21T03:57:35Z

i spent a little time thinking about a few tools different kinds of extra schema we could define. these are just some use cases for reference or discussion later on.

the schema are written in toml for density. they get weird when we are deep in the schema.

specific source patterns

constrain that a document can't be saved with out a blank cell. ideally, we'd want to have a nice $comment to inform the user.

"$description" = "require all cells are non-empty"
[properties.cells.items.properties.source.if]
type = "string"

[properties.cells.items.properties.source.then]
"$anchor" = "non-empty-string"
minLength = 1
pattern = "^\s*\S"

[properties.cells.items.properties.source.else]
type = "array"
minLength = 1
contains = {"%ref": "#non-empty-string"}

notebook metadata extensions

as @agoose77 described above, we might want to extend the notebook level metadata. in this example, we image kernelspec extracted to its own schema

[properties.metadata]
required = ["kernelspec"]

[properties.metadata.properties.kernelspec]
"$ref" = "https://github.com/jupyter/nbformat/blob/main/nbformat/v4/nbformat.kernelspec.v4.5.schema.json"

cell metadata extensions

we might want to constrain the cell metadata schema. currently, there are quite a few cell schema that might be useful to extract into more composable representations later on. in this example, slide types are constrained.

"$description" = "the cell metadata slide type schema"
[properties.cells.items.properties.metadata]
required = ["slide_type"]

[properties.cells.items.properties.metadata.properties.slide_type]
enum = ["slide", "sub-slide"]

display data data extension for a json schema

we might want to constrain our new display data types. this example requires json schema mimetypes to abide json schema.

[properties.cells.items.properties.outputs.items.if]
output_type = "display_data"

[properties.cells.items.properties.outputs.items.then.properties.data."application/schema+json"]
"$ref" = "https://json-schema.org/draft/2020-12/schema"

display data data metadata extension

a vendor might want to constrain their output metadata. below we constrain my_extensions metadata.

[properties.cells.items.properties.outputs.items.if]
output_type = "display_data"

[properties.cells.items.properties.outputs.items.then.properties.metadata.properties.my_extension.properties]
foo = {type = "string"}

tonyfast mentioned this issue Mar 7, 2023

Rethinking the notebook cells weekly meeting jupyterlab/frontends-team-compass#182

Open

This was referenced Mar 21, 2023

Pre-proposal: Specify the Markdown cell's markdown flavor #98

Open

Add JEP for adding $schema to notebook format #97

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-proposal: add `extraSchemas` to notebook format #96

pre-proposal: add `extraSchemas` to notebook format #96

agoose77 commented Mar 7, 2023 •

edited

Loading

bollwyvl commented Mar 7, 2023

agoose77 commented Mar 7, 2023

willingc commented Mar 10, 2023

tonyfast commented Mar 21, 2023

pre-proposal: add extraSchemas to notebook format #96

pre-proposal: add extraSchemas to notebook format #96

Comments

agoose77 commented Mar 7, 2023 • edited Loading

Background

Proposal

Examples

Further Information

Footnotes

bollwyvl commented Mar 7, 2023

agoose77 commented Mar 7, 2023

willingc commented Mar 10, 2023

tonyfast commented Mar 21, 2023

specific source patterns

notebook metadata extensions

cell metadata extensions

display data data extension for a json schema

display data data metadata extension

pre-proposal: add `extraSchemas` to notebook format #96

pre-proposal: add `extraSchemas` to notebook format #96

agoose77 commented Mar 7, 2023 •

edited

Loading