Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional schema_type for data format hinting (ala CVE) #134

Closed
wants to merge 3 commits into from

Conversation

joshbuker
Copy link
Contributor

@joshbuker joshbuker commented Mar 29, 2023

Revisiting #51

Fixes #30

@joshbuker
Copy link
Contributor Author

This is particularly useful to distinguish things that look like OSV, but are actually custom. e.g. some of the entries in ruby-advisory-db

@joshbuker
Copy link
Contributor Author

This change is fully backward compatible, as schema_type is unused historically and not required / can be ignored.

@oliverchang
Copy link
Contributor

Hmm, I'm not sure if we arrived at any conclusions in #51. I recall that the main point against this is that there is no standardised way to label JSON fields with their type, so for a parser to detect what format a JSON entry is, it'd need specialised knowledge of the format anyway. If you get an OSV from somewhere (e.g. https://osv.dev, or a database export explicitly formatted as OSV), you already typically know that it's formatted as OSV. I can't think of a real use case where this would not be the case. Generally, communicating about types is also typically more the domain of the protocol serving these (e.g. mime types, file extensions).

For databases like GSD which deals with many different types of data, labelling the data can be done more consistenctly through the namespacing approach instead to wrap the various different types with the type it actually is.

I realise that this seems like a small change to be adding, but one of the guiding principles of OSV is to be minimal, and have each field serve intentional use cases that we've encountered.

@joshbuker
Copy link
Contributor Author

This is primarily so that the data is explicit even without the context of a file server, i.e. the json/yaml can be parsed standalone without any doubts.

For example, if GSD extends the OSV schema to require fields such as summary, details, and schema_version, we would want to use a schema_type of OSV-GSD so that data could clearly be distinguished between OSV and the slightly expanded format.

This can definitely be solved with wrappers and server hinting. However in the scenario where those file servers shutdown and someone wants to do some archaeology on the archived json/yaml, this type of format hinting could be invaluable. It also reduces institutional knowledge required, by including that hinting in the data directly rather than service documentation. It also simplifies tooling, as you can dynamically scan data and deterministically validate it against the related schema (dependant of course on the schema providing declarative hinting, like CVE and CSAF - the only other common vuln id formats - currently do).

@kurtseifried
Copy link
Contributor

To confirm has this been rejected?

@joshbuker
Copy link
Contributor Author

@chrisbloom7 @oliverchang Checking in on status. Is this officially rejected?

@oliverchang
Copy link
Contributor

@chrisbloom7 @oliverchang Checking in on status. Is this officially rejected?

I don't see any of the arguments against this in the original #51 addressed, so this is a no from me.

@joshbuker
Copy link
Contributor Author

@chrisbloom7 @oliverchang Checking in on status. Is this officially rejected?

I don't see any of the arguments against this in the original #51 addressed, so this is a no from me.

@oliverchang What are the current arguments against, just for clarity?

I feel my earlier comment addresses the pushback on doing this exclusively with a wrapper/server hinting, and I have yet to get feedback on why OSV can't match what the other two formats (CVE & CSAF) are currently doing when it comes to format hinting.

@oliverchang
Copy link
Contributor

oliverchang commented Apr 26, 2023

The main point against is that this doesn't really offer any reliable way for an automated system to determine what format this is, because to make use of it you'd need to have pre-existing knowledge of the OSV format to begin with. This would ideally be addressed at a protocol level of some form.

For your use case of identifying random files lying around without any other metadata about what they are: the same can also likely be achieved just by running the JSON validator using the latest version of the schema and also making sure that all fields exist in the schema. We can likely also improve the JSON schema / validator to make this easier (e.g. setting additionalProperties to false among other things).

We've also not seen any other requests for this, across all of our existing producers and consumers of OSV data.

Adding fields to the OSV schema is very expensive, and we need to make sure every field added (even if they're optional) serves an intentional purpose.

@joshbuker joshbuker deleted the schema/schema-type branch April 27, 2023 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Please add "data_type" tag to make parsing easier
3 participants