-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional schema_type for data format hinting (ala CVE) #134
Conversation
This is particularly useful to distinguish things that look like OSV, but are actually custom. e.g. some of the entries in ruby-advisory-db |
This change is fully backward compatible, as schema_type is unused historically and not required / can be ignored. |
Signed-off-by: Josh Buker <[email protected]>
Signed-off-by: Josh Buker <[email protected]>
bbf7b99
to
bf57f98
Compare
Signed-off-by: Josh Buker <[email protected]>
Hmm, I'm not sure if we arrived at any conclusions in #51. I recall that the main point against this is that there is no standardised way to label JSON fields with their type, so for a parser to detect what format a JSON entry is, it'd need specialised knowledge of the format anyway. If you get an OSV from somewhere (e.g. https://osv.dev, or a database export explicitly formatted as OSV), you already typically know that it's formatted as OSV. I can't think of a real use case where this would not be the case. Generally, communicating about types is also typically more the domain of the protocol serving these (e.g. mime types, file extensions). For databases like GSD which deals with many different types of data, labelling the data can be done more consistenctly through the namespacing approach instead to wrap the various different types with the type it actually is. I realise that this seems like a small change to be adding, but one of the guiding principles of OSV is to be minimal, and have each field serve intentional use cases that we've encountered. |
This is primarily so that the data is explicit even without the context of a file server, i.e. the json/yaml can be parsed standalone without any doubts. For example, if GSD extends the OSV schema to require fields such as summary, details, and schema_version, we would want to use a schema_type of OSV-GSD so that data could clearly be distinguished between OSV and the slightly expanded format. This can definitely be solved with wrappers and server hinting. However in the scenario where those file servers shutdown and someone wants to do some archaeology on the archived json/yaml, this type of format hinting could be invaluable. It also reduces institutional knowledge required, by including that hinting in the data directly rather than service documentation. It also simplifies tooling, as you can dynamically scan data and deterministically validate it against the related schema (dependant of course on the schema providing declarative hinting, like CVE and CSAF - the only other common vuln id formats - currently do). |
To confirm has this been rejected? |
@chrisbloom7 @oliverchang Checking in on status. Is this officially rejected? |
I don't see any of the arguments against this in the original #51 addressed, so this is a no from me. |
@oliverchang What are the current arguments against, just for clarity? I feel my earlier comment addresses the pushback on doing this exclusively with a wrapper/server hinting, and I have yet to get feedback on why OSV can't match what the other two formats (CVE & CSAF) are currently doing when it comes to format hinting. |
The main point against is that this doesn't really offer any reliable way for an automated system to determine what format this is, because to make use of it you'd need to have pre-existing knowledge of the OSV format to begin with. This would ideally be addressed at a protocol level of some form. For your use case of identifying random files lying around without any other metadata about what they are: the same can also likely be achieved just by running the JSON validator using the latest version of the schema and also making sure that all fields exist in the schema. We can likely also improve the JSON schema / validator to make this easier (e.g. setting We've also not seen any other requests for this, across all of our existing producers and consumers of OSV data. Adding fields to the OSV schema is very expensive, and we need to make sure every field added (even if they're optional) serves an intentional purpose. |
Revisiting #51
Fixes #30