Add relationships field to enhance metadata on ID relationships #133

joshbuker · 2023-03-29T20:53:01Z

Fixes #53

Signed-off-by: Josh Buker <[email protected]>

oliverchang · 2023-03-29T22:11:32Z

Hey Josh, we can't break the schema, and would need a very good reason to create a new major version. I'll ask some more clarifying questions in #53.

joshbuker · 2023-03-29T22:33:38Z

@oliverchang This would be backwards compatible, as you could still have an array of strings. It's just that you can now also do an array of objects instead if you want. (anyOf)

joshbuker · 2023-03-30T13:00:49Z

That said, the types I provided as examples definitely need refined and perhaps slimmed down to purely known use-cases. For now, duplicate (and perhaps alias to consolidate that?) are the only ones I know of from the GSD perspective.

@oliverchang, @darakian, are there any that would be useful from the Google/GitHub perspective? I could see GHSA's including parent/child relationships for a root-cause vulnerability and the affected OSS packages, for example.

EDIT: Oh, and the SIMILAR_TO is something we've seen with particular CVEs that say "this is NOT CVE-x CVE-y CVE-z" - @kurtseifried might know the example IDs off the top of his head. ~~Also covers the generic "related" type.~~ Well, not quite the generic case. There is some nuance there, but worth discussing anyway.

EDIT 2: There would need to be a RELATED or similar type for explicit backwards compatibility with the untyped string array. (i.e. if the string array is used, all entries are assumed to be of RELATED type). This could be the catchall type, similar to WEB for references.

joshbuker · 2023-03-30T13:42:05Z

Ideally, if a v2 schema was possible:

related would be removed entirely and replaced with a new relationships field
aliases would be removed entirely and provided as a relationship type
relationships would be an array of objects { type, id }, similar to references being { type, url }

joshbuker · 2023-03-30T16:36:17Z

Alternatively, can also add relationships as a new field, and leave aliases and related unchanged. Allows for more specific relationship information, but remains completely backwards compatible and simple.

kurtseifried · 2023-03-30T18:01:46Z

One comment: I think we should leave aliases and related as is. 1) backwards compatibility (assuming anyone uses it) and 2) it also nicely solves the case for which we have related data but don't know what the relationship is.

In my mind it would be nice to classify the relationship so that every end user doesn't have to read the links and figure it out, e.g. answers like "downstream vendor that ships this" or "vuln2 was found because of an incomplete fix for this vuln" and so on. Thus a possible solution where URLs start in related and then graduate to relationships once we know what the relationship is, also we leave the URL in related so that older tooling that can't read relationships still get the data.

darakian · 2023-03-30T20:20:22Z

parent/child relationships for a root-cause vulnerability and the affected OSS packages

What does it mean to be in a parent child relationship though? Is that package A discovers a vuln and package B consumes A as a dependency? Is that the vuln in A causes a distinct vuln to occur in B? Something like parser issue in A leads to incorrect behavior in B. If code is copied by a ctrl+c/ctrl+v from A to B is that a relation or an alias?

Beyond that how do we collect and validate this data? How do we use this data? I find it very difficult to know at advisory review time what relations may exist and I'm not sure what behavior change I would expect from someone receiving an advisory if it had relationship data.

joshbuker · 2023-04-01T19:02:04Z

@darakian

Is that package A discovers a vuln and package B consumes A as a dependency?

This is the primary use-case I was imagining. For example, with Log4Shell you could have a root vuln for Log4j itself, with child IDs for each library or service that consumes it (Minecraft, Ubiquity, every Java application on the planet that logs things, etc). This helps keep the root vulnerability minimal and relevant while still providing the valuable meta for each package affected.

Beyond that how do we collect and validate this data? How do we use this data? I find it very difficult to know at advisory review time what relations may exist and I'm not sure what behavior change I would expect from someone receiving an advisory if it had relationship data.

That's a good question. With something like GSD, for example, this would be folks updating the ID over time as things are discovered, rather than something frontloaded with the original/parent vuln ID.

The only behavior change I would see is some additional metadata for humans investigating the root cause. Scanners wouldn't care because it doesn't matter where the affected version comes from (one huge ID with everything everywhere affected, or from a child ID specific to the package scanned).

darakian · 2023-04-04T17:04:53Z

This is the primary use-case I was imagining. For example, with Log4Shell you could have a root vuln for Log4j itself, with child IDs for each library or service that consumes it (Minecraft, Ubiquity, every Java application on the planet that logs things, etc). This helps keep the root vulnerability minimal and relevant while still providing the valuable meta for each package affected.

In theory this is how CVEs are supposed to work. Rule 7.2.4 in the cve rule set states that a CNA

MUST NOT assign more than one CVE ID if the products are affected, because they share the vulnerable code. The assigned CVE ID will be shared by the affected products.

In practice it can be hard to know if two advisories share the same underlying vulnerable code or not which is where this gets painful.

That's a good question. With something like GSD, for example, this would be folks updating the ID over time as things are discovered, rather than something frontloaded with the original/parent vuln ID.

The only behavior change I would see is some additional metadata for humans investigating the root cause. Scanners wouldn't care because it doesn't matter where the affected version comes from (one huge ID with everything everywhere affected, or from a child ID specific to the package scanned).

I guess let me ask; how is a relation different from an alias? An alias is already a relation with a claim that two advisory IDs are the same (for some definition of sameness). I think the example you're laying out where users would tag an ID over time as new instances of a vuln are discovered could be captured in the alias field.

joshbuker · 2023-04-04T17:42:03Z

@darakian

I guess let me ask; how is a relation different from an alias? An alias is already a relation with a claim that two advisory IDs are the same (for some definition of sameness). I think the example you're laying out where users would tag an ID over time as new instances of a vuln are discovered could be captured in the alias field.

An alias is a type of relationship; not all relationships are aliases. #53 has more examples of the other kinds of relationships and why they would be helpful.

Signed-off-by: Josh Buker <[email protected]>

darakian · 2023-04-04T18:04:21Z

I do get wanting more fidelity out of relations, but looking at that issue it seems like an impossibility to maintain those with any sort of consistency.

Signed-off-by: Josh Buker <[email protected]>

joshbuker · 2023-04-04T18:35:09Z

docs/schema.md

+### relationships[].type field
+
+Specifies the type of relationship this OSV has to the other identifier. Must
+include one of the following types:
+
+- `ALIAS`: An alias, or identifier that is referring to the exact same
+  vulnerability. This is for connecting identifiers from different databases,
+  and not for marking duplicate IDs within the same database, which should use
+  `DUPLICATED_BY` or `DUPLICATE_OF` respectively.
+- `CAUSES`: Causes a related vulnerability, for example Log4Shell causing
+  binaries that embed the vulnerable version of Log4j to be vulnerable to RCE.
+- `CAUSED_BY`: Caused by a related vulnerability, most often an embedded
+  dependency.
+- `COMMON_NAME`: A name used to colloquially refer to a specific, usually high
+  impact, vulnerability. For example, "Log4Shell" would be a common name for
+  CVE-2021-44228.
+- `DUPLICATED_BY`: Other identifiers within the same database that are marked as
+  a duplicate of this ID.
+- `DUPLICATE_OF`: Points to the canonical identifier for a vulnerability within
+  a given database.
+- `INCOMPLETE_FIX_FOR`: When the remediation for a vulnerability is incomplete,
+  and causes a related vulnerability. For example, Log4Shell (CVE-2021-44228)
+  would be an incomplete fix for CVE-2021-45046.
+- `INSUFFICIENT_FIX_OF`: Fixes a vulnerability caused by a previous remediation
+  being incomplete. For example, CVE-2021-45046 would be an insufficient fix of
+  Log4Shell (CVE-2021-44228).
+- `RELATED`: An identifier that is related in an unspecified way.


@darakian Updated this with some descriptions for each relationship type.

Could you expand a bit on the concerns around consistency?

Existing tooling could continue using aliases/related or switch to relationships using just the ALIAS/RELATED types, so this would primarily expand what other databases would be able to do as far as data enrichment goes (one of the main use-cases for GSD).

Signed-off-by: Josh Buker <[email protected]>

oliverchang · 2023-04-26T22:28:37Z

Closing. I share the same concerns with @darakian here around this being not feasible to maintain with consistency in a practical way. This also complicates the schema a fair bit, and fragments it (i.e. there are now two ways to specify aliases).

Update related to support array of strings OR objects with type and ID

ff63d3f

Signed-off-by: Josh Buker <[email protected]>

joshbuker mentioned this pull request Mar 29, 2023

OSV-GSD Extended Schema CloudSecurityAlliance/gsd-tools#197

Closed

oliverchang mentioned this pull request Mar 29, 2023

Expand vuln id relationships #53

Open

joshbuker marked this pull request as draft March 30, 2023 14:37

joshbuker added 2 commits April 4, 2023 10:48

Merge branch 'main' into schema/optional-related-object-array

015b00a

Signed-off-by: Josh Buker <[email protected]>

Separate relationship into new optional field

0dc9ad9

Signed-off-by: Josh Buker <[email protected]>

joshbuker added 2 commits April 4, 2023 11:28

Add documentation for relationships field

3ea7442

Signed-off-by: Josh Buker <[email protected]>

Use tabs instead of spaces to match surrounding docs

6432f8a

Signed-off-by: Josh Buker <[email protected]>

joshbuker force-pushed the schema/optional-related-object-array branch from 30e01b0 to 6432f8a Compare April 4, 2023 18:31

joshbuker commented Apr 4, 2023

View reviewed changes

joshbuker added 2 commits April 4, 2023 11:36

Add missing type indicator to relationships docs

875d413

Signed-off-by: Josh Buker <[email protected]>

Add comma for grammatical clarity

5f2d4aa

Signed-off-by: Josh Buker <[email protected]>

joshbuker marked this pull request as ready for review April 4, 2023 23:38

joshbuker changed the title ~~Update related to support array of strings OR objects with type and ID~~ Add relationships field to enhance metadata on ID relationships Apr 19, 2023

oliverchang closed this Apr 26, 2023

joshbuker deleted the schema/optional-related-object-array branch April 27, 2023 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add relationships field to enhance metadata on ID relationships #133

Add relationships field to enhance metadata on ID relationships #133

joshbuker commented Mar 29, 2023 •

edited

Loading

oliverchang commented Mar 29, 2023

joshbuker commented Mar 29, 2023

joshbuker commented Mar 30, 2023 •

edited

Loading

joshbuker commented Mar 30, 2023

joshbuker commented Mar 30, 2023

kurtseifried commented Mar 30, 2023

darakian commented Mar 30, 2023

joshbuker commented Apr 1, 2023

darakian commented Apr 4, 2023

joshbuker commented Apr 4, 2023

darakian commented Apr 4, 2023

joshbuker Apr 4, 2023

oliverchang commented Apr 26, 2023

Add relationships field to enhance metadata on ID relationships #133

Add relationships field to enhance metadata on ID relationships #133

Conversation

joshbuker commented Mar 29, 2023 • edited Loading

oliverchang commented Mar 29, 2023

joshbuker commented Mar 29, 2023

joshbuker commented Mar 30, 2023 • edited Loading

joshbuker commented Mar 30, 2023

joshbuker commented Mar 30, 2023

kurtseifried commented Mar 30, 2023

darakian commented Mar 30, 2023

joshbuker commented Apr 1, 2023

darakian commented Apr 4, 2023

joshbuker commented Apr 4, 2023

darakian commented Apr 4, 2023

joshbuker Apr 4, 2023

Choose a reason for hiding this comment

oliverchang commented Apr 26, 2023

joshbuker commented Mar 29, 2023 •

edited

Loading

joshbuker commented Mar 30, 2023 •

edited

Loading