Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relationships field to enhance metadata on ID relationships #133

Closed

Conversation

joshbuker
Copy link
Contributor

@joshbuker joshbuker commented Mar 29, 2023

Fixes #53

@oliverchang
Copy link
Contributor

Hey Josh, we can't break the schema, and would need a very good reason to create a new major version. I'll ask some more clarifying questions in #53.

@joshbuker
Copy link
Contributor Author

@oliverchang This would be backwards compatible, as you could still have an array of strings. It's just that you can now also do an array of objects instead if you want. (anyOf)

@joshbuker
Copy link
Contributor Author

joshbuker commented Mar 30, 2023

That said, the types I provided as examples definitely need refined and perhaps slimmed down to purely known use-cases. For now, duplicate (and perhaps alias to consolidate that?) are the only ones I know of from the GSD perspective.

@oliverchang, @darakian, are there any that would be useful from the Google/GitHub perspective? I could see GHSA's including parent/child relationships for a root-cause vulnerability and the affected OSS packages, for example.

EDIT: Oh, and the SIMILAR_TO is something we've seen with particular CVEs that say "this is NOT CVE-x CVE-y CVE-z" - @kurtseifried might know the example IDs off the top of his head. Also covers the generic "related" type. Well, not quite the generic case. There is some nuance there, but worth discussing anyway.

EDIT 2: There would need to be a RELATED or similar type for explicit backwards compatibility with the untyped string array. (i.e. if the string array is used, all entries are assumed to be of RELATED type). This could be the catchall type, similar to WEB for references.

@joshbuker
Copy link
Contributor Author

Ideally, if a v2 schema was possible:

  • related would be removed entirely and replaced with a new relationships field
  • aliases would be removed entirely and provided as a relationship type
  • relationships would be an array of objects { type, id }, similar to references being { type, url }

@joshbuker joshbuker marked this pull request as draft March 30, 2023 14:37
@joshbuker
Copy link
Contributor Author

Alternatively, can also add relationships as a new field, and leave aliases and related unchanged. Allows for more specific relationship information, but remains completely backwards compatible and simple.

@kurtseifried
Copy link
Contributor

One comment: I think we should leave aliases and related as is. 1) backwards compatibility (assuming anyone uses it) and 2) it also nicely solves the case for which we have related data but don't know what the relationship is.

In my mind it would be nice to classify the relationship so that every end user doesn't have to read the links and figure it out, e.g. answers like "downstream vendor that ships this" or "vuln2 was found because of an incomplete fix for this vuln" and so on. Thus a possible solution where URLs start in related and then graduate to relationships once we know what the relationship is, also we leave the URL in related so that older tooling that can't read relationships still get the data.

@darakian
Copy link
Contributor

parent/child relationships for a root-cause vulnerability and the affected OSS packages

What does it mean to be in a parent child relationship though? Is that package A discovers a vuln and package B consumes A as a dependency? Is that the vuln in A causes a distinct vuln to occur in B? Something like parser issue in A leads to incorrect behavior in B. If code is copied by a ctrl+c/ctrl+v from A to B is that a relation or an alias?

Beyond that how do we collect and validate this data? How do we use this data? I find it very difficult to know at advisory review time what relations may exist and I'm not sure what behavior change I would expect from someone receiving an advisory if it had relationship data.

@joshbuker
Copy link
Contributor Author

@darakian

Is that package A discovers a vuln and package B consumes A as a dependency?

This is the primary use-case I was imagining. For example, with Log4Shell you could have a root vuln for Log4j itself, with child IDs for each library or service that consumes it (Minecraft, Ubiquity, every Java application on the planet that logs things, etc). This helps keep the root vulnerability minimal and relevant while still providing the valuable meta for each package affected.

Beyond that how do we collect and validate this data? How do we use this data? I find it very difficult to know at advisory review time what relations may exist and I'm not sure what behavior change I would expect from someone receiving an advisory if it had relationship data.

That's a good question. With something like GSD, for example, this would be folks updating the ID over time as things are discovered, rather than something frontloaded with the original/parent vuln ID.

The only behavior change I would see is some additional metadata for humans investigating the root cause. Scanners wouldn't care because it doesn't matter where the affected version comes from (one huge ID with everything everywhere affected, or from a child ID specific to the package scanned).

@darakian
Copy link
Contributor

darakian commented Apr 4, 2023

This is the primary use-case I was imagining. For example, with Log4Shell you could have a root vuln for Log4j itself, with child IDs for each library or service that consumes it (Minecraft, Ubiquity, every Java application on the planet that logs things, etc). This helps keep the root vulnerability minimal and relevant while still providing the valuable meta for each package affected.

In theory this is how CVEs are supposed to work. Rule 7.2.4 in the cve rule set states that a CNA

MUST NOT assign more than one CVE ID if the products are affected, because they share the vulnerable code. The assigned CVE ID will be shared by the affected products.

In practice it can be hard to know if two advisories share the same underlying vulnerable code or not which is where this gets painful.

That's a good question. With something like GSD, for example, this would be folks updating the ID over time as things are discovered, rather than something frontloaded with the original/parent vuln ID.

The only behavior change I would see is some additional metadata for humans investigating the root cause. Scanners wouldn't care because it doesn't matter where the affected version comes from (one huge ID with everything everywhere affected, or from a child ID specific to the package scanned).

I guess let me ask; how is a relation different from an alias? An alias is already a relation with a claim that two advisory IDs are the same (for some definition of sameness). I think the example you're laying out where users would tag an ID over time as new instances of a vuln are discovered could be captured in the alias field.

@joshbuker
Copy link
Contributor Author

@darakian

I guess let me ask; how is a relation different from an alias? An alias is already a relation with a claim that two advisory IDs are the same (for some definition of sameness). I think the example you're laying out where users would tag an ID over time as new instances of a vuln are discovered could be captured in the alias field.

An alias is a type of relationship; not all relationships are aliases. #53 has more examples of the other kinds of relationships and why they would be helpful.

@darakian
Copy link
Contributor

darakian commented Apr 4, 2023

I do get wanting more fidelity out of relations, but looking at that issue it seems like an impossibility to maintain those with any sort of consistency.

@joshbuker joshbuker force-pushed the schema/optional-related-object-array branch from 30e01b0 to 6432f8a Compare April 4, 2023 18:31
docs/schema.md Outdated
Comment on lines 281 to 307
### relationships[].type field

Specifies the type of relationship this OSV has to the other identifier. Must
include one of the following types:

- `ALIAS`: An alias, or identifier that is referring to the exact same
vulnerability. This is for connecting identifiers from different databases,
and not for marking duplicate IDs within the same database, which should use
`DUPLICATED_BY` or `DUPLICATE_OF` respectively.
- `CAUSES`: Causes a related vulnerability, for example Log4Shell causing
binaries that embed the vulnerable version of Log4j to be vulnerable to RCE.
- `CAUSED_BY`: Caused by a related vulnerability, most often an embedded
dependency.
- `COMMON_NAME`: A name used to colloquially refer to a specific, usually high
impact, vulnerability. For example, "Log4Shell" would be a common name for
CVE-2021-44228.
- `DUPLICATED_BY`: Other identifiers within the same database that are marked as
a duplicate of this ID.
- `DUPLICATE_OF`: Points to the canonical identifier for a vulnerability within
a given database.
- `INCOMPLETE_FIX_FOR`: When the remediation for a vulnerability is incomplete,
and causes a related vulnerability. For example, Log4Shell (CVE-2021-44228)
would be an incomplete fix for CVE-2021-45046.
- `INSUFFICIENT_FIX_OF`: Fixes a vulnerability caused by a previous remediation
being incomplete. For example, CVE-2021-45046 would be an insufficient fix of
Log4Shell (CVE-2021-44228).
- `RELATED`: An identifier that is related in an unspecified way.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darakian Updated this with some descriptions for each relationship type.

Could you expand a bit on the concerns around consistency?

Existing tooling could continue using aliases/related or switch to relationships using just the ALIAS/RELATED types, so this would primarily expand what other databases would be able to do as far as data enrichment goes (one of the main use-cases for GSD).

@joshbuker joshbuker marked this pull request as ready for review April 4, 2023 23:38
@joshbuker joshbuker changed the title Update related to support array of strings OR objects with type and ID Add relationships field to enhance metadata on ID relationships Apr 19, 2023
@oliverchang
Copy link
Contributor

Closing. I share the same concerns with @darakian here around this being not feasible to maintain with consistency in a practical way. This also complicates the schema a fair bit, and fragments it (i.e. there are now two ways to specify aliases).

@joshbuker joshbuker deleted the schema/optional-related-object-array branch April 27, 2023 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand vuln id relationships
4 participants