Handle category-filtered fields #61

kiendang · 2021-05-03T15:06:08Z

This was originally mentioned in #59 and briefly discussed during jupyter-server meeting jupyter-server/team-compass#4 (comment).

In summary, there are 2 decisions to make:

How exactly do we "filter" the fields? 2 options being discussed are

Set the property to null, equivalent to removing the data and keep the key
Delete the property, i.e. removing both the key and the data

Do we inform users of the fields being filtered?

How to filter properties

Currently fields that are filtered out due to not having all of their categories allowed are set to null instead of being deleted. The rationale was to preserve the "shape" of the data.

During the meeting, it was proposed by some to just delete the fields. The "shape" of the data is not totally preserved in the case of nested fields where the parent properties are filtered. For example

user:
  email: [email protected]
  name: someone

if the parent property user is to be filtered, the whole property would become

user: null

where the user.email and user.name keys are not preserved.

Listing filtered fields

It was suggested in #59 that we have another metadata field in the event to list out all the properties being filtered, e.g.

"__masked__": [
  ["user", "email"],
  ["user", "id"]
]

This would be useful if we decide to set the filtered properties to null. This helps differentiate whether a property is null because its value is actually null or because it is hidden due to categories.

This would alse be useful if we go with the second approach of deleting the field altogether too. Since there are json schemas that allow additional properties. This would inform whether the missing properties are not there due to being filtered by categories or not included in the original data.

The text was updated successfully, but these errors were encountered:

Zsailer · 2021-06-01T20:02:46Z

How exactly do we "filter" the fields? 2 options being discussed are

Set the property to null, equivalent to removing the data and keep the key

Delete the property, i.e. removing both the key and the data

Do we inform users of the fields being filtered?

I don’t think it matters either way, because as you mentioned, preserving the data shape isn’t really a valid reason to keep the fields around. My preference, then, would be to drop the fields and add them to a masked field.
Adding a masked field seems like the right way to go here, no matter what we decide in (1). We should be explicit about what fields were masked, even in the case where we drop fields. This makes it easier for consumers of the emitted data to know exactly what happened to the data (and build logic around that).

This brings up another question. What about non-required properties? Do we need a missing field in the event capsule in the spirit of being explicit? Right now, it’s implied that missing, optional properties were not supplied by the event source, rather than being dropped by the filter. This can be inferred+verified by the checking the masked field, of course, so missing would be redundant but explicit.

kiendang · 2021-06-02T02:37:13Z

I don’t think it matters either way, because as you mentioned, preserving the data shape isn’t really a valid reason to keep the fields around. My preference, then, would be to drop the fields and add them to a masked field.

Adding a masked field seems like the right way to go here, no matter what we decide in (1). We should be explicit about what fields were masked, even in the case where we drop fields. This makes it easier for consumers of the emitted data to know exactly what happened to the data (and build logic around that).

This is what I think too.

This brings up another question. What about non-required properties? Do we need a missing field in the event capsule in the spirit of being explicit? Right now, it’s implied that missing, optional properties were not supplied by the event source, rather than being dropped by the filter. This can be inferred+verified by the checking the masked field, of course, so missing would be redundant but explicit.

I think it's better to do this post-hoc so we don't add more computation when recording events. For now we can provide a code example in the doc on how to extract missing, not filtered fields from the emitted events.

I would prefer the recording event logic to remain lean and only include things we have to do and can only do during event recording. To me this falls under operations/processing you can do on the emitted events after receiving them. There might be dedicated tooling for this in the future, by us and third parties, that arises from usage of telemetry.

Zsailer · 2021-06-03T18:41:46Z

Sounds good. We can always offer tooling (i.e. some helpful functions) inside jupyter/telemetry (in a later PR) to help process emitted events.

kiendang mentioned this issue May 3, 2021

[Telemetry] Add basic telemetry to Jupyter server and begin emitting server events jupyter-server/jupyter_server#364

Closed

kiendang added this to the 0.2 milestone May 27, 2021

Zsailer mentioned this issue Jun 1, 2021

Categories filtering for nested properties #59

Merged

kiendang mentioned this issue Aug 19, 2021

Evaluate jschon as json schema validator #66

Open

kiendang mentioned this issue Sep 8, 2021

Use fastjsonschema for json schema validation #64

Draft

kiendang mentioned this issue Jul 22, 2022

Add redactionPolicies field to Jupyter Event schemas jupyter/jupyter_events#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle category-filtered fields #61

Handle category-filtered fields #61

kiendang commented May 3, 2021 •

edited

Loading

Zsailer commented Jun 1, 2021

kiendang commented Jun 2, 2021

Zsailer commented Jun 3, 2021

Handle category-filtered fields #61

Handle category-filtered fields #61

Comments

kiendang commented May 3, 2021 • edited Loading

How to filter properties

Listing filtered fields

Zsailer commented Jun 1, 2021

kiendang commented Jun 2, 2021

Zsailer commented Jun 3, 2021

kiendang commented May 3, 2021 •

edited

Loading