Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Evaluation should consider a dataset's privacy data

See original GitHub issue

Splitting up issues from #88

The current code implementation does not look at the dataset_references field in privacy declarations for evaluations. It just looks at data categories, data use, data subjects and a data qualifier.

The current dataset has an interesting hierarchical format so we want to make sure that we define the evaluation behavior well.

dataset:
  - fides_key: demo_users_dataset
    name: Demo Users Dataset
    data_categories: ["user.provided.identifiable"]
    data_qualifiers: [ "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified"]
    description: Data collected about users for our analytics system.
    collections:
      - name: users
        description: User information
        data_qualifier: "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified"
        data_categories:
              - user.provided.identifiable
        fields:
          - name: first_name
            description: User's first name
            data_qualifier: "aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified"
            data_categories:
              - user.provided.identifiable.name

dataset, dataset_collection and dataset_collection_field contain possible data qualifier(s) and data categories which makes the evaluation a little tricky. I’ll add here which things we need to be clear on:

What specific resource does the user want to evaluate We discussed this in #88 and it does feel like with the current hierarchy, it’s not clear which resource exactly should be evaluated. Fields in each level could yield different results in evaluation so I think we should evaluate each and each follows some sort of hierarchy.
How does inheritance work It makes sense that the each resource should inherit from it’s closest parent when a field is not defined. What im not 100% sure on is whether it should inherit qualifiers or categories from the privacy declaration. Basically we just need to define whether the other fields in the privacy declaration should have any impact on evaluations of the data set.
Are implicit defaults problematic in evaluations If the evaluation model follows some sort of inheritance then it might be problematic to have implicit defaults which are not obvious to a user. In our code we default qualifiers to aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified, but what if you wanted to define a qualifier at the collection level which should apply to all fields?

Issue Analytics

State:
Created 2 years ago
Comments:16 (16 by maintainers)

Top GitHub Comments

1reaction

NevilleScommented, Oct 14, 2021

FWIW, in the nuance of your comment you’re also suggesting to make it so data_qualifier is always singular which I’m generally pretty OK with too. This would make it the same for System, Dataset, DatasetCollection, and DatasetField.

I suppose there’s value in allowing this declaration:

collection:
  - name: "foo"
    data_categories: ["circle", "square"]
    data_qualifiers: ["red", "blue"]

But it’s less clear here, right? Is that saying the collection has red circles and blue squares? Or is it saying that the collection has circles and squares that may be red or blue? Both are reasonable interpretations and we have to pick a winner to decide if the policy says you aren’t allowed to have blue squares!

You avoid this issue by forcing the qualifier to be singular like this:

collection:
  - name: "foo"
    data_categories: ["circle", "square"]
    data_qualifier: "red"

This disallows the annotation (which, in fairness, seems fair to allow) but makes it much clearer what we’re doing. That edge case feels like you could support it in a different way, and then the singular qualifier works for the 90% of cases and avoids the potential footgun ambiguity

0reactions

ThomasLaPianacommented, Oct 15, 2021

I like it!