Prototype loading privacy declarations directly from source code
See original GitHub issueThe separate system declarations are a potential burden for users. A good middle-ground between code analysis and what we have now is to co-locate the declarations and the code.
The two implementation methods I can think of for the POC are as follows:
- A very python-specific implementation where we ingest the python code, extract the docstrings and then extract the system declarations from there
- A major issue here is that this is not generalizable to other languages
- We go for a more general approach, and treat each source code file as a
txt
file. We then use regex to look for matching cases and attempt to load it into a system declaration- Because we would still expect it to be yaml-like, this would only work in languages with multi-line comments
Option 1: Declaration inside of the docstring
def some_func(some_parameter: str) -> None:
"""
Do something important with user data.
system:
- fides_key: demo_analytics_system
name: Demo Analytics System
description: A system used for analyzing customer behaviour.
system_type: Service
privacy_declarations:
- name: Analyze customer behaviour for improvements.
data_categories:
- user.provided.identifiable.contact
- user.derived.identifiable.device.cookie_id
data_use: improve.system
data_subjects:
- customer
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
dataset_references:
- demo_users_dataset
""""
user_data = get_user_data(some_parameter)
advertise_to(user_data)
Option 2: Declaration as a multi-line comment:
"""
system:
- fides_key: demo_analytics_system
name: Demo Analytics System
description: A system used for analyzing customer behaviour.
system_type: Service
privacy_declarations:
- name: Analyze customer behaviour for improvements.
data_categories:
- user.provided.identifiable.contact
- user.derived.identifiable.device.cookie_id
data_use: improve.system
data_subjects:
- customer
data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified
dataset_references:
- demo_users_dataset
"""
def some_func(some_parameter: str) -> None:
"""
Do something important with user data.
"""
user_data = get_user_data(some_parameter)
advertise_to(user_data)
An additional caveat here is that it would be extremely difficult if not impossible for a plugin to help with these annotations, as they’re embedded in other source code.
Additional questions to think about:
- Do we have the user define a system in a system.yaml file, and then attribute all of the nearby code declarations to that?
- Do they need to define a system-per-declaration? that seems weird, so this ^ option seems better
- How should this be handled during evaluations? Should it be done at apply/evaluate time, or should there be a separate command that generates a full system.yaml file from the source code declarations?
Issue Analytics
- State:
- Created 2 years ago
- Comments:20 (19 by maintainers)
Top Results From Across the Web
A Privacy-Preserving Validation Server Prototype
During any part of the research process, that means no researcher would have direct access to view the confidential data in any form....
Read more >Creating Prototype Nodes - ScienceDirect
Prototype instances match a specific node type, and can be used anywhere in a scene graph that matches the node type and is...
Read more >Prototype-polluting function - CodeQL - GitHub
Prototype pollution is a type of vulnerability in which an attacker is able to modify Object.prototype . Since most objects inherit from the...
Read more >Accessing private member variables from prototype-defined ...
The simplest way to construct objects is to avoid prototypal inheritance altogether. Just define the private variables and public functions ...
Read more >Object prototypes - Learn web development | MDN
This article has covered JavaScript object prototypes, including how ... This code creates a Date object, then walks up the prototype chain, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@edthedev With the feature I proposed, the coverage report would show how many of your system declarations had associated code files. The new system declaration would look like this
This method has the benefit of being language agnostic, and we can then throw errors for when the
code_paths
section is empty. We could also move it down into thedeclarations
sectionMost specifically relevant to item (2) above, opened issues for additional documentation: