question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scala Common Enrich: support "pii" annotations in schemas for PII Enrichment

See original GitHub issue

PII = Personally Identifiable Information

The basic idea:

  • Any JSON Schema (ue or context) can be annotated with "pii": true on a per-property basis
  • If this PII Scrubber is turned on, then we encrypt any given PII property in any JSON, using AES - so you end up with a unique but non-PII value, e.g. “Fred Blundun” always -> “1de6e53cb23”

This would be of potential interest to users in healthcare or finance, where the ability for analysts to drill down to individual users could be a privacy concern

/cc @yalisassoon @fblundun

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:15 (15 by maintainers)

github_iconTop GitHub Comments

2reactions
alexanderdeancommented, Feb 21, 2018

One of the nice things about this idea is that the pii: true hint would be enough for Iglu when generating Redshift etc tables to make sure these columns are wide enough to take the hashed value.

It also just means that the work to identify that e.g. com.acme.email/send_email’s email_recipient property is PII is just done in one place (at the time of schema authorship), rather than every user having to configure their own PII Enrichment.

0reactions
chuwycommented, Jun 19, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

how to capture clickstream events in Kafka with Snowplow
Snowplow allows us to capture user behavior, via a Javascript tag, at the individual level. In contrast to the most popular web analytics ......
Read more >
Install Snowplow On The Google Cloud Platform
A walkthrough for deploying the Snowplow Analytics pipeline in the Google Cloud Platform environment.
Read more >
Building the Lakehouse Architecture With Azure Synapse ...
Enriched is where data is cleaned, deduped etc, whereas curated is where we create our summary outputs, including facts and dimensions, all in ......
Read more >
Modern Unified Data Architecture
Data should be cleansed, deduped, enriched and curated for data integrity so that businesses can trust the data and make a confident analysis....
Read more >
The Delta Lake Series — Complete Collection
How does schema evolution work? ... How Delta Lake Solves Common Pain Points in Streaming ... Simplifying Streaming Stock Data Analysis Using Delta...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found