Scala Common Enrich: encrypt original values in PII Enrichment
See original GitHub issueThe motivation for this ticket is to help users of piinguin and piinguin relay to better secure access to the original data on piinguin without having to focus on securing access to piinguin within an organisation.
The way to achieve that is to have one (or more) public keys with which all the original values will be encrypted. The new configuration will look like this:
{
"schema": "iglu:com.snowplowanalytics.snowplow.enrichments/pii_enrichment_config/jsonschema/3-0-0",
"data": {
"vendor": "com.snowplowanalytics.snowplow.enrichments",
"name": "pii_enrichment_config",
"emitEvent": true,
"enabled": true,
"parameters": {
"pii": [
{
"pojo": {
"field": "user_id",
"encryptionKeyName": "other-key"
}
},
{
"pojo": {
"field": "user_fingerprint"
# No encryption
}
},
{
"json": {
"field": "unstruct_event",
"schemaCriterion": "iglu:com.mailchimp/subscribe/jsonschema/1-*-*",
"jsonPath": "$.data.['email', 'ip_opt']",
"encryptionKeyName": "email-key"
}
}
],
"strategy": {
"pseudonymize": {
"hashFunction": "SHA-1",
"salt": "pepper123"
}
},
"encryption": [
{
"keyName": "email-key",
"key": "some-rsa-publickey"
},
{
"keyName": "other-key",
"key": "some-rsa-publickey-2"
}
]
}
}
}
The emitted event will also be changed (value is encrypted and base64 encoded, the actual implementation will need to be finalised):
{
"schema": "iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0",
"data": {
"schema": "iglu:com.snowplowanalytics.snowplow/pii_transformation/jsonschema/2-0-0",
"data": {
"pii": {
"pojo": [
{
"fieldName": "user_fingerprint",
"originalValue": "its_you_again!",
"modifiedValue": "27abac60dff12792c6088b8d00ce7f25c86b396b8c3740480cd18e21068ecff4"
},
{
"fieldName": "user_ipaddress",
"originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
"modifiedValue": "dd9720903c89ae891ed5c74bb7a9f2f90f6487927ac99afe73b096ad0287f3f5",
"encryptionKeyName": "other-key"
},
{
"fieldName": "user_id",
"originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
"modifiedValue": "7d8a4beae5bc9d314600667d2f410918f9af265017a6ade99f60a9c8f3aac6e9",
"encryptionKeyName": "other-key"
}
],
"json": [
{
"fieldName": "unstruct_event",
"originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
"modifiedValue": "269c433d0cc00395e3bc5fe7f06c5ad822096a38bec2d8a005367b52c0dfb428",
"jsonPath": "$.ip",
"schema": "iglu:com.mailgun/message_clicked/jsonschema/1-0-0",
"encryptionKeyName": "email-key"
},
{
"fieldName": "contexts",
"originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
"modifiedValue": "1c6660411341411d5431669699149283d10e070224be4339d52bbc4b007e78c5",
"jsonPath": "$.data.emailAddress2",
"schema": "iglu:com.acme/email_sent/jsonschema/1-1-0",
"encryptionKeyName": "email-key"
},
{
"fieldName": "contexts",
"originalValue": "eZDx1Y1SMIcP0vIzkNsx3xMZ4twdyqqU5bqNPkLNYElDNcUhD/8NH0Xb8vYPLvy5NZmm5XuMzInQ7xRHr4kB9q4kvRwtCwUGSS4OSR/QlPQWMz6NzMAep7oQ10crpdxQcXH5LxvMTMROndxOnV5Aglepd4zuSMRj+q3u9uH6zZmiMjS/1xcxC4dRdD3NtrR9IpNjaqkx9BrQ2S1ClsVntU/UGLZEAle5H+Uy+qvXYczbQsmVVwYLdgv4S4Om0QPW+T48pu2VGXVwNnJUwdAFqL+snAFrOfyGa1oDcwoTGcbhR3YJO2Gv7NzvMyDtPaNLaYgrzDJcDV1qLt1W12h2Bg==",
"modifiedValue": "72f323d5359eabefc69836369e4cabc6257c43ab6419b05dfb2211d0e44284c6",
"jsonPath": "$.emailAddress",
"schema": "iglu:com.acme/email_sent/jsonschema/1-0-0",
"encryptionKeyName": "email-key"
}
]
},
"strategy": {
"pseudonymize": {
"hashFunction": "SHA-256"
}
}
}
}
}
An incidental benefit coming out of this is that the values in kinesis pii are also encrypted.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Filtering events from specific IPs - Enrichment - Discourse
Hey, I was wondering if there is a way to filter events from specific IPs BEFORE they are loaded into the target. Before...
Read more >How to Use Databricks to Encrypt and Protect PII Data
The first step in this process is to protect the data by encrypting it. One possible solution is the Fernet Python library. Fernet...
Read more >how to capture clickstream events in Kafka with Snowplow
We don't need to wait in order to act. time value of data. In this post, we'll walk you through the steps to...
Read more >What's new - IBM Cloud Pak for Data as a Service
Python 3.10 is now supported in Decision Optimization experiments in Watson Studio and for deployment in Watson Machine Learning. The default version remains ......
Read more >Transcriptome Profiling Uncovers Potential Common ... - NCBI
These genes are involved in ubiquitination, protein folding, cell proliferation, and apoptosis. Pathway-based enrichment analyses demonstrated ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That is a possible alternative. One possible motivation to do it in the enrichment is to also help secure the pii stream, however this is not a strong reason. It may be better in piinguin.
Migrated to https://github.com/snowplow/enrich/issues/33