Scala Common Enrich: add API Request Enrichment
See original GitHub issueThe API Request Enrichment lets you perform dimension widening on a Snowplow event via your own proprietary http(s) API.
The configuration JSON for this enrichment contains four sub-objects:
inputsspecifies the datapoint(s) from the Snowplow event to use as keys when performing your API lookupapidefines how the enrichment can access your APIoutputslets you tune how you convert the returned JSON into one or more self-describing JSONs ready to be attached to your Snowplow eventcacheimproves the enrichment’s performance by storing values retrieved from the API
Here is an example configuration:
{
"enabled": true,
"parameters": {
"inputs": [
{
"key": "user",
"pojo": {
"field": "user_id"
}
},
{
"key": "user",
"json": {
"field": "contexts",
"schemaCriterion": "iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-*-*",
"jsonPath": "$.userId"
}
},
{
"key": "client",
"pojo": {
"field": "app_id"
}
}
],
"api": {
"http": {
"method": "GET",
"uri": "http://api.acme.com/users/{{client}}/{{user}}?format=json",
"timeout": 5000,
"authentication": {
"httpBasic": {
"username": "xxx",
"password": "yyy"
}
}
}
},
"outputs": [ {
"json": {
"jsonPath": "$.record",
"schema": "iglu:com.acme/user/jsonschema/1-0-0"
}
} ],
"cache": {
"size": 3000,
"ttl": 60
}
}
}
To go through each of these sections in more detail:
inputs
Specify an array of inputs to use as keys when performing your API lookup. Each input consists of a key and a source: either pojo if the datapoint comes from the Snowplow enriched event POJO, or json if the datapoint comes from a self-describing JSON inside one of the three JSON fields. The key can be referred to later in the api.http.uri property.
For pojo, the field name must be specified. A field name which is not recognized as part of the POJO will be ignored by the enrichment.
For json, you must specify the field name as either unstruct_event, contexts or derived_contexts. You must then provide two additional fields:
schemaCriterionlets you specify the self-describing JSON you are looking for in the given JSON field. You can specify only the SchemaVer MODEL (e.g. 1-), MODEL plus REVISION (e.g. 1-1-) or a full MODEL-REVISION-ADDITION version (e.g. 1-1-1)jsonPathlets you provide the JSON Path statement to navigate to the field inside the JSON that you want to use as the input
The lookup algorithm is short-circuiting: the first match for a given key will be used.
api
The api section lets you configure how the enrichment should access your API. At the moment only http is supported, with this option covering both HTTP and HTTPS - the protocol on the uri field will determine which to use. Currently only GET is supported as the HTTP method for the lookup.
For the uri field, specify the full URI including the protocol. You can attach a querystring to the end of the URI. You can also embed the keys from your inputs section in the URI, by wrapping the key in {{}} brackets thus:
"uri": "http://api.acme.com/users/{{client}}/{{user}}?format=json"
If a key required in the uri was not found in any of the inputs, then the lookup will not proceed, but this will not be flagged as a failure.
Currently the only supported authentication option is http-basic: provide a username and/or a password for the enrichment to use to connect to your API using basic access authentication. Some APIs use only the username or password field to contain an API key; in this case, set the other property to the empty string "".
If your API is unsecured (because for example it is only accessible from inside your private subnet, or using IP address whitelisting), then configure the authentication section like so:
"authentication": { }
outputs
This enrichment assumes that your API returns a JSON, which will contain one or more entities that you want to add to your event as derived contexts. Within the outputs array, each entry is a json sub-object that contains a jsonPath configuration field that lets you specify which part of the returned JSON you want to add to your enriched event. $ can be used if you want to attach returned JSON as is.
If the JSON Path specified cannot be not found within the API’s returned JSON, then the lookup (and thus the overall event) will be flagged as a failure.
The enrichment adds the returned JSON into the derived_contexts field within a Snowplow enriched event. Because all JSONs in the derived_contexts field must be self-describing JSONs, use the schema field to specify the Iglu schema URI that you want to attach to the event.
Example:
GET http://api.acme.com/users/northwind-traders/123?format=json
{
"metadata": {
"whenCreated": 1448371243,
"whenUpdated": 1448373431
},
"record": {
"name": "Bob Thorpe",
"id": "123"
}
}
With this configuration:
"outputs": [ {
"json": {
"jsonPath": "$.record",
"schemaUri": "iglu:com.acme/user/jsonschema/1-0-0"
}
} ]
This would be added to the derived_contexts array:
{
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"data": {
"name": "Bob Thorpe",
"id": "123"
}
}
The outputs array must have at least one entry in it.
cache
A Snowplow enrichment can run many millions of time per hour, effectively launching a DoS attack on a data source if we are not careful. The cache configuration attempts to minimize the number of lookups performed.
The cache is an LRU (least-recently used) cache, where less frequently accessed values are evicted to make space for new values. The uri with all keys populated is used as the key in the cache. Configure the cache as follows:
sizeis the maximum number of entries to hold in the cache at any one timettlis the number of seconds that an entry can stay in the cache before it is forcibly evicted. This is useful to prevent stale values from being retrieved in the case that your API can return different values for the same key over time
Issue Analytics
- State:
- Created 8 years ago
- Comments:28 (28 by maintainers)

Top Related StackOverflow Question
Sneak peak: https://github.com/snowplow/snowplow-mini
Yes please @chuwy! Let’s implement it. I need to do a new Iglu Central release for the Clearbit tutorial anyway…