Scala Common Enrich: add API Request Enrichment
See original GitHub issueThe API Request Enrichment lets you perform dimension widening on a Snowplow event via your own proprietary http(s) API.
The configuration JSON for this enrichment contains four sub-objects:
inputs
specifies the datapoint(s) from the Snowplow event to use as keys when performing your API lookupapi
defines how the enrichment can access your APIoutputs
lets you tune how you convert the returned JSON into one or more self-describing JSONs ready to be attached to your Snowplow eventcache
improves the enrichment’s performance by storing values retrieved from the API
Here is an example configuration:
{
"enabled": true,
"parameters": {
"inputs": [
{
"key": "user",
"pojo": {
"field": "user_id"
}
},
{
"key": "user",
"json": {
"field": "contexts",
"schemaCriterion": "iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-*-*",
"jsonPath": "$.userId"
}
},
{
"key": "client",
"pojo": {
"field": "app_id"
}
}
],
"api": {
"http": {
"method": "GET",
"uri": "http://api.acme.com/users/{{client}}/{{user}}?format=json",
"timeout": 5000,
"authentication": {
"httpBasic": {
"username": "xxx",
"password": "yyy"
}
}
}
},
"outputs": [ {
"json": {
"jsonPath": "$.record",
"schema": "iglu:com.acme/user/jsonschema/1-0-0"
}
} ],
"cache": {
"size": 3000,
"ttl": 60
}
}
}
To go through each of these sections in more detail:
inputs
Specify an array of inputs
to use as keys when performing your API lookup. Each input consists of a key
and a source: either pojo
if the datapoint comes from the Snowplow enriched event POJO, or json
if the datapoint comes from a self-describing JSON inside one of the three JSON fields. The key
can be referred to later in the api.http.uri
property.
For pojo
, the field name must be specified. A field name which is not recognized as part of the POJO will be ignored by the enrichment.
For json
, you must specify the field name as either unstruct_event
, contexts
or derived_contexts
. You must then provide two additional fields:
schemaCriterion
lets you specify the self-describing JSON you are looking for in the given JSON field. You can specify only the SchemaVer MODEL (e.g. 1-), MODEL plus REVISION (e.g. 1-1-) or a full MODEL-REVISION-ADDITION version (e.g. 1-1-1)jsonPath
lets you provide the JSON Path statement to navigate to the field inside the JSON that you want to use as the input
The lookup algorithm is short-circuiting: the first match for a given key will be used.
api
The api
section lets you configure how the enrichment should access your API. At the moment only http
is supported, with this option covering both HTTP and HTTPS - the protocol on the uri
field will determine which to use. Currently only GET
is supported as the HTTP method
for the lookup.
For the uri
field, specify the full URI including the protocol. You can attach a querystring to the end of the URI. You can also embed the keys from your inputs
section in the URI, by wrapping the key in {{}}
brackets thus:
"uri": "http://api.acme.com/users/{{client}}/{{user}}?format=json"
If a key required in the uri
was not found in any of the inputs
, then the lookup will not proceed, but this will not be flagged as a failure.
Currently the only supported authentication
option is http-basic
: provide a username
and/or a password
for the enrichment to use to connect to your API using basic access authentication. Some APIs use only the username
or password
field to contain an API key; in this case, set the other property to the empty string ""
.
If your API is unsecured (because for example it is only accessible from inside your private subnet, or using IP address whitelisting), then configure the authentication
section like so:
"authentication": { }
outputs
This enrichment assumes that your API returns a JSON, which will contain one or more entities that you want to add to your event as derived contexts. Within the outputs
array, each entry is a json
sub-object that contains a jsonPath
configuration field that lets you specify which part of the returned JSON you want to add to your enriched event. $
can be used if you want to attach returned JSON as is.
If the JSON Path specified cannot be not found within the API’s returned JSON, then the lookup (and thus the overall event) will be flagged as a failure.
The enrichment adds the returned JSON into the derived_contexts
field within a Snowplow enriched event. Because all JSONs in the derived_contexts
field must be self-describing JSONs, use the schema
field to specify the Iglu schema URI that you want to attach to the event.
Example:
GET http://api.acme.com/users/northwind-traders/123?format=json
{
"metadata": {
"whenCreated": 1448371243,
"whenUpdated": 1448373431
},
"record": {
"name": "Bob Thorpe",
"id": "123"
}
}
With this configuration:
"outputs": [ {
"json": {
"jsonPath": "$.record",
"schemaUri": "iglu:com.acme/user/jsonschema/1-0-0"
}
} ]
This would be added to the derived_contexts
array:
{
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"data": {
"name": "Bob Thorpe",
"id": "123"
}
}
The outputs
array must have at least one entry in it.
cache
A Snowplow enrichment can run many millions of time per hour, effectively launching a DoS attack on a data source if we are not careful. The cache
configuration attempts to minimize the number of lookups performed.
The cache is an LRU (least-recently used) cache, where less frequently accessed values are evicted to make space for new values. The uri
with all keys populated is used as the key in the cache. Configure the cache
as follows:
size
is the maximum number of entries to hold in the cache at any one timettl
is the number of seconds that an entry can stay in the cache before it is forcibly evicted. This is useful to prevent stale values from being retrieved in the case that your API can return different values for the same key over time
Issue Analytics
- State:
- Created 8 years ago
- Comments:28 (28 by maintainers)
Top GitHub Comments
Sneak peak: https://github.com/snowplow/snowplow-mini
Yes please @chuwy! Let’s implement it. I need to do a new Iglu Central release for the Clearbit tutorial anyway…