question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SIP-78] Proposal for Advanced Types

See original GitHub issue

[SIP-78] Proposal for Advanced Types

Motivation

Our users are leveraging Superset to explore and visualize network security logs. These logs often contain IP, CIDR range, port, hostname, and/or protocol fields, among others. While such fields might be stored as simple INTEGERS or VARCHARS in the backend database, these fields also have a “advanced” type which, if leveraged, can provide additional context and improve the user experience.

User-Friendly Display

A field’s advanced type can influence how that field is interacted with by the users. Consider, for example, an IP field. For performance reasons, we store IP values as INTEGERs in our DW e.g. the IP “127.0.0.1” would be stored as its integer value 2130706433. However, the string representation is substantially more human-readable and, as such, we wish to display IP fields in dot-decimal notation in visualizations and filters. This numeric format is fairly standard (link)and having this display functionality would be valuable in superset in case the underlying DB does not support this conversion functionality out of the box. As another example, a Port field stored as an integer (e.g., 443) could be displayed as a protocol name where such mapping is known (e.g., HTTPS).

Applicable Operators for filters

Certain operators (e.g., LIKE, =, <, IS NULL, IN, etc.) may be valid for a database type, but may not be applicable to a advanced type. For example, an organization might decide that the LIKE operator is not relevant to Port fields. Ideally, we want to be able to configure a list of applicable operators that will appear for a field of a certain advanced type.

Input Validation

Advanced type can be associated with (configurable) input validation and type conversion rules. For example, an input validator for a Port field could validate that the filter string is a parsable number between 0 and 65536, or that it contains known port names such as HTTPS, SSH, FTP.

Human Readable input values

As with displaying the human readable format of a value to the user (i.e the dot notation of an IP) users should be able to enter the more readable form of a value and have it be applied properly to a query. This would mean a user could create a filter such as {col: "ip_col", op: "IN", val: [“1.1.1.1”, “1.1.1.2”]} and have the query be executed as ip_col IN [16842752, 16842753]. The functionality for a user to simply enter {col: "ip_col", op: "IN", val: [16842752, 16842753]} should also be preserved.

Behavior specified in the config

The code to specify the behavior of each advanced type should be defined outside of the superset codebase (i.e the config). This way those who wish to implement a advanced type may do so freely without having to make any code changes to superset.

Proof of concept

Figure 1: a video displaying a proof of concept for this proposal. a version of the modified filters popover is shown. The user inputs both the human readable and numeric values for ports and the conversion is shown. Also it is shown that the user cannot save an invalid value, abc in this case. The user then saves and submits a valid filter and runs the query. the query is then displayed.

https://user-images.githubusercontent.com/71385290/147113657-f8d72243-2852-43c9-8f64-604aff734fda.mp4

Proposed Changes

We propose that the concept of an “advanced type” be introduced to dataset columns in Superset and that this advanced type be leveraged to allow display formats, applicable operators, and validation rules to be added to fields of a particular advanced type. Advanced types would be a purely optional attribute which could be specified on columns; if a advanced type is not specified for a particular column, there would be no changes to how a column is displayed, what operators are available, or what validation is performed on filter inputs.

Adding the advanced type to dataset

Introduce the concept of a advanced type on dataset columns

  • Add a advanced_type attribute to the BaseColumn model. This column only needs to be a nullable VARCHAR.
  • Add a “Advanced Type” field to the “Edit Datasource” modal (Figure 1) where dataset creators/editors can set the advanced type for a column. This should be restricted to applicable advanced types

Figure 2: Edit Datasource modal with Advanced Type field

unnamed

Allow applicable operator sets to be defined for each advanced type

The Explorer UI has an ad-hoc filter which lets the user enter a condition on a column, a value and an operator (>, =, LIKE, IS NULL, etc).

Currently all operators are available to the user. We propose returning a list of applicable operators in the response payload. The list of applicable operators should be configurable for each custom advanced type. For maximum flexibility, the applicable operators should be tied to the advanced type.

Figure 3: Current list of operators (left) vs customized list of operators (right)

Before After
all_ops unnamed

Allow input validation and value conversion on advanced type

Currently any input string can be entered into an ad-hoc filter. We propose exposing a server REST endpoint to validate input strings and also return the “normalized” representation. The normalized representation should be accessible to the user so they can verify the conversion. This could be displayed via a tooltip in order to minimize UI changes to the control.

Example inputs

  • Time range: “Last day” -> “2021-03-16T21:57:33.000Z - 2021-03-17T21:57:33.000Z”
  • Numeric IP: “192.168.0.1” -> 3232235521
  • Numeric IP: 3232235521 -> 3232235521
  • Port: “HTTPS” -> “443”
  • Port: “8080” -> “8080”
  • Numeric CIDR: “1.1.1.1/16” -> {16842752 - 16908287}

An input validator for a PORT could validate that the string is a parsable number between 0 and 65536 or that it contains known port names such as HTTPS, SSH, FTP. An input validator for a DOMAIN could validate that the string is parsable as a domain and contains only alphanumeric characters, periods (.), and dashes (-).

It should be possible to add additional input validators via a configuration like config.py.

Figure 3: Error message for a filter value that fails validation for a field of type Port

unnamed

Allow custom behavior for each operator on a advanced type

Ad-hoc filters currently work as would be expected on the backend: a filter is translated directly into an SQL statement that matches the filter, so the filter {col: "ip_col", op: "IN", val: "1.1.1.1/16"} is translated to the statement ip_col IN '1.1.1.1/16'. Based on the idea of advanced types, we want the value '1.1.1.1/16' to first be converted to a decimal range (16842752 - 16908287) as covered above, but just doing the conversion is not enough for this range type as we would get something like this in the best case: ip_col IN [16842752, 16842753, ..., 16908287]. We would more likely want this: ip_col >= 16842752 AND ip_col <= 16908287. In order to achieve this, we propose making it possible to specify custom behavior for each defined operator associated with an advanced type. This would allow a lot of flexibility for each type.

Allow for custom display values

Columns with advanced types should (optionally) be displayed in their human readable format (i.e dot decimal notation for IPs even though they are stored as integers). In order to accomplish this, we should generate a list of user-friendly values based on the target column and append it to the result set or replace the original column with these new values. This could be accomplished in the filter in superset/views/core.py. We should optimally clean up some technical debt in the backend code by porting it to the v1 API, and while doing so we can introduce the additional functionality needed for this feature (two birds with one stone).

New or Changed Public Interfaces

Changes to the config

In order to implement advanced logic for advanced types we propose adding a collection of AdvancedType objects. These will define the available advanced types and their functionalities. The proposed class will be as follows are as follows:

description:

  • a description of the usage of the advanced type

verbose_name:

  • name used for displaying purposes, e.g. “IP and CIDR (v4)”

Applicatable_types

  • a list of GenericDataType types that this advanced type applies to. Using the earlier example of IPs stored as ints, that advanced type would only function on integer value columns

type_translation:

  • a function that accepts unparsed values and return: a status indicating whether the provided value was valid, the parsed value(s) based on advanced type, valid operators, and formatted value(s)

filter_translation:

  • a function that accepts a SqlaColumn, FilterOperator, and an unparsed value (or values). The function calls the advanced_type_translations function (for appropriate advanced type), and returns a valid [SqlaExpression].
  • The purpose of these functions is to take in a valid filter operator and a column and return a valid sqla expression that can be used to form the query.

Model changes

BaseColumn model

  • Add advanced type field (Nullable VARCHAR)

Change SqlaTable’s get_sql_query function inside the filter loop so that if a advanced type is present on a column, the operator logic is overridden by an advanced type’s operator logic (via callback defined in advanced_type_filter_translations). This callback’s response will then be appended to the where_and list to produce the query.

Addition of the v1/advanced_types/ endpoints

In order to use any of this logic in the front end we will need to define a set of API endpoints for advanced types. Under this section we would define the following endpoints:

[GET] v1/advanced_type/convert

  • Calls the appropriate advanced_type_translation function as defined in the config for the given advanced type.
  • Parameters:
  • advanced type
  • a set of value(s) to be validated/converted
  • Returns:
  • The output of the function retrieved from advanced_type_translation

[GET] v1/advanced_type/types

  • Get the list of implemented advanced types and their descriptions. This list would be defined by the intersection of the set of keys for the advanced_type_translations and advanced_type_filter_translations dictionaries. This would be used in order to populate lists to select advanced types for a given column.
  • Parameters:
  • None
  • Returns:
  • A list of implemented advanced types and their descriptions

UI changes

The AdhocFilterEditPopoverSimpleTabContent will need to be modified. If an advanced type is present on a selected column, we use the v1/advanced_type/convert endpoint to retrieve: an input’s valid/invalid status, the list of operators to display, and the currently selected comparator’s converted value. The valid_operators list will need to override the current operators list defined by the isOperatorRevelant function. A tooltip component will also need to be added to the component to display the returned converted value. No behavioral changes should be present if a column without a advanced type is selected.

Migration Plan and Compatibility

A database migration is required. This will only effect the BaseColumn model. We will need to add a varchar field to the columns table.

in order to maintain compatibility we will insure the advanced type column in nullable as well as hiding changes behind the ENABLE_ADVANCED_TYPES feature flag.

Rejected Alternatives

no alternatives have been strongly considered at this time

Note

Please note that the API endpoints are subject to change during the development process. This was the first logical attempt

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:14
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
cccs-RyanScommented, Dec 22, 2021

Thanks for the feedback @betodealmeida @rusackas I really appreciate the interest!

So first addressing the idea of having a plugin system, we are 100% on board and had that in mind as well. We were just unsure if introducing this idea might be a separate SIP, extra feature or something long those lines. This could also address the idea of having some of these validators / types / etc be pre-canned and but also extensible as you both have mentioned, I think this is certainly a good idea.

in in addition v1/business_type/types will be used to populate a select control to populate a drop-down to select the business type. just so users don’t get into weird situations selecting types that don’t exist and such.

I was unaware about the changes to BaseColumn so implementing this on the new Column model does make sense to me.

at this time I do not have a ton of insight into how this fits into known/proposed/speculated plans for Superset’s semantic layer but am willing to take on any feedback in regards to this.

Thanks again for the feedback!

2reactions
betodealmeidacommented, Dec 22, 2021

This is great! A couple comments:

  1. BaseColumn is going away (https://github.com/apache/superset/issues/14909 and https://github.com/apache/superset/pull/17543) in the next few weeks, so I wonder if we should wait a little and implement this directly in the new Column model.
  2. Instead of having this in the config, we could leverage Python entry points to have a plugin system, like @rusackas mentioned. This way we can have a builtin set of business types that comes with Superset, and admins can develop, distribute, and install custom business types that would be automatically discovered by Superset. Best of both worlds! 😃
Read more comments on GitHub >

github_iconTop Results From Across the Web

[SIP-68] A better model for Datasets · Issue #14909 - GitHub
In this proposal datasets are virtual-first, and physical datasets are a particular case of ... [SIP-78] Proposal for Advanced Types #17852.
Read more >
dev - The Mail Archive
[DISCUSS] SIP-83 Proposal for copying virtual datasets Collins, Reese R. RE: [VOTE][SIP-78] Proposal for Advanced Types Rouette, Melanie M.
Read more >
Federal Register/Vol. 85, No. 14/Wednesday, January 22 ...
Agency (EPA) is proposing to take action pursuant to section 110 of the. Clean Air Act (CAA or Act) on State. Implementation Plan...
Read more >
Approval and Promulgation of Implementation Plans ...
A. Comments Regarding Necessary Types and Quantities of Contingency ... The SJV PM2.5 SIP is California's plan for attaining the 1997 PM2.5 ...
Read more >
https://www.ietf.org/tools/idnits?url=https://www....
No 'Intended status' indicated for this document; assuming Proposed Standard == The page ... Overlap signalling in SIP 78 SIP uses en-bloc signalling....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found