[SIP-92] Proposal for restructuring the Python code base
See original GitHub issue[SIP-92] Proposal for restructuring the Python code base
Motivation
Superset has evolved somewhat organically over time which is reflected—somewhat apparently—in the how the Python code—which resides solely in the top level superset folder—is organized. Initially Superset used a Model View Controlled (MVC) pattern combined with the notion of database connectors whereas now we’ve adopted the Data Access Object (DAO) pattern (SIP-35), which when coupled with commands and the API, helps to decouple the business layer from the persistence layer.
Due to partial refactors and years of creep the code organization is fragmented. This has negatively impacted both the code quality and developer experience.
Proposed Change
The TL;DR is the Superset application is primarily composed on a few major functional components:
- APIs: v1 RESTful API (current) and API view endpoints (legacy)
- CLI: Suite of command line tools
- Commands: Used by both the APIs and the CLI
- DAOs: Used by the commands to interface with the SQLAlchemy models
- Models: Thin layer reflecting SQLAlchemy’s declarative mapping and mixins
- SQL Engine: Comprising of the engine specs, templating, pre-/post-processing, execution, etc.
- Tasks: Asynchronous tasks/schedules
- Views: Non-API endpoints, i.e., rendering HTML templates
The proposed change would be to refactor the code into more functional rather than business top level folders, which has become somewhat bloated. Below is the before/after enumeration of the current (as of 10/28/2022) top-level folders and files, where “N/A” denotes that the folder/files will no longer exist in its current form. The additional sub-sections outline more specifics.
| Current | Proposed | Notes |
|---|---|---|
| advanced_data_type | N/A | See APIs, Commands, DAOs, and Models |
| annotation_layers | N/A | See APIs, Commands, DAOs, and Models |
| async_events | N/A | See APIs, Commands, DAOs, and Models |
| available_domains | N/A | See APIs, Commands, DAOs, and Models |
| cachekeys | N/A | See APIs, Commands, DAOs, and Models |
| charts | N/A | See APIs, Commands, DAOs, and Models |
| cli | cli | Unchanged |
| columns | N/A | See APIs, Commands, DAOs, and Models |
| commands | commands | See APIs, Commands, DAOs, and Models |
| common | N/A | See SQL Engine |
| connectors | N/A | See APIs, Commands, DAOs, and Models |
| css_templates | N/A | See APIs, Commands, DAOs, and Models |
| dao | daos | See APIs, Commands, DAOs, and Models |
| dashboards | N/A | See APIs, Commands, DAOs, and Models |
| databases | N/A | See APIs, Commands, DAOs, and Models |
| datasets | N/A | See APIs, Commands, DAOs, and Models |
| datasource | N/A | Unclear why we have both datasets and datasource DAOs |
| db_engine_specs | engine/specs | See SQL Engine |
| db_engines | N/A | Unused. See https://github.com/apache/superset/pull/20631 |
| embedded | N/A | See APIs, Commands, DAOs, and Models |
| embedded_dashboard | N/A | See APIs, Commands, DAOs, and Models |
| examples | examples | Unchanged |
| explore | N/A | See APIs, Commands, DAOs, and Models |
| extensions | extensions | Unchanged |
| importexport | N/A | See APIs, Commands, DAOs, and Models |
| initialization | blueprints | See APIs, Commands, DAOs, and Models |
| key_value | N/A | See APIs, Commands, DAOs, and Models |
| migrations | migrations | Unchanged |
| models | N/A | See APIs, Commands, DAOs, and Models |
| queries | N/A | See APIs, Commands, DAOs, and Models |
| reports | N/A | See APIs, Commands, DAOs, and Models |
| security | security | See APIs, Commands, DAOs, and Models |
| sql_validators | engine/query/validators | See SQL Engine |
| sqllab | N/A | See APIs, Commands, DAOs, and Models |
| tables | N/A | See APIs, Commands, DAOs, and Models |
| tags | N/A | See APIs, Commands, DAOs, and Models |
| tasks | tasks | Unchanged |
| templates | blueprints/templates | See APIs, Commands, DAOs, and Models |
| temporary_cache | N/A | See APIs, Commands, DAOs, and Models |
| translations | translations | Unchanged |
| utils | ? | Unclear |
| views | N/A | See APIs, Commands, DAOs, and Models |
| __init__.py | __init__.py | Existing logic migrated elsewhere |
| app.py | app.py | Unchanged |
| config.py | config.py | Unchanged |
| constants.py | N/A | See APIs, Commands, DAOs, and Modelsand SQL Engine |
| dataframe.py | N/A | See SQL Engine |
| errors.py | N/A | See APIs, Commands, DAOs, and Models |
| exceptions.py | N/A | See APIs, Commands, DAOs, and Models |
| jinja_context.py | N/A | Split into components |
| result_set.py | N/A | See SQL Engine |
| schemas.py | blueprints/api/base.py | See APIs, Commands, DAOs, and Models |
| sql_lab.py | N/A | See SQL Engine |
| sql_parse.py | ? | See SQL Engine |
| stats_logger.py | ? | |
| superset_typing.py | N/A | Split into components |
| viz.py | ? | Legacy visualization types |
APIs, Commands, DAOs, and Models
APIs historically were mostly defined in an ad-hoc manner, i.e., in a non-RESTful way, as views which mostly reside in within the superset/views/ folder. These “legacy” APIs now coexist alongside RESTful APIs which leverage the DAO model which reside in the component specific folder, i.e., superset/datasets/. Furthermore commands are either defined within the superset/commands/ folder or component specific folder, i.e., superset/datasets/commands.
As a developer it isn’t overly apparent where an API endpoint resides. The proposed solution is move to a directory structure which more clearly illustrates that the APIs, DAOs, and commands are decoupled (as illustrated below for the datasets components). The API—which leverages blueprints—is comprised both of the v1 RESTful API (current) and the legacy API. This demarcation also helps developers identify which API endpoints need to be migrated to v1.
superset/
├─ blueprints/
│ ├─ api/
| | ├─ legacy/ # Previously defined in superset/connectors/*/views.py, superset/views/*, etc.
| | ├─ v1/
| | | ├─ datasets.py # Previously superset/datasets/api.py
| | | ├─ ...
│ ├─ views/ # Previously defined in superset/connectors/*/views.py, superset/views/*, etc.
├─ commands/
│ ├─ datasets/ # Previously superset/datasets/commands
| | ├─ ...
│ ├─ base.py
| ├─ ...
├─ daos/
│ ├─ datasets/ # Previously superset/datasets/
| | ├─ ...
│ ├─ base.py # Previously superset/dao/base.py
│ ├─ exceptions.py # Previously superset/dao/exceptions.py
| ├─ ...
│ ├─ models/ # Previously superset/connectors/*/models.py, superset/datasets/models.py, etc.
| | ├─ base.py
| ├─ ...
Note currently views are a combination of legacy API endpoints and non-API endpoints. The concept of views will remain but should only contain non-API endpoints, i.e., rendering HTML templates.
SQL Engine
Though not as well flushed out as APIs, commands, and DAOs, the actual SQL engine—responsible for preparing, executing, fetching result sets—would be colocated within the broad superset/engine/ subfolder. This would comprise of the engine specifications, templating, SQL parsing, query objects, etc.
superset/
├─ engine/
│ ├─ specs/
│ | ├─ base.py # Previously superset/db_engine_specs/base.py
| | ├─ ...
| ├─ query/
| | ├─ validators/
| | | ├─ base.py # Previously sql_validators/base.py
| | | ├─ ...
│ | ├─ context.py # Previously common/query_context.py
| | ├─ executor.py # Previously sql_lab.py et al.
| | ├─ results.py # Previously result_set.py et al.
| | ├─ ...
| ├─ ...
New or Changed Public Interfaces
N/A.
New dependencies
N/A.
Migration Plan and Compatibility
The code restructure can be piecemeal. The general steps are:
- Flush out the base APIs, commands, DAOs, and models
- Migrate—piece by piece—each of the functional components into the appropriate sub-folders.
Rejected Alternatives
Regarding the engine directory structure I’m unsure whether the current proposal makes the most sense. An alternative could be based more on the flow/path of a query from pre-processing (including construction and SQL parsing), to execution, then fetching, and finally post-processing of the result set.
I presume that calling out the engine as a first class entity rather than treating it as a command likely makes sense. I think this is open for debate/discussion.
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:7 (7 by maintainers)

Top Related StackOverflow Question
@ktmud I spoke with @hughhhh and @rusackas briefly about this and they were in agreement with you about keeping the concept of views and thus I’ve updated the directory schematic. As it currently stands views contain both API endpoints as well as non-API endpoints, i.e., those which render HTML templates, and thus the non-API endpoints would be housed under the
superset/blueprints/viewssubfolder.Thanks for the SIP @john-bodley. I went through a similar process when writing SIP-61 - Improve the organization of front-end folders. When researching best practices for organizing projects, the concept of a feature-based organization appeared in many articles that described how large codebases were organized. You’ll find many of the reasoning behind this model in SIP-61 and its references. The basic idea is that files related to a feature should belong together in a structured way. This allows you to easily switch between feature implementations, facilitates the use of feature flags, and also promotes better-defined dependencies. Maybe we can apply some of the concepts here too 😉