question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SIP-92] Proposal for restructuring the Python code base

See original GitHub issue

[SIP-92] Proposal for restructuring the Python code base

Motivation

Superset has evolved somewhat organically over time which is reflected—somewhat apparently—in the how the Python code—which resides solely in the top level superset folder—is organized. Initially Superset used a Model View Controlled (MVC) pattern combined with the notion of database connectors whereas now we’ve adopted the Data Access Object (DAO) pattern (SIP-35), which when coupled with commands and the API, helps to decouple the business layer from the persistence layer.

Due to partial refactors and years of creep the code organization is fragmented. This has negatively impacted both the code quality and developer experience.

Proposed Change

The TL;DR is the Superset application is primarily composed on a few major functional components:

  • APIs: v1 RESTful API (current) and API view endpoints (legacy)
  • CLI: Suite of command line tools
  • Commands: Used by both the APIs and the CLI
  • DAOs: Used by the commands to interface with the SQLAlchemy models
  • Models: Thin layer reflecting SQLAlchemy’s declarative mapping and mixins
  • SQL Engine: Comprising of the engine specs, templating, pre-/post-processing, execution, etc.
  • Tasks: Asynchronous tasks/schedules
  • Views: Non-API endpoints, i.e., rendering HTML templates

The proposed change would be to refactor the code into more functional rather than business top level folders, which has become somewhat bloated. Below is the before/after enumeration of the current (as of 10/28/2022) top-level folders and files, where “N/A” denotes that the folder/files will no longer exist in its current form. The additional sub-sections outline more specifics.

Current Proposed Notes
advanced_data_type N/A See APIs, Commands, DAOs, and Models
annotation_layers N/A See APIs, Commands, DAOs, and Models
async_events N/A See APIs, Commands, DAOs, and Models
available_domains N/A See APIs, Commands, DAOs, and Models
cachekeys N/A See APIs, Commands, DAOs, and Models
charts N/A See APIs, Commands, DAOs, and Models
cli cli Unchanged
columns N/A See APIs, Commands, DAOs, and Models
commands commands See APIs, Commands, DAOs, and Models
common N/A See SQL Engine
connectors N/A See APIs, Commands, DAOs, and Models
css_templates N/A See APIs, Commands, DAOs, and Models
dao daos See APIs, Commands, DAOs, and Models
dashboards N/A See APIs, Commands, DAOs, and Models
databases N/A See APIs, Commands, DAOs, and Models
datasets N/A See APIs, Commands, DAOs, and Models
datasource N/A Unclear why we have both datasets and datasource DAOs
db_engine_specs engine/specs See SQL Engine
db_engines N/A Unused. See https://github.com/apache/superset/pull/20631
embedded N/A See APIs, Commands, DAOs, and Models
embedded_dashboard N/A See APIs, Commands, DAOs, and Models
examples examples Unchanged
explore N/A See APIs, Commands, DAOs, and Models
extensions extensions Unchanged
importexport N/A See APIs, Commands, DAOs, and Models
initialization blueprints See APIs, Commands, DAOs, and Models
key_value N/A See APIs, Commands, DAOs, and Models
migrations migrations Unchanged
models N/A See APIs, Commands, DAOs, and Models
queries N/A See APIs, Commands, DAOs, and Models
reports N/A See APIs, Commands, DAOs, and Models
security security See APIs, Commands, DAOs, and Models
sql_validators engine/query/validators See SQL Engine
sqllab N/A See APIs, Commands, DAOs, and Models
tables N/A See APIs, Commands, DAOs, and Models
tags N/A See APIs, Commands, DAOs, and Models
tasks tasks Unchanged
templates blueprints/templates See APIs, Commands, DAOs, and Models
temporary_cache N/A See APIs, Commands, DAOs, and Models
translations translations Unchanged
utils ? Unclear
views N/A See APIs, Commands, DAOs, and Models
__init__.py __init__.py Existing logic migrated elsewhere
app.py app.py Unchanged
config.py config.py Unchanged
constants.py N/A See APIs, Commands, DAOs, and Modelsand SQL Engine
dataframe.py N/A See SQL Engine
errors.py N/A See APIs, Commands, DAOs, and Models
exceptions.py N/A See APIs, Commands, DAOs, and Models
jinja_context.py N/A Split into components
result_set.py N/A See SQL Engine
schemas.py blueprints/api/base.py See APIs, Commands, DAOs, and Models
sql_lab.py N/A See SQL Engine
sql_parse.py ? See SQL Engine
stats_logger.py ?
superset_typing.py N/A Split into components
viz.py ? Legacy visualization types

APIs, Commands, DAOs, and Models

APIs historically were mostly defined in an ad-hoc manner, i.e., in a non-RESTful way, as views which mostly reside in within the superset/views/ folder. These “legacy” APIs now coexist alongside RESTful APIs which leverage the DAO model which reside in the component specific folder, i.e., superset/datasets/. Furthermore commands are either defined within the superset/commands/ folder or component specific folder, i.e., superset/datasets/commands.

As a developer it isn’t overly apparent where an API endpoint resides. The proposed solution is move to a directory structure which more clearly illustrates that the APIs, DAOs, and commands are decoupled (as illustrated below for the datasets components). The API—which leverages blueprints—is comprised both of the v1 RESTful API (current) and the legacy API. This demarcation also helps developers identify which API endpoints need to be migrated to v1.

superset/
├─ blueprints/
│  ├─ api/
|  |  ├─ legacy/         # Previously defined in superset/connectors/*/views.py, superset/views/*, etc.
|  |  ├─ v1/
|  |  |  ├─ datasets.py  # Previously superset/datasets/api.py
|  |  |  ├─ ...
│  ├─ views/             # Previously defined in superset/connectors/*/views.py, superset/views/*, etc.
├─ commands/
│  ├─ datasets/          # Previously superset/datasets/commands
|  |  ├─ ...
│  ├─ base.py
|  ├─ ...
├─ daos/
│  ├─ datasets/          # Previously superset/datasets/
|  |  ├─ ...
│  ├─ base.py            # Previously superset/dao/base.py
│  ├─ exceptions.py      # Previously superset/dao/exceptions.py
|  ├─ ...
│  ├─ models/            # Previously superset/connectors/*/models.py, superset/datasets/models.py, etc.
|  |  ├─ base.py
|  ├─ ...

Note currently views are a combination of legacy API endpoints and non-API endpoints. The concept of views will remain but should only contain non-API endpoints, i.e., rendering HTML templates.

SQL Engine

Though not as well flushed out as APIs, commands, and DAOs, the actual SQL engine—responsible for preparing, executing, fetching result sets—would be colocated within the broad superset/engine/ subfolder. This would comprise of the engine specifications, templating, SQL parsing, query objects, etc.

superset/
├─ engine/
│  ├─ specs/
│  | ├─ base.py          # Previously superset/db_engine_specs/base.py
|  | ├─ ...
|  ├─ query/
|  |  ├─ validators/
|  |  |  ├─ base.py      # Previously sql_validators/base.py
|  |  |  ├─ ...
│  |  ├─ context.py      # Previously common/query_context.py
|  |  ├─ executor.py     # Previously sql_lab.py et al.
|  |  ├─ results.py      # Previously result_set.py et al.
|  |  ├─ ...
|  ├─ ...

New or Changed Public Interfaces

N/A.

New dependencies

N/A.

Migration Plan and Compatibility

The code restructure can be piecemeal. The general steps are:

  1. Flush out the base APIs, commands, DAOs, and models
  2. Migrate—piece by piece—each of the functional components into the appropriate sub-folders.

Rejected Alternatives

Regarding the engine directory structure I’m unsure whether the current proposal makes the most sense. An alternative could be based more on the flow/path of a query from pre-processing (including construction and SQL parsing), to execution, then fetching, and finally post-processing of the result set.

I presume that calling out the engine as a first class entity rather than treating it as a command likely makes sense. I think this is open for debate/discussion.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:2
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
john-bodleycommented, Dec 1, 2022

@ktmud I spoke with @hughhhh and @rusackas briefly about this and they were in agreement with you about keeping the concept of views and thus I’ve updated the directory schematic. As it currently stands views contain both API endpoints as well as non-API endpoints, i.e., those which render HTML templates, and thus the non-API endpoints would be housed under the superset/blueprints/views subfolder.

1reaction
michael-s-molinacommented, Dec 13, 2022

Thanks for the SIP @john-bodley. I went through a similar process when writing SIP-61 - Improve the organization of front-end folders. When researching best practices for organizing projects, the concept of a feature-based organization appeared in many articles that described how large codebases were organized. You’ll find many of the reasoning behind this model in SIP-61 and its references. The basic idea is that files related to a feature should belong together in a structured way. This allows you to easily switch between feature implementations, facilitates the use of feature flags, and also promotes better-defined dependencies. Maybe we can apply some of the concepts here too 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

[SIP-35] Proposal for Improving Superset's Python Code ...
I'm all for refactoring the Python codebase (leaning more on blueprints), though I'm unsure whether need to restructure the logic at this time ......
Read more >
Why Refactoring? How to Refactor/Restructure Python ...
Step by step guide to restructuring Python… ... which is a cleaner, better code base, there are still a lot of actions can...
Read more >
Structuring Your Project - The Hitchhiker's Guide to Python
In practical terms, “structure” means making clean code whose logic and dependencies are clear as well as how the files and folders are...
Read more >
Refactoring Python Applications for Simplicity
In this step-by-step tutorial, you'll learn how to refactor your Python application to be simpler and more maintainable and have fewer bugs.
Read more >
PEP 474 – Creating forge.python.org
Proposal. This PEP proposes that an instance of the self-hosted Kallithea code repository management system be deployed as “forge.python.org”.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found