run results reporting is a mess
See original GitHub issueDescribe the feature
Right now, run results are an ugly overloaded mess:
- dbt run returns a string
- it’s the result of
get_statusmethod on SQL adapters, which can basically be anything the db/underlying db api want - on bigquery it’s a carefully crafted string based on what we think the query did
- it’s the result of
- dbt test returns an integer for the number of failed rows
- other things just return ‘OK’, I think? Honestly who even knows though.
- except on exceptions when the status is
'ERROR'- not to be confused with test errors, where
failis true butstatusis still the number of failed rows
- not to be confused with test errors, where
Runs set error and status (but they could set fail and warn if they wanted, those attributes exist!)
Tests set error, fail, warn, and status.
status is defined as Union[None, bool, int, str].
There’s no way we need all of these for everyone! The problem is complicated a bit by the RPC server which has its own set of responses, etc, but I believe this is solvable anyway.
I propose picking one unified key that supports a union of objects. If it’s hard to give tests their own from a type system perspective, we can still make fail/warn/rows_failed available in run output, and have them be None in that case.
Here’s a revised proposal, with some sample type specifications in python and sample outputs:
class RunStatus(StrEnum):
Success = "success"
Error = "error"
Skipped = "skipped"
class TestStatus(StrEnum):
Success = "success"
Error = "error"
Fail = "fail"
Warn = "warn"
class FreshnessStatus(StrEnum):
Pass = "pass"
Warning = "warn"
Error = "error"
Runtime = "runtime error"
@dataclass
class BaseResult(JsonSchemaMixin):
status: str
cmd: str # the name of the task (`parsed_args.which`)
timing: List[TimingInfo]
thread_id: str
execution_time: float
message: Optional[str]
@dataclass
class NodeResult(BaseResult):
status: RunStatus
node: ParsedNode # the node this result is from
@dataclass
class TestResult(NodeResult):
status: TestStatus
# these two end up in the source-freshness API result and sources.json
@dataclass
class ErrorFreshnessResult(BaseResult):
status: FreshnessStats = field(metadata={'restrict': [FreshnessStatus.Runtime]})
@dataclass
class FreshnessResult(BaseResult):
status: FreshnessStats = field(metadata={'restrict': [FreshnessStatus.Pass, FreshnessStatus.Warning, FreshnessStatus.Error]})
node: ParsedSourceDefinition
snapshotted_at: datetime
max_loaded_at: datetime
age: float
ResultType = TypeVar('ResultType')
class ResultsCollection(Generic[ResultType]):
results: List[ResultType]
generated_at: datetime
elapsed_time: float # time spent in the task
# and the RPC result is pretty similar, with an extra field
class RPCResult(Generic[ResultType], ResultsCollection[ResultType]):
logs: List[LogRecord]
start: datetime
end: datetime
elapsed: float # time spent in the whole API call
JSON examples
# a single entry in "results" for a successful table materialization during "dbt run" (run_results.json, API)
{
"status": "success",
"cmd": "run",
"thread_id": "Thread-1",
"execution_time": 0.24340200424194336,
"timing": [...],
"message": "SELECT 1",
"node": ..., # omitted
}
# a single entry in "results" for a node that failed during "dbt run" due to an error (run_results.json, API)
{
"status": "error",
"cmd": "run",
"timing": [],
"thread_id": "Thread-1",
"execution_time": 0.0777738094329834,
"message": "Compilation Error in model debug (models/debug.sql)\n blah\n \n > in model debug (models/debug.sql)\n > called by model debug (models/debug.sql)"
"node": ..., # omitted
}
# a single entry in "results" for a snapshot-freshness that resulted in a warning (API)
{
"status": "warn",
"cmd": "snapshot-freshness",
"thread_id": "Thread-1",
"execution_time": 0.019963979721069336,
"timing": [...],
"message": null,
"node": ..., # omitted
"max_loaded_at": "2019-01-02T01:02:03+00:00",
"snapshotted_at": "2020-07-13T20:09:14.857209+00:00",
"age": 48280031.857209,
}
# a source freshness result (API)
{
"results": [...], # see above
"generated_at": "2020-07-13T20:09:14.857209+00:00",
"elapsed_time": 1.3687610626220703,
"elapsed": 2.45623,
"start": "2020-07-13T20:09:12.542073Z",
"end": "2020-07-13T20:09:14.998303Z",
}
# a source freshness result(sources.json)
{
"meta": {
"generated_at": "2020-07-13T20:09:14.857209+00:00",
"elapsed_time": 1.3687610626220703,
"sources": {
"source.<project_name>.<source_name>.<table_name>": {
"max_loaded_at": "2019-01-02T01:02:03+00:00",
"snapshotted_at": "2020-07-13T20:09:14.857209+00:00",
"max_loaded_at_time_ago_in_s": 48280031.857209,
"status": "warn",
"criteria": {
"warn_after": {"count": 100, "period": "day"},
"error_after": {"count": 1000, "period": "day"},
},
}
}
}
}
It might be nice to include a node unique ID explicitly alongside everywhere this spec accepts node, with an eye to deprecating the “node” field. We can do that separately, as it’ll require us writing out the manifest either a second time at run end (my preference) or moving the manifest-writing to the end of the run.
This will require some coordination with cloud, I assume it must use this value for displaying run results.
I don’t think this issue is imminently urgent, but I would consider it a blocker for 1.0.
Describe alternatives you’ve considered
We can always not do this, it’s just a cosmetic/display kind of issue… the information is there.
Who will this benefit?
developers, consumers of the output json
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (9 by maintainers)

Top Related StackOverflow Question
@cmcarthur This is a breaking change. We already have a schema for the results output, it’s just not a useful one and consumers are confused by it.
The problem here, which I don’t believe the above structure solves, is that dbt does a poor job of differentiating between “There was an error in execution” (an exception raised) and “There was an error-severity test failure”. Maybe we could add
failto the enum you suggested, and get rid of thefail/warnbooleans?Additional results dict to support adapter-specific info like in https://github.com/fishtown-analytics/dbt/issues/2747