[CEP] System Forms Metadata Standardization
See original GitHub issueAbstract A number of different parts of HQ create forms (pretty much all are case updates) for various purposes. As contrasted with forms submitted by CommCare mobile or webapps, these are generally thought of as “system forms”. A recurring challenge is tagging these with useful information, and various conflicting mechanisms are currently in use.
References: System forms in Reports improve report filtering related to system forms #31654
Motivation An ideal system would allow us to:
- Programmatically identify all system forms
- Identify system forms from a particular feature
- Further identify which sub-feature submitted the form. This is usually dynamic, such as a specific auto update rule ID or DHIS2 integration
- See what user triggered the action, where applicable
- Store arbitrary unstructured information as needed
Specification
At the moment, 4 fields are commonly in use: xmlns
, device_id
, user_id
, and username
The submit_case_blocks
docstring has some current guidance around how these fields are used. Here are some features that submit system forms, and how they use each of these fields:
System feature | User ID | Username | XMLNS | Device ID |
---|---|---|---|---|
Case Importer | Web user’s ID | Web user username | http://commcarehq.org/case | corehq.apps.case_importer.do_import.do_import |
Case Claim | Mobile worker ID | Mobile worker username | http://commcarehq.org/case | corehq.apps.ota.views.claim |
Case Cleaning | Web user ID | Web user username | http://commcarehq.org/case/edit | corehq.apps.reports.views.edit_case |
Update Rule | system | system | http://commcarehq.org/hq_case_update_rule | “” |
COVID custom update rules | system | system | http://commcarehq.org/hq_case_update_rule | custom.covid.rules.custom_actions.{rule_name} |
Deduplication | system | system | http://commcarehq.org/hq_case_deduplication_rule__{name_slug}-{rule.case_type} | CaseDeduplicationActionDefinition-update-cases |
FHIR | “” | system | http://commcarehq.org/x/fhir/engine-read | FHIRImportConfig-{importer.pk} (but it varies) |
DHIS2 | “” | system | http://commcarehq.org/dhis2-integration | “” |
OpenMRS | “” | system | http://commcarehq.org/openmrs-integration | openmrs-atomfeed-{repeater.get_id} (and sometimes blank) |
usercase | “” | system | http://commcarehq.org/case | corehq.apps.callcenter.sync_usercase._UserCaseHelper. |
commtrack sms | User ID | “” | Possibly http://openrosa.org/jr/xforms, unsure | sms:{phone_number} or “” |
Aside: I do see that anything that uses “submit_case_blocks” (which I’m pretty sure includes everything save commtrack and smsforms) uses the same template, which stores the form instance in a <system />
node, rather than <data />
as is normally used. This is a pretty reliable signal, and it is already queryable in elasticsearch. Still, I don’t think it’s necessary to build off of this in our desired end state.
The proposal is that we adopt the following conventions:
- user_id - reserved for user where applicable, otherwise “system”
- username - reserved for user where applicable, otherwise “system”
- XMLNS - Identifies the feature. This must not be dynamic, and it must be registered with a human-readable name in
SYSTEM_FORM_XMLNS_MAP
- deviceID - Can be dynamic, used for storing further granularity about what portion of the feature it came from. This is also queryable
In this world, system forms could be identified conclusively with something like form.xmlns in SYSTEM_FORM_XMLNS_MAP.keys()
in elasticsearch, postgres, or python. We would also be able to associate a human-readable name of the feature with the form based on its XMLNS (useful for reporting). Further granularity can be achieved by feature-specific filtering with deviceID
as well. For example:
(FormES()
.domain(domain)
.xmlns(CASE_UPDATE_RULE_XMLNS)
.device_id(rule.pk))
An option for further extension if needed is expanding beyond these four fields. CommCare uses appVersion
, app_build_version
, and commcare_version
. I wouldn’t want to abuse these beyond their stated meanings, though. We can also store essentially anything we want in the meta block, it just wouldn’t be easily searchable in elasticsearch without switching to a subdocument model, though. Possible things we might want to store include:
- Feature version (could use
appVersion
for this) - arbitrary logging/debugging information
Perhaps the most straightforward approach to storing arbitrary info is as normal form data, as is done here, which we could trivially enable in submit_case_blocks
. An alternative would be adding an optional info
arg to submit_case_blocks
which accepts a dict and turns that into an info
node inside the meta block. I’m interested in opinions there.
I imagine there’s room for improving tooling to support this paradigm, plus certainly a docs change. One model that might be a good hook for standardization is the SystemFormMeta
dataclass
Impact on users I don’t propose a large scale migration at this time, just alignment on a desired future state to guide feature development
Impact on hosting no change
Backwards compatibility None at this time, but backwards incompatibility is likely as features are migrated to the new standard.
Release Timeline N/A
Open questions and issues I’m particularly interested in hearing thoughts about how we can make this easier to stick to going forward
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Thanks for sharing those examples, Daniel - I didn’t know that about the couch-to-sql migration forms. The more I think about it, the more I like using the form body for mostly unstructured logging, given that that’s literally what xforms are built for. Thinking about how we can facilitate that, I guess a little validation is maybe all that’s needed. We can add a
form_data
dict arg tosubmit_case_blocks
and validate that the keys are valid XML identifiers and then escape the values like you did in thatdiff
code.Sure thing, thanks for the clarifying questions, @dannyroberts First off, motivations:
I do think your breakdown is a logical one. I’m actually hoping to avoid advocating here for any general breaking changes as you describe in point 2, as I think that’s more of a feature-by-feature question. Most of the USH features that submit system forms would be fine with a break in continuity, but it’d be disruptive to make such a change for GA features. Everything except deduplication uses a static XMLNS, even if it doesn’t uniquely identify those features. I’d love to see us move say, case claim in this direction, but that’d be a whole thing trying to figure that out.
I think the specific action items I’d like to see come out of this are the following non-breaking changes:
SYSTEM_FORM_XMLNS_MAP
(or a successor enum type)And then perhaps the following breaking change:
On that point, the XMLNS is already feature-specific, and the device ID is unused, so this wouldn’t be too scary a change. However, any new features built around that device ID would only support future data. That still seems like a net win, but I think that’s a separate discussion - we’d have to get Product’s 👍 before making such things GA.