question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[CEP] System Forms Metadata Standardization

See original GitHub issue

Abstract A number of different parts of HQ create forms (pretty much all are case updates) for various purposes. As contrasted with forms submitted by CommCare mobile or webapps, these are generally thought of as “system forms”. A recurring challenge is tagging these with useful information, and various conflicting mechanisms are currently in use.

References: System forms in Reports improve report filtering related to system forms #31654

Motivation An ideal system would allow us to:

  • Programmatically identify all system forms
  • Identify system forms from a particular feature
  • Further identify which sub-feature submitted the form. This is usually dynamic, such as a specific auto update rule ID or DHIS2 integration
  • See what user triggered the action, where applicable
  • Store arbitrary unstructured information as needed

Specification At the moment, 4 fields are commonly in use: xmlns, device_id, user_id, and username The submit_case_blocks docstring has some current guidance around how these fields are used. Here are some features that submit system forms, and how they use each of these fields:

System feature User ID Username XMLNS Device ID
Case Importer Web user’s ID Web user username http://commcarehq.org/case corehq.apps.case_importer.do_import.do_import
Case Claim Mobile worker ID Mobile worker username http://commcarehq.org/case corehq.apps.ota.views.claim
Case Cleaning Web user ID Web user username http://commcarehq.org/case/edit corehq.apps.reports.views.edit_case
Update Rule system system http://commcarehq.org/hq_case_update_rule “”
COVID custom update rules system system http://commcarehq.org/hq_case_update_rule custom.covid.rules.custom_actions.{rule_name}
Deduplication system system http://commcarehq.org/hq_case_deduplication_rule__{name_slug}-{rule.case_type} CaseDeduplicationActionDefinition-update-cases
FHIR “” system http://commcarehq.org/x/fhir/engine-read FHIRImportConfig-{importer.pk} (but it varies)
DHIS2 “” system http://commcarehq.org/dhis2-integration “”
OpenMRS “” system http://commcarehq.org/openmrs-integration openmrs-atomfeed-{repeater.get_id} (and sometimes blank)
usercase “” system http://commcarehq.org/case corehq.apps.callcenter.sync_usercase._UserCaseHelper.
commtrack sms User ID “” Possibly http://openrosa.org/jr/xforms, unsure sms:{phone_number} or “”

Aside: I do see that anything that uses “submit_case_blocks” (which I’m pretty sure includes everything save commtrack and smsforms) uses the same template, which stores the form instance in a <system /> node, rather than <data /> as is normally used. This is a pretty reliable signal, and it is already queryable in elasticsearch. Still, I don’t think it’s necessary to build off of this in our desired end state.

The proposal is that we adopt the following conventions:

  • user_id - reserved for user where applicable, otherwise “system”
  • username - reserved for user where applicable, otherwise “system”
  • XMLNS - Identifies the feature. This must not be dynamic, and it must be registered with a human-readable name in SYSTEM_FORM_XMLNS_MAP
  • deviceID - Can be dynamic, used for storing further granularity about what portion of the feature it came from. This is also queryable

In this world, system forms could be identified conclusively with something like form.xmlns in SYSTEM_FORM_XMLNS_MAP.keys() in elasticsearch, postgres, or python. We would also be able to associate a human-readable name of the feature with the form based on its XMLNS (useful for reporting). Further granularity can be achieved by feature-specific filtering with deviceID as well. For example:

(FormES()
 .domain(domain)
 .xmlns(CASE_UPDATE_RULE_XMLNS)
 .device_id(rule.pk))

An option for further extension if needed is expanding beyond these four fields. CommCare uses appVersion, app_build_version, and commcare_version. I wouldn’t want to abuse these beyond their stated meanings, though. We can also store essentially anything we want in the meta block, it just wouldn’t be easily searchable in elasticsearch without switching to a subdocument model, though. Possible things we might want to store include:

  • Feature version (could use appVersion for this)
  • arbitrary logging/debugging information

Perhaps the most straightforward approach to storing arbitrary info is as normal form data, as is done here, which we could trivially enable in submit_case_blocks. An alternative would be adding an optional info arg to submit_case_blocks which accepts a dict and turns that into an info node inside the meta block. I’m interested in opinions there.

I imagine there’s room for improving tooling to support this paradigm, plus certainly a docs change. One model that might be a good hook for standardization is the SystemFormMeta dataclass

Impact on users I don’t propose a large scale migration at this time, just alignment on a desired future state to guide feature development

Impact on hosting no change

Backwards compatibility None at this time, but backwards incompatibility is likely as features are migrated to the new standard.

Release Timeline N/A

Open questions and issues I’m particularly interested in hearing thoughts about how we can make this easier to stick to going forward

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
esoergelcommented, Jul 8, 2022

Thanks for sharing those examples, Daniel - I didn’t know that about the couch-to-sql migration forms. The more I think about it, the more I like using the form body for mostly unstructured logging, given that that’s literally what xforms are built for. Thinking about how we can facilitate that, I guess a little validation is maybe all that’s needed. We can add a form_data dict arg to submit_case_blocks and validate that the keys are valid XML identifiers and then escape the values like you did in that diff code.

1reaction
esoergelcommented, Jul 7, 2022

Sure thing, thanks for the clarifying questions, @dannyroberts First off, motivations:

  • I’ve been in a number of conversations around how to do this properly, so the easiest outcome from this CEP is a documented decision on what the correct approach is, so we needn’t deliberate this next time
  • I did some of the research for this doc a little while back in fielding a request to show system forms in standard reports. That’s where I discovered how disjointed this is, and how hard it is to conclusively identify “system forms”. Simon reworked our filters a bit to support that, though there’s room to go further, such as by presenting known system forms as filter options (we don’t have any plans to implement this at this time, though).
  • We’re also deliberating ways to improve the UX of auto update rules, and these changes could support pre-filtering the submit history and manage forms reports to forms submitted by a particular rule. That’d be an easy way to get better user-facing logging, and possibly an “undo” functionality by bulk archival of those forms.
  • The most recent impetus is this ticket, where USH would like to be able to see which case update rule caused a particular case change. The proposal there is to use dynamic XMLNSs, but I think a static “case update rule” xmlns and a dynamic device ID would be better. I could also see us adding further context to the form body if we adopt that as a “normal” thing to do.

I do think your breakdown is a logical one. I’m actually hoping to avoid advocating here for any general breaking changes as you describe in point 2, as I think that’s more of a feature-by-feature question. Most of the USH features that submit system forms would be fine with a break in continuity, but it’d be disruptive to make such a change for GA features. Everything except deduplication uses a static XMLNS, even if it doesn’t uniquely identify those features. I’d love to see us move say, case claim in this direction, but that’d be a whole thing trying to figure that out.

I think the specific action items I’d like to see come out of this are the following non-breaking changes:

  • Document what we determine to be The Right Way
  • Organize XMLNSs into SYSTEM_FORM_XMLNS_MAP (or a successor enum type)
  • Use the readable feature names in some key places in reports

And then perhaps the following breaking change:

  • Add case update rule IDs to those forms as device ID.

On that point, the XMLNS is already feature-specific, and the device ID is unused, so this wouldn’t be too scary a change. However, any new features built around that device ID would only support future data. That still seems like a net win, but I think that’s a separate discussion - we’d have to get Product’s 👍 before making such things GA.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Metadata Systems for the U.S. Statistical Agencies, in Plain ...
Abstract. This paper explains the benefits and challenges of metadata systems for U.S. statistical agencies. Metadata systems facilitate the ...
Read more >
TABLE OF CONTENTS - Squarespace
CEP DELIBERATIVE DOCUMENT – NOT FOR PUBLIC RELEASE accessing, as the most basic form of metadata. Stakeholders described metadata as a critical.
Read more >
PREMIS: Preservation Metadata Maintenance Activity (Library ...
The PREMIS maintenance activity is responsible for maintaining, supporting, and coordinating future revisions to the PREMIS data dictionary.
Read more >
Chapter 6. Complex event processing (CEP)
The CEP system prioritizes patterns of related events and the relationships between them. Events usually need to be composed and aggregated. Given these...
Read more >
Forms & Instructions - State of Michigan
CEP Household Information Report - Letter to Households · Income Eligibility Guidelines for Use in Schools. For translated versions of the Letter to ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found