question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Increase model signature flexibility to allow for occasionally missing fields while retaining datatype enforcement

See original GitHub issue

Willingness to contribute

No. I cannot contribute this feature at this time.

Proposal Summary

When creating a model signature, it would helpful to have the option to allow certain fields to be missing in incoming data while retaining the datatype check functionality when they are present. Currently, all fields must be present in the model signature and all of those features must be passed to the model or an error will be thrown. Additionally, extra fields, while they will not cause an error, are not passed through to the model so simply excluding these fields from the signature itself so they can optionally be used in scoring is not possible.

Motivation

What is the use case for this feature?

This would be helpful in cases when incoming scoring data might not always be complete but one wishes to enforce data typing on specified fields when they are present so as not to lose information.

Why is this use case valuable to support for MLflow users in general?

It provides greater flexibility for using the model signature. To work around this issue currently requires one to exclude the signature completely.

Why is this use case valuable to support for your project(s) or organization?

Our raw scoring data is passed as a json that occasionally has some missing fields. Data formatting can also be variable so it is important for us to be able to retain the datatype enforcement functionality.

Why is it currently difficult to achieve this use case?

The only way to do this requires that the model signature is excluded which means losing the datatyping functionality which results in potential loss of information.

Details

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:2
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
nfarley-soarencommented, Sep 19, 2022

I appreciate the opportunity to participate in this and would be happy to throw in my thoughts. I hope more people weigh in on this because I’m definitely focused on my current use-case and may be a bit myopic regarding the needs of other users. I’ll be meeting with my team tomorrow to get their opinions/feedback and will have a more detailed response for you guys after that.

2reactions
BenWilson2commented, Sep 16, 2022

@nfarley-soaren @skylarbpayne After some deliberations amongst the maintainers, we really like this idea. That being said, we’d like to make sure that we fully capture as many potential end-use scenarios that a feature could address. Would it be possible for the two of you (and anyone else that is interested in weighing in with their thoughts) to describe in detail the scenario involved? Specifically: What does a partial check actually look like in given scenarios?

  • Fields in the payload that were not part of signature validation
  • Null values in a payload (missing data)
  • Different numeric non-lossy types passed (int -> long, float->double)
  • How would you want to see the definition of a ‘partial check’ from an API standpoint?
  • What level of interaction would you want for a partial validation? The ability to generate a logged warning? An addition to the return payload that indicates that schema validation failed? Silence?
  • Would you want the ability to mutate the strictness (strict -> partial; partial -> strict)?

If it’s not too much trouble to really flesh out this request, we’d really appreciate it (and to continue the design discussion here).

Read more comments on GitHub >

github_iconTop Results From Across the Web

USCIS Makes Permanent Certain COVID-19 Signature ...
In a July 25, 2022 notice titled USCIS Extends Flexibility for Responding to Agency Requests During COVID-19, USCIS stated:
Read more >
Verifiable Credentials Data Model v1.1 - W3C
A verifiable credential can represent all of the same information that a physical credential represents. The addition of technologies, such as ...
Read more >
Federal Register/Vol. 85, No. 85/Friday, May 1, 2020/Rules ...
The rule also finalizes certain modifications to the 2015 Edition health IT certification criteria and Program in additional ways to advance ...
Read more >
21st Century Cures Act: Interoperability, Information Blocking ...
As noted in the 2015 Edition final rule (80 FR 62646), certification to these criteria is not linked to meeting the Certified EHR...
Read more >
Federal Register, Volume 80 Issue 241 (Wednesday, December 16 ...
The requirements for ELDs will improve compliance with the HOS rules. ... Others are procedural, to give drivers recourse when they are harassed....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found