Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Require validation for `MlflowClient.log_batch` when updating params

See original GitHub issue

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

1.27.0

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac Book Big Sur (11.4)
Python version: Python 3.8.13
yarn version, if running the dev UI:

Describe the problem

I am using mlflow.tracking.MlflowClient for tracking purposes. If I am not wrong, run params are not immutable in Mlflow tracking in order to ensure the reproducibility of an experiment run.

Calling MlflowClient.log_batch to update params (with duplicate keys) does not produce a proper error message.

It should produce

MlflowException: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Param with key='test_param' was already logged with value='101' for run ID='4b4bc4bdf2ab4c0992fd1879e8580d29'. Attempted logging new value '100'.

Instead, it produces

MlflowException: API request to http://localhost:5000/api/2.0/mlflow/runs/log-batch failed with exception HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))

Tracking information

mlflow server --backend-store-uri expstore --default-artifact-root expstore --host localhost

Code to reproduce issue

import mlflow

from mlflow.tracking import MlflowClient
from mlflow.entities import Param

# create mlflow client
client = MlflowClient(tracking_uri="http://localhost:5000")

# create mlflow experiment
exp = client.create_experiment("test_exp")

# create run
run = client.create_run(experiment_id=2)

# first attempt to log params
params = {
    "item_1": 55,
    "test_param": 101
}

params_arr = [Param(key, str(value)) for key, value in params.items()]

client.log_batch(
    run_id=run.info.run_id,
    params=params_arr
)


# second attempt to log params with modified value for test_param
params = {
    "item_1": 55,
    "test_param": 100,
    "new": 5
}

params_arr = [Param(key, str(value)) for key, value in params.items()]

client.log_batch(
    run_id=run.info.run_id,
    params=params_arr
)

Other info / logs

---------------------------------------------------------------------------
MaxRetryError                             Traceback (most recent call last)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/adapters.py:489, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    488 if not chunked:
--> 489     resp = conn.urlopen(
    490         method=request.method,
    491         url=url,
    492         body=request.body,
    493         headers=request.headers,
    494         redirect=False,
    495         assert_same_host=False,
    496         preload_content=False,
    497         decode_content=False,
    498         retries=self.max_retries,
    499         timeout=timeout,
    500     )
    502 # Send the request.
    503 else:

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:876, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    875     log.debug("Retry: %s", url)
--> 876     return self.urlopen(
    877         method,
    878         url,
    879         body,
    880         headers,
    881         retries=retries,
    882         redirect=redirect,
    883         assert_same_host=assert_same_host,
    884         timeout=timeout,
    885         pool_timeout=pool_timeout,
    886         release_conn=release_conn,
    887         chunked=chunked,
    888         body_pos=body_pos,
    889         **response_kw
    890     )
    892 return response

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:876, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    875     log.debug("Retry: %s", url)
--> 876     return self.urlopen(
    877         method,
    878         url,
    879         body,
    880         headers,
    881         retries=retries,
    882         redirect=redirect,
    883         assert_same_host=assert_same_host,
    884         timeout=timeout,
    885         pool_timeout=pool_timeout,
    886         release_conn=release_conn,
    887         chunked=chunked,
    888         body_pos=body_pos,
    889         **response_kw
    890     )
    892 return response

    [... skipping similar frames: HTTPConnectionPool.urlopen at line 876 (2 times)]

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:876, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    875     log.debug("Retry: %s", url)
--> 876     return self.urlopen(
    877         method,
    878         url,
    879         body,
    880         headers,
    881         retries=retries,
    882         redirect=redirect,
    883         assert_same_host=assert_same_host,
    884         timeout=timeout,
    885         pool_timeout=pool_timeout,
    886         release_conn=release_conn,
    887         chunked=chunked,
    888         body_pos=body_pos,
    889         **response_kw
    890     )
    892 return response

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:866, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    865 try:
--> 866     retries = retries.increment(method, url, response=response, _pool=self)
    867 except MaxRetryError:

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    591 if new_retry.is_exhausted():
--> 592     raise MaxRetryError(_pool, url, error or ResponseError(cause))
    594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))

During handling of the above exception, another exception occurred:

RetryError                                Traceback (most recent call last)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:151, in http_request(host_creds, endpoint, method, max_retries, backoff_factor, retry_codes, timeout, **kwargs)
    150 try:
--> 151     return _get_http_response_with_retries(
    152         method,
    153         url,
    154         max_retries,
    155         backoff_factor,
    156         retry_codes,
    157         headers=headers,
    158         verify=verify,
    159         timeout=timeout,
    160         **kwargs,
    161     )
    162 except Exception as e:

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:91, in _get_http_response_with_retries(method, url, max_retries, backoff_factor, retry_codes, **kwargs)
     90 session = _get_request_session(max_retries, backoff_factor, retry_codes)
---> 91 return session.request(method, url, **kwargs)

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
    589 return resp

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
    700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
    703 # Total elapsed time of the request (approximately)

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/adapters.py:556, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    555 if isinstance(e.reason, ResponseError):
--> 556     raise RetryError(e, request=request)
    558 if isinstance(e.reason, _ProxyError):

RetryError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))

During handling of the above exception, another exception occurred:

MlflowException                           Traceback (most recent call last)
Input In [13], in <cell line: 9>()
      1 params = {
      2     "new_value": 55,
      3     "test_param": 100,
      4     "new": 5
      5 }
      7 params_arr = [Param(key, str(value)) for key, value in params.items()]
----> 9 client.log_batch(
     10     run_id=run.info.run_id,
     11     params=params_arr
     12 )

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/tracking/client.py:918, in MlflowClient.log_batch(self, run_id, metrics, params, tags)
    861 def log_batch(
    862     self,
    863     run_id: str,
   (...)
    866     tags: Sequence[RunTag] = (),
    867 ) -> None:
    868     """
    869     Log multiple metrics, params, and/or tags.
    870 
   (...)
    916         status: FINISHED
    917     """
--> 918     self._tracking_client.log_batch(run_id, metrics, params, tags)

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py:315, in TrackingServiceClient.log_batch(self, run_id, metrics, params, tags)
    312     metrics_batch = metrics[:metrics_batch_size]
    313     metrics = metrics[metrics_batch_size:]
--> 315     self.store.log_batch(
    316         run_id=run_id, metrics=metrics_batch, params=params_batch, tags=tags_batch
    317     )
    319 for metrics_batch in chunk_list(metrics, chunk_size=MAX_METRICS_PER_BATCH):
    320     self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py:309, in RestStore.log_batch(self, run_id, metrics, params, tags)
    305 tag_protos = [tag.to_proto() for tag in tags]
    306 req_body = message_to_json(
    307     LogBatch(metrics=metric_protos, params=param_protos, tags=tag_protos, run_id=run_id)
    308 )
--> 309 self._call_endpoint(LogBatch, req_body)

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py:56, in RestStore._call_endpoint(self, api, json_body)
     54 endpoint, method = _METHOD_TO_INFO[api]
     55 response_proto = api.Response()
---> 56 return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:253, in call_endpoint(host_creds, endpoint, method, json_body, response_proto)
    249     response = http_request(
    250         host_creds=host_creds, endpoint=endpoint, method=method, params=json_body
    251     )
    252 else:
--> 253     response = http_request(
    254         host_creds=host_creds, endpoint=endpoint, method=method, json=json_body
    255     )
    256 response = verify_rest_response(response, endpoint)
    257 js_dict = json.loads(response.text)

File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:163, in http_request(host_creds, endpoint, method, max_retries, backoff_factor, retry_codes, timeout, **kwargs)
    151     return _get_http_response_with_retries(
    152         method,
    153         url,
   (...)
    160         **kwargs,
    161     )
    162 except Exception as e:
--> 163     raise MlflowException("API request to %s failed with exception %s" % (url, e))

MlflowException: API request to http://localhost:5000/api/2.0/mlflow/runs/log-batch failed with exception HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))

What component(s) does this bug affect?

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:9 (1 by maintainers)

Top GitHub Comments

1reaction

harupycommented, Jul 1, 2022

@Mathanraj-Sharma You cannot push to mlflow/mlflow. Can you create a fork, push commits there, and create a PR?

0reactions

maximilianreimercommented, Dec 9, 2022

This also seems to apply other validation problems: e.g. if the length of parameter value is longer than 250 digits. But this just become apparent if you change the server to local:

mlflow.exceptions.MlflowException: Param value [MASKED] had length 301, which exceeded length limit of 250

Top Results From Across the Web

MLflow 2.0.1 documentation

Evaluate a PyFunc model on the specified dataset using one or more specified evaluators , and log resulting metrics & artifacts to MLflow...

When sending a Patch request to my API, getUpdate Handler ...

When sending a Patch request to my API, getUpdate Handler throws a validation error for a field required in in the schema ;...

mlflow Changelog - pyup.io

Small bug fixes and documentation updates: ... [Pipelines] Add validation and an exception if required step files are missing (7067, mingyu89)

1946358 – [DDF] The update prepare command returned an ...

Action tripleo.parameters.update execution failed: Error validating environment for plan ocd03: ERROR: The Parameter (DockerPuppetProcessCount) was not ...

User Feedback - Updating API Elements - Postman community

(ref); FIXED - When defining an API schema that uses root as it's path, the collection generated causes collection schema validation error.