[BUG] Require validation for `MlflowClient.log_batch` when updating params
See original GitHub issueWillingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
MLflow version
1.27.0
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac Book Big Sur (11.4)
- Python version: Python 3.8.13
- yarn version, if running the dev UI:
Describe the problem
I am using mlflow.tracking.MlflowClient
for tracking purposes. If I am not wrong, run params are not immutable in Mlflow tracking in order to ensure the reproducibility of an experiment run.
Calling MlflowClient.log_batch
to update params (with duplicate keys) does not produce a proper error message.
It should produce
MlflowException: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Param with key='test_param' was already logged with value='101' for run ID='4b4bc4bdf2ab4c0992fd1879e8580d29'. Attempted logging new value '100'.
Instead, it produces
MlflowException: API request to http://localhost:5000/api/2.0/mlflow/runs/log-batch failed with exception HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))
Tracking information
mlflow server --backend-store-uri expstore --default-artifact-root expstore --host localhost
Code to reproduce issue
import mlflow
from mlflow.tracking import MlflowClient
from mlflow.entities import Param
# create mlflow client
client = MlflowClient(tracking_uri="http://localhost:5000")
# create mlflow experiment
exp = client.create_experiment("test_exp")
# create run
run = client.create_run(experiment_id=2)
# first attempt to log params
params = {
"item_1": 55,
"test_param": 101
}
params_arr = [Param(key, str(value)) for key, value in params.items()]
client.log_batch(
run_id=run.info.run_id,
params=params_arr
)
# second attempt to log params with modified value for test_param
params = {
"item_1": 55,
"test_param": 100,
"new": 5
}
params_arr = [Param(key, str(value)) for key, value in params.items()]
client.log_batch(
run_id=run.info.run_id,
params=params_arr
)
Other info / logs
---------------------------------------------------------------------------
MaxRetryError Traceback (most recent call last)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/adapters.py:489, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
488 if not chunked:
--> 489 resp = conn.urlopen(
490 method=request.method,
491 url=url,
492 body=request.body,
493 headers=request.headers,
494 redirect=False,
495 assert_same_host=False,
496 preload_content=False,
497 decode_content=False,
498 retries=self.max_retries,
499 timeout=timeout,
500 )
502 # Send the request.
503 else:
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:876, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
875 log.debug("Retry: %s", url)
--> 876 return self.urlopen(
877 method,
878 url,
879 body,
880 headers,
881 retries=retries,
882 redirect=redirect,
883 assert_same_host=assert_same_host,
884 timeout=timeout,
885 pool_timeout=pool_timeout,
886 release_conn=release_conn,
887 chunked=chunked,
888 body_pos=body_pos,
889 **response_kw
890 )
892 return response
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:876, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
875 log.debug("Retry: %s", url)
--> 876 return self.urlopen(
877 method,
878 url,
879 body,
880 headers,
881 retries=retries,
882 redirect=redirect,
883 assert_same_host=assert_same_host,
884 timeout=timeout,
885 pool_timeout=pool_timeout,
886 release_conn=release_conn,
887 chunked=chunked,
888 body_pos=body_pos,
889 **response_kw
890 )
892 return response
[... skipping similar frames: HTTPConnectionPool.urlopen at line 876 (2 times)]
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:876, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
875 log.debug("Retry: %s", url)
--> 876 return self.urlopen(
877 method,
878 url,
879 body,
880 headers,
881 retries=retries,
882 redirect=redirect,
883 assert_same_host=assert_same_host,
884 timeout=timeout,
885 pool_timeout=pool_timeout,
886 release_conn=release_conn,
887 chunked=chunked,
888 body_pos=body_pos,
889 **response_kw
890 )
892 return response
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/connectionpool.py:866, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
865 try:
--> 866 retries = retries.increment(method, url, response=response, _pool=self)
867 except MaxRetryError:
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
591 if new_retry.is_exhausted():
--> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause))
594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)
MaxRetryError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))
During handling of the above exception, another exception occurred:
RetryError Traceback (most recent call last)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:151, in http_request(host_creds, endpoint, method, max_retries, backoff_factor, retry_codes, timeout, **kwargs)
150 try:
--> 151 return _get_http_response_with_retries(
152 method,
153 url,
154 max_retries,
155 backoff_factor,
156 retry_codes,
157 headers=headers,
158 verify=verify,
159 timeout=timeout,
160 **kwargs,
161 )
162 except Exception as e:
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:91, in _get_http_response_with_retries(method, url, max_retries, backoff_factor, retry_codes, **kwargs)
90 session = _get_request_session(max_retries, backoff_factor, retry_codes)
---> 91 return session.request(method, url, **kwargs)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
589 return resp
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
703 # Total elapsed time of the request (approximately)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/requests/adapters.py:556, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
555 if isinstance(e.reason, ResponseError):
--> 556 raise RetryError(e, request=request)
558 if isinstance(e.reason, _ProxyError):
RetryError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))
During handling of the above exception, another exception occurred:
MlflowException Traceback (most recent call last)
Input In [13], in <cell line: 9>()
1 params = {
2 "new_value": 55,
3 "test_param": 100,
4 "new": 5
5 }
7 params_arr = [Param(key, str(value)) for key, value in params.items()]
----> 9 client.log_batch(
10 run_id=run.info.run_id,
11 params=params_arr
12 )
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/tracking/client.py:918, in MlflowClient.log_batch(self, run_id, metrics, params, tags)
861 def log_batch(
862 self,
863 run_id: str,
(...)
866 tags: Sequence[RunTag] = (),
867 ) -> None:
868 """
869 Log multiple metrics, params, and/or tags.
870
(...)
916 status: FINISHED
917 """
--> 918 self._tracking_client.log_batch(run_id, metrics, params, tags)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py:315, in TrackingServiceClient.log_batch(self, run_id, metrics, params, tags)
312 metrics_batch = metrics[:metrics_batch_size]
313 metrics = metrics[metrics_batch_size:]
--> 315 self.store.log_batch(
316 run_id=run_id, metrics=metrics_batch, params=params_batch, tags=tags_batch
317 )
319 for metrics_batch in chunk_list(metrics, chunk_size=MAX_METRICS_PER_BATCH):
320 self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py:309, in RestStore.log_batch(self, run_id, metrics, params, tags)
305 tag_protos = [tag.to_proto() for tag in tags]
306 req_body = message_to_json(
307 LogBatch(metrics=metric_protos, params=param_protos, tags=tag_protos, run_id=run_id)
308 )
--> 309 self._call_endpoint(LogBatch, req_body)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/store/tracking/rest_store.py:56, in RestStore._call_endpoint(self, api, json_body)
54 endpoint, method = _METHOD_TO_INFO[api]
55 response_proto = api.Response()
---> 56 return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:253, in call_endpoint(host_creds, endpoint, method, json_body, response_proto)
249 response = http_request(
250 host_creds=host_creds, endpoint=endpoint, method=method, params=json_body
251 )
252 else:
--> 253 response = http_request(
254 host_creds=host_creds, endpoint=endpoint, method=method, json=json_body
255 )
256 response = verify_rest_response(response, endpoint)
257 js_dict = json.loads(response.text)
File ~/miniconda3-intel/envs/mlflow/lib/python3.8/site-packages/mlflow/utils/rest_utils.py:163, in http_request(host_creds, endpoint, method, max_retries, backoff_factor, retry_codes, timeout, **kwargs)
151 return _get_http_response_with_retries(
152 method,
153 url,
(...)
160 **kwargs,
161 )
162 except Exception as e:
--> 163 raise MlflowException("API request to %s failed with exception %s" % (url, e))
MlflowException: API request to http://localhost:5000/api/2.0/mlflow/runs/log-batch failed with exception HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/2.0/mlflow/runs/log-batch (Caused by ResponseError('too many 500 error responses'))
What component(s) does this bug affect?
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/pipelines
: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
What language(s) does this bug affect?
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created a year ago
- Comments:9 (1 by maintainers)
Top Results From Across the Web
MLflow 2.0.1 documentation
Evaluate a PyFunc model on the specified dataset using one or more specified evaluators , and log resulting metrics & artifacts to MLflow...
Read more >When sending a Patch request to my API, getUpdate Handler ...
When sending a Patch request to my API, getUpdate Handler throws a validation error for a field required in in the schema ;...
Read more >mlflow Changelog - pyup.io
Small bug fixes and documentation updates: ... [Pipelines] Add validation and an exception if required step files are missing (7067, mingyu89)
Read more >1946358 – [DDF] The update prepare command returned an ...
Action tripleo.parameters.update execution failed: Error validating environment for plan ocd03: ERROR: The Parameter (DockerPuppetProcessCount) was not ...
Read more >User Feedback - Updating API Elements - Postman community
(ref); FIXED - When defining an API schema that uses root as it's path, the collection generated causes collection schema validation error.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@Mathanraj-Sharma You cannot push to
mlflow/mlflow
. Can you create a fork, push commits there, and create a PR?This also seems to apply other validation problems: e.g. if the length of parameter value is longer than 250 digits. But this just become apparent if you change the server to local: