question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Custom handler file with multiple classes: not selecting the first one

See original GitHub issue

According to docs, if two classes are present in a handler file, the first one will be used. However, that’s not what happens, and I get an error while serving. Additionally, specifying the handler class with --handler handler.py:HandlerClass yields an error during archival.

I have two classes in the handler.py : ClassifierFeatureExtractor and ScaleIntensityTransform

attempt 1: archiving with --handler handler.py

  • archival works w/o errors
  • server stumbles:
ValueError: Expected only one class in custom service code or a function entry point
 [<class 'baidu_handler.ClassifierFeatureExtractor'>, <class 'baidu_handler.ScaleIntensityTransform'>]

attempt 2: archiving with --handler handler.py:ClassifierFeatureExtractor

  • archival fails:
FileNotFoundError: [Errno 2] No such file or directory: 'handler.py:ClassifierFeatureExtractor'

Context

pip freeze |grep torch

torch==1.7.1
torch-model-archiver==0.3.0
torchserve==0.3.0
torchvision==0.2.0

  • java version
java --version
openjdk 11.0.9.1 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
  • Operating System and version: ubuntu.20.04

Your Environment

  • Installed using source? [yes/no]: pip
  • Are you planning to deploy it using docker container? [yes/no]: NA
  • Is it a CPU or GPU environment?: GPU
  • Using a default/custom handler? [If possible upload/share custom handler/model]: see above
  • What kind of model is it e.g. vision, text, audio?: vision

Failure Logs [if any]

``` 2021-02-04 10:39:11,362 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.3.0 TS Home: ~/.conda/envs/ncrf/lib/python3.6/site-packages Current directory: ~/repos/NCRF/scripts Temp directory: /tmp Number of GPUs: 1 Number of CPUs: 8 Max heap size: 3986 M Python executable: ~/.conda/envs/ncrf/bin/python Config file: logs/config/20210203222847767-shutdown.cfg Inference address: http://127.0.0.1:8080 Management address: http://127.0.0.1:8081 Metrics address: http://127.0.0.1:8082 Model Store: ~/repos/NCRF/model_store Initial Models: resnet18-baseline=resnet18-baseline.mar Log dir: ~/repos/NCRF/scripts/logs Metrics dir: ~/repos/NCRF/scripts/logs Netty threads: 0 Netty client threads: 0 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Prefer direct buffer: false Allowed Urls: [file://.*|http(s)?://.*] Custom python dependency for model allowed: false Metrics report format: prometheus Enable metrics API: true

2021-02-04 10:39:11,373 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot { “name”: “20210203222847767-shutdown.cfg”, “modelCount”: 1, “created”: 1612420127768, “models”: { “resnet18-baseline”: { “1.0”: { “defaultVersion”: true, “marName”: “resnet18-baseline.mar”, “minWorkers”: 1, “maxWorkers”: 1, “batchSize”: 1, “maxBatchDelay”: 100, “responseTimeout”: 120 } } } } 2021-02-04 10:39:11,385 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot 20210203222847767-shutdown.cfg 2021-02-04 10:39:11,386 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot 20210203222847767-shutdown.cfg validated successfully 2021-02-04 10:39:11,875 [INFO ] main org.pytorch.serve.archive.ModelArchive - eTag 83d761631a2943f08c5764350f3ebd76 2021-02-04 10:39:11,883 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model resnet18-baseline 2021-02-04 10:39:11,884 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model resnet18-baseline 2021-02-04 10:39:11,884 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model resnet18-baseline 2021-02-04 10:39:11,884 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model resnet18-baseline loaded. 2021-02-04 10:39:11,884 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: resnet18-baseline, count: 1 2021-02-04 10:39:11,904 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2021-02-04 10:39:11,954 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 021-02-04 10:39:11,904 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2021-02-04 10:39:11,954 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2021-02-04 10:39:11,956 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082 2021-02-04 10:39:11,986 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: /tmp/.ts.sock.9000 2021-02-04 10:39:11,986 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]2460 2021-02-04 10:39:11,986 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started. 2021-02-04 10:39:11,987 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.12 2021-02-04 10:39:11,987 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet18-baseline_1.0 State change null -> WORKER_STARTED 2021-02-04 10:39:11,997 [INFO ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000 2021-02-04 10:39:12,005 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: /tmp/.ts.sock.9000. 2021-02-04 10:39:13,935 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died. 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last): 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 182, in <module> 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - worker.run_server() 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 154, in run_server 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 154, in run_server 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.handle_connection(cl_socket) 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 116, in handle_connection 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg) 2021-02-04 10:39:13,936 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 89, in load_model 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope) 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_loader.py”, line 96, in load 2021-02-04 10:39:13,937 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - entry_point, initialize_fn = self._get_class_entry_point(module) 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_loader.py”, line 138, in _get_class_entry_point 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - model_class_definitions)) 2021-02-04 10:39:13,937 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:188) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - ValueError: Expected only one class in custom service code or a function entry point [<class ‘baidu_handler.BaiduClassifierFeatureExtractor’>, <class ‘baidu_handler.ScaleIntensityBaidu’>] 2021-02-04 10:39:13,939 [WARN ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: resnet18-baseline, error: Worker died. 2021-02-04 10:39:13,939 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet18-baseline_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2021-02-04 10:39:13,939 [WARN ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-resnet18-baseline_1.0-stderr 2021-02-04 10:39:13,939 [WARN ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-resnet18-baseline_1.0-stdout 2021-02-04 10:39:13,940 [INFO ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds. 2021-02-04 10:39:13,962 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-resnet18-baseline_1.0-stdout 2021-02-04 10:39:13,962 [INFO ] W-9000-resnet18-baseline_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-resnet18-baseline_1.0-stderr 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: /tmp/.ts.sock.9000 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]2490 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started. 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.12 2021-02-04 10:39:15,013 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet18-baseline_1.0 State change WORKER_STOPPED -> WORKER_STARTED

</details>

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:13

github_iconTop GitHub Comments

2reactions
DSLituievcommented, Feb 6, 2021

I was able to resolve it by moving the transform class into a separate file and passing the latter under the --extra-files.

1reaction
DSLituievcommented, Mar 1, 2021

@lxning I believe this still needs to be documented properly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to select multiple class class with same name in JS
You are only selecting the first .likarton instance - this is fixed by using querySelectorAll(). Since you are using addEventListener ...
Read more >
6. Custom Service — PyTorch/Serve master documentation
You can create custom handler by having class with any name, but it must have an initialize and a handle method. NOTE -...
Read more >
Azure Functions custom handlers | Microsoft Learn
The primary purpose of the custom handlers feature is to enable languages and runtimes that do not currently have first-class support on Azure ......
Read more >
Deploy Java Lambda functions with .zip or JAR file archives
Lambda loads JAR files in Unicode alphabetical order. If multiple JAR files in the lib directory contain the same class, the first one...
Read more >
Using files from web applications - Web APIs - MDN Web Docs
The multiple attribute on the input element allows the user to select multiple files. Accessing the first selected file using a classical ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found