Custom handler file with multiple classes: not selecting the first one
See original GitHub issueAccording to docs, if two classes are present in a handler file, the first one will be used. However, that’s not what happens, and I get an error while serving. Additionally, specifying the handler class with --handler handler.py:HandlerClass yields an error during archival.
I have two classes in the handler.py : ClassifierFeatureExtractor and ScaleIntensityTransform
attempt 1: archiving with --handler handler.py
- archival works w/o errors
- server stumbles:
ValueError: Expected only one class in custom service code or a function entry point
[<class 'baidu_handler.ClassifierFeatureExtractor'>, <class 'baidu_handler.ScaleIntensityTransform'>]
attempt 2: archiving with --handler handler.py:ClassifierFeatureExtractor
- archival fails:
FileNotFoundError: [Errno 2] No such file or directory: 'handler.py:ClassifierFeatureExtractor'
Context
pip freeze |grep torch
torch==1.7.1
torch-model-archiver==0.3.0
torchserve==0.3.0
torchvision==0.2.0
- java version
java --version
openjdk 11.0.9.1 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
- Operating System and version: ubuntu.20.04
Your Environment
- Installed using source? [yes/no]: pip
- Are you planning to deploy it using docker container? [yes/no]: NA
- Is it a CPU or GPU environment?: GPU
- Using a default/custom handler? [If possible upload/share custom handler/model]: see above
- What kind of model is it e.g. vision, text, audio?: vision
…
Failure Logs [if any]
2021-02-04 10:39:11,373 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Started restoring models from snapshot { “name”: “20210203222847767-shutdown.cfg”, “modelCount”: 1, “created”: 1612420127768, “models”: { “resnet18-baseline”: { “1.0”: { “defaultVersion”: true, “marName”: “resnet18-baseline.mar”, “minWorkers”: 1, “maxWorkers”: 1, “batchSize”: 1, “maxBatchDelay”: 100, “responseTimeout”: 120 } } } } 2021-02-04 10:39:11,385 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Validating snapshot 20210203222847767-shutdown.cfg 2021-02-04 10:39:11,386 [INFO ] main org.pytorch.serve.snapshot.SnapshotManager - Snapshot 20210203222847767-shutdown.cfg validated successfully 2021-02-04 10:39:11,875 [INFO ] main org.pytorch.serve.archive.ModelArchive - eTag 83d761631a2943f08c5764350f3ebd76 2021-02-04 10:39:11,883 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model resnet18-baseline 2021-02-04 10:39:11,884 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model resnet18-baseline 2021-02-04 10:39:11,884 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model resnet18-baseline 2021-02-04 10:39:11,884 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model resnet18-baseline loaded. 2021-02-04 10:39:11,884 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: resnet18-baseline, count: 1 2021-02-04 10:39:11,904 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2021-02-04 10:39:11,954 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 021-02-04 10:39:11,904 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2021-02-04 10:39:11,954 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081 2021-02-04 10:39:11,955 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2021-02-04 10:39:11,956 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082 2021-02-04 10:39:11,986 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: /tmp/.ts.sock.9000 2021-02-04 10:39:11,986 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]2460 2021-02-04 10:39:11,986 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started. 2021-02-04 10:39:11,987 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.12 2021-02-04 10:39:11,987 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet18-baseline_1.0 State change null -> WORKER_STARTED 2021-02-04 10:39:11,997 [INFO ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000 2021-02-04 10:39:12,005 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: /tmp/.ts.sock.9000. 2021-02-04 10:39:13,935 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died. 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last): 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 182, in <module> 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - worker.run_server() 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 154, in run_server 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 154, in run_server 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.handle_connection(cl_socket) 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 116, in handle_connection 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg) 2021-02-04 10:39:13,936 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED 2021-02-04 10:39:13,936 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_service_worker.py”, line 89, in load_model 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope) 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_loader.py”, line 96, in load 2021-02-04 10:39:13,937 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - entry_point, initialize_fn = self._get_class_entry_point(module) 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File “~/.conda/envs/ncrf/lib/python3.6/site-packages/ts/model_loader.py”, line 138, in _get_class_entry_point 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - model_class_definitions)) 2021-02-04 10:39:13,937 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:188) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2021-02-04 10:39:13,937 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - ValueError: Expected only one class in custom service code or a function entry point [<class ‘baidu_handler.BaiduClassifierFeatureExtractor’>, <class ‘baidu_handler.ScaleIntensityBaidu’>] 2021-02-04 10:39:13,939 [WARN ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: resnet18-baseline, error: Worker died. 2021-02-04 10:39:13,939 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet18-baseline_1.0 State change WORKER_STARTED -> WORKER_STOPPED 2021-02-04 10:39:13,939 [WARN ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-resnet18-baseline_1.0-stderr 2021-02-04 10:39:13,939 [WARN ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-resnet18-baseline_1.0-stdout 2021-02-04 10:39:13,940 [INFO ] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds. 2021-02-04 10:39:13,962 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-resnet18-baseline_1.0-stdout 2021-02-04 10:39:13,962 [INFO ] W-9000-resnet18-baseline_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-resnet18-baseline_1.0-stderr 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: /tmp/.ts.sock.9000 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]2490 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started. 2021-02-04 10:39:15,013 [INFO ] W-9000-resnet18-baseline_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.12 2021-02-04 10:39:15,013 [DEBUG] W-9000-resnet18-baseline_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet18-baseline_1.0 State change WORKER_STOPPED -> WORKER_STARTED
</details>
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:13

Top Related StackOverflow Question
I was able to resolve it by moving the transform class into a separate file and passing the latter under the
--extra-files.@lxning I believe this still needs to be documented properly.