Possible memory leak in Ape-X
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 16.04
- Ray installed from (source or binary): binary
- Ray version: 0.6.0
- Python version: 2.7
- Exact command to reproduce: rllib train -f crash.yaml
You can run this on any 64-core CPU machine:
crash.yaml:
apex:
env:
grid_search:
- BreakoutNoFrameskip-v4
- BeamRiderNoFrameskip-v4
- QbertNoFrameskip-v4
- SpaceInvadersNoFrameskip-v4
run: APEX
config:
double_q: false
dueling: false
num_atoms: 1
noisy: false
n_step: 3
lr: .0001
adam_epsilon: .00015
hiddens: [512]
buffer_size: 1000000
schedule_max_timesteps: 2000000
exploration_final_eps: 0.01
exploration_fraction: .1
prioritized_replay_alpha: 0.5
beta_annealing_fraction: 1.0
final_prioritized_replay_beta: 1.0
num_gpus: 0
# APEX
num_workers: 8
num_envs_per_worker: 8
sample_batch_size: 20
train_batch_size: 1
target_network_update_freq: 50000
timesteps_per_iteration: 25000
Describe the problem
Source code / logs
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/workers/default_worker.py", line 99, in <module>
ray.worker.global_worker.main_loop()
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/worker.py", line 1010, in main_loop
self._wait_for_and_process_task(task)
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/worker.py", line 967, in _wait_for_and_process_task
self._process_task(task, execution_info)
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/worker.py", line 865, in _process_task
traceback_str)
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/worker.py", line 889, in _handle_process_task_failure
self._store_outputs_in_object_store(return_object_ids, failure_objects)
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/worker.py", line 798, in _store_outputs_in_object_store
self.put_object(object_ids[i], outputs[i])
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/worker.py", line 411, in put_object
self.store_and_register(object_id, value)
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/worker.py", line 346, in store_and_register
self.task_driver_id))
File "/home/ubuntu/.local/lib/python2.7/site-packages/ray/utils.py", line 404, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 534, in pyarrow._plasma.PlasmaClient.put
buffer = self.create(target_id, serialized.total_bytes)
File "pyarrow/_plasma.pyx", line 344, in pyarrow._plasma.PlasmaClient.create
check_status(self.client.get().Create(object_id.data, data_size,
File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
raise ArrowIOError(message)
ArrowIOError: Broken pipe
This error is unexpected and should not have happened. Somehow a worker
crashed in an unanticipated way causing the main_loop to throw an exception,
which is being caught in "python/ray/workers/default_worker.py".
The rest of the experiment keeps running, but the particular trial fails.
Issue Analytics
- State:
- Created 5 years ago
- Comments:15 (14 by maintainers)
Top Results From Across the Web
Solved: Apex Legends: FPS Drop, Memory Leak. - Answers HQ
I went to Task Manager and saw that there's a possible Memory-Leak for this game when started. Anyway to fix, please help. Graphics...
Read more >Vram Memory leak : r/apexlegends - Reddit
"NVIDIA Reflex ENABLED+BOOST" Is the cause of Vram Memory leak. If it doesn't work ;. GeForce Experience --> APEX Legends (DETILS) --> Custom ......
Read more >Identify and mitigate memory leaks - Salesforce Help
This article will help clarify what information will be needed by Salesforce Support to troubleshoot the issue.
Read more >Possible Memory leak when playing for some hours | Apex ...
Hey @T732232,. Since you're running 8GB of RAM a 12GB page file makes sense. The rule of thumb is generally 1.5 times your...
Read more >GPU memory issues (leak?) · Issue #439 · NVIDIA/apex - GitHub
The GPU memory accumulates and after a few steps in the loop CUDA memory runs out. I have debugged everything, have monitored memory,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ericl and I determined that the error messages like
The output of an actor task is required, but the actor may still be alive. If the output has been evicted, the job may hang.
are expected, but we should fix the backend so that the job doesn’t hang. I’m currently working on a PR to treat the task as failed if the object really has been evicted.Oh just kidding this is single node
On Sat, Dec 22, 2018 at 8:55 PM Richard Liaw rich.liaw@gmail.com wrote: