question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hi,

I played dqn_workflow with 7.9G training_data. But i got a OOM Killed. Below is my environment and oom logs.

workflow : dqn_workflow.py training_data : 8 features, 20,249,257 rows, 7.9G training_eval_data : 8 features, 2,028,916 rows, 0.8G RAM : 80G

INFO:ml.rl.evaluation.evaluation_data_page:EvaluationDataPage minibatch size: 2028912
WARNING:ml.rl.evaluation.doubly_robust_estimator:Can't normalize DR-CPE because of small or negative logged_policy_score
Killed
[Tue May  7 22:05:38 2019] python invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
[Tue May  7 22:05:38 2019] python cpuset=42ee6ef8b84594988960735ef211ac05221059efc2d524f2afc1e2b49eb46d0c mems_allowed=0-1
[Tue May  7 22:05:38 2019] CPU: 1 PID: 51997 Comm: python Tainted: P           O      4.20.13-1.el7.elrepo.x86_64 #1
[Tue May  7 22:05:38 2019] Hardware name: Dell Inc. PowerEdge C4140/013M88, BIOS 1.6.11 11/21/2018
[Tue May  7 22:05:38 2019] Call Trace:
[Tue May  7 22:05:38 2019]  dump_stack+0x63/0x88
[Tue May  7 22:05:38 2019]  dump_header+0x78/0x2a4
[Tue May  7 22:05:38 2019]  ? mem_cgroup_scan_tasks+0x9c/0xf0
[Tue May  7 22:05:38 2019]  oom_kill_process+0x26b/0x290
[Tue May  7 22:05:38 2019]  out_of_memory+0x140/0x4b0
[Tue May  7 22:05:38 2019]  mem_cgroup_out_of_memory+0x4b/0x80
[Tue May  7 22:05:38 2019]  try_charge+0x6e2/0x750
[Tue May  7 22:05:38 2019]  mem_cgroup_try_charge+0x8c/0x1e0
[Tue May  7 22:05:38 2019]  __add_to_page_cache_locked+0x1a0/0x300
[Tue May  7 22:05:38 2019]  ? scan_shadow_nodes+0x30/0x30
[Tue May  7 22:05:38 2019]  add_to_page_cache_lru+0x4e/0xd0
[Tue May  7 22:05:38 2019]  filemap_fault+0x428/0x7c0
[Tue May  7 22:05:38 2019]  ? xas_find+0x138/0x1a0
[Tue May  7 22:05:38 2019]  ? filemap_map_pages+0x153/0x3c0
[Tue May  7 22:05:38 2019]  __do_fault+0x3e/0xc0
[Tue May  7 22:05:38 2019]  __handle_mm_fault+0xbd6/0xe80
[Tue May  7 22:05:38 2019]  handle_mm_fault+0x102/0x220
[Tue May  7 22:05:38 2019]  __do_page_fault+0x21c/0x4c0
[Tue May  7 22:05:38 2019]  do_page_fault+0x37/0x140
[Tue May  7 22:05:38 2019]  ? page_fault+0x8/0x30
[Tue May  7 22:05:38 2019]  page_fault+0x1e/0x30
...
[Tue May  7 22:05:38 2019] Memory cgroup out of memory: Kill process 51997 (python) score 997 or sacrifice child
[Tue May  7 22:05:38 2019] Killed process 51997 (python) total-vm:102757536kB, anon-rss:83335008kB, file-rss:132692kB, shmem-rss:8192kB
[Tue May  7 22:05:42 2019] oom_reaper: reaped process 51997 (python), now anon-rss:0kB, file-rss:127188kB, shmem-rss:8192kB

image green : CPU yellow : RAM

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
czxttklcommented, May 20, 2019

Great. Let’s close this issue.

1reaction
czxttklcommented, May 13, 2019

@pjy953

Np. Let us know how it goes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Out Of Memory Management - The Linux Kernel Archives
If no, not OOM; If there hasn't been 10 failures at least in the last 5 seconds, we're not OOM; Has a process...
Read more >
Linux Out of Memory killer - Knowledge Base - Neo4j
The Out Of Memory Killer or OOM Killer is a process that the linux kernel employs when the system is critically low on...
Read more >
How does the OOM killer decide which process to kill first?
The OOM Killer has to select the best process(es) to kill. Best here refers to that process which will free up the maximum...
Read more >
How to Find Which Process Was Killed by Linux OOM Killer
A quick and practical guide to debugging Linux OOM errors. ... How to Find Which Process Was Killed by Linux OOM Killer.
Read more >
Linux Out-Of-Memory Killer. What is this ? | by Rakesh Jain
The “OOM Killer” or “Out of Memory Killer” is a process that the Linux kernel employs when the system is critically low on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found