question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

JVM crashed when start job_worker progress

See original GitHub issue

Alluxio Version: master branch, 2.3.0, 2.4.0

Describe the bug I compiled the alluxio locally with the system of centos 6 successfully. But when I try to start the job_worker progress, I got the following error and the JVM crashed.

2020-11-12 15:56:04,999 INFO network.NettyUtils (NettyUtils.java:checkNettyEpollAvailable) - EPOLL_MODE is available 2020-11-12 15:56:05,517 INFO metrics.MetricsSystem (MetricsSystem.java:startSinksFromConfig) - Starting sinks with config: {}. 2020-11-12 15:56:05,519 INFO metrics.MetricsHeartbeatContext (MetricsHeartbeatContext.java:addHeartbeat) - Created metrics heartbeat with ID app-8127555058117044977. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property 2020-11-12 15:56:05,547 INFO network.TieredIdentityFactory (TieredIdentityFactory.java:localIdentity) - Initialized tiered identity TieredIdentity(node=100.76.19.7, rack=presto-ss-qe-presto-test) 2020-11-12 15:56:05,596 INFO util.log (Log.java:initialized) - Logging initialized @1076ms to org.eclipse.jetty.util.log.Slf4jLog 2020-11-12 15:56:05,725 INFO alluxio.ProcessUtils (ProcessUtils.java:run) - Starting Alluxio job worker. 2020-11-12 15:56:05,725 INFO alluxio.ProcessUtils (ProcessUtils.java:run) - Running under Java 1.8.0_252 2020-11-12 15:56:05,726 INFO web.WebServer (WebServer.java:start) - Alluxio Job Manager Worker Web service starting @ /0.0.0.0:30003 2020-11-12 15:56:05,727 INFO metrics.MetricsHeartbeatContext (MetricsHeartbeatContext.java:addHeartbeat) - Created metrics heartbeat with ID app-4950460193034851762. This ID will be used for identifying info from the client. It can be set manually through the alluxio.user.app.id property 2020-11-12 15:56:05,730 INFO server.Server (Server.java:doStart) - jetty-9.4.31.v20200723; built: 2020-07-23T17:57:36.812Z; git: 450ba27947e13e66baa8cd1ce7e85a4461cacc1d; jvm 1.8.0_252-b4 2020-11-12 15:56:05,756 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.s.ServletContextHandler@7cbd9d24{/metrics/json,null,AVAILABLE} 2020-11-12 15:56:05,757 WARN security.SecurityHandler (ConstraintSecurityHandler.java:checkPathsWithUncoveredHttpMethods) - ServletContext@o.e.j.s.ServletContextHandler@50dfbc58{/,null,STARTING} has uncovered http methods for path: / 2020-11-12 15:56:09,586 INFO handler.ContextHandler (ContextHandler.java:doStart) - Started o.e.j.s.ServletContextHandler@50dfbc58{/,null,AVAILABLE} 2020-11-12 15:56:09,594 INFO server.AbstractConnector (AbstractConnector.java:doStart) - Started ServerConnector@4470fbd6{HTTP/1.1, (http/1.1)}{0.0.0.0:30003} 2020-11-12 15:56:09,595 INFO server.Server (Server.java:doStart) - Started @5075ms 2020-11-12 15:56:09,595 INFO web.WebServer (WebServer.java:start) - Alluxio Job Manager Worker Web service started @ /0.0.0.0:30003 2020-11-12 15:56:09,653 INFO worker.AlluxioJobWorkerProcess (AlluxioJobWorkerProcess.java:start) - Started Alluxio job worker with id 1605167752223 2020-11-12 15:56:09,653 INFO worker.AlluxioJobWorkerProcess (AlluxioJobWorkerProcess.java:start) - Alluxio job worker version 2.5.0-SNAPSHOT started. bindHost=/0.0.0.0:30001, connectHost=tdw-100-76-19-7:30001, rpcPort=30001, webPort=30003 2020-11-12 15:56:09,653 INFO worker.AlluxioJobWorkerProcess (AlluxioJobWorkerProcess.java:startServingRPCServer) - Starting gRPC server on address tdw-100-76-19-7:30001 2020-11-12 15:56:09,689 INFO worker.AlluxioJobWorkerProcess (AlluxioJobWorkerProcess.java:startServingRPCServer) - Started gRPC server on address tdw-100-76-19-7:30001 # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007ff6822993b8, pid=100388, tid=0x00007ff6001c1700 # # JRE version: OpenJDK Runtime Environment (8.0_252-b04) (build 1.8.0_252-b4) # Java VM: OpenJDK 64-Bit Server VM (25.252-b4 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [ld-linux-x86-64.so.2+0xb3b8] _dl_relocate_object+0x98 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again # # An error report file with more information is saved as: # /data/tdwadmin/tdwenv/panyliu/alluxio-2.5-tq-0.1.0-SNAPSHOT/bin/hs_err_pid100388.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug.

Here is the stack info in the detailed crash report file.

Stack: [0x00007f23423e8000,0x00007f23424e9000], sp=0x00007f23424e4de0, free space=1011k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [ld-linux-x86-64.so.2+0xab12] _dl_relocate_object+0xa2 C [ld-linux-x86-64.so.2+0x1315f] dl_open_worker+0x38f C [ld-linux-x86-64.so.2+0xe7b6] _dl_catch_error+0x66 C [libdl.so.2+0xf76] dlopen_doit+0x66
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;Z)V+0 j java.lang.ClassLoader.loadLibrary0(Ljava/lang/Class;Ljava/io/File;)Z+328 j java.lang.ClassLoader.loadLibrary(Ljava/lang/Class;Ljava/lang/String;Z)V+48 j java.lang.Runtime.load0(Ljava/lang/Class;Ljava/lang/String;)V+57 j java.lang.System.load(Ljava/lang/String;)V+7 j com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath()V+110 j com.sun.jna.Native.loadNativeDispatchLibrary()V+420 j com.sun.jna.Native.<clinit>()V+108 v ~StubRoutines::call_stub j oshi.jna.platform.linux.LinuxLibc.<clinit>()V+4 v ~StubRoutines::call_stub j oshi.hardware.platform.linux.LinuxCentralProcessor.getSystemLoadAverage(I)[D+24 j alluxio.worker.job.command.JobWorkerHealthReporter.compute()V+18 j alluxio.worker.job.command.CommandHandlingExecutor.heartbeat()V+4 j alluxio.heartbeat.HeartbeatThread.run()V+78 j java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;+4 J 2294 C1 java.util.concurrent.FutureTask.run()V (126 bytes) @ 0x00007f244966ea64 [0x00007f244966e800+0x264] j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub

Here is the gdb debug info(I am not familiar with it):

(gdb) info shared From To Syms Read Shared Object Library No linux-vdso.so.1 0x00007f7ad75fb060 0x00007f7ad75fc4f8 Yes /lib64/libonion.so 0x00007f7ad70c6950 0x00007f7ad70d30f8 Yes /lib64/libpthread.so.0 0x00007f7ad6eac410 0x00007f7ad6eb9778 Yes /data/tdwenv/TencentKona-8.0.3-262/bin/…/lib/amd64/jli/libjli.so 0x00007f7ad6ca6e10 0x00007f7ad6ca78e8 Yes /lib64/libdl.so.2 0x00007f7ad6918580 0x00007f7ad6a49594 Yes /lib64/libc.so.6 0x00007f7ad72dfae0 0x00007f7ad72f8950 Yes /lib64/ld-linux-x86-64.so.2 0x00007f7ad5aec870 0x00007f7ad63e9058 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/server/libjvm.so 0x00007f7ad55d4790 0x00007f7ad5641748 Yes /lib64/libm.so.6 0x00007f7ad53c92a0 0x00007f7ad53cc2d8 Yes /lib64/librt.so.1 0x00007f7ad51bb340 0x00007f7ad51c22b8 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libverify.so 0x00007f7ad4f9a5c0 0x00007f7ad4fadf78 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libjava.so 0x00007f7ad4d738a0 0x00007f7ad4d84898 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libzip.so 0x00007f7aa433fe30 0x00007f7aa43470e8 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libnio.so 0x00007f7aa4124bf0 0x00007f7aa4134098 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libnet.so 0x00007f7a98289a70 0x00007f7a9828c498 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libmanagement.so No /tmp/libnetty_transport_native_epoll_x86_648487691960033771233.so 0x00007f7a69de8790 0x00007f7a69de8b98 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libjaas_unix.so 0x00007f7a68594840 0x00007f7a685b27b8 Yes () /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libsunec.so 0x00007f7a68377910 0x00007f7a68387f18 Yes /lib64/libgcc_s-4.4.6-20110824.so.1 No /home/panyliu/.cache/JNA/temp/jna3956775499404260402.tmp (): Shared library is missing debugging information. (gdb) bt #0 0x00007f7ad692bb15 in raise (sig=6) at …/nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f7ad692cf25 in abort () at abort.c:89 #2 0x00007f7ad6211735 in os::abort(bool) () from /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/server/libjvm.so #3 0x00007f7ad63b8ee3 in VMError::report_and_die() () from /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/server/libjvm.so #4 0x00007f7ad6218242 in JVM_handle_linux_signal () from /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/server/libjvm.so #5 0x00007f7ad620d4d3 in signalHandler(int, siginfo*, void*) () from /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/server/libjvm.so #6 <signal handler called> #7 _dl_relocate_object (scope=0x7f79b40231d8, reloc_mode=<value optimized out>, consider_profiling=0) at dl-reloc.c:238 #8 0x00007f7ad72f215f in dl_open_worker (a=<value optimized out>) at dl-open.c:416 #9 0x00007f7ad72ed7b6 in _dl_catch_error (objname=0x7f7a68cd4fd0, errstring=0x7f7a68cd4fc8, mallocedp=0x7f7a68cd4fdf, operate=0x7f7ad72f1dd0 <dl_open_worker>, args=0x7f7a68cd4f80) at dl-error.c:177 #10 0x00007f7ad72f191a in _dl_open (file=0x7f79b4022950 “/home/panyliu/.cache/JNA/temp/jna3956775499404260402.tmp”, mode=-2147483647, caller_dlopen=0x7f7ad6214d1d, nsid=-2, argc=19, argv=<value optimized out>, env=0x7fffb87486e8) at dl-open.c:650 #11 0x00007f7ad6ca6f76 in dlopen_doit (a=0x7f7a68cd51a0) at dlopen.c:66 #12 0x00007f7ad72ed7b6 in _dl_catch_error (objname=0x7f79b40011d0, errstring=0x7f79b40011d8, mallocedp=0x7f79b40011c8, operate=0x7f7ad6ca6f10 <dlopen_doit>, args=0x7f7a68cd51a0) at dl-error.c:177 #13 0x00007f7ad6ca72ec in _dlerror_run (operate=0x7f7ad6ca6f10 <dlopen_doit>, args=0x7f7a68cd51a0) at dlerror.c:163 #14 0x00007f7ad6ca6ef1 in __dlopen (file=<value optimized out>, mode=<value optimized out>) at dlopen.c:87 #15 0x00007f7ad6214d1d in os::dll_load(char const*, char*, int) () from /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/server/libjvm.so #16 0x00007f7ad600a173 in JVM_LoadLibrary () from /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/server/libjvm.so #17 0x00007f7ad4f9b7b8 in Java_java_lang_ClassLoader_00024NativeLibrary_load () from /data/tdwenv/TencentKona-8.0.3-262/jre/lib/amd64/libjava.so #18 0x00007f7ac1018507 in ?? () #19 0x00000007099e9038 in ?? () #20 0x00007f7ac10080a1 in ?? () #21 0x00007f7a68cd5da8 in ?? () #22 0x00007f7ac10080a1 in ?? () #23 0x00007f7a68cd5d50 in ?? () #24 0x0000000000000000 in ?? () (gdb)

It seems related to the native method load. The jvm cannot find the .so file or somthing else thus got a signal from the linux kernal and then shutdown. I know litttle about the jna called of alluxio, so I don’t figure out the crash reason yet. Any suggestion is appreciate. By the way, this problem will not happen when using the community version, so it seems an issue related to the compiling env, but I am not sure.

To Reproduce Compiling locally in centos 6. exec following command: ./bin/alluxio-start.sh local

Expected behavior The job_worker progress starts successfully.

Urgency HIGH

Additional context When I update the OSHI version above 5.3.1, the problem solved.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
maobaolongcommented, Nov 14, 2020

@apc999 We use jdk8

1reaction
bradyoocommented, Nov 13, 2020

It appears that @liupan664021 has figured out the issue and made a fix. The commit is good so I just merged it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

6 Crashing JVM - Oracle Help Center
This chapter describes how to diagnose and troubleshoot JVM crashes. ... Before you start troubleshooting an out-of-virtual-memory error, you must verify ...
Read more >
How To Diagnose And Prevent Java (JVM) Crashes - Xperti
1. Classify the JVM crashes. The first question is, where can you even start diagnosing a Java crash? You need to know why...
Read more >
Crashing when running from source - Usage & Issues
Hi all, I installed and am running Cellprofiler 4 from source (installed using these instructions) to use the Cellpose plugin.
Read more >
How We Diagnosed a JVM Crash - New Relic
When all the JVMs in our distributed application quit unexpectedly one day, the New Relic Browser team diagnosed a JVM crash.
Read more >
The JVM is crashing with SIGBUS which causes the ...
The JVM is crashing with SIGBUS which causes the AdminServer process to terminate. The AdminServer log does not have any errors, the entries ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found