Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
6.2.3
Description
We got signal 11 on READING nodes and cluster get stuck :
this is the trace:
Date/time: 2022-04-06 08:42:26
Signal: 11
/usr/bin/ExeMgr(+0x282ac)[0x562c12d5e2ac]
/lib64/libpthread.so.0(+0xf630)[0x7fee7b4dc630]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition15writeByteStreamEiRN11messageqcpp10ByteStreamE+0xe5)[0x7fee7f1bf245]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition18processLargeBufferERN8rowgroup6RGDataE+0x200)[0x7fee7f1bfd30]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition18processLargeBufferEv+0x16)[0x7fee7f1bff06]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition22doneInsertingLargeDataEv+0xcd)[0x7fee7f1c01bd]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition22doneInsertingLargeDataEv+0x55)[0x7fee7f1c0145]
/lib64/libjoblist.so(_ZN7joblist12DiskJoinStep11largeReaderEv+0x117)[0x7fee7fe59487]
/lib64/libjoblist.so(_ZN7joblist12DiskJoinStep10mainRunnerEv+0x5a)[0x7fee7fe5966a]
/lib64/libthreadpool.so(_ZN10threadpool10ThreadPool11beginThreadEv+0x5a8)[0x7fee7a80d488]
/lib64/libboost_thread-mt.so.1.53.0(+0xd25a)[0x7fee7c37e25a]
/lib64/libpthread.so.0(+0x7ea5)[0x7fee7b4d4ea5]
/lib64/libc.so.6(clone+0x6d)[0x7fee794cdb0d]
and this is the log:
Apr 6 08:42:27 pixid-csx1 systemd: mcs-exemgr.service: main process exited, code=killed, status=11/SEGV
Apr 6 08:42:27 pixid-csx1 systemd: Unit mcs-exemgr.service entered failed state.
Apr 6 08:42:27 pixid-csx1 systemd: mcs-exemgr.service failed.
Apr 6 08:42:27 pixid-csx1 systemd: mcs-exemgr.service holdoff time over, scheduling restart.
Apr 6 08:42:27 pixid-csx1 systemd: Cannot add dependency job for unit systemd-tmpfiles-clean.timer, ignoring: Unit is masked.
Apr 6 08:42:27 pixid-csx1 systemd: Stopping WriteEngineServer...
Apr 6 08:42:27 pixid-csx1 systemd: Stopped WriteEngineServer.
Apr 6 08:42:27 pixid-csx1 systemd: Stopped mcs-exemgr.
Apr 6 08:42:27 pixid-csx1 systemd: Starting mcs-exemgr...