[MCOL-5042] ExeMgr crashes running disk-based JOIN Created: 2022-04-06  Updated: 2022-06-07  Resolved: 2022-04-13

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 6.2.3
Fix Version/s: 6.4.1

Type: Bug Priority: Major
Reporter: Massimo Assignee: Roman
Resolution: Fixed Votes: 4
Labels: triage


 Description   

We got signal 11 on READING nodes and cluster get stuck :
this is the trace:

Date/time: 2022-04-06 08:42:26
Signal: 11

/usr/bin/ExeMgr(+0x282ac)[0x562c12d5e2ac]
/lib64/libpthread.so.0(+0xf630)[0x7fee7b4dc630]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition15writeByteStreamEiRN11messageqcpp10ByteStreamE+0xe5)[0x7fee7f1bf245]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition18processLargeBufferERN8rowgroup6RGDataE+0x200)[0x7fee7f1bfd30]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition18processLargeBufferEv+0x16)[0x7fee7f1bff06]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition22doneInsertingLargeDataEv+0xcd)[0x7fee7f1c01bd]
/lib64/libjoiner.so(_ZN6joiner13JoinPartition22doneInsertingLargeDataEv+0x55)[0x7fee7f1c0145]
/lib64/libjoblist.so(_ZN7joblist12DiskJoinStep11largeReaderEv+0x117)[0x7fee7fe59487]
/lib64/libjoblist.so(_ZN7joblist12DiskJoinStep10mainRunnerEv+0x5a)[0x7fee7fe5966a]
/lib64/libthreadpool.so(_ZN10threadpool10ThreadPool11beginThreadEv+0x5a8)[0x7fee7a80d488]
/lib64/libboost_thread-mt.so.1.53.0(+0xd25a)[0x7fee7c37e25a]
/lib64/libpthread.so.0(+0x7ea5)[0x7fee7b4d4ea5]
/lib64/libc.so.6(clone+0x6d)[0x7fee794cdb0d]

and this is the log:

Apr 6 08:42:27 pixid-csx1 systemd: mcs-exemgr.service: main process exited, code=killed, status=11/SEGV
Apr 6 08:42:27 pixid-csx1 systemd: Unit mcs-exemgr.service entered failed state.
Apr 6 08:42:27 pixid-csx1 systemd: mcs-exemgr.service failed.
Apr 6 08:42:27 pixid-csx1 systemd: mcs-exemgr.service holdoff time over, scheduling restart.
Apr 6 08:42:27 pixid-csx1 systemd: Cannot add dependency job for unit systemd-tmpfiles-clean.timer, ignoring: Unit is masked.
Apr 6 08:42:27 pixid-csx1 systemd: Stopping WriteEngineServer...
Apr 6 08:42:27 pixid-csx1 systemd: Stopped WriteEngineServer.
Apr 6 08:42:27 pixid-csx1 systemd: Stopped mcs-exemgr.
Apr 6 08:42:27 pixid-csx1 systemd: Starting mcs-exemgr...



 Comments   
Comment by Roman [ 2022-04-13 ]

ERROR 1815 (HY000): Internal error: InetStreamSocket::readToMagic: Remote is closed
This line tells there was EM crash and the plugin lost its connection with EM

Generated at Thu Feb 08 02:54:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.