[MCOL-5265] TupleHashJoin::segregateJoiners double throws causing SIGABRT Created: 2022-10-13  Updated: 2023-11-17  Resolved: 2022-11-15

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 6.3.1, 6.4.5-dompe
Fix Version/s: 22.08.8

Type: Bug Priority: Major
Reporter: Roman Assignee: Denis Khalikov
Resolution: Fixed Votes: 0
Labels: triage

Attachments: PNG File CSX 202210120851.PNG     File Pixid_CSX1_logs.tar.gz     File Pixid_CSX2_logs.tar.gz     File Pixid_CSX3_logs.tar.gz     File libjoblist.so     Text File stuck_queries_on_Pixid.txt    
Issue Links:
Blocks
is blocked by MCOL-5308 debug build : line numbers , "-O0",... Closed

 Description   

Here is another crash trace from a client.

Date/time: 2022-10-05 04:05:39
Signal: 6
 
/usr/bin/ExeMgr(+0x2840c)[0x555c5010c40c]
/lib64/libpthread.so.0(+0xf630)[0x7f9a185eb630]
/lib64/libc.so.6(gsignal+0x37)[0x7f9a16513387]
/lib64/libc.so.6(abort+0x148)[0x7f9a16514a78]
/lib64/libc.so.6(+0x2f1a6)[0x7f9a1650c1a6]
/lib64/libc.so.6(+0x2f252)[0x7f9a1650c252]
/lib64/libjoblist.so(+0x22e273)[0x7f9a1d075273]
/lib64/libjoblist.so(_ZN7joblist17TupleHashJoinStep16segregateJoinersEv+0x6b9)[0x7f9a1d078609]
/lib64/libjoblist.so(_ZN7joblist17TupleHashJoinStep8hjRunnerEv+0x77c)[0x7f9a1d081d2c]
/lib64/libthreadpool.so(_ZN10threadpool10ThreadPool11beginThreadEv+0x5a8)[0x7f9a1791b7c8]
/lib64/libboost_thread-mt.so.1.53.0(+0xd25a)[0x7f9a1948d25a]
/lib64/libpthread.so.0(+0x7ea5)[0x7f9a185e3ea5]

Presumably EM throws an exception processing another exception.



 Comments   
Comment by Roman [ 2022-11-01 ]

Thx 2 allen.herrera for the research on this crash trace.
This line

/lib64/libjoblist.so(_ZN7joblist17TupleHashJoinStep16segregateJoinersEv+0x6b9)[0x7f9a1d078609]

points to /usr/include/boost/thread/lock_types.hpp:327
The line that is above the mentioned in the stack points to this line in our code:

for (i = 0; i < feIndexes.size() && joiners.size() > 0; i++)
    joiners[feIndexes[i]]->setFcnExpFilter(fe[i]);
 
  /* segregate the Joiners into ones for TBPS and ones for DJS */
  segregateJoiners();
 
  /* Need to clean this stuff up.  If the query was cancelled before this, and this would have had
     a disk join, it's still necessary to construct the DJS objects to finish the abort.
     Update: Is this more complicated than scanning joiners for either ondisk() or (not isFinished())
     and draining the corresponding inputs & telling downstream EOF?  todo, think about it */
  if (!djsJoiners.empty())     <-------------------------------------------------------------------- # Line 720
  {
    joinIsTooBig = false;
 
    if (!cancelled())
      fLogger->logMessage(logging::LOG_TYPE_INFO, logging::INFO_SWITCHING_TO_DJS);
 
    uint32_t smallSideCount = djsJoiners.size();

Comment by alexey vorovich (Inactive) [ 2022-11-15 ]

denis0x0D MTR failed on this https://ci.columnstore.mariadb.net/mariadb-corporation/mariadb-columnstore-engine/6039/7/10 on some distro

most likely unrelated.. 1. Please confirm

drrtuythe question is if we want to deliver this to Pixid as a private build when we are not sure if this is the cause.

Generated at Thu Feb 08 02:56:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.