Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5265

TupleHashJoin::segregateJoiners double throws causing SIGABRT

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 6.3.1, 6.4.5-dompe
    • 22.08.8
    • ExeMgr

    Description

      Here is another crash trace from a client.

      Date/time: 2022-10-05 04:05:39
      Signal: 6
       
      /usr/bin/ExeMgr(+0x2840c)[0x555c5010c40c]
      /lib64/libpthread.so.0(+0xf630)[0x7f9a185eb630]
      /lib64/libc.so.6(gsignal+0x37)[0x7f9a16513387]
      /lib64/libc.so.6(abort+0x148)[0x7f9a16514a78]
      /lib64/libc.so.6(+0x2f1a6)[0x7f9a1650c1a6]
      /lib64/libc.so.6(+0x2f252)[0x7f9a1650c252]
      /lib64/libjoblist.so(+0x22e273)[0x7f9a1d075273]
      /lib64/libjoblist.so(_ZN7joblist17TupleHashJoinStep16segregateJoinersEv+0x6b9)[0x7f9a1d078609]
      /lib64/libjoblist.so(_ZN7joblist17TupleHashJoinStep8hjRunnerEv+0x77c)[0x7f9a1d081d2c]
      /lib64/libthreadpool.so(_ZN10threadpool10ThreadPool11beginThreadEv+0x5a8)[0x7f9a1791b7c8]
      /lib64/libboost_thread-mt.so.1.53.0(+0xd25a)[0x7f9a1948d25a]
      /lib64/libpthread.so.0(+0x7ea5)[0x7f9a185e3ea5]
      

      Presumably EM throws an exception processing another exception.

      Attachments

        1. CSX 202210120851.PNG
          CSX 202210120851.PNG
          25 kB
        2. libjoblist.so
          3.01 MB
        3. Pixid_CSX1_logs.tar.gz
          452 kB
        4. Pixid_CSX2_logs.tar.gz
          294 kB
        5. Pixid_CSX3_logs.tar.gz
          577 kB
        6. stuck_queries_on_Pixid.txt
          0.9 kB

        Issue Links

          Activity

            drrtuy Roman added a comment -

            Thx 2 allen.herrera for the research on this crash trace.
            This line

            /lib64/libjoblist.so(_ZN7joblist17TupleHashJoinStep16segregateJoinersEv+0x6b9)[0x7f9a1d078609]
            

            points to /usr/include/boost/thread/lock_types.hpp:327
            The line that is above the mentioned in the stack points to this line in our code:

            for (i = 0; i < feIndexes.size() && joiners.size() > 0; i++)
                joiners[feIndexes[i]]->setFcnExpFilter(fe[i]);
             
              /* segregate the Joiners into ones for TBPS and ones for DJS */
              segregateJoiners();
             
              /* Need to clean this stuff up.  If the query was cancelled before this, and this would have had
                 a disk join, it's still necessary to construct the DJS objects to finish the abort.
                 Update: Is this more complicated than scanning joiners for either ondisk() or (not isFinished())
                 and draining the corresponding inputs & telling downstream EOF?  todo, think about it */
              if (!djsJoiners.empty())     <-------------------------------------------------------------------- # Line 720
              {
                joinIsTooBig = false;
             
                if (!cancelled())
                  fLogger->logMessage(logging::LOG_TYPE_INFO, logging::INFO_SWITCHING_TO_DJS);
             
                uint32_t smallSideCount = djsJoiners.size();
            

            drrtuy Roman added a comment - Thx 2 allen.herrera for the research on this crash trace. This line /lib64/libjoblist.so(_ZN7joblist17TupleHashJoinStep16segregateJoinersEv+0x6b9)[0x7f9a1d078609] points to /usr/include/boost/thread/lock_types.hpp:327 The line that is above the mentioned in the stack points to this line in our code: for (i = 0; i < feIndexes.size() && joiners.size() > 0; i++) joiners[feIndexes[i]]->setFcnExpFilter(fe[i]);   /* segregate the Joiners into ones for TBPS and ones for DJS */ segregateJoiners();   /* Need to clean this stuff up. If the query was cancelled before this, and this would have had a disk join, it's still necessary to construct the DJS objects to finish the abort. Update: Is this more complicated than scanning joiners for either ondisk() or (not isFinished()) and draining the corresponding inputs & telling downstream EOF? todo, think about it */ if (!djsJoiners.empty()) <-------------------------------------------------------------------- # Line 720 { joinIsTooBig = false;   if (!cancelled()) fLogger->logMessage(logging::LOG_TYPE_INFO, logging::INFO_SWITCHING_TO_DJS);   uint32_t smallSideCount = djsJoiners.size();

            denis0x0D MTR failed on this https://ci.columnstore.mariadb.net/mariadb-corporation/mariadb-columnstore-engine/6039/7/10 on some distro

            most likely unrelated.. 1. Please confirm

            drrtuythe question is if we want to deliver this to Pixid as a private build when we are not sure if this is the cause.

            alexey.vorovich alexey vorovich (Inactive) added a comment - denis0x0D MTR failed on this https://ci.columnstore.mariadb.net/mariadb-corporation/mariadb-columnstore-engine/6039/7/10 on some distro most likely unrelated.. 1. Please confirm drrtuy the question is if we want to deliver this to Pixid as a private build when we are not sure if this is the cause.

            People

              denis0x0D Denis Khalikov (Inactive)
              drrtuy Roman
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.