[MCOL-5559] Shmem segment remap causes SEGV in ExtentMapIndexImpl::find Created: 2023-08-21  Updated: 2023-11-15  Resolved: 2023-11-01

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 23.02.3
Fix Version/s: 23.10.1, 23.10.0

Type: Bug Priority: Critical
Reporter: Allen Herrera Assignee: Roman
Resolution: Fixed Votes: 0
Labels: None
Environment:

HA 3 node redhat 8.7
Each node: 48 core, 750GB ram
NFS shared storage


Issue Links:
Duplicate
is duplicated by MCOL-5488 Shmem RWLock is not strict enough for... Closed
Relates
relates to MCOL-5488 Shmem RWLock is not strict enough for... Closed
relates to MCOL-5565 Queries stuck in MDB waiting for an a... In Progress
relates to MCOL-5487 A race in BRM causes SEGV working wit... Closed
Sprint: 2023-8, 2023-10, 2023-11
Assigned for Review: Gagan Goel Gagan Goel (Inactive)
Assigned for Testing: Allen M Herrera Allen M Herrera

 Description   

See Developer Comments for the types of queries running and logs
Core dump enabled, will share when given (8/24/2023 - segfault still hasnt reoccured )

This is currently happening every night,
Currently every night queries hang requiring the cluster to be restarted to continue with nightly ETL (daily aggregation). The only red flag found was the following crash trace once.

We need
1) avoid queries from hanging
2) avoid the seg fault
3) have the cluster recover from 1 primproc restarting

Crash Trace 1

Date/time: 2023-08-20 20:39:11
Signal: 11
 
/usr/bin/PrimProc(+0xb70f6)[0x556a0ed230f6]
/lib64/libpthread.so.0(+0x12cf0)[0x7fcaa8c5dcf0]
/lib64/libbrm.so(_ZN5boost9unordered13unordered_mapIiNS1_IjNS_9container6vectorIlNS_12interprocess9allocatorIlNS4_15segment_managerIcNS4_15rbtree_best_fitINS4_12mutex_familyENS4_10offset_ptrIvlmLm0EEELm0EEENS4_10iset_indexEEEEEvEENS_4hashIjEESt8equal_toIjENS5_ISt4pairIKjSF_ESD_EEEENSG_IiEESI_IiENS5_ISK_IKiSO_ESD_EEE4findERSR_+0x15b)[0x7fcaa943040b]
/lib64/libbrm.so(_ZN3BRM18ExtentMapIndexImpl14search2ndLayerERN5boost9unordered13unordered_mapIiNS3_IjNS1_9container6vectorIlNS1_12interprocess9allocatorIlNS6_15segment_managerIcNS6_15rbtree_best_fitINS6_12mutex_familyENS6_10offset_ptrIvlmLm0EEELm0EEENS6_10iset_indexEEEEEvEENS1_4hashIjEESt8equal_toIjENS7_ISt4pairIKjSH_ESF_EEEENSI_IiEESK_IiENS7_ISM_IKiSQ_ESF_EEEEi+0x49)[0x7fcaa9416ce9]
/lib64/libbrm.so(_ZN3BRM18ExtentMapIndexImpl4findEti+0x77)[0x7fcaa9417b57]
/lib64/libbrm.so(_ZN3BRM9ExtentMap10getExtentsEiRSt6vectorINS_7EMEntryESaIS2_EEbbb+0xe0)[0x7fcaa9423e10]
/lib64/libbrm.so(_ZN3BRM4DBRM10getExtentsEiRSt6vectorINS_7EMEntryESaIS2_EEbbb+0x23)[0x7fcaa9404883]
/lib64/libjoblist.so(_ZN7joblist15pDictionaryScanC1EiiRKN8execplan20CalpontSystemCatalog7ColTypeERKNS_7JobInfoE+0x32a)[0x7fcaaa17737a]
/lib64/libjoblist.so(+0x1652a7)[0x7fcaaa0db2a7]
/lib64/libjoblist.so(_ZN7joblist21JLF_ExecPlanToJobList8walkTreeEPN8execplan9ParseTreeERNS_7JobInfoE+0x212)[0x7fcaaa0dea42]
/lib64/libjoblist.so(_ZN7joblist21JLF_ExecPlanToJobList8walkTreeEPN8execplan9ParseTreeERNS_7JobInfoE+0x606)[0x7fcaaa0dee36]
/lib64/libjoblist.so(+0x1cac4f)[0x7fcaaa140c4f]
/lib64/libjoblist.so(_ZN7joblist12makeJobStepsEPN8execplan26CalpontSelectExecutionPlanERNS_7JobInfoERSt6vectorIN5boost10shared_ptrINS_7JobStepEEESaIS9_EESC_RSt3mapIiS9_St4lessIiESaISt4pairIKiS9_EEE+0x249)[0x7fcaaa143b89]
/lib64/libjoblist.so(+0x1cfcfd)[0x7fcaaa145cfd]
/lib64/libjoblist.so(_ZN7joblist14JobListFactory11makeJobListEPN8execplan20CalpontExecutionPlanEPNS_15ResourceManagerERK26PrimitiveServerThreadPoolsbb+0x62)[0x7fcaaa146142]
/usr/bin/PrimProc(+0xb2163)[0x556a0ed1e163]
/lib64/libthreadpool.so(_ZN10threadpool10ThreadPool11beginThreadEv+0x615)[0x7fcaa89f2ad5]
/usr/bin/PrimProc(+0xb8bd7)[0x556a0ed24bd7]
/lib64/libpthread.so.0(+0x81ca)[0x7fcaa8c531ca]
/lib64/libc.so.6(clone+0x43)[0x7fcaa7668e73]

Debugging Trace 1

LBID_tFindResult ExtentMapIndexImpl::find(const DBRootT dbroot, const OID_t oid)
{
  ExtentMapIndex& emIndex = *get();
  if (dbroot >= emIndex.size())
    return {};
  return search2ndLayer(emIndex[dbroot], oid);
}

Messages File 1

 
# On Node 1
Aug 20 20:39:44 atx-mdb101pl messagequeue[516171]: 44.604637 |0|0|0| W 31 CAL0000: Client read close socket for InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Connection reset by peer        %%10%%
Aug 20 20:39:44 atx-mdb101pl env[516171]: DEC Caught EXCEPTION: InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Connection reset by peer
 
# On Node 2
Aug 20 20:39:44 atx-mdb102pl messagequeue[3758352]: 44.591925 |0|0|0| W 31 CAL0000: MessageQueueClient::write: error writing 16 bytes to IOSocket: sd: 24 inet: 10.224.140.32 port: 8620. Socket error was InetStreamSocket::write error: Broken pipe -- write from InetStreamSocket: sd: 24 inet: 10.224.140.32 port: 8620#012         %%10%%
 
# On Node 3
Aug 20 20:39:11 atx-mdb103pl systemd[1]: Started Process Core Dump (PID 1287119/UID 0).
Aug 20 20:39:11 atx-mdb103pl systemd-coredump[1287121]: Resource limits disable core dumping for process 792020 (PrimProc).
Aug 20 20:39:11 atx-mdb103pl systemd-coredump[1287121]: Process 792020 (PrimProc) of user 993 dumped core.

Workload

The customer ingests 3TB of raw data a day. over 2.6 Billion records, they batch the data to be imported hourly into each table. At night they aggregate all the hourly data into a daily table to summary/reduce the data footprint.  This topic / query is split into 48 parts to optimize for extent elimination based on date/ lat/long search as well as to divide the data to not oom on group by calc because they have so many distinct values in the group by criteria. 
parts 0 to 13 is loaded on first node, parts 14 to 26 on second node and parts 27 to 47 on the third node. The parts are not equally distributed as they represent geographical areas of a map, imagine a lat/long grid. so 
 
high level example:
sudo -u mysql ${MCSMYSQL} -qs ${DATABASE} -N -e"${DAILY_SQL}" > ${TOPIC}_DAILY.tbl
sudo -u mysql ${CPIMPORT} -m 3 -j1337

Crash Trace 2:
While this ran there was 1 query running on node 2 and a couple cpimports across all nodes

Date/time: 2023-08-29 08:34:04
Signal: 11
 
/usr/bin/PrimProc(+0xb70f6)[0x561b2fe610f6]
/lib64/libpthread.so.0(+0x12cf0)[0x7f0304045cf0]
/lib64/libbrm.so(_ZN3BRM9ExtentMap10findByLBIDEl+0x185)[0x7f0304801045]
/lib64/libbrm.so(_ZN3BRM9ExtentMap18getEmIdentsByLbidsERKN5boost9container6vectorIlvvEE+0x1e4)[0x7f0304801974]
/lib64/libbrm.so(_ZN3BRM9ExtentMap10getExtentsEiRSt6vectorINS_7EMEntryESaIS2_EEbbb+0x11c)[0x7f030480be4c]
/lib64/libbrm.so(_ZN3BRM4DBRM10getExtentsEiRSt6vectorINS_7EMEntryESaIS2_EEbbb+0x23)[0x7f03047ec883]
/lib64/libjoblist.so(_ZN7joblist8pColStepC2EiiRKN8execplan20CalpontSystemCatalog7ColTypeERKNS_7JobInfoE+0x41e)[0x7f03055583ae]
/lib64/libjoblist.so(+0x163f30)[0x7f03054c1f30]
/lib64/libjoblist.so(_ZN7joblist21JLF_ExecPlanToJobList8walkTreeEPN8execplan9ParseTreeERNS_7JobInfoE+0x212)[0x7f03054c6a42]
/lib64/libjoblist.so(+0x1cac4f)[0x7f0305528c4f]
/lib64/libjoblist.so(_ZN7joblist12makeJobStepsEPN8execplan26CalpontSelectExecutionPlanERNS_7JobInfoERSt6vectorIN5boost10shared_ptrINS_7JobStepEEESaIS9_EESC_RSt3mapIiS9_St4lessIiESaISt4pairIKiS9_EEE+0x249)[0x7f030552bb89]
/lib64/libjoblist.so(+0x1cfcfd)[0x7f030552dcfd]
/lib64/libjoblist.so(_ZN7joblist14JobListFactory11makeJobListEPN8execplan20CalpontExecutionPlanEPNS_15ResourceManagerERK26PrimitiveServerThreadPoolsbb+0x62)[0x7f030552e142]
/lib64/libexecplan.so(_ZN8execplan20CalpontSystemCatalog13getSysData_ECERNS_26CalpontSelectExecutionPlanERNS0_14NJLSysDataListERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x98)[0x7f0304f15258]
/lib64/libexecplan.so(_ZN8execplan20CalpontSystemCatalog10getSysDataERNS_26CalpontSelectExecutionPlanERNS0_14NJLSysDataListERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x8b1)[0x7f0304f161b1]
/lib64/libexecplan.so(_ZN8execplan20CalpontSystemCatalog9getTablesENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEi+0xb6d)[0x7f0304f1cb5d]
/lib64/libexecplan.so(_ZN8execplan20CalpontSystemCatalog13getSchemaInfoERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEi+0x28d)[0x7f0304f2ae9d]
/usr/bin/PrimProc(+0xb05e1)[0x561b2fe5a5e1]
/usr/bin/PrimProc(+0xb0653)[0x561b2fe5a653]
/usr/bin/PrimProc(+0xb0653)[0x561b2fe5a653]
/usr/bin/PrimProc(+0xb30f2)[0x561b2fe5d0f2]
/lib64/libthreadpool.so(_ZN10threadpool10ThreadPool11beginThreadEv+0x615)[0x7f0303ddaad5]
/usr/bin/PrimProc(+0xb8bd7)[0x561b2fe62bd7]
/lib64/libpthread.so.0(+0x81ca)[0x7f030403b1ca]
/lib64/libc.so.6(clone+0x43)[0x7f0302a50e73]

Debugging Trace 2:
/usr/src/debug/MariaDB-/src_0/storage/columnstore/columnstore/.boost/boost-lib/include/boost/intrusive/bstree_algorithms.hpp:2034

   template<class KeyType, class KeyNodePtrCompare>
   static node_ptr lower_bound_loop
      (node_ptr x, node_ptr y, const KeyType &key, KeyNodePtrCompare comp)
   {
      while(x){
         if(comp(x, key)){                 <----------- line 2034
            x = NodeTraits::get_right(x);
         }
         else{
            y = x;
            x = NodeTraits::get_left(x);
         }
      }
      return y;
   }



 Comments   
Comment by Roman [ 2023-10-20 ]

The patch has been merged both in develop-23.02 and develop branch. The patch has been released with 23.10.0 but never released for 23.02 branch.
If you have ways to repro this allen.herrera you can try the patch with 23.10.0.
If not I will proceed and close this as resolved.

Comment by Allen Herrera [ 2023-11-15 ]

drrtuy please confirm
This ticket 5559 was closed because the crash trace for the described seg fault was never explicitly reproduced but the crash trace file and backtrace attached were clear enough to fix from a theoretical code flow view.

The issue for the hanging queries requires other changes separate from this issue.
we do have a reproduction for that and all the work for the hanging queries, primproc restart will be contained with MCOL 5565.

Generated at Thu Feb 08 02:58:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.