Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5368

PrimProc eventually failed on slave node. Docker.

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • None
    • None
    • cmapi
    • None

    Description

      Steps to reproduce:
      1. build and start 3 node cluster w or w/o MXS. Build and verification is ok (green).
      2. exec to the slave node 2 using docker exec -it mcs2 bash, for me it is mcs2 everytime.
      3. check process list (ps aux or mcs cluster status), all MCS processes should exist
      4. wait 1-2 minutes and do nothing with a cluster
      5. check process list again, now PrimProc process is gone

      After PrimProc gone I got those additional info:
      From /var/log/mariadb/columnstore/trace/PrimProc****

      Date/time: 2022-12-13 16:11:25
      Signal: 11
       
      /usr/bin/PrimProc(+0xbe6c6)[0x55dacb8066c6]
      /lib64/libpthread.so.0(+0x12cf0)[0x7f29c217dcf0]
      /lib64/libjoblist.so(_ZN7joblist21DistributedEngineComm5SetupEv+0x1384)[0x7f29c3634b14]
      /lib64/libjoblist.so(_ZN7joblist21DistributedEngineComm6ListenEN5boost10shared_ptrIN11messageqcpp18MessageQueueClientEEEj+0x522)[0x7f29c3635e02]
      /lib64/libjoblist.so(+0x13b046)[0x7f29c3636046]
      /usr/bin/PrimProc(+0xc01a7)[0x55dacb8081a7]
      /lib64/libpthread.so.0(+0x81cf)[0x7f29c21731cf]
      /lib64/libc.so.6(clone+0x43)[0x7f29c0b87e73]
      

      Using MariaDB-columnstore-engine-debuginfo package I got those:

      nm /usr/lib/debug/usr/lib64/libjoblist.so-10.6.11_6_22.08.4-1.el8.x86_64.debug | grep _ZN7joblist21DistributedEngineComm5SetupEv
      0000000000138790 T _ZN7joblist21DistributedEngineComm5SetupEv
      00000000000acd94 t _ZN7joblist21DistributedEngineComm5SetupEv.cold
       
      0x1384 + 0x138790 = 0x139B14
       
      addr2line -f -e /lib64/libjoblist.so 0x139b14
      _ZN7joblist21DistributedEngineComm5SetupEv
      /usr/src/debug/MariaDB-/src_0/storage/columnstore/columnstore/.boost/boost-lib/include/boost/smart_ptr/shared_ptr.hpp:786
      

      At the same time I could observe this messages at debug.log
      Seems that this is not related but anyway.

      Dec 13 16:55:49 mcs2 joblist[564]: 49.623685 |0|0|0| W 05 CAL0000: /mdb/verylongdirnameforverystrangecpackbehavior/storage/columnstore/columnstore/dbcon/joblist/distributedenginecomm.cpp @ 308 Could not connect to PMS0: Connection refused from PMS0      %%10%%
      Dec 13 16:55:49 mcs2 joblist[564]: 49.624413 |0|0|0| W 05 CAL0000: /mdb/verylongdirnameforverystrangecpackbehavior/storage/columnstore/columnstore/dbcon/joblist/distributedenginecomm.cpp @ 308 Could not connect to PMS0: Connection refused from PMS0      %%10%%
      Dec 13 16:55:49 mcs2 joblist[564]: 49.624919 |0|0|0| W 05 CAL0000: /mdb/verylongdirnameforverystrangecpackbehavior/storage/columnstore/columnstore/dbcon/joblist/distributedenginecomm.cpp @ 308 Could not connect to PMS0: Connection refused from PMS0      %%10%%
      Dec 13 16:55:49 mcs2 joblist[564]: 49.625456 |0|0|0| W 05 CAL0000: /mdb/verylongdirnameforverystrangecpackbehavior/storage/columnstore/columnstore/dbcon/joblist/distributedenginecomm.cpp @ 308 Could not connect to PMS0: Connection refused from PMS0      %%10%%
      

      Attachments

        Activity

          People

            alan.mologorsky Alan Mologorsky
            alan.mologorsky Alan Mologorsky
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.