Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5487

A race in BRM causes SEGV working with managed shared mem segment

    XMLWordPrintable

Details

    • 2023-6

    Description

      The client observes SEGV crashes of cpimport, PP and DMLProc.
      According with the crashtrace all three demonstrated read/write conflict accessing managed shared mem segment for Extent Map. You can find the conflicting threads. One of two tries to remap managed shmem segment whilst another reads from the virtual memory address space allocated for the previous mount of EM shmem segment.

      Stack trace of thread 72889:
      #0  0x00007f9b5037112f _ZN3BRM9ExtentMap10findByLBIDEl (libbrm.so + 0xae12f)
      #1  0x00007f9b5037177a _ZN3BRM9ExtentMap18getEmIdentsByLbidsERKN5boost9container6vectorIlvvEE (libbrm.so + 0xae77a)
      #2  0x00007f9b50376f03 _ZN3BRM9ExtentMap11lookupLocalEijtjRl (libbrm.so + 0xb3f03)
      #3  0x00007f9b5035cf35 _ZN3BRM4DBRM11lookupLocalEijtjRl (libbrm.so + 0x99f35)
      #4  0x00007f9b512ebfc9 _ZN11WriteEngine10BRMWrapper10getBrmInfoEjjtiRl (libwriteengine.so + 0x94fc9)
      #5  0x00007f9b513304eb _ZN11WriteEngine6Dctnry10openDctnryERKjtjtb (libwriteengine.so + 0xd94eb)
      #6  0x000055c090ddb483 _ZN11WriteEngine10ColumnInfo15openDctnryStoreEb (cpimport.bin + 0x50483)
      #7  0x000055c090df5ee9 _ZN11WriteEngine14BulkLoadBuffer9parseDictERNS_10ColumnInfoE (cpimport.bin + 0x6aee9)
      #8  0x000055c090de5ad2 _ZN11WriteEngine9TableInfo11parseColumnERKiS2_Rd (cpimport.bin + 0x5aad2)
      #9  0x000055c090def3b1 _ZN11WriteEngine8BulkLoad5parseEi (cpimport.bin + 0x643b1)
      #10 0x00007f9b50127032 thread_proxy (libboost_thread.so.1.75.0 + 0x14032)
      #11 0x00007f9b4f96f812 start_thread (libc.so.6 + 0x9f812)
      #12 0x00007f9b4f90f450 __clone3 (libc.so.6 + 0x3f450)
       
      Stack trace of thread 72888:
      #0  0x00007f9b4fa17d97 __mmap (libc.so.6 + 0x147d97)
      #1  0x00007f9b503507e3 _ZN5boost12interprocess13mapped_regionC2INS0_20shared_memory_objectEEERKT_NS0_6mode_tElmPKvi (libbrm.so + 0x8d7e3)
      #2  0x00007f9b50357647 _ZN5boost12interprocess9ipcdetail27managed_open_or_create_implINS0_20shared_memory_objectELm16ELb1ELb0EE19priv_open_or_createINS1_16create_open_funcINS1_25basic_managed_memory_implIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvlmLm0EEELm0EEENS0_10iset_indexELm16EEEEEEEvNS1_13create_enum_tERKPKcmNS0_6mode_tEPKvRKNS0_11permissionsET_ (libbrm.so + 0x94647)
      #3  0x00007f9b5034f5c6 _ZN3BRM23BRMManagedShmImplRBTree12reMapSegmentEv (libbrm.so + 0x8c5c6)
      #4  0x00007f9b5036edc0 _ZN3BRM19ExtentMapRBTreeImpl23makeExtentMapRBTreeImplEjlb (libbrm.so + 0xabdc0)
      #5  0x00007f9b50375293 _ZN3BRM9ExtentMap16grabEMEntryTableENS0_3OPSE (libbrm.so + 0xb2293)
      #6  0x00007f9b50376e80 _ZN3BRM9ExtentMap11lookupLocalEijtjRl (libbrm.so + 0xb3e80)
      #7  0x00007f9b5035cf35 _ZN3BRM4DBRM11lookupLocalEijtjRl (libbrm.so + 0x99f35)
      #8  0x00007f9b512ebfc9 _ZN11WriteEngine10BRMWrapper10getBrmInfoEjjtiRl (libwriteengine.so + 0x94fc9)
      #9  0x00007f9b513304eb _ZN11WriteEngine6Dctnry10openDctnryERKjtjtb (libwriteengine.so + 0xd94eb)
      #10 0x000055c090ddb483 _ZN11WriteEngine10ColumnInfo15openDctnryStoreEb (cpimport.bin + 0x50483)
      #11 0x000055c090df5ee9 _ZN11WriteEngine14BulkLoadBuffer9parseDictERNS_10ColumnInfoE (cpimport.bin + 0x6aee9)
      #12 0x000055c090de5ad2 _ZN11WriteEngine9TableInfo11parseColumnERKiS2_Rd (cpimport.bin + 0x5aad2)
      #13 0x000055c090def3b1 _ZN11WriteEngine8BulkLoad5parseEi (cpimport.bin + 0x643b1)
      #14 0x00007f9b50127032 thread_proxy (libboost_thread.so.1.75.0 + 0x14032)
      #15 0x00007f9b4f96f812 start_thread (libc.so.6 + 0x9f812)
      #16 0x00007f9b4f90f450 __clone3 (libc.so.6 + 0x3f450)
      

      There is an edge case that makes it possible to remap the segment hold only Read lock thus allows such conflict. The suggested solution would be to add shmem RWLock upgrade for this edge case.

      Attachments

        Issue Links

          Activity

            People

              drrtuy Roman
              drrtuy Roman
              Gagan Goel Gagan Goel (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.