[MCOL-5487] A race in BRM causes SEGV working with managed shared mem segment Created: 2023-04-27  Updated: 2023-08-22  Resolved: 2023-05-01

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport, PrimProc
Affects Version/s: 23.02.2
Fix Version/s: 23.02.3

Type: Bug Priority: Major
Reporter: Roman Assignee: Roman
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Issue split
split to MCOL-5488 Shmem RWLock is not strict enough for... Closed
Relates
relates to MCOL-5559 Shmem segment remap causes SEGV in Ex... Closed
Sprint: 2023-6
Assigned for Review: Gagan Goel Gagan Goel (Inactive)

 Description   

The client observes SEGV crashes of cpimport, PP and DMLProc.
According with the crashtrace all three demonstrated read/write conflict accessing managed shared mem segment for Extent Map. You can find the conflicting threads. One of two tries to remap managed shmem segment whilst another reads from the virtual memory address space allocated for the previous mount of EM shmem segment.

Stack trace of thread 72889:
#0  0x00007f9b5037112f _ZN3BRM9ExtentMap10findByLBIDEl (libbrm.so + 0xae12f)
#1  0x00007f9b5037177a _ZN3BRM9ExtentMap18getEmIdentsByLbidsERKN5boost9container6vectorIlvvEE (libbrm.so + 0xae77a)
#2  0x00007f9b50376f03 _ZN3BRM9ExtentMap11lookupLocalEijtjRl (libbrm.so + 0xb3f03)
#3  0x00007f9b5035cf35 _ZN3BRM4DBRM11lookupLocalEijtjRl (libbrm.so + 0x99f35)
#4  0x00007f9b512ebfc9 _ZN11WriteEngine10BRMWrapper10getBrmInfoEjjtiRl (libwriteengine.so + 0x94fc9)
#5  0x00007f9b513304eb _ZN11WriteEngine6Dctnry10openDctnryERKjtjtb (libwriteengine.so + 0xd94eb)
#6  0x000055c090ddb483 _ZN11WriteEngine10ColumnInfo15openDctnryStoreEb (cpimport.bin + 0x50483)
#7  0x000055c090df5ee9 _ZN11WriteEngine14BulkLoadBuffer9parseDictERNS_10ColumnInfoE (cpimport.bin + 0x6aee9)
#8  0x000055c090de5ad2 _ZN11WriteEngine9TableInfo11parseColumnERKiS2_Rd (cpimport.bin + 0x5aad2)
#9  0x000055c090def3b1 _ZN11WriteEngine8BulkLoad5parseEi (cpimport.bin + 0x643b1)
#10 0x00007f9b50127032 thread_proxy (libboost_thread.so.1.75.0 + 0x14032)
#11 0x00007f9b4f96f812 start_thread (libc.so.6 + 0x9f812)
#12 0x00007f9b4f90f450 __clone3 (libc.so.6 + 0x3f450)
 
Stack trace of thread 72888:
#0  0x00007f9b4fa17d97 __mmap (libc.so.6 + 0x147d97)
#1  0x00007f9b503507e3 _ZN5boost12interprocess13mapped_regionC2INS0_20shared_memory_objectEEERKT_NS0_6mode_tElmPKvi (libbrm.so + 0x8d7e3)
#2  0x00007f9b50357647 _ZN5boost12interprocess9ipcdetail27managed_open_or_create_implINS0_20shared_memory_objectELm16ELb1ELb0EE19priv_open_or_createINS1_16create_open_funcINS1_25basic_managed_memory_implIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvlmLm0EEELm0EEENS0_10iset_indexELm16EEEEEEEvNS1_13create_enum_tERKPKcmNS0_6mode_tEPKvRKNS0_11permissionsET_ (libbrm.so + 0x94647)
#3  0x00007f9b5034f5c6 _ZN3BRM23BRMManagedShmImplRBTree12reMapSegmentEv (libbrm.so + 0x8c5c6)
#4  0x00007f9b5036edc0 _ZN3BRM19ExtentMapRBTreeImpl23makeExtentMapRBTreeImplEjlb (libbrm.so + 0xabdc0)
#5  0x00007f9b50375293 _ZN3BRM9ExtentMap16grabEMEntryTableENS0_3OPSE (libbrm.so + 0xb2293)
#6  0x00007f9b50376e80 _ZN3BRM9ExtentMap11lookupLocalEijtjRl (libbrm.so + 0xb3e80)
#7  0x00007f9b5035cf35 _ZN3BRM4DBRM11lookupLocalEijtjRl (libbrm.so + 0x99f35)
#8  0x00007f9b512ebfc9 _ZN11WriteEngine10BRMWrapper10getBrmInfoEjjtiRl (libwriteengine.so + 0x94fc9)
#9  0x00007f9b513304eb _ZN11WriteEngine6Dctnry10openDctnryERKjtjtb (libwriteengine.so + 0xd94eb)
#10 0x000055c090ddb483 _ZN11WriteEngine10ColumnInfo15openDctnryStoreEb (cpimport.bin + 0x50483)
#11 0x000055c090df5ee9 _ZN11WriteEngine14BulkLoadBuffer9parseDictERNS_10ColumnInfoE (cpimport.bin + 0x6aee9)
#12 0x000055c090de5ad2 _ZN11WriteEngine9TableInfo11parseColumnERKiS2_Rd (cpimport.bin + 0x5aad2)
#13 0x000055c090def3b1 _ZN11WriteEngine8BulkLoad5parseEi (cpimport.bin + 0x643b1)
#14 0x00007f9b50127032 thread_proxy (libboost_thread.so.1.75.0 + 0x14032)
#15 0x00007f9b4f96f812 start_thread (libc.so.6 + 0x9f812)
#16 0x00007f9b4f90f450 __clone3 (libc.so.6 + 0x3f450)

There is an edge case that makes it possible to remap the segment hold only Read lock thus allows such conflict. The suggested solution would be to add shmem RWLock upgrade for this edge case.


Generated at Thu Feb 08 02:58:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.