[MCOL-5341] PrimProc crashes running string to int128 conversion Created: 2022-12-09  Updated: 2023-11-17  Resolved: 2023-07-25

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 22.08.6
Fix Version/s: 23.02.4

Type: Bug Priority: Critical
Reporter: Rick Pizzi Assignee: Roman
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File allt.txt    
Sprint: 2022-22, 2022-23, 2023-4, 2023-5, 2023-6, 2023-7

 Description   

PrimProc crashed with following stack trace

Date/time: 2022-12-09 10:39:08
Signal: 11
/usr/bin/PrimProc(+0xc012a)[0x556a2a2a012a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f7ef6912420]
/usr/lib/x86_64-linux-gnu/libdataconvert.so(_ZN11dataconvert10strtoll128EPKcRbPPc+0x1c)[0x7f7ef6ee206c]
/usr/lib/x86_64-linux-gnu/libdataconvert.so(_ZN11dataconvert16number_int_valueInEEvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN9datatypes13SystemCatalog11ColDataTypeERKNSA_17TypeAttributesStdERbbRT_Pb+0xc1c)[0x7f7ef6ee80cc]
/usr/lib/x86_64-linux-gnu/libbrm.so(_ZNK9datatypes13SystemCatalog17TypeAttributesStd20decimal128FromStringERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPb+0x37)[0x7f7ef6d57c77]
/usr/lib/x86_64-linux-gnu/librowgroup.so(_ZN8rowgroup16RowAggregationUM26doNotNullConstantAggregateERKNS_15ConstantAggDataEm+0xe25)[0x7f7ef7237c45]
/usr/lib/x86_64-linux-gnu/librowgroup.so(_ZN8rowgroup16RowAggregationUM20fixConstantAggregateEv+0x17e)[0x7f7ef722b0de]
/usr/lib/x86_64-linux-gnu/librowgroup.so(_ZN8rowgroup16RowAggregationUM8finalizeEv+0x8d)[0x7f7ef723583d]
/usr/lib/x86_64-linux-gnu/libjoblist.so(_ZN7joblist18TupleAggregateStep19doThreadedAggregateERN11messageqcpp10ByteStreamEPNS_4FIFOIN8rowgroup6RGDataEEE+0x467)[0x7f7ef7acdc37]
/usr/lib/x86_64-linux-gnu/libjoblist.so(_ZN7joblist18TupleAggregateStep11doAggregateEv+0x77)[0x7f7ef7ace4b7]
/usr/lib/x86_64-linux-gnu/libthreadpool.so(_ZN10threadpool10ThreadPool11beginThreadEv+0x5df)[0x7f7ef68931cf]
/usr/bin/PrimProc(+0xc142b)[0x556a2a2a142b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f7ef6906609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f7ef63f1133]



 Comments   
Comment by Rick Pizzi [ 2022-12-09 ]

PrimProc was running when I checked, because systemd restarted it right away.

Comment by Roman [ 2022-12-09 ]

There are two potentially separate issues here.
First and most pressing is PrimProc crash. I found out that Ubuntu packages doesn't have debug symbols in .so libraries so even with crash trace I can't answer why exactly it crashes.
Here are the next steps for this part:

  • RDBA team activates core dump collecting for PP. The customer re-enables the query. We collect a core. After I get the core I can answer what is wrong and fix the issue.
  • I research why we don''t ship debug symbols with deb packages
    The second part is about all those threads that stuck in MDB runtime. I have a suspicion those stuck queries are not connected with PP crash. To check this RDBA team has to connect to a running MDB and take the output of 'thr apply all br' command.
Comment by Rick Pizzi [ 2022-12-10 ]

drrtuy but the second point above is exactly what I have done (the file attached to this ticket is exactly that)

Generated at Thu Feb 08 02:57:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.