[MCOL-5544] StatisticsManager crashes on PP startup unable to read the file. Created: 2023-07-28  Updated: 2024-01-09

Status: In Progress
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 23.02.3
Fix Version/s: 23.10

Type: Bug Priority: Critical
Reporter: Roman Assignee: Denis Khalikov
Resolution: Unresolved Votes: 1
Labels: None

Attachments: HTML File statistics_backup    
Sprint: 2023-8, 2023-10, 2023-11, 2023-12

 Description   

PP was continuously crashing on its startup. There was a crash trace:

{format}
Date/time: 2023-07-28 14:13:27
Signal: 11
/usr/bin/PrimProc(+0xb8116)[0x55b7b95d5116]
/lib64/libpthread.so.0(+0xf630)[0x7f7e1de5d630]
/lib64/libcommon.so(_ZN10statistics17StatisticsManager26convertStatsFromDataStreamESt10unique_ptrIA_cSt14default_deleteIS2_EE+0x14e)[0x7f7e1e833dce]
/lib64/libcommon.so(_ZN10statistics17StatisticsManager12loadFromFileEv+0x244)[0x7f7e1e834204]
/usr/bin/PrimProc(+0xabb4d)[0x55b7b95c8b4d]
/usr/bin/PrimProc(+0x4f1c5)[0x55b7b956c1c5]
/usr/bin/PrimProc(+0x1b1a80)[0x55b7b96cea80]
/lib64/libpthread.so.0(+0x7ea5)[0x7f7e1de55ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f7e1ca01b0d]{format}

Presumably the /var/lib/columnstore/local/statistics file is crashed. I am attaching the file.



 Comments   
Comment by Roman [ 2023-11-03 ]

Right denis0x0D, but before control flow loads data, it makes a buffer using data size from statistics storage file. And if the data size is crazy large this causes SEGV allocating the buffer. We need a failure detection here, e.g. save a hash of the data size counter and if hash(dataSize) != saved_hash StatisticsManager should clean statistics storage file and proceed.

Comment by Roman [ 2023-11-03 ]

We have the actual file this time.

Comment by JiraAutomate [ 2023-12-17 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Comment by Massimo [ 2023-12-18 ]

hi
why this jira has been closed ? we got this issue in more then one customer., what feedback is need it leonid.fedorov?

Comment by Roman [ 2023-12-18 ]

We decided to re-open the issue.

Generated at Thu Feb 08 02:58:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.