[MCOL-749] DMLProc Segfault after killing update and restarting update. Created: 2017-06-06  Updated: 2017-08-09  Resolved: 2017-08-09

Status: Closed
Project: MariaDB ColumnStore
Component/s: DMLProc
Affects Version/s: 1.0.9
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Nivesh Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

redhat 7.2 Maipo
Linux s4-mariadbcs-um-01 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux


Attachments: Text File All_logs.txt     Text File root_messages.txt    

 Description   

I have an update running on a 500M Row table
I update only 50-100 rows in the table.
I then kill the update using maraidb kill command.
repeating the process I rerun the update and kill it before completion.

on the third update and kill the DMLProc process segfaults with the below and then restarts.
This is a root installation as I tried the non-root install and got the same error.

DMLProc[18462]: segfault at 30 ip 0000000000424d30 sp 00007f236f7fd410 error 4 in DMLProc[400000+48000]

Attached are the logs.

MaraidCS config =
2 UM
4 PM
each comprising of
132 G memory
1 dbroot
32 Intel Xeon cores.



 Comments   
Comment by David Thompson (Inactive) [ 2017-06-06 ]

Can you check the logs on the pms as well? Alternatively you can run the columnstoreSupport utility which will bundle up everything useful?

I'd suspect you are blowing the limit for our dml transaction buffer. If you need to do a very large update like this i'd suggest either increasing the version buffer size (see last section here: https://mariadb.com/kb/en/mariadb/columnstore-batch-insert-mode) or batching your updates by date. If your data is loaded in date order (or approximiately) this will also be efficient due to partition elimination.

Comment by Nivesh [ 2017-06-07 ]

Hi David.

The actual update is only 85 rows on the multi-million row table.
There are no error reported on the PM's during these errors.
unfortunatley I cannot upload the logs as the server were reinstalled.
Here is the primary OAM log files that I could get.

Comment by David Thompson (Inactive) [ 2017-06-07 ]

One thing i see in the logs is using a variety of where clause filters. For partition elimination to work currently you should avoid using functions:
update schema_1.test_data_fct a, schema_1.test_data_dim_user_prf b set a.co_cde = b.co_cde where a.user_ref = b.user_ref and a.bhdb = b.bhdb and a.co_cde <> b.co_cde and b.co_cde > 0 and a.event_date >= (date_format('2017-04-01','%Y%m%d'))

would be much faster (assuming data is loaded by event_date) as:
update schema_1.test_data_fct a, schema_1.test_data_dim_user_prf b set a.co_cde = b.co_cde where a.user_ref = b.user_ref and a.bhdb = b.bhdb and a.co_cde <> b.co_cde and b.co_cde > 0 and a.event_date >= '2017-04-01'

Otherwise the system must scan the entire set of rows.

YYYY-MM-DD is the default date format for mariadb

Comment by Nivesh [ 2017-06-20 ]

thanks for the info.
I will be rebuilding this environment for testing in the next 2 days (hoping to have servers by then ) and will try to replicate this bug.

Comment by David Thompson (Inactive) [ 2017-08-09 ]

Please re-open if this reproduces with the proposed changes.

Generated at Thu Feb 08 02:23:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.