[MDEV-10831] MariaDB 10.0.24 crash with signal 11 Created: 2016-09-19  Updated: 2018-07-17  Resolved: 2018-07-16

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB
Affects Version/s: 10.0.24-galera
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: TYNDALE BANZA Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: galera
Environment:

AWS
CentOS release 6.7


Attachments: Text File mariadb log.txt    

 Description   

The node is part of Galera Cluster wsrep_25.13.raf7f02e . The node crashed with signal 11 . This is part of a 6 node cluster .

Log file is attached .



 Comments   
Comment by Arjen Lentz [ 2017-10-19 ]

Crashing bug, open for over a year. I find this worrying.
thoughts, serg ?

Comment by Sergei Golubchik [ 2017-10-22 ]

We have slightly more than 20 open galera-related bugs, and this one is not even the oldest. Generally bugs are fixed in the ORDER BY priority, created to make sure old bugs aren't stuck forever. I'll increase the priority of this bug to have it fixed earlier.

Btw, we plan to have more developers to work on these bugs, that should help too.

Comment by Arjen Lentz [ 2017-10-23 ]

Thanks serg - I'm not specifically concerned about this bug, I think we worked around it for the particular client it affected.
I was just concerned that a (any) crashing bug stays open for that long.
I understand the priority system, which is generally perfectly sensible.
If there's that much of a backlog, indeed the issue appears to be dev-resources so having more sounds great.
thanks

Comment by Jan Lindström (Inactive) [ 2018-07-16 ]

Elkin Stack trace indicates problems on unpacking blob from replication event, can you have a look?

Comment by Andrei Elkin [ 2018-07-16 ]

Jan, salute.

I have looked through the stack

/usr/sbin/mysqld(Field_blob::unpack(unsigned char*, unsigned char const*, unsigned char const*, unsigned int)+0xf0)[0x73d290]
/usr/sbin/mysqld(unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*)+0x425)[0x820545]
/usr/sbin/mysqld(Rows_log_event::find_row(rpl_group_info*)+0x45f)[0x81ce9f]
/usr/sbin/mysqld(Delete_rows_log_event::do_exec_row(rpl_group_info*)+0x6f)[0x81d34f]
/usr/sbin/mysqld(Rows_log_event::do_apply_event(rpl_group_info*)+0x251)[0x810411]
/usr/sbin/mysqld(wsrep_apply_cb(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*)+0x50e)[0x7044ae]
/usr/lib64/galera/libgalera_smm.so(galera::TrxHandle::apply(void*, wsrep_cb_status (*)(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*), wsrep_trx_meta const&) const+0xd3)[0x7f2f1f110063]

and tried to locate the line at which segfault happened. We have an offset '0xf0', but what line it corresponds
to depends on the build properties whose info the report does not say about.
I guessed {{ -DCMAKE_BUILD_TYPE=RelWithDebInfo }} but then the last disassembled line of
Field_blob::unpack has offset of the decimal 134:

   ...
   0x0000555555a48f86 <+134>:   retq   
   0x0000555555a48f87:  nop
   0x0000555555a48f88:  nopl   0x0(%rax,%rax,1)

As there is no way to reproduce I fear we can't have any progress in this case at all.

I return it back to you to review my decision to close it with No-reproduce.

Andrei

Comment by Jan Lindström (Inactive) [ 2018-07-16 ]

Please upgrade to more recent release and if you still can repeat please send full error log (preferable using wsrep-debug=ON), configuration and instructions how to repeat.

Comment by Arjen Lentz [ 2018-07-17 ]

Elkin jplindst the environment was MariaDB 10.0.24-galera, installed from the MariaDB repo for CentOS 6.7
Doesn't that info provide an exact marker for the build properties?

I think it would be unfortunate to let this one slip with no-reproduce; each incident like this in the world is not just an inconvenience to a user, but also an opportunity for developers to track down and fix a problem. Problems may only trigger intermittently, so leaving them be is very unfortunate. Segfaults tend to not magically disappear in newer versions unless there's refactoring involved, the issues may just hide away a bit further and cause problems for other users later.
Let's chase this one to its maximum extent. thanks.

Generated at Thu Feb 08 07:45:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.