[MDEV-9153] Crash of MariaDB 10.1.8 when updating from 10.0 and catching up on replication Created: 2015-11-18  Updated: 2015-12-25  Resolved: 2015-12-25

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kenny Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

debian 7.9


Attachments: Text File mariadb_101_crash_fullerrorlog.txt     Text File mariadb_101_crash_variables.txt    

 Description   

After upgrading the MariaDB packages from 10.0.22 to 10.1.8 and starting replication, the system crashes (no other workload).

By looking at the traces (the debian package was rebuild without dh_strip), it seems that it's related to the new encryption plugins? However, they do not seem to be enabled for anything (see attached variables)

I get 2 types of crashes:

151118  9:44:37 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.8-MariaDB-2~wheezy-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=9
max_threads=5052
thread_count=6
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15639278 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f1820c2b008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f17af2fe550 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f8864fedfbb]
/usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7f8864b4fa55]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f886414b0a0]
/usr/sbin/mysqld(_Z8do_cryptPKhjPhPjP20st_encryption_schemejjjyi+0x53)[0x7f8864b03663]
/usr/sbin/mysqld(encryption_scheme_decrypt+0x2b)[0x7f8864b038db]
/usr/sbin/mysqld(+0x9f37b4)[0x7f8864f6d7b4]
/usr/sbin/mysqld(+0x9f392e)[0x7f8864f6d92e]
/usr/sbin/mysqld(+0x991df3)[0x7f8864f0bdf3]
/usr/sbin/mysqld(+0x9921b0)[0x7f8864f0c1b0]
/usr/sbin/mysqld(+0x9aa578)[0x7f8864f24578]
/usr/sbin/mysqld(+0x9ab08f)[0x7f8864f2508f]
/usr/sbin/mysqld(+0x98a52f)[0x7f8864f0452f]
/usr/sbin/mysqld(+0x964e5e)[0x7f8864edee5e]
/usr/sbin/mysqld(+0x966557)[0x7f8864ee0557]
/usr/sbin/mysqld(+0x8640f2)[0x7f8864dde0f2]
/usr/sbin/mysqld(+0x8688ff)[0x7f8864de28ff]
/usr/sbin/mysqld(+0x966407)[0x7f8864ee0407]
/usr/sbin/mysqld(+0x8ded6c)[0x7f8864e58d6c]
/usr/sbin/mysqld(+0x8e1734)[0x7f8864e5b734]
/usr/sbin/mysqld(+0x8e1af8)[0x7f8864e5baf8]
/usr/sbin/mysqld(+0x8ee9bd)[0x7f8864e689bd]
/usr/sbin/mysqld(+0x84612d)[0x7f8864dc012d]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPh+0xf4)[0x7f8864b59204]
/usr/sbin/mysqld(_ZN14Rows_log_event9write_rowEP14rpl_group_infob+0x1b9)[0x7f8864c1cfb9]
/usr/sbin/mysqld(_ZN20Write_rows_log_event11do_exec_rowEP14rpl_group_info+0x7d)[0x7f8864c1d3ad]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x323)[0x7f8864c10143]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14rpl_group_info+0x71)[0x7f8864976af1]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14rpl_group_infoP19rpl_parallel_thread+0x17b)[0x7f886496caeb]
/usr/sbin/mysqld(handle_slave_sql+0x24ca)[0x7f886497001a]
/usr/sbin/mysqld(+0x8213cd)[0x7f8864d9b3cd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f8864142b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f886260d95d]

151118 10:12:19 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.8-MariaDB-2~wheezy-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=0
max_threads=5052
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15639278 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
(my_addr_resolve failure: fork)
/usr/sbin/mysqld(my_print_stacktrace+0x2b) [0x7f54c2564fbb]
/usr/sbin/mysqld(handle_fatal_signal+0x475) [0x7f54c20c6a55]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7f54c16c20a0]
/usr/sbin/mysqld(do_crypt(unsigned char const*, unsigned int, unsigned char*, unsigned int*, st_encryption_scheme*, unsigned int, unsigned int, unsigned int, unsigned long long, int)+0x53) [0x7f54c207a663]
/usr/sbin/mysqld(encryption_scheme_decrypt+0x2b) [0x7f54c207a8db]
/usr/sbin/mysqld(+0x9f37b4) [0x7f54c24e47b4]
/usr/sbin/mysqld(+0x9f392e) [0x7f54c24e492e]
/usr/sbin/mysqld(+0x991df3) [0x7f54c2482df3]
/usr/sbin/mysqld(+0x9921b0) [0x7f54c24831b0]
/usr/sbin/mysqld(+0x9aa578) [0x7f54c249b578]
/usr/sbin/mysqld(+0x9ab08f) [0x7f54c249c08f]
/usr/sbin/mysqld(+0x98a52f) [0x7f54c247b52f]
/usr/sbin/mysqld(+0x964c37) [0x7f54c2455c37]
/usr/sbin/mysqld(+0x966557) [0x7f54c2457557]
/usr/sbin/mysqld(+0x8640f2) [0x7f54c23550f2]
/usr/sbin/mysqld(+0x868bc5) [0x7f54c2359bc5]
/usr/sbin/mysqld(+0x966762) [0x7f54c2457762]
/usr/sbin/mysqld(+0x90220d) [0x7f54c23f320d]
/usr/sbin/mysqld(+0x8ff005) [0x7f54c23f0005]
/usr/sbin/mysqld(+0x9005f5) [0x7f54c23f15f5]
/usr/sbin/mysqld(+0x8c66cf) [0x7f54c23b76cf]
/usr/sbin/mysqld(+0x92e0f1) [0x7f54c241f0f1]
/usr/sbin/mysqld(+0x91bcc5) [0x7f54c240ccc5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f54c16b9b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f54bfb8495d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
151118 10:12:24 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
151118 10:35:30 mysqld_safe Starting mysqld daemon with databases from /data/db/mysql



 Comments   
Comment by Elena Stepanova [ 2015-11-19 ]

gryp,

Could you please provide the whole error log from server startup and till the crash, not just the stack trace?
Among other things, it should show which binlog position the replication starts from – it should be where it crashed before. If possible, please retrieve several events around this position and attach them as well, or upload them to ftp.askmonty.org/private if they contain confidential information.

But still, please attach the full error log, maybe there is something else in there.

Comment by Kenny [ 2015-11-19 ]

Hi Elena,

There was nothing on it. I would have sent it to you if there would.
I have uploaded the file, and as you see, we only see replication being started.

Comment by Kenny [ 2015-11-19 ]

Any data is confidential and I cannot share it, but what happens in replication are regular INSERT and DELETE statements.

It also does not recover at all anymore, it crashes upon recovery:

InnoDB: Doing recovery: scanned up to log sequence number 60668828903936
InnoDB: Doing recovery: scanned up to log sequence number 60668834146816
InnoDB: Doing recovery: scanned up to log sequence number 60668839389696
InnoDB: Doing recovery: scanned up to log sequence number 60668844632576
InnoDB: Doing recovery: scanned up to log sequence number 60668849875456
InnoDB: Doing recovery: scanned up to log sequence number 60668855118336
InnoDB: Doing recovery: scanned up to log sequence number 60668856298898
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 246 row operations to undo
InnoDB: Trx id counter is 39124097280
2015-11-19  8:22:08 139896102704992 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 151119  8:22:37 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.8-MariaDB-2~wheezy-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=0
max_threads=5052
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 15639278 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
70 71 72 /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f3c1a2a3fbb]
/usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7f3c19e05a55]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f3c194010a0]
/usr/sbin/mysqld(_Z8do_cryptPKhjPhPjP20st_encryption_schemejjjyi+0x53)[0x7f3c19db9663]
/usr/sbin/mysqld(encryption_scheme_decrypt+0x2b)[0x7f3c19db98db]
/usr/sbin/mysqld(+0x9f37b4)[0x7f3c1a2237b4]
/usr/sbin/mysqld(+0x9f392e)[0x7f3c1a22392e]
/usr/sbin/mysqld(+0x991df3)[0x7f3c1a1c1df3]
/usr/sbin/mysqld(+0x9921b0)[0x7f3c1a1c21b0]
/usr/sbin/mysqld(+0x9aa578)[0x7f3c1a1da578]
/usr/sbin/mysqld(+0x9ab08f)[0x7f3c1a1db08f]
/usr/sbin/mysqld(+0x98a52f)[0x7f3c1a1ba52f]
/usr/sbin/mysqld(+0x964c37)[0x7f3c1a194c37]
/usr/sbin/mysqld(+0x966557)[0x7f3c1a196557]
/usr/sbin/mysqld(+0x9745c6)[0x7f3c1a1a45c6]
/usr/sbin/mysqld(+0x86131c)[0x7f3c1a09131c]
/usr/sbin/mysqld(+0x869495)[0x7f3c1a099495]
/usr/sbin/mysqld(+0x992372)[0x7f3c1a1c2372]
/usr/sbin/mysqld(+0x9eccf8)[0x7f3c1a21ccf8]
/usr/sbin/mysqld(+0x91e320)[0x7f3c1a14e320]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f3c193f8b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f3c178c395d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
151119 08:22:42 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

Comment by Elena Stepanova [ 2015-11-21 ]

gryp,

There was nothing on it. I would have sent it to you if there would.
I have uploaded the file, and as you see, we only see replication being started.

"Replication being started" is what we are looking for. The point is, when replication starts after the crash, it starts at the same position where it crashed. So, if there was also a crash before the part of the log that you attached, then the position shown here

Slave SQL thread initialized, starting replication in log 'bin.054507' at position 15415745

is where it must have happened. Or, if the attached log contains the very first crash, then a similar line after the server restart will show where it happened (it's not in the attached log, but you can find it in yours).
So, not relying on replication being totally crash-save, we should look at the event at this position, and the events immediately before and after the one. One of them should be what caused the problem. If you can't provide the binary logs on the privacy reasons, could you please check those events yourself, and paste them if they are innocent, or at least describe them, along with the involved tables?
Thanks.

Comment by Elena Stepanova [ 2015-12-25 ]

Please comment to re-open if you have additional information as described above.

Generated at Thu Feb 08 07:32:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.