[MDEV-9040] 10.1.8 fails after upgrade from 10.0.21 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.1.8
Fix Version/s: 10.1.9
Component/s: Platform RedHat
Labels:
None
Environment:
Centos 6.7 64bit

Sprint:
10.1.9-2

Description

After the upgrade via yum the server gets into a restart loop...

2015-10-28 13:25:06 140165019297824 [Note] Recovering after a crash using tc.log
2015-10-28 13:25:06 140165019297824 [Note] Starting crash recovery...
2015-10-28 13:25:06 140165019297824 [Note] Crash recovery finished.
2015-10-28 13:25:06 140165019297824 [Note] Event Scheduler: Loaded 0 events
2015-10-28 13:25:06 7f7a109a5700 InnoDB: Error: Column last_update in table "mysql"."innodb_table_stats" is INT UNSIGNED NOT NULL but should be BINARY(4) NOT NULL (type mismatch).
2015-10-28 13:25:06 7f7a109a5700 InnoDB: Error: Fetch of persistent statistics requested for table "mysql"."gtid_slave_pos" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
2015-10-28 13:25:06 140165019297824 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.8-MariaDB-log' socket: '/var/lib/mysql/mysql.sock' port: 0 MariaDB Server
151028 13:25:06 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.8-MariaDB-log
key_buffer_size=16777216
read_buffer_size=4194304
max_used_connections=1
max_threads=130
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 817738 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0x7f7ab6d636cb]
sql/signal_handler.cc:160(handle_fatal_signal)[0x7f7ab68c50d5]
/lib64/libpthread.so.0(+0xf790)[0x7f7ab5ede790]
include/dict0dict.ic:1244(dict_index_get_nth_col)[0x7f7ab6b1557d]
row/row0purge.cc:850(row_purge_record_func)[0x7f7ab6ae45d5]
que/que0que.cc:1089(que_thr_step)[0x7f7ab6aaa84f]
trx/trx0purge.cc:1254(trx_purge(unsigned long, unsigned long, bool))[0x7f7ab6b121e1]
srv/srv0srv.cc:3432(srv_do_purge)[0x7f7ab6affdc5]
/lib64/libpthread.so.0(+0x7a51)[0x7f7ab5ed6a51]
/lib64/libc.so.6(clone+0x6d)[0x7f7ab43bc93d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mysqld_log_2015-10-28-14-07-01.save
108 kB
2015-10-30 01:53

Issue Links

relates to

MDEV-19016 Table in InnoDB data dictionary contains invalid flags upon undo-upgrade from 10.1.8

Closed

Activity

Ascending order - Click to sort in descending order

View 12 older comments

Elena Stepanova added a comment - 2015-11-03 20:57

jplindst:
9d399c9f35ca5a85152adddc1c88a304f87f660c is where it started crashing. Stack trace from that commit:

Thread 1 (Thread 0x7f4249ffb700 (LWP 22324)):

#0  __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63

#1  0x00000000006f5ab2 in handle_fatal_signal (sig=11) at sql/signal_handler.cc:262

#2  <signal handler called>

#3  mach_read_from_1 (b=0x7f4249400000 <Address 0x7f4249400000 out of bounds>) at storage/xtradb/include/mach0data.ic:56

#4  mach_read_compressed (b=0x7f4249400000 <Address 0x7f4249400000 out of bounds>) at storage/xtradb/include/mach0data.ic:266

#5  trx_undo_rec_get_col_val (ptr=0x7f4249400000 <Address 0x7f4249400000 out of bounds>, field=field@entry=0x7f4249ffab68, len=len@entry=0x7f4249ffab70, orig_len=orig_len@entry=0x7f4249ffab78) at storage/xtradb/trx/trx0rec.cc:331

#6  0x00000000008cf589 in trx_undo_rec_get_partial_row (ptr=<optimized out>, index=0x7f424c7743e8, row=0x7f425b7b66e0, ignore_prefix=0, heap=<optimized out>) at storage/xtradb/trx/trx0rec.cc:1100

#7  0x00000000008a9a7d in row_purge_parse_undo_rec (thr=0x7f4267010eb0, updated_extern=0x7f4249ffac0f, undo_rec=0x7f4249021c80 "\004\264\214", node=0x7f425b7b6668) at storage/xtradb/row/row0purge.cc:776

#8  row_purge (thr=0x7f4267010eb0, undo_rec=0x7f4249021c80 "\004\264\214", node=0x7f425b7b6668) at storage/xtradb/row/row0purge.cc:859

#9  row_purge_step (thr=0x7f4267010eb0) at storage/xtradb/row/row0purge.cc:942

#10 0x000000000087d3bb in que_thr_step (thr=0x7f4267010eb0) at storage/xtradb/que/que0que.cc:1115

#11 que_run_threads_low (thr=0x7f4267010eb0) at storage/xtradb/que/que0que.cc:1177

#12 que_run_threads (thr=0x7f4267010eb0) at storage/xtradb/que/que0que.cc:1218

#13 0x00000000008cc827 in trx_purge (n_purge_threads=1, batch_size=300, truncate=false) at storage/xtradb/trx/trx0purge.cc:1251

#14 0x00000000008c0c8b in srv_do_purge (n_total_purged=<synthetic pointer>, n_threads=1) at storage/xtradb/srv/srv0srv.cc:3215

#15 srv_purge_coordinator_thread (arg=<optimized out>) at storage/xtradb/srv/srv0srv.cc:3397

#16 0x00007f4269861b50 in start_thread (arg=<optimized out>) at pthread_create.c:304

#17 0x00007f4267b1795d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

#18 0x0000000000000000 in ?? ()

The direct parent is 0eb84da14712a9ca820533dbc1d911b3aead1658, it starts all right.

Elena Stepanova added a comment - 2015-11-03 20:57 jplindst : 9d399c9f35ca5a85152adddc1c88a304f87f660c is where it started crashing. Stack trace from that commit: Thread 1 (Thread 0x7f4249ffb700 (LWP 22324)): #0 __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63 #1 0x00000000006f5ab2 in handle_fatal_signal (sig=11) at sql/signal_handler.cc:262 #2 <signal handler called> #3 mach_read_from_1 (b=0x7f4249400000 <Address 0x7f4249400000 out of bounds>) at storage/xtradb/include/mach0data.ic:56 #4 mach_read_compressed (b=0x7f4249400000 <Address 0x7f4249400000 out of bounds>) at storage/xtradb/include/mach0data.ic:266 #5 trx_undo_rec_get_col_val (ptr=0x7f4249400000 <Address 0x7f4249400000 out of bounds>, field=field@entry=0x7f4249ffab68, len=len@entry=0x7f4249ffab70, orig_len=orig_len@entry=0x7f4249ffab78) at storage/xtradb/trx/trx0rec.cc:331 #6 0x00000000008cf589 in trx_undo_rec_get_partial_row (ptr=<optimized out>, index=0x7f424c7743e8, row=0x7f425b7b66e0, ignore_prefix=0, heap=<optimized out>) at storage/xtradb/trx/trx0rec.cc:1100 #7 0x00000000008a9a7d in row_purge_parse_undo_rec (thr=0x7f4267010eb0, updated_extern=0x7f4249ffac0f, undo_rec=0x7f4249021c80 "\004\264\214", node=0x7f425b7b6668) at storage/xtradb/row/row0purge.cc:776 #8 row_purge (thr=0x7f4267010eb0, undo_rec=0x7f4249021c80 "\004\264\214", node=0x7f425b7b6668) at storage/xtradb/row/row0purge.cc:859 #9 row_purge_step (thr=0x7f4267010eb0) at storage/xtradb/row/row0purge.cc:942 #10 0x000000000087d3bb in que_thr_step (thr=0x7f4267010eb0) at storage/xtradb/que/que0que.cc:1115 #11 que_run_threads_low (thr=0x7f4267010eb0) at storage/xtradb/que/que0que.cc:1177 #12 que_run_threads (thr=0x7f4267010eb0) at storage/xtradb/que/que0que.cc:1218 #13 0x00000000008cc827 in trx_purge (n_purge_threads=1, batch_size=300, truncate=false) at storage/xtradb/trx/trx0purge.cc:1251 #14 0x00000000008c0c8b in srv_do_purge (n_total_purged=<synthetic pointer>, n_threads=1) at storage/xtradb/srv/srv0srv.cc:3215 #15 srv_purge_coordinator_thread (arg=<optimized out>) at storage/xtradb/srv/srv0srv.cc:3397 #16 0x00007f4269861b50 in start_thread (arg=<optimized out>) at pthread_create.c:304 #17 0x00007f4267b1795d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #18 0x0000000000000000 in ?? () The direct parent is 0eb84da14712a9ca820533dbc1d911b3aead1658, it starts all right.

Kym Farnik added a comment - 2015-11-04 00:49

Good news ... it looks like you can reproduce the issue.
I'll stay on 10.0.x for now as it's a production system and I don't have much time window to try again.

I can probably try again next week, but methinks I need to wait for 10.1.9

Kym Farnik added a comment - 2015-11-04 00:49 Good news ... it looks like you can reproduce the issue. I'll stay on 10.0.x for now as it's a production system and I don't have much time window to try again. I can probably try again next week, but methinks I need to wait for 10.1.9

Jan Lindström (Inactive) added a comment - 2015-11-04 08:07

Thanks Elena, confirmed your observation. Now I need to figure out what is wrong with that change. Actual change is very small but it seems to have some hidden effect on purge.

Jan Lindström (Inactive) added a comment - 2015-11-04 08:07 Thanks Elena, confirmed your observation. Now I need to figure out what is wrong with that change. Actual change is very small but it seems to have some hidden effect on purge.

Jan Lindström (Inactive) added a comment - 2015-11-05 10:33

commit 25f8738112b05f33cfa45eabfebf6edfc80e6d8a
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Thu Nov 5 09:42:23 2015 +0200

~~MDEV-9040~~: 10.1.8 fails after upgrade from 10.0.21

Analysis: Lengths which are not UNIV_SQL_NULL, but bigger than the following
number indicate that a field contains a reference to an externally
stored part of the field in the tablespace. The length field then
contains the sum of the following flag and the locally stored len.

This was incorrectly set to

define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_MAX)

When it should be

define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_DEF)

Additionally, we need to disable support for > 16K page size for
row compressed tables because a compressed page directory entry
reserves 14 bits for the start offset and 2 bits for flags.
This limits the uncompressed page size to 16k. To support
larger pages page directory entry needs to be larger.

Jan Lindström (Inactive) added a comment - 2015-11-05 10:33 commit 25f8738112b05f33cfa45eabfebf6edfc80e6d8a Author: Jan Lindström <jan.lindstrom@mariadb.com> Date: Thu Nov 5 09:42:23 2015 +0200 MDEV-9040 : 10.1.8 fails after upgrade from 10.0.21 Analysis: Lengths which are not UNIV_SQL_NULL, but bigger than the following number indicate that a field contains a reference to an externally stored part of the field in the tablespace. The length field then contains the sum of the following flag and the locally stored len. This was incorrectly set to define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_MAX) When it should be define UNIV_EXTERN_STORAGE_FIELD (UNIV_SQL_NULL - UNIV_PAGE_SIZE_DEF) Additionally, we need to disable support for > 16K page size for row compressed tables because a compressed page directory entry reserves 14 bits for the start offset and 2 bits for flags. This limits the uncompressed page size to 16k. To support larger pages page directory entry needs to be larger.

Kym Farnik added a comment - 2015-11-05 12:27

Great work, thanks for the prompt action. This is my first bug report to MariaDB.

Kym Farnik added a comment - 2015-11-05 12:27 Great work, thanks for the prompt action. This is my first bug report to MariaDB.

MariaDB Server

10.1.8 fails after upgrade from 10.0.21

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration