[MDEV-9843] InnoDB hangs on startup between "InnoDB: Apply batch completed" and "rollback segment(s) are active", various tests fail sporadically in buildbot on p8-rhel6-bintar-debug Created: 2016-03-31  Updated: 2019-09-30  Resolved: 2019-09-30

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.0, 10.1, 10.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
blocks MDEV-14481 Execute InnoDB crash recovery in the ... Closed
Relates
relates to MDEV-11499 mysqltest, Windows : improve diagnost... Closed
relates to MDEV-13485 MTR tests fail massively with --innod... Closed
relates to MDEV-15333 MariaDB (still) slow start Closed
relates to MDEV-13869 MariaDB slow start Closed

 Description   

There are multiple failures in different tests in buildbot which seem to have the common cause. A test runs with InnoDB or XtraDB; at some point as a part of the test flow it sigkills the server; the server gets restarted; InnoDB recovery starts; it goes like this:

Version: '10.1.13-MariaDB-debug'  socket: '/home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/mysql-test/var/tmp/4/mysqld.1.sock'  port: 16020  Source distribution
SIGKILL myself
2016-03-30 11:51:13 17590032101456 [Note] /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/sql/mysqld (mysqld 10.1.13-MariaDB-debug) starting as process 20770 ...
2016-03-30 11:51:13 17590032101456 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
2016-03-30 11:51:13 17590032101456 [Note] Plugin 'SEQUENCE' is disabled.
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-03-30 11:51:14 17590032101456 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2016-03-30 11:51:14 17590032101456 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: The InnoDB memory heap is disabled
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Compressed tables use zlib 1.2.6
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Using Linux native AIO
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Using generic crc32 instructions
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Initializing buffer pool, size = 32.0M
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Completed initialization of buffer pool
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Highest supported file format is Barracuda.
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Log scan progressed past the checkpoint lsn 2963393
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Database was not shutdown normally!
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Starting crash recovery.
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Reading tablespace information from the .ibd files...
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Restoring possible half-written data pages 
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 2977463
2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed

and nothing further. The test waits until timeout is exceeded and reports

...
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from ./include/rpl_reconnect.inc at line 93:
included from ./include/rpl_start_server.inc at line 57:
included from /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/mysql-test/suite/rpl/t/rpl_domain_id_filter_master_crash.test at line 70:
At line 26: Server failed to restart

Here is the cross-reference report which selects such failures (some of them might be of a different nature, but most of them are related):
cross-reference report
(be patient, it will take long to load)

Recent example:
http://buildbot.askmonty.org/buildbot/builders/p8-rhel6-bintar-debug/builds/1436/steps/test/logs/mysqld.1.err.4
(search for 2016-03-30 11:51:13 17590032101456, that's where the failed server startup begins)
In this case it is parts.partition_debug_innodb test running on 10.1 of March 30. If you scroll the log back from the failing point, you'll see that the test does recovery many times, and only one of them failed.

According to the cross-reference report, the test failures are currently only observed on p8-rhel6-bintar-debug.

Here is an example of a similar problem, but without recovery:
http://buildbot.askmonty.org/buildbot/builders/p8-rhel6-bintar-debug/builds/1445/steps/test/logs/mysqld.2.err.1
Test rpl.rpl_unsafe_statements

CURRENT_TEST: rpl.rpl_unsafe_statements
2016-04-01 18:05:42 17590418370640 [Note] /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/sql/mysqld (mysqld 10.2.0-MariaDB-debug) starting as process 17484 ...
2016-04-01 18:05:42 17590418370640 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
2016-04-01 18:05:42 17590418370640 [Note] Plugin 'partition' is disabled.
2016-04-01 18:05:42 17590418370640 [Note] Plugin 'SEQUENCE' is disabled.
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-04-01 18:05:42 17590418370640 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2016-04-01 18:05:42 17590418370640 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: The InnoDB memory heap is disabled
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Compressed tables use zlib 1.2.6
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Using Linux native AIO
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Using generic crc32 instructions
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Initializing buffer pool, size = 8.0M
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Completed initialization of buffer pool
2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Highest supported file format is Barracuda.
CURRENT_TEST: rpl.rpl_loaddata_s
2016-04-01 18:08:45 17590534631504 [Note] /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/sql/mysqld (mysqld 10.2.0-MariaDB-debug) starting as process 23021 ...



 Comments   
Comment by Jan Lindström (Inactive) [ 2016-03-31 ]

As this is a debug build, can someone attach a debugger and find out what it is waiting and where ?

Comment by Marko Mäkelä [ 2016-12-29 ]

Would it be possible to modify mysql-test-run so that it issues kill -ABRT when the server appears to be hung on startup or shutdown? Then the core dump from mysqld would provide some useful information. If we cannot afford to keep core dumps even for a short time, it would already help to save the output of
(gdb) thread apply all backtrace
on the core dump, like we do for crashes that occur while SQL is being executed.

Comment by Marko Mäkelä [ 2017-08-10 ]

Any update on this? The redo log recovery progress reporting was changed in MDEV-11027.
I guess it is infeasible to run once with -mysqld=-debug=d,ib_log to get more details?

Comment by Marko Mäkelä [ 2018-01-16 ]

As elenst noted in the description, normally the long delay occurs after crash recovery is needed. This is kind-of working as designed. The obvious reason for the slowdown is the call to dict_check_tablespaces_and_store_max_id() which will be invoked with a parameter that causes every .ibd file to be loaded.

That said, in MariaDB 10.2 this would make less sense, because the MLOG_FILE_NAME records in the redo log would make the redo log apply process more robust. That is, if there were any logged changes for any .ibd files that had to be applied, we must find those files as part of the crash recovery, or the redo log apply would be aborted. (To ignore missing files, the option innodb_force_recovery must be set to 1 or more.) In earlier MariaDB versions, we would scan the data directory for all .ibd files and then apply the redo log records to those files that have a matching tablespace ID. There it would make very much sense to check the consistency of the data dictionary, because we would silently ignore redo logs for files that we were unable to find.

When it comes to MariaDB 10.2.2‥10.2.13, the time-consuming checks are actually performed on every single startup, even when no crash recovery is involved; MDEV-13869 was fixed in MariaDB 10.2.13 and MDEV-15333 in MariaDB 10.2.14.

Comment by Elena Stepanova [ 2018-09-24 ]

The failure of the 2nd kind (without recovery, on clean startup) still happens, or started happening again, on 10.1 and 10.2:
http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-quantal-amd64/builds/11900

10.1

CURRENT_TEST: rpl.rpl_mixed_ddl_dml
2018-09-11  9:29:22 139928986154880 [Note] /usr/local/mariadb-10.1.37-linux-x86_64/bin/mysqld (mysqld 10.1.37-MariaDB) starting as process 27694 ...
2018-09-11  9:29:22 139928986154880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182)
2018-09-11  9:29:22 139928986154880 [Note] Plugin 'SEQUENCE' is disabled.
2018-09-11  9:29:22 139928986154880 [Note] Plugin 'partition' is disabled.
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.
 
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: The InnoDB memory heap is disabled
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Using Linux native AIO
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Using generic crc32 instructions
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Initializing buffer pool, size = 8.0M
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Completed initialization of buffer pool
2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Highest supported file format is Barracuda.

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/1839

10.2

CURRENT_TEST: parts.partition_mgm_lc1_innodb
2018-09-13 21:02:12 281473154672272 [Note] /usr/sbin/mysqld (mysqld 10.2.18-MariaDB-log) starting as process 15072 ...
2018-09-13 21:02:12 281473154672272 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182)
2018-09-13 21:02:12 281473154672272 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
2018-09-13 21:02:12 281473154672272 [Note] Plugin 'SEQUENCE' is disabled.
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Uses event mutexes
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Compressed tables use zlib 1.2.7
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using Linux native AIO
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Number of pools: 1
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using generic crc32 instructions
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Completed initialization of buffer pool
2018-09-13 21:02:12 281472829223072 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Highest supported file format is Barracuda.

Comment by Marko Mäkelä [ 2018-09-25 ]

Here is an extract from the mysql-test-run output log from the 10.2 failure

10.2 38665893087e20c3ad65d5b0e227a75185a4865e non-debug

worker[1] > Restart [mysqld.1 - pid: 11871, winpid: 11871] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=16K'
innodb.101_compatibility '8k,innodb'     w1 [ pass ]   4355
innodb.alter_missing_tablespace '8k,innodb' w1 [ pass ]   2451
innodb.innodb-alter-nullable '8k,innodb' w1 [ pass ]     35
innodb.innodb_bulk_create_index_small '8k,innodb' w1 [ pass ]    201
innodb.restart '8k,innodb'               w1 [ pass ]   4976
innodb.table_flags '8k,innodb'           w1 [ pass ]   9753
innodb_gis.rtree_purge '8k,innodb'       w1 [ pass ]   3898
innodb_zip.bug36169 '8k,innodb'          w1 [ pass ]     65
innodb_zip.bug36172 '8k,innodb'          w1 [ pass ]     24
innodb_zip.bug52745 '8k,innodb'          w1 [ pass ]      6
innodb_zip.bug53591 '8k,innodb'          w1 [ pass ]      6
innodb_zip.bug56680 '8k,innodb'          w1 [ pass ]    305
innodb_zip.create_options '8k,innodb'    w1 [ pass ]    312
innodb_zip.innochecksum_3 '8k,innodb'    w1 [ pass ]  10421
innodb_zip.innodb-zip '8k,innodb'        w1 [ pass ]    251
innodb_zip.wl5522_zip '8k,innodb'        w1 [ pass ]   4893
innodb_zip.wl6344_compress_level '8k,innodb' w1 [ pass ]     49
worker[1] > Restart [mysqld.1 - pid: 14966, winpid: 14966] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --loose-enable-partition --lower_case_table_names=1' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K'
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 120 seconds...
worker[1] mysql-test-run: WARNING: Waited 120 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 60 seconds...
worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.1 - pid: 15071, winpid: 15071] to create a pid file.
parts.partition_mgm_lc1_innodb 'innodb'  w1 [ fail ]
        Test ended at 2018-09-13 21:05:13
 
CURRENT_TEST: parts.partition_mgm_lc1_innodb
 
 
Failed to start mysqld.1
mysqltest failed but provided no output

Here is an extract from the 10.1 failure:

10.1 21829bd743f487280dba918857259b9971a8afbf non-debug

worker[1] > Restart [mysqld.1 - pid: 25894, winpid: 25894] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
worker[1] > Restart [mysqld.2 - pid: 25896, winpid: 25896] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
rpl.rpl_cant_read_event_incident 'mix'   w1 [ pass ]   2664
rpl.rpl_connection 'mix'                 w1 [ pass ]    285
worker[1] > Restart [mysqld.1 - pid: 26273, winpid: 26273] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
worker[1] > Restart [mysqld.2 - pid: 26185, winpid: 26185] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
rpl.rpl_create_tmp_table_if_not_exists 'mix' w1 [ pass ]    242
worker[1] > Restart [mysqld.1 - pid: 26707, winpid: 26707] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
worker[1] > Restart [mysqld.2 - pid: 26709, winpid: 26709] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
rpl.rpl_empty_master_host 'mix'          w1 [ pass ]    290
worker[1] > Restart [mysqld.1 - pid: 26924, winpid: 26924] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
worker[1] > Restart [mysqld.2 - pid: 26926, winpid: 26926] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
rpl.rpl_hrtime 'mix'                     w1 [ pass ]    218
rpl.rpl_loadfile 'mix'                   w1 [ pass ]    651
worker[1] > Restart [mysqld.1 - pid: 27181, winpid: 27181] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
worker[1] > Restart [mysqld.2 - pid: 27192, winpid: 27192] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
rpl.rpl_mix_found_rows 'mix'             w1 [ pass ]    199
worker[1] > Restart [mysqld.1 - pid: 27475, winpid: 27475] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
worker[1] > Restart [mysqld.2 - pid: 27477, winpid: 27477] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 120 seconds...
worker[1] mysql-test-run: WARNING: Waited 120 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 60 seconds...
worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 27693, winpid: 27693] to create a pid file.
rpl.rpl_mixed_ddl_dml 'mix'              w1 [ fail ]
        Test ended at 2018-09-11 09:32:24
 
CURRENT_TEST: rpl.rpl_mixed_ddl_dml
 
 
Failed to start mysqld.2
mysqltest failed but provided no output

In both cases, it is a non-debug build (some tests are skipped due to that according to the (omitted) logs). In the 10.1 case, the replication tests are restarting the server frequently, thus increasing the probability of failure.

I would hope that manually running ./mysql-test-run --suite=rpl on non-debug binaries should lead us to the right track. We need stack traces of all threads in order to diagnose and fix the problem.

Comment by Marko Mäkelä [ 2019-01-04 ]

Here is one more failure on 10.2.21 (should be virtually identical to the 10.2.21 release):
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/

10.2 23e4446adc0de8b6f763f746fd20f0e546a40129

mariabackup.xb_partition                 w3 [ fail ]
        Test ended at 2019-01-03 21:58:40
 
CURRENT_TEST: mariabackup.xb_partition
/usr/bin/mariabackup based on MariaDB server 10.2.22-MariaDB Linux (aarch64) 
190103 21:50:16 completed OK!
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from ./include/start_mysqld.inc at line 18:
included from /usr/share/mysql-test/suite/mariabackup/include/restart_and_restore.inc at line 10:
included from /usr/share/mysql-test/suite/mariabackup/xb_partition.test at line 77:
At line 23: Server failed to restart

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/steps/mtr/logs/mysqld.1.err.3 shows that the server hung after noticing that the redo log is clean (and after subsequently creating new redo log files):

10.2 23e4446adc0de8b6f763f746fd20f0e546a40129

Version: '10.2.22-MariaDB-log'  socket: '/dev/shm/var/tmp/3/mysqld.1.sock'  port: 16040  MariaDB Server
CURRENT_TEST: mariabackup.xb_partition
2019-01-03 21:50:14 281472921075872 [Note] /usr/sbin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
2019-01-03 21:50:14 281472921075872 [Note] Event Scheduler: Purging the queue. 0 events
2019-01-03 21:50:14 281473265430688 [Note] InnoDB: FTS optimize thread exiting.
2019-01-03 21:50:14 281472921075872 [Note] InnoDB: Starting shutdown...
2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/var/3/mysqld.1/data/ib_buffer_pool
2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Instance 0, restricted to 128 pages due to innodb_buf_pool_dump_pct=25
2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Buffer pool(s) dump completed at 190103 21:50:14
2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Shutdown completed; log sequence number 1701136
2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2019-01-03 21:50:15 281472921075872 [Note] /usr/sbin/mysqld: Shutdown complete
 
2019-01-03 21:50:16 281473425204880 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 27264 ...
2019-01-03 21:50:16 281473425204880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4186)
2019-01-03 21:50:16 281473425204880 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Uses event mutexes
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Compressed tables use zlib 1.2.7
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using Linux native AIO
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Number of pools: 1
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using generic crc32 instructions
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Completed initialization of buffer pool
2019-01-03 21:50:16 281473097658528 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile101 size to 5242880 bytes
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile1 size to 5242880 bytes
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: New log files created, LSN=1701136
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Highest supported file format is Barracuda.
CURRENT_TEST: mariabackup.xb_partition
2019-01-03 21:58:40 281472998172304 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 14433 ...

As you can see above, the server was previously shut down cleanly. The server was hung for more than 8 minutes before it was killed and the test was retried (the second CURRENT_TEST line was written).

Comment by Marko Mäkelä [ 2019-09-30 ]

For the main trees, there have not been any failures of mariabackup.xb_partition since January 2019. Maybe this one has been finally fixed in April 2019 by MDEV-12699, which fixed a hang if a corrupted page was read during startup? There also was a ‘slow startup after crash recovery’ fix MDEV-18733 in April 2019.

Generated at Thu Feb 08 07:37:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.