|
As this is a debug build, can someone attach a debugger and find out what it is waiting and where ?
|
|
Would it be possible to modify mysql-test-run so that it issues kill -ABRT when the server appears to be hung on startup or shutdown? Then the core dump from mysqld would provide some useful information. If we cannot afford to keep core dumps even for a short time, it would already help to save the output of
(gdb) thread apply all backtrace
on the core dump, like we do for crashes that occur while SQL is being executed.
|
|
Any update on this? The redo log recovery progress reporting was changed in MDEV-11027.
I guess it is infeasible to run once with -mysqld=-debug=d,ib_log to get more details?
|
|
As elenst noted in the description, normally the long delay occurs after crash recovery is needed. This is kind-of working as designed. The obvious reason for the slowdown is the call to dict_check_tablespaces_and_store_max_id() which will be invoked with a parameter that causes every .ibd file to be loaded.
That said, in MariaDB 10.2 this would make less sense, because the MLOG_FILE_NAME records in the redo log would make the redo log apply process more robust. That is, if there were any logged changes for any .ibd files that had to be applied, we must find those files as part of the crash recovery, or the redo log apply would be aborted. (To ignore missing files, the option innodb_force_recovery must be set to 1 or more.) In earlier MariaDB versions, we would scan the data directory for all .ibd files and then apply the redo log records to those files that have a matching tablespace ID. There it would make very much sense to check the consistency of the data dictionary, because we would silently ignore redo logs for files that we were unable to find.
When it comes to MariaDB 10.2.2‥10.2.13, the time-consuming checks are actually performed on every single startup, even when no crash recovery is involved; MDEV-13869 was fixed in MariaDB 10.2.13 and MDEV-15333 in MariaDB 10.2.14.
|
|
The failure of the 2nd kind (without recovery, on clean startup) still happens, or started happening again, on 10.1 and 10.2:
http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-quantal-amd64/builds/11900
|
10.1
|
CURRENT_TEST: rpl.rpl_mixed_ddl_dml
|
2018-09-11 9:29:22 139928986154880 [Note] /usr/local/mariadb-10.1.37-linux-x86_64/bin/mysqld (mysqld 10.1.37-MariaDB) starting as process 27694 ...
|
2018-09-11 9:29:22 139928986154880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182)
|
2018-09-11 9:29:22 139928986154880 [Note] Plugin 'SEQUENCE' is disabled.
|
2018-09-11 9:29:22 139928986154880 [Note] Plugin 'partition' is disabled.
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.
|
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Using mutexes to ref count buffer pool pages
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: The InnoDB memory heap is disabled
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Compressed tables use zlib 1.2.3
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Using Linux native AIO
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Using generic crc32 instructions
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Initializing buffer pool, size = 8.0M
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Completed initialization of buffer pool
|
2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Highest supported file format is Barracuda.
|
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/1839
|
10.2
|
CURRENT_TEST: parts.partition_mgm_lc1_innodb
|
2018-09-13 21:02:12 281473154672272 [Note] /usr/sbin/mysqld (mysqld 10.2.18-MariaDB-log) starting as process 15072 ...
|
2018-09-13 21:02:12 281473154672272 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182)
|
2018-09-13 21:02:12 281473154672272 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000)
|
2018-09-13 21:02:12 281473154672272 [Note] Plugin 'SEQUENCE' is disabled.
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Uses event mutexes
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Compressed tables use zlib 1.2.7
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using Linux native AIO
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Number of pools: 1
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using generic crc32 instructions
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Completed initialization of buffer pool
|
2018-09-13 21:02:12 281472829223072 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
|
2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Highest supported file format is Barracuda.
|
|
|
Here is an extract from the mysql-test-run output log from the 10.2 failure
|
10.2 38665893087e20c3ad65d5b0e227a75185a4865e non-debug
|
worker[1] > Restart [mysqld.1 - pid: 11871, winpid: 11871] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=16K'
|
innodb.101_compatibility '8k,innodb' w1 [ pass ] 4355
|
innodb.alter_missing_tablespace '8k,innodb' w1 [ pass ] 2451
|
innodb.innodb-alter-nullable '8k,innodb' w1 [ pass ] 35
|
innodb.innodb_bulk_create_index_small '8k,innodb' w1 [ pass ] 201
|
innodb.restart '8k,innodb' w1 [ pass ] 4976
|
innodb.table_flags '8k,innodb' w1 [ pass ] 9753
|
innodb_gis.rtree_purge '8k,innodb' w1 [ pass ] 3898
|
innodb_zip.bug36169 '8k,innodb' w1 [ pass ] 65
|
innodb_zip.bug36172 '8k,innodb' w1 [ pass ] 24
|
innodb_zip.bug52745 '8k,innodb' w1 [ pass ] 6
|
innodb_zip.bug53591 '8k,innodb' w1 [ pass ] 6
|
innodb_zip.bug56680 '8k,innodb' w1 [ pass ] 305
|
innodb_zip.create_options '8k,innodb' w1 [ pass ] 312
|
innodb_zip.innochecksum_3 '8k,innodb' w1 [ pass ] 10421
|
innodb_zip.innodb-zip '8k,innodb' w1 [ pass ] 251
|
innodb_zip.wl5522_zip '8k,innodb' w1 [ pass ] 4893
|
innodb_zip.wl6344_compress_level '8k,innodb' w1 [ pass ] 49
|
worker[1] > Restart [mysqld.1 - pid: 14966, winpid: 14966] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --loose-enable-partition --lower_case_table_names=1' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K'
|
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 120 seconds...
|
worker[1] mysql-test-run: WARNING: Waited 120 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 60 seconds...
|
worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.1 - pid: 15071, winpid: 15071] to create a pid file.
|
parts.partition_mgm_lc1_innodb 'innodb' w1 [ fail ]
|
Test ended at 2018-09-13 21:05:13
|
|
CURRENT_TEST: parts.partition_mgm_lc1_innodb
|
|
|
Failed to start mysqld.1
|
mysqltest failed but provided no output
|
Here is an extract from the 10.1 failure:
|
10.1 21829bd743f487280dba918857259b9971a8afbf non-debug
|
worker[1] > Restart [mysqld.1 - pid: 25894, winpid: 25894] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
|
worker[1] > Restart [mysqld.2 - pid: 25896, winpid: 25896] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
|
rpl.rpl_cant_read_event_incident 'mix' w1 [ pass ] 2664
|
rpl.rpl_connection 'mix' w1 [ pass ] 285
|
worker[1] > Restart [mysqld.1 - pid: 26273, winpid: 26273] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
|
worker[1] > Restart [mysqld.2 - pid: 26185, winpid: 26185] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
|
rpl.rpl_create_tmp_table_if_not_exists 'mix' w1 [ pass ] 242
|
worker[1] > Restart [mysqld.1 - pid: 26707, winpid: 26707] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
|
worker[1] > Restart [mysqld.2 - pid: 26709, winpid: 26709] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
|
rpl.rpl_empty_master_host 'mix' w1 [ pass ] 290
|
worker[1] > Restart [mysqld.1 - pid: 26924, winpid: 26924] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
|
worker[1] > Restart [mysqld.2 - pid: 26926, winpid: 26926] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
|
rpl.rpl_hrtime 'mix' w1 [ pass ] 218
|
rpl.rpl_loadfile 'mix' w1 [ pass ] 651
|
worker[1] > Restart [mysqld.1 - pid: 27181, winpid: 27181] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
|
worker[1] > Restart [mysqld.2 - pid: 27192, winpid: 27192] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
|
rpl.rpl_mix_found_rows 'mix' w1 [ pass ] 199
|
worker[1] > Restart [mysqld.1 - pid: 27475, winpid: 27475] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
|
worker[1] > Restart [mysqld.2 - pid: 27477, winpid: 27477] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
|
worker[1] mysql-test-run: WARNING: Waited 60 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 120 seconds...
|
worker[1] mysql-test-run: WARNING: Waited 120 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 60 seconds...
|
worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 27693, winpid: 27693] to create a pid file.
|
rpl.rpl_mixed_ddl_dml 'mix' w1 [ fail ]
|
Test ended at 2018-09-11 09:32:24
|
|
CURRENT_TEST: rpl.rpl_mixed_ddl_dml
|
|
|
Failed to start mysqld.2
|
mysqltest failed but provided no output
|
In both cases, it is a non-debug build (some tests are skipped due to that according to the (omitted) logs). In the 10.1 case, the replication tests are restarting the server frequently, thus increasing the probability of failure.
I would hope that manually running ./mysql-test-run --suite=rpl on non-debug binaries should lead us to the right track. We need stack traces of all threads in order to diagnose and fix the problem.
|
|
Here is one more failure on 10.2.21 (should be virtually identical to the 10.2.21 release):
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/
|
10.2 23e4446adc0de8b6f763f746fd20f0e546a40129
|
mariabackup.xb_partition w3 [ fail ]
|
Test ended at 2019-01-03 21:58:40
|
|
CURRENT_TEST: mariabackup.xb_partition
|
/usr/bin/mariabackup based on MariaDB server 10.2.22-MariaDB Linux (aarch64)
|
…
|
190103 21:50:16 completed OK!
|
mysqltest: In included file "./include/wait_until_connected_again.inc":
|
included from ./include/start_mysqld.inc at line 18:
|
included from /usr/share/mysql-test/suite/mariabackup/include/restart_and_restore.inc at line 10:
|
included from /usr/share/mysql-test/suite/mariabackup/xb_partition.test at line 77:
|
At line 23: Server failed to restart
|
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/steps/mtr/logs/mysqld.1.err.3 shows that the server hung after noticing that the redo log is clean (and after subsequently creating new redo log files):
|
10.2 23e4446adc0de8b6f763f746fd20f0e546a40129
|
Version: '10.2.22-MariaDB-log' socket: '/dev/shm/var/tmp/3/mysqld.1.sock' port: 16040 MariaDB Server
|
CURRENT_TEST: mariabackup.xb_partition
|
2019-01-03 21:50:14 281472921075872 [Note] /usr/sbin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
|
2019-01-03 21:50:14 281472921075872 [Note] Event Scheduler: Purging the queue. 0 events
|
2019-01-03 21:50:14 281473265430688 [Note] InnoDB: FTS optimize thread exiting.
|
2019-01-03 21:50:14 281472921075872 [Note] InnoDB: Starting shutdown...
|
2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/var/3/mysqld.1/data/ib_buffer_pool
|
2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Instance 0, restricted to 128 pages due to innodb_buf_pool_dump_pct=25
|
2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Buffer pool(s) dump completed at 190103 21:50:14
|
2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Shutdown completed; log sequence number 1701136
|
2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
|
2019-01-03 21:50:15 281472921075872 [Note] /usr/sbin/mysqld: Shutdown complete
|
|
2019-01-03 21:50:16 281473425204880 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 27264 ...
|
2019-01-03 21:50:16 281473425204880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4186)
|
2019-01-03 21:50:16 281473425204880 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000)
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Uses event mutexes
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Compressed tables use zlib 1.2.7
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using Linux native AIO
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Number of pools: 1
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using generic crc32 instructions
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Completed initialization of buffer pool
|
2019-01-03 21:50:16 281473097658528 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile101 size to 5242880 bytes
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile1 size to 5242880 bytes
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: New log files created, LSN=1701136
|
2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Highest supported file format is Barracuda.
|
CURRENT_TEST: mariabackup.xb_partition
|
2019-01-03 21:58:40 281472998172304 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 14433 ...
|
As you can see above, the server was previously shut down cleanly. The server was hung for more than 8 minutes before it was killed and the test was retried (the second CURRENT_TEST line was written).
|
|
For the main trees, there have not been any failures of mariabackup.xb_partition since January 2019. Maybe this one has been finally fixed in April 2019 by MDEV-12699, which fixed a hang if a corrupted page was read during startup? There also was a ‘slow startup after crash recovery’ fix MDEV-18733 in April 2019.
|