Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9843

InnoDB hangs on startup between "InnoDB: Apply batch completed" and "rollback segment(s) are active", various tests fail sporadically in buildbot on p8-rhel6-bintar-debug

Details

    Description

      There are multiple failures in different tests in buildbot which seem to have the common cause. A test runs with InnoDB or XtraDB; at some point as a part of the test flow it sigkills the server; the server gets restarted; InnoDB recovery starts; it goes like this:

      Version: '10.1.13-MariaDB-debug'  socket: '/home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/mysql-test/var/tmp/4/mysqld.1.sock'  port: 16020  Source distribution
      SIGKILL myself
      2016-03-30 11:51:13 17590032101456 [Note] /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/sql/mysqld (mysqld 10.1.13-MariaDB-debug) starting as process 20770 ...
      2016-03-30 11:51:13 17590032101456 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
      2016-03-30 11:51:13 17590032101456 [Note] Plugin 'SEQUENCE' is disabled.
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Compressed tables use zlib 1.2.6
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Using Linux native AIO
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Using generic crc32 instructions
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Initializing buffer pool, size = 32.0M
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Completed initialization of buffer pool
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Log scan progressed past the checkpoint lsn 2963393
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Database was not shutdown normally!
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Starting crash recovery.
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Reading tablespace information from the .ibd files...
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Restoring possible half-written data pages 
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: from the doublewrite buffer...
      InnoDB: Doing recovery: scanned up to log sequence number 2977463
      2016-03-30 11:51:14 17590032101456 [Note] InnoDB: Starting an apply batch of log records to the database...
      InnoDB: Progress in percent: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
      InnoDB: Apply batch completed
      

      and nothing further. The test waits until timeout is exceeded and reports

      ...
      mysqltest: In included file "./include/wait_until_connected_again.inc": 
      included from ./include/rpl_reconnect.inc at line 93:
      included from ./include/rpl_start_server.inc at line 57:
      included from /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/mysql-test/suite/rpl/t/rpl_domain_id_filter_master_crash.test at line 70:
      At line 26: Server failed to restart
      

      Here is the cross-reference report which selects such failures (some of them might be of a different nature, but most of them are related):
      cross-reference report
      (be patient, it will take long to load)

      Recent example:
      http://buildbot.askmonty.org/buildbot/builders/p8-rhel6-bintar-debug/builds/1436/steps/test/logs/mysqld.1.err.4
      (search for 2016-03-30 11:51:13 17590032101456, that's where the failed server startup begins)
      In this case it is parts.partition_debug_innodb test running on 10.1 of March 30. If you scroll the log back from the failing point, you'll see that the test does recovery many times, and only one of them failed.

      According to the cross-reference report, the test failures are currently only observed on p8-rhel6-bintar-debug.

      Here is an example of a similar problem, but without recovery:
      http://buildbot.askmonty.org/buildbot/builders/p8-rhel6-bintar-debug/builds/1445/steps/test/logs/mysqld.2.err.1
      Test rpl.rpl_unsafe_statements

      CURRENT_TEST: rpl.rpl_unsafe_statements
      2016-04-01 18:05:42 17590418370640 [Note] /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/sql/mysqld (mysqld 10.2.0-MariaDB-debug) starting as process 17484 ...
      2016-04-01 18:05:42 17590418370640 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
      2016-04-01 18:05:42 17590418370640 [Note] Plugin 'partition' is disabled.
      2016-04-01 18:05:42 17590418370640 [Note] Plugin 'SEQUENCE' is disabled.
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Compressed tables use zlib 1.2.6
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Using Linux native AIO
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Using generic crc32 instructions
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Initializing buffer pool, size = 8.0M
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Completed initialization of buffer pool
      2016-04-01 18:05:42 17590418370640 [Note] InnoDB: Highest supported file format is Barracuda.
      CURRENT_TEST: rpl.rpl_loaddata_s
      2016-04-01 18:08:45 17590534631504 [Note] /home/buildbot/maria-slave/power8-vlp01-bintar-debug/build/sql/mysqld (mysqld 10.2.0-MariaDB-debug) starting as process 23021 ...
      

      Attachments

        Issue Links

          Activity

            As this is a debug build, can someone attach a debugger and find out what it is waiting and where ?

            jplindst Jan Lindström (Inactive) added a comment - As this is a debug build, can someone attach a debugger and find out what it is waiting and where ?

            Would it be possible to modify mysql-test-run so that it issues kill -ABRT when the server appears to be hung on startup or shutdown? Then the core dump from mysqld would provide some useful information. If we cannot afford to keep core dumps even for a short time, it would already help to save the output of
            (gdb) thread apply all backtrace
            on the core dump, like we do for crashes that occur while SQL is being executed.

            marko Marko Mäkelä added a comment - Would it be possible to modify mysql-test-run so that it issues kill -ABRT when the server appears to be hung on startup or shutdown? Then the core dump from mysqld would provide some useful information. If we cannot afford to keep core dumps even for a short time, it would already help to save the output of (gdb) thread apply all backtrace on the core dump, like we do for crashes that occur while SQL is being executed.

            Any update on this? The redo log recovery progress reporting was changed in MDEV-11027.
            I guess it is infeasible to run once with -mysqld=-debug=d,ib_log to get more details?

            marko Marko Mäkelä added a comment - Any update on this? The redo log recovery progress reporting was changed in MDEV-11027 . I guess it is infeasible to run once with - mysqld= -debug=d,ib_log to get more details?
            marko Marko Mäkelä added a comment - - edited

            As elenst noted in the description, normally the long delay occurs after crash recovery is needed. This is kind-of working as designed. The obvious reason for the slowdown is the call to dict_check_tablespaces_and_store_max_id() which will be invoked with a parameter that causes every .ibd file to be loaded.

            That said, in MariaDB 10.2 this would make less sense, because the MLOG_FILE_NAME records in the redo log would make the redo log apply process more robust. That is, if there were any logged changes for any .ibd files that had to be applied, we must find those files as part of the crash recovery, or the redo log apply would be aborted. (To ignore missing files, the option innodb_force_recovery must be set to 1 or more.) In earlier MariaDB versions, we would scan the data directory for all .ibd files and then apply the redo log records to those files that have a matching tablespace ID. There it would make very much sense to check the consistency of the data dictionary, because we would silently ignore redo logs for files that we were unable to find.

            When it comes to MariaDB 10.2.2‥10.2.13, the time-consuming checks are actually performed on every single startup, even when no crash recovery is involved; MDEV-13869 was fixed in MariaDB 10.2.13 and MDEV-15333 in MariaDB 10.2.14.

            marko Marko Mäkelä added a comment - - edited As elenst noted in the description, normally the long delay occurs after crash recovery is needed. This is kind-of working as designed. The obvious reason for the slowdown is the call to dict_check_tablespaces_and_store_max_id() which will be invoked with a parameter that causes every .ibd file to be loaded. That said, in MariaDB 10.2 this would make less sense, because the MLOG_FILE_NAME records in the redo log would make the redo log apply process more robust. That is, if there were any logged changes for any .ibd files that had to be applied, we must find those files as part of the crash recovery, or the redo log apply would be aborted. (To ignore missing files, the option innodb_force_recovery must be set to 1 or more.) In earlier MariaDB versions, we would scan the data directory for all .ibd files and then apply the redo log records to those files that have a matching tablespace ID. There it would make very much sense to check the consistency of the data dictionary, because we would silently ignore redo logs for files that we were unable to find. When it comes to MariaDB 10.2.2‥10.2.13, the time-consuming checks are actually performed on every single startup, even when no crash recovery is involved; MDEV-13869 was fixed in MariaDB 10.2.13 and MDEV-15333 in MariaDB 10.2.14.
            elenst Elena Stepanova added a comment - - edited

            The failure of the 2nd kind (without recovery, on clean startup) still happens, or started happening again, on 10.1 and 10.2:
            http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-quantal-amd64/builds/11900

            10.1

            CURRENT_TEST: rpl.rpl_mixed_ddl_dml
            2018-09-11  9:29:22 139928986154880 [Note] /usr/local/mariadb-10.1.37-linux-x86_64/bin/mysqld (mysqld 10.1.37-MariaDB) starting as process 27694 ...
            2018-09-11  9:29:22 139928986154880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182)
            2018-09-11  9:29:22 139928986154880 [Note] Plugin 'SEQUENCE' is disabled.
            2018-09-11  9:29:22 139928986154880 [Note] Plugin 'partition' is disabled.
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.
             
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Using mutexes to ref count buffer pool pages
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: The InnoDB memory heap is disabled
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Compressed tables use zlib 1.2.3
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Using Linux native AIO
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Using generic crc32 instructions
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Initializing buffer pool, size = 8.0M
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Completed initialization of buffer pool
            2018-09-11  9:29:22 139928986154880 [Note] InnoDB: Highest supported file format is Barracuda.
            

            http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/1839

            10.2

            CURRENT_TEST: parts.partition_mgm_lc1_innodb
            2018-09-13 21:02:12 281473154672272 [Note] /usr/sbin/mysqld (mysqld 10.2.18-MariaDB-log) starting as process 15072 ...
            2018-09-13 21:02:12 281473154672272 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182)
            2018-09-13 21:02:12 281473154672272 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
            2018-09-13 21:02:12 281473154672272 [Note] Plugin 'SEQUENCE' is disabled.
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Uses event mutexes
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Compressed tables use zlib 1.2.7
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using Linux native AIO
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Number of pools: 1
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using generic crc32 instructions
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Completed initialization of buffer pool
            2018-09-13 21:02:12 281472829223072 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
            2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Highest supported file format is Barracuda.
            

            elenst Elena Stepanova added a comment - - edited The failure of the 2nd kind (without recovery, on clean startup) still happens, or started happening again, on 10.1 and 10.2: http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-quantal-amd64/builds/11900 10.1 CURRENT_TEST: rpl.rpl_mixed_ddl_dml 2018-09-11 9:29:22 139928986154880 [Note] /usr/local/mariadb-10.1.37-linux-x86_64/bin/mysqld (mysqld 10.1.37-MariaDB) starting as process 27694 ... 2018-09-11 9:29:22 139928986154880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182) 2018-09-11 9:29:22 139928986154880 [Note] Plugin 'SEQUENCE' is disabled. 2018-09-11 9:29:22 139928986154880 [Note] Plugin 'partition' is disabled. 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.   2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: The InnoDB memory heap is disabled 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Compressed tables use zlib 1.2.3 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Using Linux native AIO 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Using generic crc32 instructions 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Initializing buffer pool, size = 8.0M 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Completed initialization of buffer pool 2018-09-11 9:29:22 139928986154880 [Note] InnoDB: Highest supported file format is Barracuda. http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/1839 10.2 CURRENT_TEST: parts.partition_mgm_lc1_innodb 2018-09-13 21:02:12 281473154672272 [Note] /usr/sbin/mysqld (mysqld 10.2.18-MariaDB-log) starting as process 15072 ... 2018-09-13 21:02:12 281473154672272 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4182) 2018-09-13 21:02:12 281473154672272 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000) 2018-09-13 21:02:12 281473154672272 [Note] Plugin 'SEQUENCE' is disabled. 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Uses event mutexes 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Compressed tables use zlib 1.2.7 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using Linux native AIO 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Number of pools: 1 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Using generic crc32 instructions 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Completed initialization of buffer pool 2018-09-13 21:02:12 281472829223072 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-09-13 21:02:12 281473154672272 [Note] InnoDB: Highest supported file format is Barracuda.

            Here is an extract from the mysql-test-run output log from the 10.2 failure

            10.2 38665893087e20c3ad65d5b0e227a75185a4865e non-debug

            worker[1] > Restart [mysqld.1 - pid: 11871, winpid: 11871] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=16K'
            innodb.101_compatibility '8k,innodb'     w1 [ pass ]   4355
            innodb.alter_missing_tablespace '8k,innodb' w1 [ pass ]   2451
            innodb.innodb-alter-nullable '8k,innodb' w1 [ pass ]     35
            innodb.innodb_bulk_create_index_small '8k,innodb' w1 [ pass ]    201
            innodb.restart '8k,innodb'               w1 [ pass ]   4976
            innodb.table_flags '8k,innodb'           w1 [ pass ]   9753
            innodb_gis.rtree_purge '8k,innodb'       w1 [ pass ]   3898
            innodb_zip.bug36169 '8k,innodb'          w1 [ pass ]     65
            innodb_zip.bug36172 '8k,innodb'          w1 [ pass ]     24
            innodb_zip.bug52745 '8k,innodb'          w1 [ pass ]      6
            innodb_zip.bug53591 '8k,innodb'          w1 [ pass ]      6
            innodb_zip.bug56680 '8k,innodb'          w1 [ pass ]    305
            innodb_zip.create_options '8k,innodb'    w1 [ pass ]    312
            innodb_zip.innochecksum_3 '8k,innodb'    w1 [ pass ]  10421
            innodb_zip.innodb-zip '8k,innodb'        w1 [ pass ]    251
            innodb_zip.wl5522_zip '8k,innodb'        w1 [ pass ]   4893
            innodb_zip.wl6344_compress_level '8k,innodb' w1 [ pass ]     49
            worker[1] > Restart [mysqld.1 - pid: 14966, winpid: 14966] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --loose-enable-partition --lower_case_table_names=1' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K'
            worker[1] mysql-test-run: WARNING: Waited 60 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 120 seconds...
            worker[1] mysql-test-run: WARNING: Waited 120 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 60 seconds...
            worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.1 - pid: 15071, winpid: 15071] to create a pid file.
            parts.partition_mgm_lc1_innodb 'innodb'  w1 [ fail ]
                    Test ended at 2018-09-13 21:05:13
             
            CURRENT_TEST: parts.partition_mgm_lc1_innodb
             
             
            Failed to start mysqld.1
            mysqltest failed but provided no output
            

            Here is an extract from the 10.1 failure:

            10.1 21829bd743f487280dba918857259b9971a8afbf non-debug

            worker[1] > Restart [mysqld.1 - pid: 25894, winpid: 25894] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
            worker[1] > Restart [mysqld.2 - pid: 25896, winpid: 25896] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
            rpl.rpl_cant_read_event_incident 'mix'   w1 [ pass ]   2664
            rpl.rpl_connection 'mix'                 w1 [ pass ]    285
            worker[1] > Restart [mysqld.1 - pid: 26273, winpid: 26273] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
            worker[1] > Restart [mysqld.2 - pid: 26185, winpid: 26185] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
            rpl.rpl_create_tmp_table_if_not_exists 'mix' w1 [ pass ]    242
            worker[1] > Restart [mysqld.1 - pid: 26707, winpid: 26707] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
            worker[1] > Restart [mysqld.2 - pid: 26709, winpid: 26709] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
            rpl.rpl_empty_master_host 'mix'          w1 [ pass ]    290
            worker[1] > Restart [mysqld.1 - pid: 26924, winpid: 26924] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
            worker[1] > Restart [mysqld.2 - pid: 26926, winpid: 26926] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
            rpl.rpl_hrtime 'mix'                     w1 [ pass ]    218
            rpl.rpl_loadfile 'mix'                   w1 [ pass ]    651
            worker[1] > Restart [mysqld.1 - pid: 27181, winpid: 27181] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin'
            worker[1] > Restart [mysqld.2 - pid: 27192, winpid: 27192] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin'
            rpl.rpl_mix_found_rows 'mix'             w1 [ pass ]    199
            worker[1] > Restart [mysqld.1 - pid: 27475, winpid: 27475] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin'
            worker[1] > Restart [mysqld.2 - pid: 27477, winpid: 27477] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin'
            worker[1] mysql-test-run: WARNING: Waited 60 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 120 seconds...
            worker[1] mysql-test-run: WARNING: Waited 120 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 60 seconds...
            worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 27693, winpid: 27693] to create a pid file.
            rpl.rpl_mixed_ddl_dml 'mix'              w1 [ fail ]
                    Test ended at 2018-09-11 09:32:24
             
            CURRENT_TEST: rpl.rpl_mixed_ddl_dml
             
             
            Failed to start mysqld.2
            mysqltest failed but provided no output
            

            In both cases, it is a non-debug build (some tests are skipped due to that according to the (omitted) logs). In the 10.1 case, the replication tests are restarting the server frequently, thus increasing the probability of failure.

            I would hope that manually running ./mysql-test-run --suite=rpl on non-debug binaries should lead us to the right track. We need stack traces of all threads in order to diagnose and fix the problem.

            marko Marko Mäkelä added a comment - Here is an extract from the mysql-test-run output log from the 10.2 failure 10.2 38665893087e20c3ad65d5b0e227a75185a4865e non-debug worker[1] > Restart [mysqld.1 - pid: 11871, winpid: 11871] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=16K' innodb.101_compatibility '8k,innodb' w1 [ pass ] 4355 innodb.alter_missing_tablespace '8k,innodb' w1 [ pass ] 2451 innodb.innodb-alter-nullable '8k,innodb' w1 [ pass ] 35 innodb.innodb_bulk_create_index_small '8k,innodb' w1 [ pass ] 201 innodb.restart '8k,innodb' w1 [ pass ] 4976 innodb.table_flags '8k,innodb' w1 [ pass ] 9753 innodb_gis.rtree_purge '8k,innodb' w1 [ pass ] 3898 innodb_zip.bug36169 '8k,innodb' w1 [ pass ] 65 innodb_zip.bug36172 '8k,innodb' w1 [ pass ] 24 innodb_zip.bug52745 '8k,innodb' w1 [ pass ] 6 innodb_zip.bug53591 '8k,innodb' w1 [ pass ] 6 innodb_zip.bug56680 '8k,innodb' w1 [ pass ] 305 innodb_zip.create_options '8k,innodb' w1 [ pass ] 312 innodb_zip.innochecksum_3 '8k,innodb' w1 [ pass ] 10421 innodb_zip.innodb-zip '8k,innodb' w1 [ pass ] 251 innodb_zip.wl5522_zip '8k,innodb' w1 [ pass ] 4893 innodb_zip.wl6344_compress_level '8k,innodb' w1 [ pass ] 49 worker[1] > Restart [mysqld.1 - pid: 14966, winpid: 14966] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --loose-enable-partition --lower_case_table_names=1' != '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=8K' worker[1] mysql-test-run: WARNING: Waited 60 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 120 seconds... worker[1] mysql-test-run: WARNING: Waited 120 seconds for /dev/shm/var/1/run/mysqld.1.pid to be created, still waiting for 60 seconds... worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.1 - pid: 15071, winpid: 15071] to create a pid file. parts.partition_mgm_lc1_innodb 'innodb' w1 [ fail ] Test ended at 2018-09-13 21:05:13   CURRENT_TEST: parts.partition_mgm_lc1_innodb     Failed to start mysqld.1 mysqltest failed but provided no output Here is an extract from the 10.1 failure : 10.1 21829bd743f487280dba918857259b9971a8afbf non-debug worker[1] > Restart [mysqld.1 - pid: 25894, winpid: 25894] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' worker[1] > Restart [mysqld.2 - pid: 25896, winpid: 25896] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' rpl.rpl_cant_read_event_incident 'mix' w1 [ pass ] 2664 rpl.rpl_connection 'mix' w1 [ pass ] 285 worker[1] > Restart [mysqld.1 - pid: 26273, winpid: 26273] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' worker[1] > Restart [mysqld.2 - pid: 26185, winpid: 26185] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' rpl.rpl_create_tmp_table_if_not_exists 'mix' w1 [ pass ] 242 worker[1] > Restart [mysqld.1 - pid: 26707, winpid: 26707] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' worker[1] > Restart [mysqld.2 - pid: 26709, winpid: 26709] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' rpl.rpl_empty_master_host 'mix' w1 [ pass ] 290 worker[1] > Restart [mysqld.1 - pid: 26924, winpid: 26924] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' worker[1] > Restart [mysqld.2 - pid: 26926, winpid: 26926] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' rpl.rpl_hrtime 'mix' w1 [ pass ] 218 rpl.rpl_loadfile 'mix' w1 [ pass ] 651 worker[1] > Restart [mysqld.1 - pid: 27181, winpid: 27181] - running with different options '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' != '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' worker[1] > Restart [mysqld.2 - pid: 27192, winpid: 27192] - running with different options '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' != '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' rpl.rpl_mix_found_rows 'mix' w1 [ pass ] 199 worker[1] > Restart [mysqld.1 - pid: 27475, winpid: 27475] - running with different options '--binlog-format=mixed --log-bin=master-bin --log-bin=master-bin' != '--log-bin=master-bin --binlog-format=mixed --log-bin=master-bin' worker[1] > Restart [mysqld.2 - pid: 27477, winpid: 27477] - running with different options '--binlog-format=mixed --log-bin=slave-bin --log-bin=slave-bin' != '--log-bin=slave-bin --binlog-format=mixed --log-bin=slave-bin' worker[1] mysql-test-run: WARNING: Waited 60 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 120 seconds... worker[1] mysql-test-run: WARNING: Waited 120 seconds for /usr/local/mariadb-10.1.37-linux-x86_64/mysql-test/var/1/run/mysqld.2.pid to be created, still waiting for 60 seconds... worker[1] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 27693, winpid: 27693] to create a pid file. rpl.rpl_mixed_ddl_dml 'mix' w1 [ fail ] Test ended at 2018-09-11 09:32:24   CURRENT_TEST: rpl.rpl_mixed_ddl_dml     Failed to start mysqld.2 mysqltest failed but provided no output In both cases, it is a non-debug build (some tests are skipped due to that according to the (omitted) logs). In the 10.1 case, the replication tests are restarting the server frequently, thus increasing the probability of failure. I would hope that manually running ./mysql-test-run --suite=rpl on non-debug binaries should lead us to the right track. We need stack traces of all threads in order to diagnose and fix the problem.

            Here is one more failure on 10.2.21 (should be virtually identical to the 10.2.21 release):
            http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/

            10.2 23e4446adc0de8b6f763f746fd20f0e546a40129

            mariabackup.xb_partition                 w3 [ fail ]
                    Test ended at 2019-01-03 21:58:40
             
            CURRENT_TEST: mariabackup.xb_partition
            /usr/bin/mariabackup based on MariaDB server 10.2.22-MariaDB Linux (aarch64) 
            …
            190103 21:50:16 completed OK!
            mysqltest: In included file "./include/wait_until_connected_again.inc": 
            included from ./include/start_mysqld.inc at line 18:
            included from /usr/share/mysql-test/suite/mariabackup/include/restart_and_restore.inc at line 10:
            included from /usr/share/mysql-test/suite/mariabackup/xb_partition.test at line 77:
            At line 23: Server failed to restart
            

            http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/steps/mtr/logs/mysqld.1.err.3 shows that the server hung after noticing that the redo log is clean (and after subsequently creating new redo log files):

            10.2 23e4446adc0de8b6f763f746fd20f0e546a40129

            Version: '10.2.22-MariaDB-log'  socket: '/dev/shm/var/tmp/3/mysqld.1.sock'  port: 16040  MariaDB Server
            CURRENT_TEST: mariabackup.xb_partition
            2019-01-03 21:50:14 281472921075872 [Note] /usr/sbin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
            2019-01-03 21:50:14 281472921075872 [Note] Event Scheduler: Purging the queue. 0 events
            2019-01-03 21:50:14 281473265430688 [Note] InnoDB: FTS optimize thread exiting.
            2019-01-03 21:50:14 281472921075872 [Note] InnoDB: Starting shutdown...
            2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/var/3/mysqld.1/data/ib_buffer_pool
            2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Instance 0, restricted to 128 pages due to innodb_buf_pool_dump_pct=25
            2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Buffer pool(s) dump completed at 190103 21:50:14
            2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Shutdown completed; log sequence number 1701136
            2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
            2019-01-03 21:50:15 281472921075872 [Note] /usr/sbin/mysqld: Shutdown complete
             
            2019-01-03 21:50:16 281473425204880 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 27264 ...
            2019-01-03 21:50:16 281473425204880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4186)
            2019-01-03 21:50:16 281473425204880 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Uses event mutexes
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Compressed tables use zlib 1.2.7
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using Linux native AIO
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Number of pools: 1
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using generic crc32 instructions
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Completed initialization of buffer pool
            2019-01-03 21:50:16 281473097658528 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile101 size to 5242880 bytes
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile1 size to 5242880 bytes
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: New log files created, LSN=1701136
            2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Highest supported file format is Barracuda.
            CURRENT_TEST: mariabackup.xb_partition
            2019-01-03 21:58:40 281472998172304 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 14433 ...
            

            As you can see above, the server was previously shut down cleanly. The server was hung for more than 8 minutes before it was killed and the test was retried (the second CURRENT_TEST line was written).

            marko Marko Mäkelä added a comment - Here is one more failure on 10.2.21 (should be virtually identical to the 10.2.21 release): http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/ 10.2 23e4446adc0de8b6f763f746fd20f0e546a40129 mariabackup.xb_partition w3 [ fail ] Test ended at 2019-01-03 21:58:40   CURRENT_TEST: mariabackup.xb_partition /usr/bin/mariabackup based on MariaDB server 10.2.22-MariaDB Linux (aarch64) … 190103 21:50:16 completed OK! mysqltest: In included file "./include/wait_until_connected_again.inc": included from ./include/start_mysqld.inc at line 18: included from /usr/share/mysql-test/suite/mariabackup/include/restart_and_restore.inc at line 10: included from /usr/share/mysql-test/suite/mariabackup/xb_partition.test at line 77: At line 23: Server failed to restart http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-aarch64/builds/3128/steps/mtr/logs/mysqld.1.err.3 shows that the server hung after noticing that the redo log is clean (and after subsequently creating new redo log files): 10.2 23e4446adc0de8b6f763f746fd20f0e546a40129 Version: '10.2.22-MariaDB-log' socket: '/dev/shm/var/tmp/3/mysqld.1.sock' port: 16040 MariaDB Server CURRENT_TEST: mariabackup.xb_partition 2019-01-03 21:50:14 281472921075872 [Note] /usr/sbin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown 2019-01-03 21:50:14 281472921075872 [Note] Event Scheduler: Purging the queue. 0 events 2019-01-03 21:50:14 281473265430688 [Note] InnoDB: FTS optimize thread exiting. 2019-01-03 21:50:14 281472921075872 [Note] InnoDB: Starting shutdown... 2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/var/3/mysqld.1/data/ib_buffer_pool 2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Instance 0, restricted to 128 pages due to innodb_buf_pool_dump_pct=25 2019-01-03 21:50:14 281472955052192 [Note] InnoDB: Buffer pool(s) dump completed at 190103 21:50:14 2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Shutdown completed; log sequence number 1701136 2019-01-03 21:50:15 281472921075872 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2019-01-03 21:50:15 281472921075872 [Note] /usr/sbin/mysqld: Shutdown complete   2019-01-03 21:50:16 281473425204880 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 27264 ... 2019-01-03 21:50:16 281473425204880 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4186) 2019-01-03 21:50:16 281473425204880 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000) 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Uses event mutexes 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Compressed tables use zlib 1.2.7 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using Linux native AIO 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Number of pools: 1 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Using generic crc32 instructions 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Initializing buffer pool, total size = 8M, instances = 1, chunk size = 8M 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Completed initialization of buffer pool 2019-01-03 21:50:16 281473097658528 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile101 size to 5242880 bytes 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Setting log file ./ib_logfile1 size to 5242880 bytes 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: New log files created, LSN=1701136 2019-01-03 21:50:16 281473425204880 [Note] InnoDB: Highest supported file format is Barracuda. CURRENT_TEST: mariabackup.xb_partition 2019-01-03 21:58:40 281472998172304 [Note] /usr/sbin/mysqld (mysqld 10.2.22-MariaDB-log) starting as process 14433 ... As you can see above, the server was previously shut down cleanly. The server was hung for more than 8 minutes before it was killed and the test was retried (the second CURRENT_TEST  line was written).

            For the main trees, there have not been any failures of mariabackup.xb_partition since January 2019. Maybe this one has been finally fixed in April 2019 by MDEV-12699, which fixed a hang if a corrupted page was read during startup? There also was a ‘slow startup after crash recovery’ fix MDEV-18733 in April 2019.

            marko Marko Mäkelä added a comment - For the main trees, there have not been any failures of mariabackup.xb_partition since January 2019. Maybe this one has been finally fixed in April 2019 by MDEV-12699 , which fixed a hang if a corrupted page was read during startup? There also was a ‘slow startup after crash recovery’ fix MDEV-18733 in April 2019.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.