Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20481

innodb.blob-crash crashes during recovery

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
    • 10.3.28, 10.4.18, 10.5.9, 10.6.0
    • None
    • The description is from a crash with bb-10.3-monty, a merge of 10.2 to 10.3
      The reported crash is on Windows 64bits

    Description

      innodb.blob-crash '16k,debug,innodb' w3 [ fail ]
      Test ended at 2019-09-03 15:35:37

      CURRENT_TEST: innodb.blob-crash
      ...
      2019-09-03 15:35:30 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9301498
      Assertion failed: recv_sys->mlog_checkpoint_lsn <= recv_sys->recovered_lsn, file D:\winx64-debug\build\src\storage\innobase\log\log0recv.cc, line 2838
      190903 15:35:30 [ERROR] mysqld got exception 0x80000003 ;

      mysqld.exe!my_sigabrt_handler()[my_thr_init.c:485]
      mysqld.exe!raise()[signal.cpp:516]
      mysqld.exe!abort()[abort.cpp:71]
      mysqld.exe!common_assert_to_stderr<wchar_t>()[assert.cpp:149]
      mysqld.exe!_wassert()[assert.cpp:404]
      mysqld.exe!recv_parse_log_recs()[log0recv.cc:2837]
      mysqld.exe!recv_scan_log_recs()[log0recv.cc:3309]
      mysqld.exe!recv_group_scan_log_recs()[log0recv.cc:3403]
      mysqld.exe!recv_recovery_from_checkpoint_start()[log0recv.cc:3705]
      mysqld.exe!srv_start()[srv0start.cc:1973]
      mysqld.exe!innodb_init()[ha_innodb.cc:4262]
      mysqld.exe!ha_initialize_handlerton()[handler.cc:549]
      mysqld.exe!plugin_initialize()[sql_plugin.cc:1433]

      There are 5 failures in buildbot cross reference about other failures with blob.crash in 10.4 and 10.5. The other crashes comes from this line
      mysqld: /home/buildbot/buildbot/build/mariadb-10.4.8/storage/innobase/fil/fil0fil.cc:4567: static ulint Check::validate(const fil_space_t*): Assertion `space->id != TRX_SYS_SPACE || space == fil_system.sys_space' failed.

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment -

            http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/21312/steps/test/logs/stdio

            innodb.blob-crash '4k,debug,innodb'      w3 [ fail ]
                    Test ended at 2020-10-09 10:15:45
             
            201009 10:15:43 [ERROR] mysqld got exception 0x80000003 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
             
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
             
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed, 
            something is definitely wrong and this may fail.
             
            Server version: 10.5.7-MariaDB-debug-log
            key_buffer_size=1048576
            read_buffer_size=131072
            max_used_connections=0
            max_threads=65537
            thread_count=0
            It is possible that mysqld could use up to 
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4635 K  bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
             
            Thread pointer: 0x0
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            server.dll!my_sigabrt_handler()[my_thr_init.c:465]
            ucrtbase.dll!raise()
            ucrtbase.dll!abort()
            server.dll!ib::fatal::~fatal()[ut0ut.cc:580]
            server.dll!buf_pool_t::assert_all_freed()[buf0buf.cc:4398]
            server.dll!buf_pool_invalidate()[buf0buf.cc:4438]
            server.dll!recv_sys_t::apply()[log0recv.cc:2782]
            server.dll!recv_group_scan_log_recs()[log0recv.cc:3140]
            server.dll!recv_recovery_from_checkpoint_start()[log0recv.cc:3590]
            server.dll!srv_start()[srv0start.cc:1531]
            server.dll!innodb_init()[ha_innodb.cc:3899]
            server.dll!ha_initialize_handlerton()[handler.cc:645]
            server.dll!plugin_initialize()[sql_plugin.cc:1459]
            server.dll!plugin_init()[sql_plugin.cc:1751]
            server.dll!init_server_components()[mysqld.cc:4913]
            server.dll!win_main()[mysqld.cc:5496]
            server.dll!mysql_service()[mysqld.cc:5744]
            server.dll!mysqld_main()[mysqld.cc:5962]
            mariadbd.exe!main()[main.cc:26]
            mariadbd.exe!__scrt_common_main_seh()[exe_common.inl:288]
            KERNEL32.DLL!BaseThreadInitThunk()
            ntdll.dll!RtlUserThreadStart()
            The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
            information that should help you find out what is causing the crash.
            Writing a core file at D:\winx64-debug\build\mysql-test\var\3\mysqld.1\data\
            

            alice Alice Sherepa added a comment - http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/21312/steps/test/logs/stdio innodb.blob-crash '4k,debug,innodb' w3 [ fail ] Test ended at 2020-10-09 10:15:45   201009 10:15:43 [ERROR] mysqld got exception 0x80000003 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.   To report this bug, see https://mariadb.com/kb/en/reporting-bugs   We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.   Server version: 10.5.7-MariaDB-debug-log key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=0 max_threads=65537 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4635 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.   Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... server.dll!my_sigabrt_handler()[my_thr_init.c:465] ucrtbase.dll!raise() ucrtbase.dll!abort() server.dll!ib::fatal::~fatal()[ut0ut.cc:580] server.dll!buf_pool_t::assert_all_freed()[buf0buf.cc:4398] server.dll!buf_pool_invalidate()[buf0buf.cc:4438] server.dll!recv_sys_t::apply()[log0recv.cc:2782] server.dll!recv_group_scan_log_recs()[log0recv.cc:3140] server.dll!recv_recovery_from_checkpoint_start()[log0recv.cc:3590] server.dll!srv_start()[srv0start.cc:1531] server.dll!innodb_init()[ha_innodb.cc:3899] server.dll!ha_initialize_handlerton()[handler.cc:645] server.dll!plugin_initialize()[sql_plugin.cc:1459] server.dll!plugin_init()[sql_plugin.cc:1751] server.dll!init_server_components()[mysqld.cc:4913] server.dll!win_main()[mysqld.cc:5496] server.dll!mysql_service()[mysqld.cc:5744] server.dll!mysqld_main()[mysqld.cc:5962] mariadbd.exe!main()[main.cc:26] mariadbd.exe!__scrt_common_main_seh()[exe_common.inl:288] KERNEL32.DLL!BaseThreadInitThunk() ntdll.dll!RtlUserThreadStart() The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file at D:\winx64-debug\build\mysql-test\var\3\mysqld.1\data\

            The last failure of innodb.blob-crash for 10.5 with "still fixed or dirty" occured in 2020. That seems to be exactly the failure that alice posted above:

            10.5 a891fe6aa5ede1bf1f4551cfa75cf856

            innodb.blob-crash '4k,debug,innodb'      w3 [ fail ]
            …
            2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 338 pages from redo log.
            2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log.
            2020-10-09 10:15:43 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log.
            2020-10-09 10:15:43 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty
            

            MDEV-23399 and MDEV-23855 were pushed to the 10.5 branch soon after this. They could have fixed this trouble for this particular test.

            I remember seeing some possibly related failure in 10.5, even recently. I think it was this one:

            void
            rw_lock_free_func(
            /*==============*/
            	rw_lock_t*	lock)	/*!< in/out: rw-lock */
            {
            	ut_ad(rw_lock_validate(lock));
            	ut_a(lock->lock_word == X_LOCK_DECR);
            

            In MDEV-24142 (10.6), that code was replaced with debug assertions in srw_lock::destroy() and sux_lock::free(), but I do not remember seeing those assertions failing.

            marko Marko Mäkelä added a comment - The last failure of innodb.blob-crash for 10.5 with "still fixed or dirty" occured in 2020. That seems to be exactly the failure that alice posted above: 10.5 a891fe6aa5ede1bf1f4551cfa75cf856 innodb.blob-crash '4k,debug,innodb' w3 [ fail ] … 2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 338 pages from redo log. 2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log. 2020-10-09 10:15:43 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log. 2020-10-09 10:15:43 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty MDEV-23399 and MDEV-23855 were pushed to the 10.5 branch soon after this. They could have fixed this trouble for this particular test. I remember seeing some possibly related failure in 10.5, even recently. I think it was this one: void rw_lock_free_func( /*==============*/ rw_lock_t* lock) /*!< in/out: rw-lock */ { ut_ad(rw_lock_validate(lock)); ut_a(lock->lock_word == X_LOCK_DECR); In MDEV-24142 (10.6), that code was replaced with debug assertions in srw_lock::destroy() and sux_lock::free() , but I do not remember seeing those assertions failing.

            The related failure that I am referring to is MDEV-26733, seemingly specific to 10.5 and the test innodb.undo_truncate.

            marko Marko Mäkelä added a comment - The related failure that I am referring to is MDEV-26733 , seemingly specific to 10.5 and the test innodb.undo_truncate .

            Because I had some trouble with the Buildbot cross-reference, elenst provided to me a list of some failures for the failing still fixed or dirty check on shutdown. I accidentally asked her to search anything after 2019-10-09, instead of 2020-10-09, which was the last failure returned also by her search.

            Failures are rather rare: none in 2020-01, 2020-02, 2020-11, 2020-12, and otherwise it was failing about once per month in some main branch. Here are some of other latest failures:

            10.3 fbef428645e01113521824c2eb0a42fe

            CURRENT_TEST: innodb_gis.types
            …
            2020-04-03  8:18:10 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=31648455
            2020-04-03  8:18:11 0 [Note] InnoDB: Starting a batch to recover 307 pages from redo log.
            2020-04-03  8:18:11 0 [Note] InnoDB: Starting a batch to recover 293 pages from redo log.
            2020-04-03  8:18:11 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=511] still fixed or dirty
            

            10.5 9ae015878f11be3e3033fd1b35357ea5

            CURRENT_TEST: innodb.blob-crash
            …
            2020-03-10 15:37:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47877
            2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 110 pages from redo log.
            2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 89 pages from redo log.
            2020-03-10 15:37:09 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty
            

            10.5 90b7ac28a917fde7bceed573956aa4c6

            CURRENT_TEST: innodb.blob-crash
            …
            2020-03-17 16:43:28 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=7956119
            2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 168 pages from redo log.
            2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log.
            2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log.
            2020-03-17 16:43:29 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=0] still fixed or dirty
            

            MDEV-24449 was fixed in 10.2 on December 28, and in 10.5 on December 21. The test innodb.blob-crash does define a secondary index on the table (t3), so the change buffer might be used for it. Likewise, the test innodb_gis.types, which often runs out of memory on recovery (MDEV-22512), defines a secondary index on the table t_wl6455. I think that it is plausible that this bug was fixed by MDEV-24449.

            marko Marko Mäkelä added a comment - Because I had some trouble with the Buildbot cross-reference, elenst provided to me a list of some failures for the failing still fixed or dirty check on shutdown. I accidentally asked her to search anything after 2019-10-09, instead of 2020-10-09, which was the last failure returned also by her search. Failures are rather rare: none in 2020-01, 2020-02, 2020-11, 2020-12, and otherwise it was failing about once per month in some main branch. Here are some of other latest failures: 10.3 fbef428645e01113521824c2eb0a42fe CURRENT_TEST: innodb_gis.types … 2020-04-03 8:18:10 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=31648455 2020-04-03 8:18:11 0 [Note] InnoDB: Starting a batch to recover 307 pages from redo log. 2020-04-03 8:18:11 0 [Note] InnoDB: Starting a batch to recover 293 pages from redo log. 2020-04-03 8:18:11 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=511] still fixed or dirty 10.5 9ae015878f11be3e3033fd1b35357ea5 CURRENT_TEST: innodb.blob-crash … 2020-03-10 15:37:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47877 2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 110 pages from redo log. 2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 89 pages from redo log. 2020-03-10 15:37:09 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty 10.5 90b7ac28a917fde7bceed573956aa4c6 CURRENT_TEST: innodb.blob-crash … 2020-03-17 16:43:28 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=7956119 2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 168 pages from redo log. 2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log. 2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log. 2020-03-17 16:43:29 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=0] still fixed or dirty MDEV-24449 was fixed in 10.2 on December 28, and in 10.5 on December 21. The test innodb.blob-crash does define a secondary index on the table ( t3 ), so the change buffer might be used for it. Likewise, the test innodb_gis.types , which often runs out of memory on recovery ( MDEV-22512 ), defines a secondary index on the table t_wl6455 . I think that it is plausible that this bug was fixed by MDEV-24449 .

            I believe that this was fixed by MDEV-24449 for 10.3 and 10.4, and by MDEV-24448 for 10.5 and 10.6.

            marko Marko Mäkelä added a comment - I believe that this was fixed by MDEV-24449 for 10.3 and 10.4, and by MDEV-24448 for 10.5 and 10.6.

            People

              marko Marko Mäkelä
              monty Michael Widenius
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.