Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21674

purge_sys.stop() no longer waits for purge workers to complete

Details

    Description

      As noted at the start of 10.5-Test-Remove-innodb_log_optimize_ddl-and-FlushObserve.patch (which is a minimal port of something that we want to do in MDEV-12353), purge_sys.stop() was broken in MDEV-16264. It no longer waits for the purge worker tasks to finish processing the current event.

      There likely is a race condition, but that race condition is not prominent before we remove the FlushObserver, like the patch does. For the record, I also ported the change to 10.3, and did not observe any crash in the two tests that exercise FLUSH TABLES…FOR EXPORT.

      The crash would usually be with the following assertion failure:

      10.5ish

      2020-02-06 10:01:33 4 [Note] InnoDB: Sync to disk of `test`.`t1` started.
      2020-02-06 10:01:33 4 [Note] InnoDB: Stopping purge
      mysqld: /mariadb/10.5-MDEV-12353bis/storage/innobase/buf/buf0lru.cc:657: void buf_flush_dirty_pages(buf_pool_t *, ulint, bool, ulint): Assertion `first || buf_pool_get_dirty_pages_count(buf_pool, id) == 0' failed.
      #7  0x000055bbae5a916a in buf_flush_dirty_pages (buf_pool=0x55bbb1c269d0, id=5, flush=true, first=0) at /mariadb/10.5-MDEV-12353bis/storage/innobase/buf/buf0lru.cc:656
      #8  buf_LRU_flush_or_remove_pages (id=5, flush=true, first=0) at /mariadb/10.5-MDEV-12353bis/storage/innobase/buf/buf0lru.cc:670
      #9  0x000055bbae491d14 in row_quiesce_table_start (table=<optimized out>, trx=0x7fd726c62138) at /mariadb/10.5-MDEV-12353bis/storage/innobase/row/row0quiesce.cc:538
      

      Attachments

        Issue Links

          Activity

            The following patch might be helpful if you cannot reproduce the problem by applying 10.5-Test-Remove-innodb_log_optimize_ddl-and-FlushObserve.patch to the problematic commit and running:

            ./mtr --parallel=auto --repeat=10 encryption.innodb-discard-import{,,,,}  innodb.innodb-wl5522{,,,}
            

            diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc
            index c4e20c973a0..1e00e1f3fbd 100644
            --- a/storage/innobase/srv/srv0srv.cc
            +++ b/storage/innobase/srv/srv0srv.cc
            @@ -2190,7 +2190,7 @@ void purge_worker_callback(void*)
             	ut_ad(srv_force_recovery < SRV_FORCE_NO_BACKGROUND);
             	void* ctx;
             	THD* thd = acquire_thd(&ctx);
            -	while (srv_task_execute()){}
            +	while (srv_task_execute()) { ut_ad(purge_sys.running()); }
             	release_thd(thd,ctx);
             }
             
            

            Note: I am not yet sure if that assertion is valid. If it turns out to be, then I think that it should be part of the fix.
            Also, while fixing this, I would suggest declaring and defining

            bool purge_sys_t::running() const

            (with the const qualifier).

            marko Marko Mäkelä added a comment - The following patch might be helpful if you cannot reproduce the problem by applying 10.5-Test-Remove-innodb_log_optimize_ddl-and-FlushObserve.patch to the problematic commit and running: ./mtr --parallel=auto --repeat=10 encryption.innodb-discard-import{,,,,} innodb.innodb-wl5522{,,,} diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc index c4e20c973a0..1e00e1f3fbd 100644 --- a/storage/innobase/srv/srv0srv.cc +++ b/storage/innobase/srv/srv0srv.cc @@ -2190,7 +2190,7 @@ void purge_worker_callback(void*) ut_ad(srv_force_recovery < SRV_FORCE_NO_BACKGROUND); void* ctx; THD* thd = acquire_thd(&ctx); - while (srv_task_execute()){} + while (srv_task_execute()) { ut_ad(purge_sys.running()); } release_thd(thd,ctx); } Note: I am not yet sure if that assertion is valid. If it turns out to be, then I think that it should be part of the fix. Also, while fixing this, I would suggest declaring and defining bool purge_sys_t::running() const (with the const qualifier).

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.