Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24026

InnoDB: Failing assertion: os_total_large_mem_allocated >= size upon incremental backup

Details

    Description

      10.2 784473b9866

      2020-10-26 11:37:57 0x7fcaed1ba700  InnoDB: Assertion failure in file /home/mariadb/MDEV-24026/10.2/storage/innobase/os/os0proc.cc line 159
      InnoDB: Failing assertion: os_total_large_mem_allocated >= size
       
      #1  0x00007fcaee8c98b1 in __GI_abort () at abort.c:79
      #2  0x000056125da95b76 in ut_dbg_assertion_failed (expr=0x56125dfa5ec8 "os_total_large_mem_allocated >= size", file=0x56125dfa5d70 "/home/mariadb/MDEV-24026/10.2/storage/innobase/os/os0proc.cc", line=159)
          at /home/mariadb/MDEV-24026/10.2/storage/innobase/ut/ut0dbg.cc:60
      #3  0x000056125d94ea04 in os_mem_free_large (ptr=0x7fcae4000000, size=67108864) at /home/mariadb/MDEV-24026/10.2/storage/innobase/os/os0proc.cc:159
      #4  0x000056125d2269fd in wf_incremental_deinit (ctxt=0x7fcaed1b9740) at /home/mariadb/MDEV-24026/10.2/extra/mariabackup/write_filt.cc:189
      #5  0x000056125d207bbe in xtrabackup_copy_datafile (node=0x56125f3b8db0, thread_n=1, dest_name=0x0, write_filter=...) at /home/mariadb/MDEV-24026/10.2/extra/mariabackup/xtrabackup.cc:2679
      #6  0x000056125d2086c5 in data_copy_thread_func (arg=0x56125f385cb0) at /home/mariadb/MDEV-24026/10.2/extra/mariabackup/xtrabackup.cc:2977
      #7  0x00007fcaf008d6db in start_thread (arg=0x7fcaed1ba700) at pthread_create.c:463
      #8  0x00007fcaee9aaa3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Failure happens upon incremental backup. There are no obvious errors in the backup log prior to the failure.
      Non-debug builds are affected the same way.

      rr profile is available.

      The problem apparently appeared in 10.2 after this commit:

      commit 985ede92034696d544d484a29b45828d56a031a5
      Author: Vlad Lesin
      Date:   Tue Oct 20 13:05:58 2020 +0300
       
          MDEV-20755 InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup
      

      The test I use to reproduce the failure (happens frequently enough on the recent 10.2, every other run or so):

      git clone https://github.com/MariaDB/randgen --branch mdev24026 rqg-mdev24026
      cd rqg-mdev24026
      perl ./runall-trials.pl --duration=350 --threads=4 --seed=1603674122 --reporters=Backtrace,ErrorLog,Deadlock --skip-gendata --gendata-advanced --engine=InnoDB --grammar=conf/mariadb/generic-dml.yy --redefine=conf/mariadb/bulk_insert.yy --filter=conf/mariadb/10.4-combo-filter.ff --mysqld=--log_output=FILE --mysqld=--max-statement-time=20 --mysqld=--lock-wait-timeout=10 --mysqld=--loose-innodb-lock-wait-timeout=5 --scenario=MariaBackupIncremental --redefine=conf/mariadb/alter_table.yy --redefine=conf/mariadb/modules/admin.yy --basedir1=/data/src/10.2-bug --vardir1=/dev/shm/var_mbackup --trials=5
      

      Attachments

        Issue Links

          Activity

            The reason of the bug is the following:

            static my_bool xtrabackup_copy_datafile(fil_node_t *node, uint thread_n,        
                                                    const char *dest_name,                  
                                                    const xb_write_filt_t &write_filter)    
            {                                                                               
            ...                                                                             
              xb_write_filt_ctxt_t   write_filt_ctxt;                                       
            ...                                                                             
              was_dropped = (ddl_tracker.drops.find(node->space->id) != ddl_tracker.drops.end());
              pthread_mutex_unlock(&backup_mutex);                                          
              if (was_dropped) {                                                            
                fil_space_close(node->space->name);                                         
                goto skip;                                                                  
              }                                                                             
            ...                                                                             
              memset(&write_filt_ctxt, 0, sizeof(xb_write_filt_ctxt_t));                    
            ...                                                                             
            skip:                                                                           
            ...                                                                             
              if (write_filter.deinit) {                                                    
                write_filter.deinit(&write_filt_ctxt);                                      
              }
            ...                                                                             
            }                      
            

            I.e. write_filt_ctxt.u.wf_incremental_ctxt is not initialized, and it's deallocated in the case if some table was dropped during backup, and the corresponding DDL redo log record was read and parsed. This is indeed not a regression, because previous commits did not touch the sequence of local variables initialization in xtrabackup_copy_datafile().

            vlad.lesin Vladislav Lesin added a comment - The reason of the bug is the following: static my_bool xtrabackup_copy_datafile(fil_node_t *node, uint thread_n, const char *dest_name, const xb_write_filt_t &write_filter) { ... xb_write_filt_ctxt_t write_filt_ctxt; ... was_dropped = (ddl_tracker.drops.find(node->space->id) != ddl_tracker.drops.end()); pthread_mutex_unlock(&backup_mutex); if (was_dropped) { fil_space_close(node->space->name); goto skip; } ... memset(&write_filt_ctxt, 0 , sizeof(xb_write_filt_ctxt_t)); ... skip: ... if (write_filter.deinit) { write_filter.deinit(&write_filt_ctxt); } ... } I.e. write_filt_ctxt.u.wf_incremental_ctxt is not initialized, and it's deallocated in the case if some table was dropped during backup, and the corresponding DDL redo log record was read and parsed. This is indeed not a regression, because previous commits did not touch the sequence of local variables initialization in xtrabackup_copy_datafile().

            If you are backporting MDEV-20755, you also need this MDEV-24026.

            EM_Samurai Igor Yagolnitser added a comment - If you are backporting MDEV-20755 , you also need this MDEV-24026 .

            People

              vlad.lesin Vladislav Lesin
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.