Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25031

Not applying INSERT_REUSE_REDUNDANT due to corruption on page

Details

    Description

      I get this fault below, from the journal. I am not in a position to conclude if this is a bug, but it happens repeatedly and if there is a small chance it is a bug I would like to submit the fault here to have someone exclude it is a bug. Sorry to bother your with this. I am thankful for any effort taken and also for any tip what to look into specifically.

      I did check the filesystem, but there was no fault there, as far as I can see.

      Mär 02 12:26:40 social systemd[1]: mariadb.service: Main process exited, code=killed, status=9/KILL
      Mär 02 12:26:40 social systemd[1]: mariadb.service: Failed with result 'signal'.
      Mär 02 12:26:45 social systemd[1]: mariadb.service: Scheduled restart job, restart counter is at 1.
      Mär 02 12:26:45 social systemd[1]: Stopped MariaDB 10.5.9 database server.
      Mär 02 12:26:45 social systemd[1]: Starting MariaDB 10.5.9 database server...
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] /usr/bin/mariadbd (mysqld 10.5.9-MariaDB) starting as process 319571 ...
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32425)
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] InnoDB: Uses event mutexes
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] InnoDB: Number of pools: 1
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] InnoDB: Using generic crc32 instructions
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] InnoDB: Using Linux native AIO
      Mär 02 12:26:46 social mariadbd[319571]: 2021-03-02 12:26:46 0 [Note] InnoDB: Initializing buffer pool, total size = 17179869184, chunk size = 134217728
      Mär 02 12:26:47 social mariadbd[319571]: 2021-03-02 12:26:47 0 [Note] InnoDB: Completed initialization of buffer pool
      Mär 02 12:26:47 social mariadbd[319571]: 2021-03-02 12:26:47 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      Mär 02 12:26:47 social mariadbd[319571]: 2021-03-02 12:26:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1950138044187,1950909387225
      Mär 02 12:27:02 social mariadbd[319571]: 2021-03-02 12:27:02 0 [Note] InnoDB: Read redo log up to LSN=1951723032064
      Mär 02 12:27:02 social mariadbd[319571]: 2021-03-02 12:27:02 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77357.
      Mär 02 12:27:02 social mariadbd[319571]: 2021-03-02 12:27:02 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77356.
      Mär 02 12:27:06 social mariadbd[319571]: 2021-03-02 12:27:06 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77362.
      Mär 02 12:27:06 social mariadbd[319571]: 2021-03-02 12:27:06 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77363.
      Mär 02 12:27:07 social mariadbd[319571]: 2021-03-02 12:27:07 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77362.
      Mär 02 12:27:07 social mariadbd[319571]: 2021-03-02 12:27:07 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77364.
      Mär 02 12:27:07 social mariadbd[319571]: 2021-03-02 12:27:07 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77363.
      Mär 02 12:27:07 social mariadbd[319571]: 2021-03-02 12:27:07 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77365.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77364.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77366.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77365.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77367.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77366.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77368.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77367.
      Mär 02 12:27:08 social mariadbd[319571]: 2021-03-02 12:27:08 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77369.
      Mär 02 12:27:09 social mariadbd[319571]: 2021-03-02 12:27:09 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77368.
      Mär 02 12:27:09 social mariadbd[319571]: 2021-03-02 12:27:09 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77370.
      Mär 02 12:27:09 social mariadbd[319571]: 2021-03-02 12:27:09 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77369.
      Mär 02 12:27:09 social mariadbd[319571]: 2021-03-02 12:27:09 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77371.
      Mär 02 12:27:10 social mariadbd[319571]: 2021-03-02 12:27:10 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77370.
      Mär 02 12:27:10 social mariadbd[319571]: 2021-03-02 12:27:10 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77372.
      Mär 02 12:27:10 social mariadbd[319571]: 2021-03-02 12:27:10 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77371.
      Mär 02 12:27:10 social mariadbd[319571]: 2021-03-02 12:27:10 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77373.
      Mär 02 12:27:11 social mariadbd[319571]: 2021-03-02 12:27:11 0 [Note] InnoDB: Ignoring data file './ylms_friendica/workerqueue.ibd' with space ID 77374, since the redo log references ./ylms_friendica/workerqueue.ibd with space ID 77372.
      Mär 02 12:27:11 social mariadbd[319571]: 2021-03-02 12:27:11 0 [Note] InnoDB: Ignoring data file './ylms_friendica/process.ibd' with space ID 77375, since the redo log references ./ylms_friendica/process.ibd with space ID 77373.
      Mär 02 12:27:12 social mariadbd[319571]: 2021-03-02 12:27:12 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=4]
      Mär 02 12:27:12 social mariadbd[319571]: 2021-03-02 12:27:12 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
      Mär 02 12:27:12 social mariadbd[319571]: 2021-03-02 12:27:12 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 13844040843264 outside the bounds of the file: ./ibdata1
      Mär 02 12:27:12 social mariadbd[319571]: 210302 12:27:12 [ERROR] mysqld got signal 6 ;
      Mär 02 12:27:12 social mariadbd[319571]: This could be because you hit a bug. It is also possible that this binary
      Mär 02 12:27:12 social mariadbd[319571]: or one of the libraries it was linked against is corrupt, improperly built,
      Mär 02 12:27:12 social mariadbd[319571]: or misconfigured. This error can also be caused by malfunctioning hardware.
      Mär 02 12:27:12 social mariadbd[319571]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Mär 02 12:27:12 social mariadbd[319571]: We will try our best to scrape up some info that will hopefully help
      Mär 02 12:27:12 social mariadbd[319571]: diagnose the problem, but since we have already crashed,
      Mär 02 12:27:12 social mariadbd[319571]: something is definitely wrong and this may fail.
      Mär 02 12:27:12 social mariadbd[319571]: Server version: 10.5.9-MariaDB
      Mär 02 12:27:12 social mariadbd[319571]: key_buffer_size=134217728
      Mär 02 12:27:12 social mariadbd[319571]: read_buffer_size=131072
      Mär 02 12:27:12 social mariadbd[319571]: max_used_connections=0
      Mär 02 12:27:12 social mariadbd[319571]: max_threads=388
      Mär 02 12:27:12 social mariadbd[319571]: thread_count=0
      Mär 02 12:27:12 social mariadbd[319571]: It is possible that mysqld could use up to
      Mär 02 12:27:12 social mariadbd[319571]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 985198 K  bytes of memory
      Mär 02 12:27:12 social mariadbd[319571]: Hope that's ok; if not, decrease some variables in the equation.
      Mär 02 12:27:12 social mariadbd[319571]: Thread pointer: 0x0
      Mär 02 12:27:12 social mariadbd[319571]: Attempting backtrace. You can use the following information to find out
      Mär 02 12:27:12 social mariadbd[319571]: where mysqld died. If you see no messages after this, something went
      Mär 02 12:27:12 social mariadbd[319571]: terribly wrong...
      Mär 02 12:27:12 social mariadbd[319571]: stack_bottom = 0x0 thread_stack 0x49000
      Mär 02 12:27:12 social mariadbd[319571]: ??:0(my_print_stacktrace)[0x55ba2dc07ebf]
      Mär 02 12:27:12 social mariadbd[319571]: ??:0(handle_fatal_signal)[0x55ba2d6eb650]
      Mär 02 12:27:12 social mariadbd[319571]: sigaction.c:0(__restore_rt)[0x7f9a2aae4960]
      Mär 02 12:27:13 social mariadbd[319571]: :0(__GI_raise)[0x7f9a2a5fbef5]
      Mär 02 12:27:13 social mariadbd[319571]: :0(__GI_abort)[0x7f9a2a5e5862]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55ba2d3ca9da]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55ba2d3de0c5]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55ba2d3de10c]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(std::unique_lock<std::mutex>::unlock())[0x55ba2db27131]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(std::unique_lock<std::mutex>::unlock())[0x55ba2db28385]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(std::unique_lock<std::mutex>::unlock())[0x55ba2db17401]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(std::unique_lock<std::mutex>::unlock())[0x55ba2dadedd1]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(std::unique_lock<std::mutex>::unlock())[0x55ba2dad51cc]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(std::unique_lock<std::mutex>::unlock())[0x55ba2dadb6d0]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55ba2d3c3e1d]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55ba2d9f192a]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(ha_initialize_handlerton(st_plugin_int*))[0x55ba2d6ee193]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(sys_var_pluginvar::sys_var_pluginvar(sys_var_chain*, char const*, st_plugin_int*, st_mysql_sys_var*))[0x55ba2d5018dd]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(plugin_init(int*, char**, int))[0x55ba2d502a04]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(unireg_abort)[0x55ba2d42c89a]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(mysqld_main(int, char**))[0x55ba2d432725]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(__libc_start_main)[0x7f9a2a5e6b25]
      Mär 02 12:27:13 social mariadbd[319571]: ??:0(_start)[0x55ba2d4266fe]
      Mär 02 12:27:13 social mariadbd[319571]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      Mär 02 12:27:13 social mariadbd[319571]: information that should help you find out what is causing the crash.
      Mär 02 12:27:13 social mariadbd[319571]: Writing a core file...
      Mär 02 12:27:13 social mariadbd[319571]: Working directory at /var/lib/mysql
      Mär 02 12:27:13 social mariadbd[319571]: Resource Limits:
      Mär 02 12:27:13 social mariadbd[319571]: Limit                     Soft Limit           Hard Limit           Units
      Mär 02 12:27:13 social mariadbd[319571]: Max cpu time              unlimited            unlimited            seconds
      Mär 02 12:27:13 social mariadbd[319571]: Max file size             unlimited            unlimited            bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max data size             unlimited            unlimited            bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max stack size            8388608              unlimited            bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max core file size        unlimited            unlimited            bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max resident set          unlimited            unlimited            bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max processes             79997                79997                processes
      Mär 02 12:27:13 social mariadbd[319571]: Max open files            16384                16384                files
      Mär 02 12:27:13 social mariadbd[319571]: Max locked memory         65536                65536                bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max address space         unlimited            unlimited            bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max file locks            unlimited            unlimited            locks
      Mär 02 12:27:13 social mariadbd[319571]: Max pending signals       79997                79997                signals
      Mär 02 12:27:13 social mariadbd[319571]: Max msgqueue size         819200               819200               bytes
      Mär 02 12:27:13 social mariadbd[319571]: Max nice priority         0                    0
      Mär 02 12:27:13 social mariadbd[319571]: Max realtime priority     0                    0
      Mär 02 12:27:13 social mariadbd[319571]: Max realtime timeout      unlimited            unlimited            us
      Mär 02 12:27:13 social mariadbd[319571]: Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
      Mär 02 12:27:15 social systemd[1]: mariadb.service: Main process exited, code=dumped, status=6/ABRT
      Mär 02 12:27:15 social systemd[1]: mariadb.service: Failed with result 'core-dump'.
      Mär 02 12:27:15 social systemd[1]: Failed to start MariaDB 10.5.9 database server.
      

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich added a comment - - edited

            I was able to achieve during RQG testing
             
            ...  many times [Note] InnoDB: Ignoring data file ...
            2021-03-05  7:09:13 0 [Note] InnoDB: Ignoring data file 'test/t5.ibd' with space ID 27, since the redo log references test/t5.ibd with space ID 23.
             2021-03-05  7:09:16 0 [Note] InnoDB: Starting final batch to recover 763 pages from redo log.
             2021-03-05  7:09:19 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=26, page number=49]
             2021-03-05  7:09:19 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
             2021-03-05  7:09:19 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption on [page id: space=26, page number=49]
             2021-03-05  7:09:19 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
             2021-03-05  7:09:20 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1473] with error Data structure corruption
             [00] FATAL ERROR: 2021-03-05 07:09:20 mariabackup: innodb_init() returned 39 (Data structure corruption).
             
            sdp:/home/mleich/RQG_O/storage/1614954603/MDEV-25031/dev/shm/vardir/1614954603/76/1_clone/rr
            _RR_TRACE_DIR="." rr replay --mark-stdio                                  (the first prepare)
            _RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-0   (the backup before it)
             
            Archive:
            sdp:/home/mleich/RQG_O/storage/1614954603/000934.tgz
             
            Work flow:
            1. Start the server and load some initial data
            2. Two sessions run a DDL/DML mix
            3. During 2. is ongoing
                 mariabackup --backup
                Example: bin/mariabackup --host=127.0.0.1 --user=root --password=''  --innodb-use-native-aio=0  --port=26100 --backup --datadir=/dev/shm/vardir/1614954603/76/1/data --target-dir=/dev/shm/vardir/1614954603/76/1_clone/data
                 Make a backup/copy of the data written by mariabackup --backup
                 in /home/mleich/RQG_O/storage/1614954603/MDEV-25031/dev/shm/vardir/1614954603/76/1/data_backup
                 mariabackup --prepare ...    <----  I got the messages above at this step.
                 Example: mariabackup --host=127.0.0.1 --user=root --password=''  --innodb-use-native-aio=0  --port=26104 --prepare --target-dir=/dev/shm/vardir/1614954603/76/1_clone/data
                 mariabackup --prepare ... again in that directory   (users do not need to run a second prepare)
                 Start a server on the data there + check consistency
            In most of the RQG tests all goes well.
             
            origin/10.6 33cf577ad86f3a7dd6ff86fee4ecd9c535161f02 2021-03-05T14:26:14+02:00
            The server claims to be 10.6.0.
             
            RQG
            ====
            git clone https://github.com/mleich1/rqg --branch experimental RQG
             
            perl rqg.pl \              
            --grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
            --gendata=conf_redundant/mariadb/table_stress.zz \
            --gendata_sql=conf/mariadb/table_stress.sql \
            --reporters=Mariabackup_linux \
            --mysqld=--innodb_use_native_aio=1 \
            --mysqld=--innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--net_read_timeout=30 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --no-mask \
            --queries=10000000 \
            --seed=random \
            --reporters=Backtrace \
            --reporters=ErrorLog \
            --reporters=Deadlock1 \
            --validators=None \
            --mysqld=--log_output=none \
            --mysqld=--log-bin \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --engine=InnoDB \
            --restart_timeout=240 \
            --mysqld=--plugin-load-add=file_key_management.so \
            --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf_redundant/mariadb/encryption_keys.txt \
            --duration=300 \
            --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            --mysqld=--loose-innodb-sync-debug \
            --mysqld=--innodb_stats_persistent=on \
            --mysqld=--innodb_adaptive_hash_index=off \
            --mysqld=--loose-max-statement-time=30 \
            --threads=2 \
            --rr=Extended \
            --mysqld=--innodb_page_size=4K \
            --mysqld=--innodb-buffer-pool-size=5M \
            --duration=300 \
            --workdir=<local settings> \
            --vardir=<local settings> \
            --mtr-build-thread=<local settings> \
            --basedir1=<local settings> \
            --script_debug=_nix_
             
            conf_redundant/mariadb/table_stress.zz
            $tables = {
                pk      => [ 'int' ],
                engines => [ 'InnoDB' ],
                rows => [ 0, 1, 1000 ],
                row_formats => [ 'REDUNDANT' ]
            };
             
            $fields = {
                types   => [ 'int' ],
                indexes => [ undef ]
            };
             
            $data = {
                numbers => [ 'digit' ]
            };
            I will try to simplify the test/increase the likelihood to replay the problem.
            

            mleich Matthias Leich added a comment - - edited I was able to achieve during RQG testing   ... many times [Note] InnoDB: Ignoring data file ... 2021-03-05 7:09:13 0 [Note] InnoDB: Ignoring data file 'test/t5.ibd' with space ID 27, since the redo log references test/t5.ibd with space ID 23. 2021-03-05 7:09:16 0 [Note] InnoDB: Starting final batch to recover 763 pages from redo log. 2021-03-05 7:09:19 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=26, page number=49] 2021-03-05 7:09:19 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. 2021-03-05 7:09:19 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption on [page id: space=26, page number=49] 2021-03-05 7:09:19 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. 2021-03-05 7:09:20 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1473] with error Data structure corruption [00] FATAL ERROR: 2021-03-05 07:09:20 mariabackup: innodb_init() returned 39 (Data structure corruption).   sdp:/home/mleich/RQG_O/storage/1614954603/MDEV-25031/dev/shm/vardir/1614954603/76/1_clone/rr _RR_TRACE_DIR="." rr replay --mark-stdio (the first prepare) _RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-0 (the backup before it)   Archive: sdp:/home/mleich/RQG_O/storage/1614954603/000934.tgz   Work flow: 1. Start the server and load some initial data 2. Two sessions run a DDL/DML mix 3. During 2. is ongoing mariabackup --backup Example: bin/mariabackup --host=127.0.0.1 --user=root --password='' --innodb-use-native-aio=0 --port=26100 --backup --datadir=/dev/shm/vardir/1614954603/76/1/data --target-dir=/dev/shm/vardir/1614954603/76/1_clone/data Make a backup/copy of the data written by mariabackup --backup in /home/mleich/RQG_O/storage/1614954603/MDEV-25031/dev/shm/vardir/1614954603/76/1/data_backup mariabackup --prepare ... <---- I got the messages above at this step. Example: mariabackup --host=127.0.0.1 --user=root --password='' --innodb-use-native-aio=0 --port=26104 --prepare --target-dir=/dev/shm/vardir/1614954603/76/1_clone/data mariabackup --prepare ... again in that directory (users do not need to run a second prepare) Start a server on the data there + check consistency In most of the RQG tests all goes well.   origin/10.6 33cf577ad86f3a7dd6ff86fee4ecd9c535161f02 2021-03-05T14:26:14+02:00 The server claims to be 10.6.0.   RQG ==== git clone https://github.com/mleich1/rqg --branch experimental RQG   perl rqg.pl \ --grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \ --gendata=conf_redundant/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --reporters=Mariabackup_linux \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=240 \ --mysqld=--plugin-load-add=file_key_management.so \ --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf_redundant/mariadb/encryption_keys.txt \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--loose-innodb-sync-debug \ --mysqld=--innodb_stats_persistent=on \ --mysqld=--innodb_adaptive_hash_index=off \ --mysqld=--loose-max-statement-time=30 \ --threads=2 \ --rr=Extended \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=5M \ --duration=300 \ --workdir=<local settings> \ --vardir=<local settings> \ --mtr-build-thread=<local settings> \ --basedir1=<local settings> \ --script_debug=_nix_   conf_redundant/mariadb/table_stress.zz $tables = { pk => [ 'int' ], engines => [ 'InnoDB' ], rows => [ 0, 1, 1000 ], row_formats => [ 'REDUNDANT' ] };   $fields = { types => [ 'int' ], indexes => [ undef ] };   $data = { numbers => [ 'digit' ] }; I will try to simplify the test/increase the likelihood to replay the problem.

            I realized that page 4 in the system tablespace is the root page of the change buffer B-tree (FSP_IBUF_TREE_ROOT_PAGE_NO). That is in the original InnoDB format. So, even if no tables are in ROW_FORMAT=REDUNDANT, this error is possible, when it affects the change buffer pages in the system tablespace.

            marko Marko Mäkelä added a comment - I realized that page 4 in the system tablespace is the root page of the change buffer B-tree ( FSP_IBUF_TREE_ROOT_PAGE_NO ). That is in the original InnoDB format. So, even if no tables are in ROW_FORMAT=REDUNDANT , this error is possible, when it affects the change buffer pages in the system tablespace.

            The rr replay trace that mleich posted shows an off-by-1 problem in page_cur_insert_rec_low():

            (rr) p/x cur->rec[-extra_size]@extra_size
            $14 = {0x80, 0x0, 0x0, 0x40, 0x43, 0x40, 0x80, 0x2c, 0x0, 0x2c, 0x0, 0x19, 
              0x0, 0x15, 0x0, 0x11, 0x0, 0xa, 0x0, 0x4, 0x0, 0x0, 0x18, 0x10, 0x7, 0x55}
            (rr) p/x *insert_buf@extra_size
            $15 = {0x3, 0x44, 0x83, 0x40, 0x43, 0x40, 0x80, 0x2c, 0x0, 0x2c, 0x0, 0x19, 
              0x0, 0x15, 0x0, 0x11, 0x0, 0xa, 0x0, 0x4, 0x0, 0x0, 0x20, 0x14, 0x0, 0x74}
            

            The extra_size is actually referring to the record that we are about to insert, and not to the predecessor record cur->rec from which we are letting recovery copy a number of identical bytes. It turns out that the correct extra_size of cur->rec is 4 bytes shorter, and the first 4 bytes above actually belong to the preceding record. It happens so that the last of the 4 bytes matches insert_buf, and we will write an INSERT_HEAP_REDUNDANT record where the condition on pextra_size will be violated in page_apply_insert_redundant():

              if (UNIV_UNLIKELY(hdr_c > extra_size || hdr_c > pextra_size))
                goto corrupted;
            

            I think that we must simply make the recovery more tolerant and allow us to reuse data that resides before the preceding record. We are already doing the same in page_apply_insert_dynamic(). We must relax the above condition, but ensure that hdr_c is not referring to anything that is before the start of the payload area (which is something that we checked in page_cur_insert_rec_low() before writing the record):

                const byte * const c_start= cur->rec - extra_size;
                if (c_start >=
                    &block->frame[PAGE_OLD_SUPREMUM_END + REC_N_OLD_EXTRA_BYTES])
            

            marko Marko Mäkelä added a comment - The rr replay trace that mleich posted shows an off-by-1 problem in page_cur_insert_rec_low() : (rr) p/x cur->rec[-extra_size]@extra_size $14 = {0x80, 0x0, 0x0, 0x40, 0x43, 0x40, 0x80, 0x2c, 0x0, 0x2c, 0x0, 0x19, 0x0, 0x15, 0x0, 0x11, 0x0, 0xa, 0x0, 0x4, 0x0, 0x0, 0x18, 0x10, 0x7, 0x55} (rr) p/x *insert_buf@extra_size $15 = {0x3, 0x44, 0x83, 0x40, 0x43, 0x40, 0x80, 0x2c, 0x0, 0x2c, 0x0, 0x19, 0x0, 0x15, 0x0, 0x11, 0x0, 0xa, 0x0, 0x4, 0x0, 0x0, 0x20, 0x14, 0x0, 0x74} The extra_size is actually referring to the record that we are about to insert, and not to the predecessor record cur->rec from which we are letting recovery copy a number of identical bytes. It turns out that the correct extra_size of cur->rec is 4 bytes shorter, and the first 4 bytes above actually belong to the preceding record. It happens so that the last of the 4 bytes matches insert_buf , and we will write an INSERT_HEAP_REDUNDANT record where the condition on pextra_size will be violated in page_apply_insert_redundant() : if (UNIV_UNLIKELY(hdr_c > extra_size || hdr_c > pextra_size)) goto corrupted; I think that we must simply make the recovery more tolerant and allow us to reuse data that resides before the preceding record. We are already doing the same in page_apply_insert_dynamic() . We must relax the above condition, but ensure that hdr_c is not referring to anything that is before the start of the payload area (which is something that we checked in page_cur_insert_rec_low() before writing the record): const byte * const c_start= cur->rec - extra_size; if (c_start >= &block->frame[PAGE_OLD_SUPREMUM_END + REC_N_OLD_EXTRA_BYTES])

            While testing this, we found another crash recovery regression MDEV-25110, introduced in 10.5.7. That one could mostly affect Mariabackup and not normal crash recovery.

            marko Marko Mäkelä added a comment - While testing this, we found another crash recovery regression MDEV-25110 , introduced in 10.5.7. That one could mostly affect Mariabackup and not normal crash recovery.
            haidong Haidong Ji added a comment -

            I have come across a crash with the same error message:

            2022-05-23 10:12:14 0 [ERROR] InnoDB: Not applying INSERT_REUSE_DYNAMIC due to corruption on [page id: space=745895, page number=647870]
            

            But unlike this bug that is trigger from the "page_apply_insert_redundant" function, the bug I observed follows the path of "page_apply_insert_dynamic" before the crash. For that reason, I will file a new JIRA for it.

            haidong Haidong Ji added a comment - I have come across a crash with the same error message: 2022-05-23 10:12:14 0 [ERROR] InnoDB: Not applying INSERT_REUSE_DYNAMIC due to corruption on [page id: space=745895, page number=647870] But unlike this bug that is trigger from the "page_apply_insert_redundant" function, the bug I observed follows the path of "page_apply_insert_dynamic" before the crash. For that reason, I will file a new JIRA for it.

            People

              marko Marko Mäkelä
              utzer Ben Utzer
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.