Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34446

SIGSEGV on SET GLOBAL innodb_log_file_size with memory-mapped log file

Details

    Description

      mleich provided an rr replay trace where InnoDB would crash soon after a SET GLOBAL innodb_log_file_size=4194304 was completed. The reason appears to be that in log_t::resize_write() we fail to wrap around the log_sys.resize_lsn when the start offset (disregarding the length of the mini-transaction) exceeds the log_sys.resize_target (the requested size of the file). We would only perform the wrap-around when the start offset + the length would wrap around.

      In the trace that I analyzed, we would catch a SIGSEGV on an attempt to write to log_sys.buf at an offset somewhere between 4 MiB and 8 MiB while the size of the memory mapped region is 4 MiB.

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich added a comment - - edited

            # git clone https://github.com/mleich1/rqg --branch RQG
            #
            # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 7f277ec35033c80ce03c0de33696992bd7c149a8 2024-06-18T16:13:41+02:00
            # rqg.pl  : Version 4.5.0 (2023-12)
            #
            # $RQG_HOME/rqg.pl \
            # --gendata=conf/mariadb/table_stress.zz \
            # --gendata_sql=conf/mariadb/table_stress.sql \
            # --grammar=conf/mariadb/table_stress_innodb_dml.yy \
            # --reporters=Mariabackup_linux \
            # --mysqld=--loose-innodb-log-file-size=200M \
            # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
            # --mysqld=--loose-idle_write_transaction_timeout=0 \
            # --mysqld=--loose-idle_transaction_timeout=0 \
            # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            # --mysqld=--connect_timeout=60 \
            # --mysqld=--interactive_timeout=28800 \
            # --mysqld=--slave_net_timeout=60 \
            # --mysqld=--net_read_timeout=30 \
            # --mysqld=--net_write_timeout=60 \
            # --mysqld=--loose-table_lock_wait_timeout=50 \
            # --mysqld=--wait_timeout=28800 \
            # --no_mask \
            # --queries=10000000 \
            # --seed=random \
            # --reporters=None \
            # --reporters=ErrorLog \
            # --reporters=Deadlock \
            # --validators=None \
            # --mysqld=--log_output=none \
            # --mysqld=--log_bin_trust_function_creators=1 \
            # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            # --engine=InnoDB \
            # --restart_timeout=240 \
            # --mysqld=--plugin-load-add=file_key_management.so \
            # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
            # --mysqld=--plugin-load-add=provider_lzo.so \
            # --mysqld=--plugin-load-add=provider_bzip2.so \
            # --mysqld=--plugin-load-add=provider_lzma.so \
            # --mysqld=--plugin-load-add=provider_snappy.so \
            # --mysqld=--plugin-load-add=provider_lz4.so \
            # --mysqld=--loose-innodb_compression_level=1 \
            # --duration=300 \
            # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            # --redefine=conf/mariadb/redefine_innodb_log_write_ahead_size.yy \
            # --redefine=conf/mariadb/redefine_innodb_log_size_dynamic.yy \
            # --mysqld=--lock-wait-timeout=86400 \
            # --mysqld=--innodb-lock-wait-timeout=50 \
            # --mysqld=--sql_mode=STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION \
            # --mysqld=--loose-innodb_read_only_compressed=OFF \
            # --mysqld=--innodb_stats_persistent=off \
            # --mysqld=--innodb_adaptive_hash_index=on \
            # --mysqld=--innodb_random_read_ahead=OFF \
            # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
            # --threads=33 \
            # --rr=Extended \
            # --rr_options=--wait \
            # --mysqld=--innodb_undo_log_truncate=OFF \
            # --mysqld=--loose_innodb_change_buffering=all \
            # --mysqld=--innodb_rollback_on_timeout=OFF \
            # --mysqld=--innodb_page_size=16K \
            # --mysqld=--innodb-buffer-pool-size=10M \
            # --vardir_type=fast \
            # --no_mask \
            # <local settings
             
            Error pattern:
            [ 'TBR-2145', '\[ERROR\] mysqld got signal 11.+#.{1,50} mtr_t::finish_writer.+#.{1,50} mtr_buf_t::for_each_block.+#.{1,50} mtr_t::finish_writer.+#.{1,50} in mtr_t::finish_write.+#.{1,50} mtr_t::commit_files.+#.{1,50} in fil_names_clear.+#.{1,50} in log_checkpoint_low' ],
            

            mleich Matthias Leich added a comment - - edited # git clone https://github.com/mleich1/rqg --branch RQG # # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 7f277ec35033c80ce03c0de33696992bd7c149a8 2024-06-18T16:13:41+02:00 # rqg.pl : Version 4.5.0 (2023-12) # # $RQG_HOME/rqg.pl \ # --gendata=conf/mariadb/table_stress.zz \ # --gendata_sql=conf/mariadb/table_stress.sql \ # --grammar=conf/mariadb/table_stress_innodb_dml.yy \ # --reporters=Mariabackup_linux \ # --mysqld=--loose-innodb-log-file-size=200M \ # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \ # --mysqld=--loose-idle_write_transaction_timeout=0 \ # --mysqld=--loose-idle_transaction_timeout=0 \ # --mysqld=--loose-idle_readonly_transaction_timeout=0 \ # --mysqld=--connect_timeout=60 \ # --mysqld=--interactive_timeout=28800 \ # --mysqld=--slave_net_timeout=60 \ # --mysqld=--net_read_timeout=30 \ # --mysqld=--net_write_timeout=60 \ # --mysqld=--loose-table_lock_wait_timeout=50 \ # --mysqld=--wait_timeout=28800 \ # --no_mask \ # --queries=10000000 \ # --seed=random \ # --reporters=None \ # --reporters=ErrorLog \ # --reporters=Deadlock \ # --validators=None \ # --mysqld=--log_output=none \ # --mysqld=--log_bin_trust_function_creators=1 \ # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ # --engine=InnoDB \ # --restart_timeout=240 \ # --mysqld=--plugin-load-add=file_key_management.so \ # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \ # --mysqld=--plugin-load-add=provider_lzo.so \ # --mysqld=--plugin-load-add=provider_bzip2.so \ # --mysqld=--plugin-load-add=provider_lzma.so \ # --mysqld=--plugin-load-add=provider_snappy.so \ # --mysqld=--plugin-load-add=provider_lz4.so \ # --mysqld=--loose-innodb_compression_level=1 \ # --duration=300 \ # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ # --redefine=conf/mariadb/redefine_innodb_log_write_ahead_size.yy \ # --redefine=conf/mariadb/redefine_innodb_log_size_dynamic.yy \ # --mysqld=--lock-wait-timeout=86400 \ # --mysqld=--innodb-lock-wait-timeout=50 \ # --mysqld=--sql_mode=STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION \ # --mysqld=--loose-innodb_read_only_compressed=OFF \ # --mysqld=--innodb_stats_persistent=off \ # --mysqld=--innodb_adaptive_hash_index=on \ # --mysqld=--innodb_random_read_ahead=OFF \ # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \ # --threads=33 \ # --rr=Extended \ # --rr_options=--wait \ # --mysqld=--innodb_undo_log_truncate=OFF \ # --mysqld=--loose_innodb_change_buffering=all \ # --mysqld=--innodb_rollback_on_timeout=OFF \ # --mysqld=--innodb_page_size=16K \ # --mysqld=--innodb-buffer-pool-size=10M \ # --vardir_type=fast \ # --no_mask \ # <local settings   Error pattern: [ 'TBR-2145', '\[ERROR\] mysqld got signal 11.+#.{1,50} mtr_t::finish_writer.+#.{1,50} mtr_buf_t::for_each_block.+#.{1,50} mtr_t::finish_writer.+#.{1,50} in mtr_t::finish_write.+#.{1,50} mtr_t::commit_files.+#.{1,50} in fil_names_clear.+#.{1,50} in log_checkpoint_low' ],
            mleich Matthias Leich added a comment - - edited

            origin/10.11-MDEV-34446 99e1836bba37027d3e717f91a0bf61f5cdf48d40 2024-06-24T17:33:37+03:00 behaved well in RQG testing.
            I have to admit that I don't have a test that replays the problem with a sufficiently high probability.
            

            mleich Matthias Leich added a comment - - edited origin/10.11-MDEV-34446 99e1836bba37027d3e717f91a0bf61f5cdf48d40 2024-06-24T17:33:37+03:00 behaved well in RQG testing. I have to admit that I don't have a test that replays the problem with a sufficiently high probability.

            Looks like the issue was introduced when innodb_log_file_size was made dynamic: 10.9.0

                MDEV-27812 Allow SET GLOBAL innodb_log_file_size
             
                We support online log resizing by replicating the current ib_logfile0
                to a new file ib_logfile101, which will eventually replace the
                ib_logfile0 on the first applicable log checkpoint.
            

            marko Can you please update the crash stack trace in description ? I think it should help us mapping any future bug to it. Also, what is the impact of this issue ? Is it a recoverable crash or corruption. Since redo log is involved I am suspecting possible corruption but good to mention it.

            Thanks for the patch. I am done reviewing 3355

            debarun Debarun Banerjee added a comment - Looks like the issue was introduced when innodb_log_file_size was made dynamic: 10.9.0 MDEV-27812 Allow SET GLOBAL innodb_log_file_size   We support online log resizing by replicating the current ib_logfile0 to a new file ib_logfile101, which will eventually replace the ib_logfile0 on the first applicable log checkpoint. marko Can you please update the crash stack trace in description ? I think it should help us mapping any future bug to it. Also, what is the impact of this issue ? Is it a recoverable crash or corruption. Since redo log is involved I am suspecting possible corruption but good to mention it. Thanks for the patch. I am done reviewing 3355

            Thank you for the review. I think that I must revise this a bit more, because I have some doubt that there is a possible race condition around the log_sys.resize_lsn because multiple writes can occur concurrently.

            marko Marko Mäkelä added a comment - Thank you for the review. I think that I must revise this a bit more, because I have some doubt that there is a possible race condition around the log_sys.resize_lsn because multiple writes can occur concurrently.

            marko Thanks for the patch. I have reviewed the code and updated my comments in 3355.

            debarun Debarun Banerjee added a comment - marko Thanks for the patch. I have reviewed the code and updated my comments in 3355 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.