Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31875

ROW_FORMAT=COMPRESSED table: InnoDB: 2048 bytes should have been read. Only 0 bytes read.

Details

    Description

      origin/bb-10.4-release 0ede90dd316ee7c3e22b6c8cad5e39b5913e9eed 2023-08-08T13:24:23+05:30
       
      Two sessions run concurrent some DDL/DML mix and the server crashes with
      ...
      # 2023-08-08T03:09:37 [1270466] | [rr 1280990 111944]2023-08-08  3:08:32 0 [Warning] InnoDB: 2048 bytes should have been read. Only 0 bytes read. Retrying for the remaining bytes.
      # 2023-08-08T03:09:37 [1270466] | [rr 1280990 111948]2023-08-08  3:08:32 0 [Warning] InnoDB: 2048 bytes should have been read. Only 0 bytes read. Retrying for the remaining bytes.
      # 2023-08-08T03:09:37 [1270466] | [rr 1280990 111952]2023-08-08  3:08:32 0 [Warning] InnoDB: 2048 bytes should have been read. Only 0 bytes read. Retrying for the remaining bytes.
      # 2023-08-08T03:09:37 [1270466] | [rr 1280990 111956]2023-08-08  3:08:32 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
      # 2023-08-08T03:09:37 [1270466] | [rr 1280990 111960]2023-08-08  3:08:32 0 [ERROR] InnoDB: Tried to read 2048 bytes at offset 40960, but was only able to read 0
      # 2023-08-08T03:09:37 [1270466] | [rr 1280990 111964]2023-08-08  3:08:32 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 0. Cannot continue operation
      # 2023-08-08T03:09:37 [1270466] | [rr 1280990 111978]230808  3:08:32 [rr 1280990 111981][ERROR] mysqld got signal 6 ;
       
      # 2023-08-08T03:11:54 [1270466] | Thread 4 (Thread 1280990.1282409 (mysqld)):
      # 2023-08-08T03:11:54 [1270466] | #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=125271409890880) at ./nptl/pthread_kill.c:44
      # 2023-08-08T03:11:54 [1270466] | #1  __pthread_kill_internal (signo=6, threadid=125271409890880) at ./nptl/pthread_kill.c:78
      # 2023-08-08T03:11:54 [1270466] | #2  __GI___pthread_kill (threadid=125271409890880, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
      # 2023-08-08T03:11:54 [1270466] | #3  0x00002ac10cfb3476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
      # 2023-08-08T03:11:54 [1270466] | #4  0x00002ac10cf997f3 in __GI_abort () at ./stdlib/abort.c:79
      # 2023-08-08T03:11:54 [1270466] | #5  0x000055e21084fefe in os_file_handle_error_cond_exit (name=<optimized out>, name@entry=0x0, operation=<optimized out>, operation@entry=0x55e210dd8674 "read", should_abort=should_abort@entry=true, on_error_silent=on_error_silent@entry=false) at /data/Server/bb-10.4-release/storage/innobase/os/os0file.cc:4777
      # 2023-08-08T03:11:54 [1270466] | #6  0x000055e210852088 in os_file_read_page (type=..., file=84, buf=0x4d3878c36000, offset=<optimized out>, n=<optimized out>, o=o@entry=0x0, exit_on_err=true) at /data/Server/bb-10.4-release/storage/innobase/os/os0file.cc:4681
      # 2023-08-08T03:11:54 [1270466] | #7  0x000055e210852142 in os_file_read_func (type=..., file=<optimized out>, buf=<optimized out>, offset=<optimized out>, n=<optimized out>) at /data/Server/bb-10.4-release/storage/innobase/os/os0file.cc:5074
      # 2023-08-08T03:11:54 [1270466] | #8  0x000055e2108599fe in SimulatedAIOHandler::read (slot=<optimized out>, this=0x71ef05e309d0) at /data/Server/bb-10.4-release/storage/innobase/os/os0file.cc:6592
      # 2023-08-08T03:11:54 [1270466] | #9  SimulatedAIOHandler::io (this=0x71ef05e309d0) at /data/Server/bb-10.4-release/storage/innobase/os/os0file.cc:6556
      # 2023-08-08T03:11:54 [1270466] | #10 os_aio_simulated_handler (global_segment=global_segment@entry=2, m1=m1@entry=0x71ef05e30b18, m2=m2@entry=0x71ef05e30b20, type=type@entry=0x71ef05e30b40) at /data/Server/bb-10.4-release/storage/innobase/os/os0file.cc:6901
      # 2023-08-08T03:11:54 [1270466] | #11 0x000055e210859b4c in os_aio_handler (segment=segment@entry=2, m1=m1@entry=0x71ef05e30b18, m2=m2@entry=0x71ef05e30b20, request=request@entry=0x71ef05e30b40) at /data/Server/bb-10.4-release/storage/innobase/os/os0file.cc:5308
      # 2023-08-08T03:11:54 [1270466] | #12 0x000055e210b254fb in fil_aio_wait (segment=segment@entry=2) at /data/Server/bb-10.4-release/storage/innobase/fil/fil0fil.cc:4330
      # 2023-08-08T03:11:54 [1270466] | #13 0x000055e21098609c in io_handler_thread (arg=<optimized out>) at /data/Server/bb-10.4-release/storage/innobase/srv/srv0start.cc:324
      # 2023-08-08T03:11:54 [1270466] | #14 0x00002ac10d005b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      # 2023-08-08T03:11:54 [1270466] | #15 0x00002ac10d096bb4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
       
      sdp:/data1/results/1691486849/TBR-1988$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio
       
      I hit that problem several times. The RQG runs were in all cases with --mysqld=--innodb-buffer-pool-size=5M.
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
      #
      # GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 499e4ae7d70fcc3937bbc302bd22e8e95826ab57 2023-08-08T10:55:43+02:00
      # rqg.pl  : Version 4.3.0 (2023-06)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
      # --gendata=conf/mariadb/table_stress.zz \
      # --gendata_sql=conf/mariadb/table_stress.sql \
      # --mysqld=--transaction-isolation=REPEATABLE-READ \
      # --validator=SelectStability \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --no_mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=None \
      # --reporters=ErrorLog \
      # --reporters=Deadlock \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma.so \
      # --mysqld=--plugin-load-add=provider_snappy.so \
      # --mysqld=--plugin-load-add=provider_lz4.so \
      # --mysqld=--loose-innodb_compression_level=1 \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --mysqld=--sql_mode=STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION \
      # --mysqld=--innodb_file_per_table=1 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=on \
      # --mysqld=--innodb_adaptive_hash_index=off \
      # --mysqld=--innodb_random_read_ahead=ON \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=2 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--innodb_undo_log_truncate=OFF \
      # --mysqld=--loose_innodb_change_buffering=all \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --mysqld=--innodb_page_size=4K \
      # --mysqld=--innodb-buffer-pool-size=5M \
      # <local settings>
      

      Attachments

        1. TBR-1988.cfg
          45 kB
          Matthias Leich
        2. TBR-1988.yy
          0.3 kB
          Matthias Leich

        Issue Links

          Activity

            I conducted some initial analysis. This is a ROW_FORMAT=COMPRESSED table using 2048-byte page size. The file was recently extended to 21 pages. Due to innodb_random_read_ahead=ON we are initiating a read-ahead for page 20 (2048 bytes at offset 40960).

            It is yet unclear to me why the read would fail. Maybe the file was not extended properly?

            marko Marko Mäkelä added a comment - I conducted some initial analysis. This is a ROW_FORMAT=COMPRESSED table using 2048-byte page size. The file was recently extended to 21 pages. Due to innodb_random_read_ahead=ON we are initiating a read-ahead for page 20 (2048 bytes at offset 40960). It is yet unclear to me why the read would fail. Maybe the file was not extended properly?

            This seems to be a conflict between read-ahead and a design decision that was made in fil_space_extend_must_retry() when fixing MDEV-26537:

            	/* Datafile::read_first_page() expects innodb_page_size bytes.
            	fil_node_t::read_page0() expects at least 4 * innodb_page_size bytes.
            	os_file_set_size() expects multiples of 4096 bytes.
            	For ROW_FORMAT=COMPRESSED tables using 1024-byte or 2048-byte
            	pages, we will preallocate up to an integer multiple of 4096 bytes,
            	and let normal writes append 1024, 2048, or 3072 bytes to the file. */
            	os_offset_t new_size = std::max(
            		(os_offset_t(size - file_start_page_no) * page_size)
            		& ~os_offset_t(4095),
            		os_offset_t(FIL_IBD_FILE_INITIAL_SIZE << srv_page_size_shift));
             
            	*success = os_file_set_size(node->name, node->handle, new_size,
            				    space->is_compressed());
            

            When the file is being extended from 20 to 21 pages, nothing will actually happen, because os_file_set_size() will be invoked with new_size=40960.

            We must make the linear or random read-ahead code aware of this, and avoid issuing any warning or error if the file is a little shorter than expected.

            This bug should only affect ROW_FORMAT=COMPRESSED tables that use the compressed page size of 1024 or 2048 bytes (KEY_BLOCK_SIZE=1 or KEY_BLOCK_SIZE=2; the latter is the default ROW_FORMAT=COMPRESSED page size when using innodb_page_size=4k).

            I did not check yet if this bug is reproducible with 10.5 or later major versions.

            marko Marko Mäkelä added a comment - This seems to be a conflict between read-ahead and a design decision that was made in fil_space_extend_must_retry() when fixing MDEV-26537 : /* Datafile::read_first_page() expects innodb_page_size bytes. fil_node_t::read_page0() expects at least 4 * innodb_page_size bytes. os_file_set_size() expects multiples of 4096 bytes. For ROW_FORMAT=COMPRESSED tables using 1024-byte or 2048-byte pages, we will preallocate up to an integer multiple of 4096 bytes, and let normal writes append 1024, 2048, or 3072 bytes to the file. */ os_offset_t new_size = std::max( (os_offset_t(size - file_start_page_no) * page_size) & ~os_offset_t(4095), os_offset_t(FIL_IBD_FILE_INITIAL_SIZE << srv_page_size_shift));   *success = os_file_set_size(node->name, node->handle, new_size, space->is_compressed()); When the file is being extended from 20 to 21 pages, nothing will actually happen, because os_file_set_size() will be invoked with new_size=40960 . We must make the linear or random read-ahead code aware of this, and avoid issuing any warning or error if the file is a little shorter than expected. This bug should only affect ROW_FORMAT=COMPRESSED tables that use the compressed page size of 1024 or 2048 bytes ( KEY_BLOCK_SIZE=1 or KEY_BLOCK_SIZE=2 ; the latter is the default ROW_FORMAT=COMPRESSED page size when using innodb_page_size=4k ). I did not check yet if this bug is reproducible with 10.5 or later major versions.

            Results of test simplification
            =======================
             
            Simplified RQG grammar
            create_table:
                CREATE TABLE t2 ( col1 INT PRIMARY KEY ) ENGINE = InnoDB ROW_FORMAT = Compressed ;
            thread1:
                INSERT INTO t2 (col1) VALUES ( { $my_int= $prng->int(513, 4096) } ) ;
            thread1_connect:
                ;
            thread1_init:
                create_table ;
            query:
                { exit 0 } ;
             
            pluto:/data/results/1691583293/TBR-1988-MDEV-31875$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
             
            The attempt to transform the SQL trace of such a RQG run into some MTR based test failed.
            It looks as if a high speed of SQL stream processing is required. RQG is faster than MTR.
             
            No replay (> 600 attempts) of the problem + no other problem instead on
            - origin/bb-10.5-MDEV-31254 c9eff1a144ba44846373660a30d342d3f0dc91a5 2023-05-12T15:04:50+03:00 10.5.21 build with debug+asan
            - origin/bb-10.6-MDEV-30100 1a1daf30f6366090ab339c6acc3fff57abc99c99 2023-08-04T16:55:26+03:00   10.6.15 build with debug+asan
            - origin/10.9 34a8e78581c1e34f27b4bc414478b30e05abd43c 2023-08-04T08:01:06+02:00 10.9.8  build with RelWithDebInfo
            
            

            mleich Matthias Leich added a comment - Results of test simplification =======================   Simplified RQG grammar create_table: CREATE TABLE t2 ( col1 INT PRIMARY KEY ) ENGINE = InnoDB ROW_FORMAT = Compressed ; thread1: INSERT INTO t2 (col1) VALUES ( { $my_int= $prng->int(513, 4096) } ) ; thread1_connect: ; thread1_init: create_table ; query: { exit 0 } ;   pluto:/data/results/1691583293/TBR-1988-MDEV-31875$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio   The attempt to transform the SQL trace of such a RQG run into some MTR based test failed. It looks as if a high speed of SQL stream processing is required. RQG is faster than MTR.   No replay (> 600 attempts) of the problem + no other problem instead on - origin/bb-10.5-MDEV-31254 c9eff1a144ba44846373660a30d342d3f0dc91a5 2023-05-12T15:04:50+03:00 10.5.21 build with debug+asan - origin/bb-10.6-MDEV-30100 1a1daf30f6366090ab339c6acc3fff57abc99c99 2023-08-04T16:55:26+03:00 10.6.15 build with debug+asan - origin/10.9 34a8e78581c1e34f27b4bc414478b30e05abd43c 2023-08-04T08:01:06+02:00 10.9.8 build with RelWithDebInfo

            It seems to me that the following code in 10.5 buf_page_read_low() could prevent a crash in this case:

            	auto fio = space->io(IORequest(sync
            				       ? IORequest::READ_SYNC
            				       : IORequest::READ_ASYNC),
            			     page_id.page_no() * len, len, dst, bpage);
            	*err= fio.err;
             
            	if (UNIV_UNLIKELY(fio.err != DB_SUCCESS)) {
            		if (!sync || fio.err == DB_TABLESPACE_DELETED
            		    || fio.err == DB_IO_ERROR) {
            			buf_pool.corrupted_evict(bpage);
            			return false;
            		}
             
            		ut_error;
            	}
            

            This will invoke fil_space_t::io(), which in turn will invoke os_aio() and os_file_read_func(), which is where 10.4 would intentionally crash.

            I will try to generate a slightly corrupted test file for triggering this.

            marko Marko Mäkelä added a comment - It seems to me that the following code in 10.5 buf_page_read_low() could prevent a crash in this case: auto fio = space->io(IORequest(sync ? IORequest::READ_SYNC : IORequest::READ_ASYNC), page_id.page_no() * len, len, dst, bpage); *err= fio.err;   if (UNIV_UNLIKELY(fio.err != DB_SUCCESS)) { if (!sync || fio.err == DB_TABLESPACE_DELETED || fio.err == DB_IO_ERROR) { buf_pool.corrupted_evict(bpage); return false ; }   ut_error; } This will invoke fil_space_t::io() , which in turn will invoke os_aio() and os_file_read_func() , which is where 10.4 would intentionally crash. I will try to generate a slightly corrupted test file for triggering this.

            Please test my patch. In addition to adjusting buf_read_ahead_random() where the bug was caught, I adjusted also buf_read_ahead_linear(). That is, in MariaDB Server 10.4 we would never attempt a read-ahead of the last page(s) of ROW_FORMAT=COMPRESSED tablespaces if the file size is not an integer multiple of 4096 bytes.

            In MariaDB Server 10.5 and later, we keep attempting read-ahead up to the last page. An error on asynchronous read completion would be gracefully ignored.

            marko Marko Mäkelä added a comment - Please test my patch. In addition to adjusting buf_read_ahead_random() where the bug was caught, I adjusted also buf_read_ahead_linear() . That is, in MariaDB Server 10.4 we would never attempt a read-ahead of the last page(s) of ROW_FORMAT=COMPRESSED tablespaces if the file size is not an integer multiple of 4096 bytes. In MariaDB Server 10.5 and later, we keep attempting read-ahead up to the last page. An error on asynchronous read completion would be gracefully ignored.

            origin/bb-10.4-MDEV-31875 b4616412e0bf717c6627af70d274db021c0d57d1 2023-08-16T10:08:32+03:00
            performed well in RQG testing
            - no unknown bad effects
            - no replay of MDEV-31875 with the simplified test and ~ 1600 attempts
            

            mleich Matthias Leich added a comment - origin/bb-10.4-MDEV-31875 b4616412e0bf717c6627af70d274db021c0d57d1 2023-08-16T10:08:32+03:00 performed well in RQG testing - no unknown bad effects - no replay of MDEV-31875 with the simplified test and ~ 1600 attempts

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.