Details

    Description

      In a case of filling up a disk with space an assertion was generated.

      Like MDEV-25760 fatal error with code would be better.

      2022-01-24  8:21:51 0 [Note] Plugin 'FEEDBACK' is disabled.
      2022-01-24  8:21:51 0 [Note] InnoDB: Loading buffer pool(s) from /mnt/dd/ib_buffer_pool
      2022-01-24  8:21:51 0 [Note] InnoDB: Buffer pool(s) load completed at 220124  8:21:51
      2022-01-24  8:21:51 0 [Note] sql/mysqld: ready for connections.
      Version: '10.8.0-MariaDB'  socket: '/tmp/mysql.sock'  port: 0  Source distribution
      2022-01-24 12:04:01 4 [Warning] Aborted connection 4 to db: 'test' user: 'dan' host: 'localhost' (Got an error writing communication packets)
      2022-01-24 12:09:33 0x7f7a8dffb640  InnoDB: Assertion failure in file /home/dan/repos/mariadb-server-10.8/storage/innobase/os/os0file.cc line 3596
      InnoDB: Failing assertion: cb->m_err == DB_SUCCESS
      InnoDB: We intentionally generate a memory trap.
      

      backtrace

      $ coredumpctl debug
                 PID: 16931 (mysqld)
                 UID: 1000 (dan)
                 GID: 1000 (dan)
              Signal: 6 (ABRT)
           Timestamp: Mon 2022-01-24 12:09:33 AEDT (4min 3s ago)
        Command Line: sql/mysqld --no-defaults --skip-networking --datadir=/mnt/dd --verbose
          Executable: /home/dan/repos/build-mariadb-server-10.8/sql/mariadbd
       
      Core was generated by `sql/mysqld --no-defaults --skip-networking --datadir=/mnt/dd --verbose'.
      Program terminated with signal SIGABRT, Aborted.
       
      gdb) bt full
      #0  0x00007f7ad93df84c in __pthread_kill_implementation () from /lib64/libc.so.6
      No symbol table info available.
      #1  0x00007f7ad93926a6 in raise () from /lib64/libc.so.6
      No symbol table info available.
      #2  0x00007f7ad937c865 in abort () from /lib64/libc.so.6
      No symbol table info available.
      #3  0x0000000000c9ecfb in ut_dbg_assertion_failed (expr=0x1152a4f "cb->m_err == DB_SUCCESS", file=<optimized out>, line=<optimized out>, line@entry=3596) at /home/dan/repos/mariadb-server-10.8/storage/innobase/ut/ut0dbg.cc:60
      No locals.
      #4  0x0000000000c037d0 in io_callback (cb=<optimized out>) at /home/dan/repos/mariadb-server-10.8/storage/innobase/os/os0file.cc:3596
              request = <optimized out>
      #5  0x0000000000d6588d in tpool::task_group::execute (this=0x2348930, t=0x23613d0) at /home/dan/repos/mariadb-server-10.8/tpool/task_group.cc:55
              lk = {_M_device = 0x2348960, _M_owns = false}
      #6  0x0000000000d63976 in tpool::thread_pool_generic::worker_main (this=0x22dd700, thread_var=0x22ed2a0) at /home/dan/repos/mariadb-server-10.8/tpool/tpool_generic.cc:549
              task = 0x4223
      #7  0x00007f7ad964e5c4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
      No symbol table info available.
      #8  0x00007f7ad93dda87 in start_thread () from /lib64/libc.so.6
      No symbol table info available.
      #9  0x00007f7ad9462640 in clone3 () from /lib64/libc.so.6
      No symbol table info available.
      (gdb) up
      #1  0x00007f7ad93926a6 in raise () from /lib64/libc.so.6
      (gdb) up
      #2  0x00007f7ad937c865 in abort () from /lib64/libc.so.6
      (gdb) up
      #3  0x0000000000c9ecfb in ut_dbg_assertion_failed (expr=0x1152a4f "cb->m_err == DB_SUCCESS", file=<optimized out>, line=<optimized out>, line@entry=3596) at /home/dan/repos/mariadb-server-10.8/storage/innobase/ut/ut0dbg.cc:60
      60		abort();
      (gdb) up
      #4  0x0000000000c037d0 in io_callback (cb=<optimized out>) at /home/dan/repos/mariadb-server-10.8/storage/innobase/os/os0file.cc:3596
      3596	  ut_a(cb->m_err == DB_SUCCESS);
      (gdb) p cb
      $1 = <optimized out>
      (gdb) p cb->m_err
      value has been optimized out
      (gdb) list
      3591	
      3592	extern void fil_aio_callback(const IORequest &request);
      3593	
      3594	static void io_callback(tpool::aiocb *cb)
      3595	{
      3596	  ut_a(cb->m_err == DB_SUCCESS);
      3597	  const IORequest &request= *static_cast<const IORequest*>
      3598	    (static_cast<const void*>(cb->m_userdata));
      3599	
      3600	  /* Return cb back to cache*/
      (gdb) up
      #5  0x0000000000d6588d in tpool::task_group::execute (this=0x2348930, t=0x23613d0) at /home/dan/repos/mariadb-server-10.8/tpool/task_group.cc:55
      55	        t->m_func(t->m_arg);
      (gdb) p *t
      $2 = (tpool::task) {_vptr$task = 0x1174848 <vtable for tpool::task+16>, m_func = 0xc03740 <io_callback(tpool::aiocb*)>, m_arg = 0x2361378, m_group = 0x2348930}
      (gdb) p (tpool::aiocb *) t->m_arg
      $3 = (tpool::aiocb *) 0x2361378
      (gdb) p *((tpool::aiocb *) t->m_arg)
      $4 = {<iovec> = {iov_base = 0x7f7ab47f8000, iov_len = 2097152}, m_fh = 12, m_opcode = tpool::aio_opcode::AIO_PWRITE, m_offset = 1048576, m_buffer = 0x7f7ab47f8000, m_len = 2097152, m_callback = 0xc03740 <io_callback(tpool::aiocb*)>, m_group = 0x2348930, m_ret_len = 0, m_err = 28, m_internal = 0x0, m_internal_task = {_vptr$task = 0x1174848 <vtable for tpool::task+16>, m_func = 0xc03740 <io_callback(tpool::aiocb*)>, m_arg = 0x2361378, m_group = 0x2348930}, m_userdata = '\000' <repeats 16 times>, "\310\065V\002\000\000\000\000\031\000\000\000\000\000\000"}
      

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            PR 1995.

            The file descriptor to name is a bit of work wlad mentioned in a former PR. So I'm hoping a number is sufficient for now.

            danblack Daniel Black added a comment - PR 1995. The file descriptor to name is a bit of work wlad mentioned in a former PR. So I'm hoping a number is sufficient for now.

            I think that we should distinguish page read and write errors.

            On page write error, we do not have much other choice than to eventually terminate the process. I think that the minimum necessary handling is that we simply will leave the page dirty in the buffer pool, which will correctly prevent a log checkpoint from occurring.

            Page read errors should be handled in the same way as checksum mismatch: evict the page frame from the buffer pool, and possibly propagate the error some way to the caller (as noted in MDEV-13542).

            I think that at least the write side can be improved. There is no need to immediately crash the server on a page write failure.

            marko Marko Mäkelä added a comment - I think that we should distinguish page read and write errors. On page write error, we do not have much other choice than to eventually terminate the process. I think that the minimum necessary handling is that we simply will leave the page dirty in the buffer pool, which will correctly prevent a log checkpoint from occurring. Page read errors should be handled in the same way as checksum mismatch: evict the page frame from the buffer pool, and possibly propagate the error some way to the caller (as noted in MDEV-13542 ). I think that at least the write side can be improved. There is no need to immediately crash the server on a page write failure.
            danblack Daniel Black added a comment -

            We should retry a corrupted page read for 100 times and then give up. For the system tablespace we will intentionally crash the server. That is not optimal, and maybe also the 100 rereads are not optimal, now that HDDs are getting as rare as hen’s teeth.

            danblack Daniel Black added a comment - We should retry a corrupted page read for 100 times and then give up. For the system tablespace we will intentionally crash the server. That is not optimal, and maybe also the 100 rereads are not optimal, now that HDDs are getting as rare as hen’s teeth.

            I can repeat crashes with io_uring, but not with libaio, when running the following:

            ./mtr --parallel=auto --repeat=100 encryption.innodb-checksum-algorithm{,,,}{,,}
            

            On NVMe instead of /dev/shm, this took a bit longer time to crash. The cb->m_err was always 195 (not 28). I repeated it on 5.17-rc3 and Debian’s 5.16-0-1-amd64 (5.16.7).

            marko Marko Mäkelä added a comment - I can repeat crashes with io_uring, but not with libaio, when running the following: ./mtr --parallel=auto --repeat=100 encryption.innodb-checksum-algorithm{,,,}{,,} On NVMe instead of /dev/shm , this took a bit longer time to crash. The cb->m_err was always 195 (not 28). I repeated it on 5.17-rc3 and Debian’s 5.16-0-1-amd64 (5.16.7).
            danblack Daniel Black added a comment -

            NFS note from lwn article, EIO is accessing a file descriptor when its NFS lock has been lost.

            danblack Daniel Black added a comment - NFS note from lwn article , EIO is accessing a file descriptor when its NFS lock has been lost.

            One of our customers running on MariaDB 10.5.10, hit the same assertion reported here, disk usage wise, it was not full, but can see I/O errors in the dmesg

            error logs:
            ======
            2023-06-23 09:22:50 0x7fc2e96a4700 InnoDB: Assertion failure in file /home/jenkins/workspace/DEV-Branch/RPM-Builds/label/rhel-7/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/innobase/os/os0file.cc line 3890
            InnoDB: Failing assertion: cb->m_err == DB_SUCCESS
            InnoDB: We intentionally generate a memory trap.

            dmesg:
            =======
            [Fri Jun 23 08:55:20 2023] blk_update_request: I/O error, dev fd0, sector 0
            [Fri Jun 23 09:25:16 2023] blk_update_request: I/O error, dev fd0, sector 0
            [Fri Jun 23 09:25:19 2023] blk_update_request: I/O error, dev fd0, sector 0
            [Fri Jun 23 09:25:22 2023] blk_update_request: I/O error, dev fd0, sector 0
            [Fri Jun 23 09:55:14 2023] blk_update_request: I/O error, dev fd0, sector 0
            [Fri Jun 23 09:55:17 2023] blk_update_request: I/O error, dev fd0, sector 0

            It looks me I/O error is related to the floppy drive, which is not there and still trying to refer and getting the I/O error.
            However, from our side how best we can handle and avoid the assertion is something we should understand and fix it, is want I feel. Looking forward for your comments on this.

            suresh.ramagiri@mariadb.com suresh ramagiri added a comment - One of our customers running on MariaDB 10.5.10, hit the same assertion reported here, disk usage wise, it was not full, but can see I/O errors in the dmesg error logs: ====== 2023-06-23 09:22:50 0x7fc2e96a4700 InnoDB: Assertion failure in file /home/jenkins/workspace/DEV-Branch/RPM-Builds/label/rhel-7/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/innobase/os/os0file.cc line 3890 InnoDB: Failing assertion: cb->m_err == DB_SUCCESS InnoDB: We intentionally generate a memory trap. dmesg: ======= [Fri Jun 23 08:55:20 2023] blk_update_request: I/O error, dev fd0, sector 0 [Fri Jun 23 09:25:16 2023] blk_update_request: I/O error, dev fd0, sector 0 [Fri Jun 23 09:25:19 2023] blk_update_request: I/O error, dev fd0, sector 0 [Fri Jun 23 09:25:22 2023] blk_update_request: I/O error, dev fd0, sector 0 [Fri Jun 23 09:55:14 2023] blk_update_request: I/O error, dev fd0, sector 0 [Fri Jun 23 09:55:17 2023] blk_update_request: I/O error, dev fd0, sector 0 It looks me I/O error is related to the floppy drive, which is not there and still trying to refer and getting the I/O error. However, from our side how best we can handle and avoid the assertion is something we should understand and fix it, is want I feel. Looking forward for your comments on this.

            While 10.5 only uses libaio and not liburing, fixing MDEV-29610 could be helpful.

            marko Marko Mäkelä added a comment - While 10.5 only uses libaio and not liburing, fixing MDEV-29610 could be helpful.

            This is conceptually related to MDEV-13542.

            When a page write fails, we should do everything like we do for a successful page write, except mark the page as clean (or remove it from buf_pool.flush_list or buf_pool.LRU). A subsequent iteration of buf_flush_page_cleaner() should retry the write later. If all subsequent writes fail, then buf_flush_page_cleaner() will fail to make any progress, and all writer threads will eventually be blocked in log_free_check() because the log checkpoint cannot be advanced to make room in the circular write-ahead log ib_logfile0.

            When an asynchronous page read fails, we can simply discard the corrupted page from the buffer pool. If the page is requested via buf_page_get_low(), then a synchronous read will be invoked via buf_page_read(). In MDEV-13542, that code path was already fixed so that I/O errors and checksum mismatch are being treated in a similar way.

            marko Marko Mäkelä added a comment - This is conceptually related to MDEV-13542 . When a page write fails, we should do everything like we do for a successful page write, except mark the page as clean (or remove it from buf_pool.flush_list or buf_pool.LRU ). A subsequent iteration of buf_flush_page_cleaner() should retry the write later. If all subsequent writes fail, then buf_flush_page_cleaner() will fail to make any progress, and all writer threads will eventually be blocked in log_free_check() because the log checkpoint cannot be advanced to make room in the circular write-ahead log ib_logfile0 . When an asynchronous page read fails, we can simply discard the corrupted page from the buffer pool. If the page is requested via buf_page_get_low() , then a synchronous read will be invoked via buf_page_read() . In MDEV-13542 , that code path was already fixed so that I/O errors and checksum mismatch are being treated in a similar way.

            People

              marko Marko Mäkelä
              danblack Daniel Black
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.