Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14795

InnoDB system tablespace cannot be shrunk

Details

    Description

      hello
      i used to have all tables on ibdata1 before, then i switched to one file per table, but the ibdata1 file keep its size.
      I recently switched to mariadb 10.1, and hoped that the defragmenting function would help to shrink the file, but it didn't worked.
      I optimised all tables, but the ibdata1 file did not changed at all, it is still 197gb

      MariaDB [(none)]> show status like '%defragment%';
      +----------------------------------------+--------+
      | Variable_name                          | Value  |
      +----------------------------------------+--------+
      | Innodb_defragment_compression_failures | 0      |
      | Innodb_defragment_failures             | 32872  |
      | Innodb_defragment_count                | 113744 |
      +----------------------------------------+--------+
      3 rows in set (0.01 sec)
      

      MariaDB [(none)]> show variables like '%defragment%';
      +--------------------------------------+----------+
      | Variable_name                        | Value    |
      +--------------------------------------+----------+
      | innodb_defragment                    | ON       |
      | innodb_defragment_fill_factor        | 0.900000 |
      | innodb_defragment_fill_factor_n_recs | 20       |
      | innodb_defragment_frequency          | 40       |
      | innodb_defragment_n_pages            | 7        |
      | innodb_defragment_stats_accuracy     | 0        |
      +--------------------------------------+----------+
      

      Attachments

        Issue Links

          Activity

            The code coverage looks acceptable to me now. I made some minor suggestions on improving the out-of-memory detection. This should be OK to push after addressing the last round of my comments.

            marko Marko Mäkelä added a comment - The code coverage looks acceptable to me now. I made some minor suggestions on improving the out-of-memory detection. This should be OK to push after addressing the last round of my comments.

            origin/bb-11.2-MDEV-14795 f9003c73a15d892423845a2df96c8bd73bbe96ca 2023-08-01T19:43:04+05:30
            performed well in RQG testing. The failures observed occur on other trees too, are in JIRA or are not caused by MDEV-14795.
            

            mleich Matthias Leich added a comment - origin/bb-11.2-MDEV-14795 f9003c73a15d892423845a2df96c8bd73bbe96ca 2023-08-01T19:43:04+05:30 performed well in RQG testing. The failures observed occur on other trees too, are in JIRA or are not caused by MDEV-14795.

            Together with MDEV-19229, this allows the storage space consumed by the InnoDB system tablespace to be reclaimed.

            The defragmentation that is mentioned in the Description never shrunk any files; it only tried to move some B-tree pages within a data file in an attempt to improve performance on HDDs. That code was removed in MDEV-30545.

            marko Marko Mäkelä added a comment - Together with MDEV-19229 , this allows the storage space consumed by the InnoDB system tablespace to be reclaimed. The defragmentation that is mentioned in the Description never shrunk any files; it only tried to move some B-tree pages within a data file in an attempt to improve performance on HDDs. That code was removed in MDEV-30545 .
            psumner Phil Sumner added a comment -

            Does this change mean that there will be a delay in startup for particularly large ibdata1 files?

            i.e. I've got a Galera cluster where ibdata1 has over time grown to > 300GB, and it definitely doesn't need to be this big with `file_per_table` having been set since database creation.

            psumner Phil Sumner added a comment - Does this change mean that there will be a delay in startup for particularly large ibdata1 files? i.e. I've got a Galera cluster where ibdata1 has over time grown to > 300GB, and it definitely doesn't need to be this big with `file_per_table` having been set since database creation.

            psumner, shrinking the system tablespace involves:

            1. traversing every allocation bitmap page (pages 0, innodb_page_size, 2*innodb_page_size…) within the old size,
            2. adjusting the bitmap pages within the shrunk size,
            3. durably writing log for the change, and
            4. shrinking the last system tablespace file by ftruncate()

            When using the default innodb_page_size=16k, there will be an allocation bitmap page every 16384*16384 bytes (256 megabytes, or quarter gigabyte). Shrinking a 300GiB file would involve traversing 1,200 bitmap pages. In an extreme case, the file could be shrunk to 256 megabytes or less, that is, we would write log covering just the first page of the file. In such an extreme case, the time should be dominated by the file system operation. Truncating or deleting a heavily fragmented file can take considerable time. How much, depends on the type of storage and the file system, and the operating system kernel version.

            By popular demand, I filed MDEV-32452 so that this cleanup can be executed on shutdown rather than startup.

            marko Marko Mäkelä added a comment - psumner , shrinking the system tablespace involves: traversing every allocation bitmap page (pages 0, innodb_page_size , 2* innodb_page_size …) within the old size, adjusting the bitmap pages within the shrunk size, durably writing log for the change, and shrinking the last system tablespace file by ftruncate() When using the default innodb_page_size=16k , there will be an allocation bitmap page every 16384*16384 bytes (256 megabytes, or quarter gigabyte). Shrinking a 300GiB file would involve traversing 1,200 bitmap pages. In an extreme case, the file could be shrunk to 256 megabytes or less, that is, we would write log covering just the first page of the file. In such an extreme case, the time should be dominated by the file system operation. Truncating or deleting a heavily fragmented file can take considerable time. How much, depends on the type of storage and the file system, and the operating system kernel version. By popular demand, I filed MDEV-32452 so that this cleanup can be executed on shutdown rather than startup.

            People

              thiru Thirunarayanan Balathandayuthapani
              martina342 martina342
              Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.