Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32027

Opening all .ibd files on InnoDB startup can be slow

Details

    Description

      2023-08-14 9:13:29 0 [Note] InnoDB: Uses event mutexes
      2023-08-14 9:13:29 0 [Note] InnoDB: Compressed tables use zlib 1.2.12
      2023-08-14 9:13:29 0 [Note] InnoDB: Number of pools: 1
      2023-08-14 9:13:29 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2023-08-14 9:13:29 0 [Note] InnoDB: Initializing buffer pool, total size = 5368709120, chunk size = 134217728
      2023-08-14 9:13:29 0 [Note] InnoDB: Completed initialization of buffer pool
      2023-08-14 9:13:31 0 [Note] InnoDB: 128 rollback segments are active.
      2023-08-14 9:17:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2023-08-14 9:17:50 0 [Note] InnoDB: Setting file '.\ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2023-08-14 9:17:50 0 [Note] InnoDB: File '.\ibtmp1' size is now 12 MB.
      2023-08-14 9:17:50 0 [Note] InnoDB: 10.5.22 started; log sequence number 197013855628; transaction id 455278971

      2023-08-14 9:13:31 0 [Note] InnoDB: 128 rollback segments are active.
      2023-08-14 9:17:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables.

      MySQL starts slowly, rollback segments and Creating shared tablespace with an interval of over 4 minutes.

      Attachments

        1. 1.jpg
          1.jpg
          414 kB
        2. 2.jpg
          2.jpg
          489 kB
        3. 3.jpg
          3.jpg
          156 kB
        4. Capture.PNG
          Capture.PNG
          317 kB
        5. dump50.sql.gz
          1.13 MB
        6. my.ini
          2 kB

        Issue Links

          Activity

            kennyliao kennyliao created issue -

            Can you please run a profiler during the startup, to find out where the time is being spent?

            sudo perf record -p $(pgrep mariadbd)
            

            Would the server start up faster if you make innodb_buffer_pool_chunk_size equal to innodb_buffer_pool_size?

            In MDEV-25340, I recently tested the initialization of a 96GiB buffer pool. It was significantly slower on 10.5 than on 10.6. Would the start-up time with 10.6.15 be faster?

            marko Marko Mäkelä added a comment - Can you please run a profiler during the startup, to find out where the time is being spent? sudo perf record -p $(pgrep mariadbd) Would the server start up faster if you make innodb_buffer_pool_chunk_size equal to innodb_buffer_pool_size ? In MDEV-25340 , I recently tested the initialization of a 96GiB buffer pool. It was significantly slower on 10.5 than on 10.6. Would the start-up time with 10.6.15 be faster?
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            Assignee Marko Mäkelä [ marko ]
            Status Open [ 1 ] Needs Feedback [ 10501 ]

            my.ini includes innodb_buffer_pool_size=600M, while the quoted output corresponds to innodb_buffer_pool_size=5G.

            It is not yet clear if the slowness is related to the buffer pool initialization. Some profiling data would be needed to understand what is causing the slowdown.

            marko Marko Mäkelä added a comment - my.ini includes innodb_buffer_pool_size=600M , while the quoted output corresponds to innodb_buffer_pool_size=5G . It is not yet clear if the slowness is related to the buffer pool initialization. Some profiling data would be needed to understand what is causing the slowdown.
            kennyliao kennyliao made changes -
            Comment [ i set the initialization of a 3GiB buffer pool ,innodb_buffer_pool_size = 3072MiB ]
            kennyliao kennyliao added a comment -

            2023-08-14 9:13:29 0 [Note] InnoDB: Initializing buffer pool, total size = 5368709120, chunk size = 134217728
            2023-08-14 9:13:29 0 [Note] InnoDB: Completed initialization of buffer pool

            Aren't these two prompts indicating that the initialization of the buffer pool was completed within 1 second?
            Have I misunderstood?

            kennyliao kennyliao added a comment - 2023-08-14 9:13:29 0 [Note] InnoDB: Initializing buffer pool, total size = 5368709120, chunk size = 134217728 2023-08-14 9:13:29 0 [Note] InnoDB: Completed initialization of buffer pool Aren't these two prompts indicating that the initialization of the buffer pool was completed within 1 second? Have I misunderstood?

            Yes, the buffer pool initialization does seem to finish that quickly. I was thinking that there could be some longer-term impact on L2 or L3 cache.

            Anyway, without having any sort of profiling data, it is very hard to say what could cause the slow startup.

            marko Marko Mäkelä added a comment - Yes, the buffer pool initialization does seem to finish that quickly. I was thinking that there could be some longer-term impact on L2 or L3 cache. Anyway, without having any sort of profiling data, it is very hard to say what could cause the slow startup.
            wlad Vladislav Vaintroub added a comment - - edited

            I tried a quick test with procmon (the system in question is Windows).

            What happens between " rollback segments are active." and "Creating shared tablespace for temporary table", is that all tablespaces are opened and closed, with this stack backtrace

              fil_ibd_open
              dict_check_sys_tables
              dict_check_tablespaces_and_store_max_id
              srv_start
            

            On Windows, this results in 5 system calls per table.
            1. failed CreateFile() for the non-existing .isl file
            2. successful CreateFile() for the existing .ibd file
            3. QueryVolumeInformationFile(), from GetFileInformationByHandle() called by Datafile::init_file_info()
            4. QueryAllInformationFile(), from the same GetFileInformationByHandle()
            5. CloseFile()

            Speculating a bit - depending on the number of tables, this might be expensive. I remember seeing some systems, where QueryVolumeInformationFile() was not too fast. Anyway, as the information returned by GetFileInformationByHandle() is not going to be used, perhaps init_file_info can also be skipped.
            Actually, the only thing that fil_ibd_open() is used for, is to produce a warning "Ignoring tablespace for <tablename> because it could not be opened".
            Dunno, maybe one can live without that warning, rather than opening/closing potentially a huge number of files.

            The above is about 10.5, I did not look into higher versions.

            wlad Vladislav Vaintroub added a comment - - edited I tried a quick test with procmon (the system in question is Windows). What happens between " rollback segments are active." and "Creating shared tablespace for temporary table", is that all tablespaces are opened and closed, with this stack backtrace fil_ibd_open dict_check_sys_tables dict_check_tablespaces_and_store_max_id srv_start On Windows, this results in 5 system calls per table. 1. failed CreateFile() for the non-existing .isl file 2. successful CreateFile() for the existing .ibd file 3. QueryVolumeInformationFile(), from GetFileInformationByHandle() called by Datafile::init_file_info() 4. QueryAllInformationFile(), from the same GetFileInformationByHandle() 5. CloseFile() Speculating a bit - depending on the number of tables, this might be expensive. I remember seeing some systems, where QueryVolumeInformationFile() was not too fast. Anyway, as the information returned by GetFileInformationByHandle() is not going to be used, perhaps init_file_info can also be skipped. Actually, the only thing that fil_ibd_open() is used for, is to produce a warning "Ignoring tablespace for <tablename> because it could not be opened". Dunno, maybe one can live without that warning, rather than opening/closing potentially a huge number of files. The above is about 10.5, I did not look into higher versions.

            kennyliao, I think it makes sense to record mysqld.exe's (mariadbd.exe's) filesystem activity with procmon during server's startup in your environment, too, to see if the idea above proves correct.
            there is some information about it https://mariadb.com/kb/en/how-to-use-procmon-to-trace-mysqldexe-filesystem-access/

            wlad Vladislav Vaintroub added a comment - kennyliao, I think it makes sense to record mysqld.exe's (mariadbd.exe's) filesystem activity with procmon during server's startup in your environment, too, to see if the idea above proves correct. there is some information about it https://mariadb.com/kb/en/how-to-use-procmon-to-trace-mysqldexe-filesystem-access/
            kennyliao kennyliao added a comment - - edited

            @Vladislav Vaintroub,The operating system environment is Windows,not linux. Is there any way to accelerate the start of msyqld?

            kennyliao kennyliao added a comment - - edited @Vladislav Vaintroub,The operating system environment is Windows,not linux. Is there any way to accelerate the start of msyqld?

            wlad, thank you for chiming in. I have the impression that opening files is relatively fast in Linux. What is slow is reading from the files (fil_node_t::read_page0()). While I may have noticed that we open and close lots of files during startup, that is why I did not spend much effort in removing those calls.

            I believe that the logic could be similar in later versions of MariaDB Server. In MDEV-13542, the function dict_check_tablespaces_and_store_max_id() was refactored somewhat, and in MDEV-22343 the system tables SYS_TABLESPACES and SYS_DATAFILES were removed. Still, for each persistent InnoDB table that we find in SYS_TABLES, we will try to open the data file if it was not opened yet. The design is such that the tablespace metadata must remain in memory for all InnoDB tablespaces. File handles may be closed and table metadata may be evicted and reloaded, but the tablespace metadata will remain.

            It should be possible to revise this logic in MariaDB Server 10.6 or later. Thanks to MDEV-13542, we should have the infrastructure in place to handle corrupted or missing data files. Also, we should find the maximum used tablespace ID at DICT_HDR + DICT_HDR_MAX_SPACE_ID in the dictionary header page, ever since I added the field in 2010 (MySQL 5.1.48). We might skip the scan of SYS_TABLES if the header field is nonzero. That should save some additional time.

            kennyliao, how many .ibd files are there in your data directory?

            marko Marko Mäkelä added a comment - wlad , thank you for chiming in. I have the impression that opening files is relatively fast in Linux. What is slow is reading from the files ( fil_node_t::read_page0() ). While I may have noticed that we open and close lots of files during startup, that is why I did not spend much effort in removing those calls. I believe that the logic could be similar in later versions of MariaDB Server. In MDEV-13542 , the function dict_check_tablespaces_and_store_max_id() was refactored somewhat, and in MDEV-22343 the system tables SYS_TABLESPACES and SYS_DATAFILES were removed. Still, for each persistent InnoDB table that we find in SYS_TABLES , we will try to open the data file if it was not opened yet. The design is such that the tablespace metadata must remain in memory for all InnoDB tablespaces. File handles may be closed and table metadata may be evicted and reloaded, but the tablespace metadata will remain. It should be possible to revise this logic in MariaDB Server 10.6 or later. Thanks to MDEV-13542 , we should have the infrastructure in place to handle corrupted or missing data files. Also, we should find the maximum used tablespace ID at DICT_HDR + DICT_HDR_MAX_SPACE_ID in the dictionary header page, ever since I added the field in 2010 (MySQL 5.1.48). We might skip the scan of SYS_TABLES if the header field is nonzero. That should save some additional time. kennyliao , how many .ibd files are there in your data directory?
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Environment Microsoft Windows
            Labels performance
            Priority Blocker [ 1 ] Critical [ 2 ]
            kennyliao kennyliao added a comment - - edited

            Marko Mäkelä,there are 9511 .ibd files in the data directory, and the size of the database is 27.2GiB. Can upgrading to mariadb10.6 or later solve this performance issue?

            kennyliao kennyliao added a comment - - edited Marko Mäkelä,there are 9511 .ibd files in the data directory, and the size of the database is 27.2GiB. Can upgrading to mariadb10.6 or later solve this performance issue?

            kennyliao, I think that you should encounter the same problem in MariaDB Server 10.6 or later. I would only fix this bug starting with 10.6.

            marko Marko Mäkelä added a comment - kennyliao , I think that you should encounter the same problem in MariaDB Server 10.6 or later. I would only fix this bug starting with 10.6.
            marko Marko Mäkelä made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            wlad Vladislav Vaintroub added a comment - - edited

            kennyliao, would it be possible for you to use procmon.exe, to record file system activity of mysqld.exe during database startup, and share the procmon output here? https://download.sysinternals.com/files/ProcessMonitor.zip is the download. it is a legit tool from Microsoft, it works on Windows

            The cause of slow startup on your system is not yet properly analyzed, and this would help a lot, because slow startup happens for you, but it does not for others. Another helpful information would be when you noticed the slowdown, e.g after upgrading from 5.5.

            I did a little experiment, with 85 000 empty innodb tables. This added 6-15 seconds startup time for me, on Windows 11, on local SSD, on NTFS. It is not as drastic as your 4 minutes, but I can imagine that there are other factors that add up time, type of storage, file fragmentation etc.

            wlad Vladislav Vaintroub added a comment - - edited kennyliao, would it be possible for you to use procmon.exe, to record file system activity of mysqld.exe during database startup, and share the procmon output here? https://download.sysinternals.com/files/ProcessMonitor.zip is the download. it is a legit tool from Microsoft, it works on Windows The cause of slow startup on your system is not yet properly analyzed, and this would help a lot, because slow startup happens for you, but it does not for others. Another helpful information would be when you noticed the slowdown, e.g after upgrading from 5.5. I did a little experiment, with 85 000 empty innodb tables. This added 6-15 seconds startup time for me, on Windows 11, on local SSD, on NTFS. It is not as drastic as your 4 minutes, but I can imagine that there are other factors that add up time, type of storage, file fragmentation etc.
            kennyliao kennyliao made changes -
            Comment [  !1.png|thumbnail! ]
            kennyliao kennyliao added a comment -

            I tested using procmon.exe on mariadb 10.6, but there were still cases of file opening and closing.
            [^1.docx]

            kennyliao kennyliao added a comment - I tested using procmon.exe on mariadb 10.6, but there were still cases of file opening and closing. [^1.docx]

            procmon allows filtering by executable name (among other things). there is some information about it https://mariadb.com/kb/en/how-to-use-procmon-to-trace-mysqldexe-filesystem-access/ .

            wlad Vladislav Vaintroub added a comment - procmon allows filtering by executable name (among other things). there is some information about it https://mariadb.com/kb/en/how-to-use-procmon-to-trace-mysqldexe-filesystem-access/ .
            kennyliao kennyliao made changes -
            Attachment 微信图片_20230829140120.jpg [ 71732 ]
            kennyliao kennyliao made changes -
            Attachment 1.jpg [ 71733 ]
            kennyliao kennyliao made changes -
            Attachment 2.jpg [ 71734 ]
            kennyliao kennyliao made changes -
            Attachment 3.jpg [ 71735 ]
            kennyliao kennyliao added a comment - - edited




            I have verified on multiple Windows servers and generally encountered slow startup issues.

            kennyliao kennyliao added a comment - - edited I have verified on multiple Windows servers and generally encountered slow startup issues.
            kennyliao kennyliao made changes -
            Attachment 微信图片_20230829140120.jpg [ 71732 ]
            wlad Vladislav Vaintroub added a comment - - edited

            maybe you can just save procmon output (File/Save), and attach the PML file?

            wlad Vladislav Vaintroub added a comment - - edited maybe you can just save procmon output (File/Save), and attach the PML file?
            kennyliao kennyliao made changes -
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.5.22 [ 29011 ]

            I can verify same kind of slowness with many table files, in my case 50 databases with the same two tables having ~2000 partitions each, summing up to a total of 200K .ibd partition tablespace files, on current 10.6

            procmon seems to indeed identify QueryVolumeInformationFile() as the worst offender indeed.

            Attaching an anonymized mysqldump, raw procmon output, and a representative screenshot of a small portion of same

            hholzgra Hartmut Holzgraefe added a comment - I can verify same kind of slowness with many table files, in my case 50 databases with the same two tables having ~2000 partitions each, summing up to a total of 200K .ibd partition tablespace files, on current 10.6 procmon seems to indeed identify QueryVolumeInformationFile() as the worst offender indeed. Attaching an anonymized mysqldump, raw procmon output, and a representative screenshot of a small portion of same
            hholzgra Hartmut Holzgraefe made changes -
            Attachment Capture.PNG [ 71949 ]
            hholzgra Hartmut Holzgraefe made changes -
            Attachment dump50.sql.gz [ 71950 ]
            wlad Vladislav Vaintroub added a comment - - edited

            Hartmut, you can just attach the PML (this is File/Save menu).There is no need to take screenshots of the procmon

            wlad Vladislav Vaintroub added a comment - - edited Hartmut, you can just attach the PML (this is File/Save menu).There is no need to take screenshots of the procmon

            Now made the PML url public

            hholzgra Hartmut Holzgraefe added a comment - Now made the PML url public
            wlad Vladislav Vaintroub added a comment - - edited

            Ok , full PML file is actually helpful. I was wondering about this stacktrace

              dict_check_tablespaces_and_store_max_id
              srv_start
              innodb_init 
              ha_initialize_handlerton
             

            The first occurrence in this PML file is at 9/12/2023 12:21:50.9988071 PM, the last occurrence at 9/12/2023 12:35:42.7459648 PM.
            Everything between first and last occurrence, is just dict_check_tablespaces_and_store_max_id(), that's about 14 minutes.
            And it has, per file

            • CreateFile(.isl)
            • CreateFile(.ibd)
            • QueryEAFile(.ibd), which I did not notice before
            • QueryInformationVolume(.ibd)
            • QueryAllInformationFile(.ibd)
            • CloseFile(.ibd)

            QueryInformationVolume(.ibd) seems to be access to open file metadata, QueryAllInformationFile too. They do not seem to take longer than others.

            The startup message is written at 9/12/2023 12:35:43.3938245 PM

            Afterwards, There is some GetVolumePathName() that's Innodb is loading buffer pool dump ib_buffer_pool and opening corresponding tablespaces.
            This results in a series of DeviceIoControl and Create/CloseFile for all directories in hierarchy.
            This comes from is_file_on_ssd(), once per opened file, and this certainly can also be optimized further (the function apparently does a much more than once syscall). But that is not the culprit.

            The culprit is opening and closing ibd and isl files, overall. dict_check_tablespaces_and_store_max_id(), as I initially suspected. It takes ~14 minutes.

            wlad Vladislav Vaintroub added a comment - - edited Ok , full PML file is actually helpful. I was wondering about this stacktrace dict_check_tablespaces_and_store_max_id srv_start innodb_init ha_initialize_handlerton The first occurrence in this PML file is at 9/12/2023 12:21:50.9988071 PM, the last occurrence at 9/12/2023 12:35:42.7459648 PM. Everything between first and last occurrence, is just dict_check_tablespaces_and_store_max_id(), that's about 14 minutes. And it has, per file CreateFile(.isl) CreateFile(.ibd) QueryEAFile(.ibd), which I did not notice before QueryInformationVolume(.ibd) QueryAllInformationFile(.ibd) CloseFile(.ibd) QueryInformationVolume(.ibd) seems to be access to open file metadata, QueryAllInformationFile too. They do not seem to take longer than others. The startup message is written at 9/12/2023 12:35:43.3938245 PM Afterwards, There is some GetVolumePathName() that's Innodb is loading buffer pool dump ib_buffer_pool and opening corresponding tablespaces. This results in a series of DeviceIoControl and Create/CloseFile for all directories in hierarchy. This comes from is_file_on_ssd(), once per opened file, and this certainly can also be optimized further (the function apparently does a much more than once syscall). But that is not the culprit. The culprit is opening and closing ibd and isl files, overall. dict_check_tablespaces_and_store_max_id(), as I initially suspected. It takes ~14 minutes.
            danblack Daniel Black made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]

            hholzgra and wlad, thank you a lot. It looks like opening a file (without reading anything from it) is a much more expensive operation on Microsoft Windows than it is on Linux. I will see if we can remove that code altogether, and just rely on fil_node_t::read_page0() to do the right thing when the files are being opened for the first actual access.

            marko Marko Mäkelä added a comment - hholzgra and wlad , thank you a lot. It looks like opening a file (without reading anything from it) is a much more expensive operation on Microsoft Windows than it is on Linux. I will see if we can remove that code altogether, and just rely on fil_node_t::read_page0() to do the right thing when the files are being opened for the first actual access.
            wlad Vladislav Vaintroub made changes -
            marko Marko Mäkelä made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            I started fixing this by trimming down dict_check_tablespaces_and_store_max_id() so that it would basically only do SELECT MAX(space) FROM information_schema.innodb_sys_tables at READ UNCOMMITTED isolation level instead of the previous READ COMMITTED, and not bother with any tablespace lookups or opening any files. (Instead of performing even that table scan, we might just rely on the DICT_HDR_MAX_SPACE_ID, but I wanted to retain an option to remove that sequence later.)

            Some further adjustment is needed to ensure that the data files that were not opened during redo log processing will be opened when the table is first accessed:

            10.6 04477bd9364e7a753918c730b89e7012b1926e6a

            CURRENT_TEST: innodb.table_flags
            mysqltest: At line 160: query 'SHOW CREATE TABLE td' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB
            

            marko Marko Mäkelä added a comment - I started fixing this by trimming down dict_check_tablespaces_and_store_max_id() so that it would basically only do SELECT MAX(space) FROM information_schema.innodb_sys_tables at READ UNCOMMITTED isolation level instead of the previous READ COMMITTED , and not bother with any tablespace lookups or opening any files. (Instead of performing even that table scan, we might just rely on the DICT_HDR_MAX_SPACE_ID , but I wanted to retain an option to remove that sequence later.) Some further adjustment is needed to ensure that the data files that were not opened during redo log processing will be opened when the table is first accessed: 10.6 04477bd9364e7a753918c730b89e7012b1926e6a CURRENT_TEST: innodb.table_flags mysqltest: At line 160: query 'SHOW CREATE TABLE td' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB

            It turns out that the file test/td.ibd actually contains FSP_SPACE_FLAGS that disagree with the intentionally corrupted table flags, but the fil_ibd_open() call in dict_check_tablespaces_and_store_max_id() never enabled the strict validation, which would be part of dict_load_tablespace().

            Furthermore, the following call could confuse users:

            diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
            index 02203ba3d7b..1e0b7a9c8c5 100644
            --- a/storage/innobase/fil/fil0fil.cc
            +++ b/storage/innobase/fil/fil0fil.cc
            @@ -2221,8 +2221,6 @@ fil_ibd_open(
             			goto corrupted;
             		}
             
            -		os_file_get_last_error(operation_not_for_export,
            -				       !operation_not_for_export);
             		if (!operation_not_for_export) {
             			goto corrupted;
             		}
            

            On my system, it would report an EINVAL that is not actually directly related to opening the file, but to an expected error during the execution of my_realpath():

            10.6 04477bd9364e7a753918c730b89e7012b1926e6a

            #0  0x00007f842cb17279 in __readlink (path=path@entry=0x7f8420186b10 "/dev/shm/10.6/mysql-test/var/mysqld.1/data/test", buf=buf@entry=0x7f8420187330 "", len=len@entry=1023)
                at ../sysdeps/unix/sysv/linux/readlink.c:30
            #1  0x00007f842ca5bc42 in realpath_stk (bufs=0x7f8420186b00, resolved=0x7f8420187780 "\020\212\030 \204\177", name=<optimized out>) at ./stdlib/canonicalize.c:306
            #2  __GI___realpath (name=name@entry=0x7f8420188be0 "./test/", resolved=resolved@entry=0x7f8420187780 "\020\212\030 \204\177") at ./stdlib/canonicalize.c:432
            #3  0x00007f842cb6da1a in __old_realpath (name=name@entry=0x7f8420188be0 "./test/", resolved=resolved@entry=0x7f8420187780 "\020\212\030 \204\177") at ./stdlib/canonicalize.c:454
            #4  0x000055ae4c6b2530 in my_realpath (to=0x7f84201889e0 "\020\364\035\"\204\177", filename=0x7f8420188be0 "./test/", MyFlags=0) at /mariadb/10.6/mysys/my_symlink.c:151
            #5  0x000055ae4c34b5bd in find_or_create_file (thread=thread@entry=0x7f842c0fbec0, klass=klass@entry=0x55ae4da7a1c0, filename=filename@entry=0x55ae4e257448 "./test/td.ibd", len=13, create=create@entry=true)
                at /mariadb/10.6/storage/perfschema/pfs_instr.cc:829
            #6  0x000055ae4c339f9d in pfs_end_file_open_wait_v1 (locker=0x7f8420188e70, result=<optimized out>) at /mariadb/10.6/storage/perfschema/pfs.cc:4458
            #7  0x000055ae4c420296 in pfs_os_file_create_simple_no_error_handling_func (key=<optimized out>, name=0x55ae4e257448 "./test/td.ibd", create_mode=create_mode@entry=51, access_type=access_type@entry=333, 
                read_only=read_only@entry=true, success=success@entry=0x7f8420188f0f, src_file=0x55ae4b71b998 "/mariadb/10.6/storage/innobase/fsp/fsp0file.cc", src_line=84)
                at /mariadb/10.6/storage/innobase/include/os0file.inl:118
            #8  0x000055ae4c420ae3 in Datafile::open_read_only (this=this@entry=0x7f8420189120, strict=strict@entry=true) at /mariadb/10.6/storage/innobase/fsp/fsp0file.cc:84
            #9  0x000055ae4c42bd5e in fil_ibd_open (validate=validate@entry=2, purpose=purpose@entry=FIL_TYPE_TABLESPACE, id=3, flags=1879113953, name={data_ = 0x55ae4e2514c8 "test/td", size_ = 7}, 
                path_in=path_in@entry=0x0, err=0x0) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2182
            #10 0x000055ae4c4b0e82 in dict_load_tablespace (table=0x55ae4e2bd618, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /mariadb/10.6/storage/innobase/dict/dict0load.cc:2211
            #11 0x000055ae4c4b23ae in dict_load_table_one (name=@0x7f8420189e20: {data_ = 0x7f842018a3b0 "test/td", size_ = 7}, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY, fk_tables=std::deque with 0 elements)
                at /mariadb/10.6/storage/innobase/dict/dict0load.cc:2320
            #12 0x000055ae4c4b2c27 in dict_sys_t::load_table (this=this@entry=0x55ae4c8dd180 <dict_sys>, name=@0x7f8420189e20: {data_ = 0x7f842018a3b0 "test/td", size_ = 7}, ignore=ignore@entry=DICT_ERR_IGNORE_FK_NOKEY)
                at /mariadb/10.6/storage/innobase/dict/dict0load.cc:2482
            #13 0x000055ae4c3f5983 in dict_table_open_on_name (table_name=table_name@entry=0x7f842018a3b0 "test/td", dict_locked=dict_locked@entry=false, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY)
                at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1055
            #14 0x000055ae4c3bc54c in ha_innobase::open_dict_table (norm_name=norm_name@entry=0x7f842018a3b0 "test/td", is_partition=is_partition@entry=false, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY)
                at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:6244
            #15 0x000055ae4c3c921b in ha_innobase::open (this=0x55ae4e2bcd80, name=0x55ae4e2cd490 "./test/td") at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:5963
            …
            #26 0x000055ae4c0b4cf8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x55ae4e2981b8, packet=packet@entry=0x55ae4e29f099 "SHOW CREATE TABLE td", packet_length=packet_length@entry=20, 
                blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1896
            

            marko Marko Mäkelä added a comment - It turns out that the file test/td.ibd actually contains FSP_SPACE_FLAGS that disagree with the intentionally corrupted table flags, but the fil_ibd_open() call in dict_check_tablespaces_and_store_max_id() never enabled the strict validation, which would be part of dict_load_tablespace() . Furthermore, the following call could confuse users: diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc index 02203ba3d7b..1e0b7a9c8c5 100644 --- a/storage/innobase/fil/fil0fil.cc +++ b/storage/innobase/fil/fil0fil.cc @@ -2221,8 +2221,6 @@ fil_ibd_open( goto corrupted; } - os_file_get_last_error(operation_not_for_export, - !operation_not_for_export); if (!operation_not_for_export) { goto corrupted; } On my system, it would report an EINVAL that is not actually directly related to opening the file, but to an expected error during the execution of my_realpath() : 10.6 04477bd9364e7a753918c730b89e7012b1926e6a #0 0x00007f842cb17279 in __readlink (path=path@entry=0x7f8420186b10 "/dev/shm/10.6/mysql-test/var/mysqld.1/data/test", buf=buf@entry=0x7f8420187330 "", len=len@entry=1023) at ../sysdeps/unix/sysv/linux/readlink.c:30 #1 0x00007f842ca5bc42 in realpath_stk (bufs=0x7f8420186b00, resolved=0x7f8420187780 "\020\212\030 \204\177", name=<optimized out>) at ./stdlib/canonicalize.c:306 #2 __GI___realpath (name=name@entry=0x7f8420188be0 "./test/", resolved=resolved@entry=0x7f8420187780 "\020\212\030 \204\177") at ./stdlib/canonicalize.c:432 #3 0x00007f842cb6da1a in __old_realpath (name=name@entry=0x7f8420188be0 "./test/", resolved=resolved@entry=0x7f8420187780 "\020\212\030 \204\177") at ./stdlib/canonicalize.c:454 #4 0x000055ae4c6b2530 in my_realpath (to=0x7f84201889e0 "\020\364\035\"\204\177", filename=0x7f8420188be0 "./test/", MyFlags=0) at /mariadb/10.6/mysys/my_symlink.c:151 #5 0x000055ae4c34b5bd in find_or_create_file (thread=thread@entry=0x7f842c0fbec0, klass=klass@entry=0x55ae4da7a1c0, filename=filename@entry=0x55ae4e257448 "./test/td.ibd", len=13, create=create@entry=true) at /mariadb/10.6/storage/perfschema/pfs_instr.cc:829 #6 0x000055ae4c339f9d in pfs_end_file_open_wait_v1 (locker=0x7f8420188e70, result=<optimized out>) at /mariadb/10.6/storage/perfschema/pfs.cc:4458 #7 0x000055ae4c420296 in pfs_os_file_create_simple_no_error_handling_func (key=<optimized out>, name=0x55ae4e257448 "./test/td.ibd", create_mode=create_mode@entry=51, access_type=access_type@entry=333, read_only=read_only@entry=true, success=success@entry=0x7f8420188f0f, src_file=0x55ae4b71b998 "/mariadb/10.6/storage/innobase/fsp/fsp0file.cc", src_line=84) at /mariadb/10.6/storage/innobase/include/os0file.inl:118 #8 0x000055ae4c420ae3 in Datafile::open_read_only (this=this@entry=0x7f8420189120, strict=strict@entry=true) at /mariadb/10.6/storage/innobase/fsp/fsp0file.cc:84 #9 0x000055ae4c42bd5e in fil_ibd_open (validate=validate@entry=2, purpose=purpose@entry=FIL_TYPE_TABLESPACE, id=3, flags=1879113953, name={data_ = 0x55ae4e2514c8 "test/td", size_ = 7}, path_in=path_in@entry=0x0, err=0x0) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:2182 #10 0x000055ae4c4b0e82 in dict_load_tablespace (table=0x55ae4e2bd618, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /mariadb/10.6/storage/innobase/dict/dict0load.cc:2211 #11 0x000055ae4c4b23ae in dict_load_table_one (name=@0x7f8420189e20: {data_ = 0x7f842018a3b0 "test/td", size_ = 7}, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY, fk_tables=std::deque with 0 elements) at /mariadb/10.6/storage/innobase/dict/dict0load.cc:2320 #12 0x000055ae4c4b2c27 in dict_sys_t::load_table (this=this@entry=0x55ae4c8dd180 <dict_sys>, name=@0x7f8420189e20: {data_ = 0x7f842018a3b0 "test/td", size_ = 7}, ignore=ignore@entry=DICT_ERR_IGNORE_FK_NOKEY) at /mariadb/10.6/storage/innobase/dict/dict0load.cc:2482 #13 0x000055ae4c3f5983 in dict_table_open_on_name (table_name=table_name@entry=0x7f842018a3b0 "test/td", dict_locked=dict_locked@entry=false, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1055 #14 0x000055ae4c3bc54c in ha_innobase::open_dict_table (norm_name=norm_name@entry=0x7f842018a3b0 "test/td", is_partition=is_partition@entry=false, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:6244 #15 0x000055ae4c3c921b in ha_innobase::open (this=0x55ae4e2bcd80, name=0x55ae4e2cd490 "./test/td") at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:5963 … #26 0x000055ae4c0b4cf8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x55ae4e2981b8, packet=packet@entry=0x55ae4e29f099 "SHOW CREATE TABLE td", packet_length=packet_length@entry=20, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1896

            Some tests for encryption functionality depend on opening all InnoDB data files at startup. For example, the test encryption.innodb-spatial-index would not list the tablespace for the system table mysql.transaction_registry in INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION, because it would no longer be opened as part of the server execution. (The test does not make use of any tables WITH SYSTEM VERSIONING.)
            The full list of affected enryption tests is as follows:

            encryption.debug_key_management
            encryption.encrypt_and_grep
            encryption.file_creation
            encryption.innodb-remove-encryption
            encryption.innodb-spatial-index
            encryption.innodb_encrypt_freed
            encryption.innodb_encrypt_key_rotation_age
            encryption.innodb_encryption
            

            An option might be to keep opening all data files of InnoDB tables if an encryption plugin is loaded. We must keep in mind that most encryption parameters, such as innodb_encryption_threads, are settable at runtime.

            marko Marko Mäkelä added a comment - Some tests for encryption functionality depend on opening all InnoDB data files at startup. For example, the test encryption.innodb-spatial-index would not list the tablespace for the system table mysql.transaction_registry in INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION , because it would no longer be opened as part of the server execution. (The test does not make use of any tables WITH SYSTEM VERSIONING .) The full list of affected enryption tests is as follows: encryption.debug_key_management encryption.encrypt_and_grep encryption.file_creation encryption.innodb-remove-encryption encryption.innodb-spatial-index encryption.innodb_encrypt_freed encryption.innodb_encrypt_key_rotation_age encryption.innodb_encryption An option might be to keep opening all data files of InnoDB tables if an encryption plugin is loaded. We must keep in mind that most encryption parameters, such as innodb_encryption_threads , are settable at runtime.

            Some of encryption functionality, like key rotation, is obscure, and is only supported by very few plugins, therefore it is a good question if
            opening all files in anticipation of this functionality by default with encryption is reasonable. Those encryption threads perhaps maybe changed to be not settable at runtime anymore.

            wlad Vladislav Vaintroub added a comment - Some of encryption functionality, like key rotation, is obscure, and is only supported by very few plugins, therefore it is a good question if opening all files in anticipation of this functionality by default with encryption is reasonable. Those encryption threads perhaps maybe changed to be not settable at runtime anymore.

            If I make dict_check_tablespaces_and_store_max_id() open all files when encryption_key_id_exists(FIL_DEFAULT_ENCRYPTION_KEY) (or !ibuf.empty), then all encryption tests except encryption.innodb-remove-encryption will pass. That still leaves some non-encryption test failures. The trickiest is the following:

            10.6 04477bd9364e7a753918c730b89e7012b1926e6a with patch

            CURRENT_TEST: sys_vars.innodb_buffer_pool_dump_abort_loads
            mysqltest: At line 143: Can't initialize replace from 'replace_result $fulldump Previously_dumped'
            

            This would be fixed by making buf_load() invoke another SYS_TABLES scan and open all data files for those missing tablespaces that are mentioned in the file identified by innodb_buffer_pool_filename.

            marko Marko Mäkelä added a comment - If I make dict_check_tablespaces_and_store_max_id() open all files when encryption_key_id_exists(FIL_DEFAULT_ENCRYPTION_KEY) (or !ibuf.empty ), then all encryption tests except encryption.innodb-remove-encryption will pass. That still leaves some non-encryption test failures. The trickiest is the following: 10.6 04477bd9364e7a753918c730b89e7012b1926e6a with patch CURRENT_TEST: sys_vars.innodb_buffer_pool_dump_abort_loads mysqltest: At line 143: Can't initialize replace from 'replace_result $fulldump Previously_dumped' This would be fixed by making buf_load() invoke another SYS_TABLES scan and open all data files for those missing tablespaces that are mentioned in the file identified by innodb_buffer_pool_filename .

            Before I implemented the tweak to buf_load(), I had tweaked the following tests:

            encryption.innodb-remove-encryption
            innodb.row_format_redundant
            innodb.table_flags
            innodb_zip.restart
            

            It turns out that my changes to these tests can be omitted, because the ib_buffer_pool file would ensure that the tablespaces of interest be loaded.

            If the ib_buffer_pool file were removed, these tests would fail. To make these failures deterministic, I would have created all tables with STATS_PERSISTENT=0 and run a slow shutdown (innodb_fast_shutdown=0) so that no background tasks would cause the tables to be loaded and data files to be opened.

            marko Marko Mäkelä added a comment - Before I implemented the tweak to buf_load() , I had tweaked the following tests: encryption.innodb-remove-encryption innodb.row_format_redundant innodb.table_flags innodb_zip.restart It turns out that my changes to these tests can be omitted, because the ib_buffer_pool file would ensure that the tablespaces of interest be loaded. If the ib_buffer_pool file were removed, these tests would fail. To make these failures deterministic, I would have created all tables with STATS_PERSISTENT=0 and run a slow shutdown ( innodb_fast_shutdown=0 ) so that no background tasks would cause the tables to be loaded and data files to be opened.
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Component/s Server [ 13907 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 10.10 [ 27530 ]
            Affects Version/s 10.6 [ 24028 ]
            Summary mariadb10.5.22 startup is slow Opening all .ibd files on InnoDB startup is slow on Microsoft Windows
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            I can imagine that opening the files can be slow also in other environments than Microsoft Windows, such as when the data directory resides on network attached storage or NFS.

            marko Marko Mäkelä added a comment - I can imagine that opening the files can be slow also in other environments than Microsoft Windows, such as when the data directory resides on network attached storage or NFS.
            marko Marko Mäkelä made changes -
            Summary Opening all .ibd files on InnoDB startup is slow on Microsoft Windows Opening all .ibd files on InnoDB startup can be slow

            origin/10.6-MDEV-32027 bddd1da9e939835dbbf61ea3c90638789db86733 2023-11-13T16:27:37+02:00
            behaved well in RQG testing.

            mleich Matthias Leich added a comment - origin/10.6- MDEV-32027 bddd1da9e939835dbbf61ea3c90638789db86733 2023-11-13T16:27:37+02:00 behaved well in RQG testing.
            wlad Vladislav Vaintroub added a comment - - edited

            Even on Windows it is not slow. I could not get it as slow, as the original reporter, but 85 000 files add 6-15 seconds startup (Windows seems to caches some metadata, and it is faster second time around). However, delay scales linearly with number of files, so startup with 850000 files can take 2.5 minutes on my machine. The reporter has something that is factors of magnitude slower, and for him it is really much too much. We do not know what kind of storage reporter had, but it would be interesting to know . kennyliao, can you comment on that?

            wlad Vladislav Vaintroub added a comment - - edited Even on Windows it is not slow. I could not get it as slow, as the original reporter, but 85 000 files add 6-15 seconds startup (Windows seems to caches some metadata, and it is faster second time around). However, delay scales linearly with number of files, so startup with 850000 files can take 2.5 minutes on my machine. The reporter has something that is factors of magnitude slower, and for him it is really much too much. We do not know what kind of storage reporter had, but it would be interesting to know . kennyliao , can you comment on that?
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2023-11-17 14:54:39.0 2023-11-17 14:54:38.689
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.17 [ 29518 ]
            Fix Version/s 10.11.7 [ 29519 ]
            Fix Version/s 11.0.5 [ 29520 ]
            Fix Version/s 11.1.4 [ 29024 ]
            Fix Version/s 11.2.3 [ 29521 ]
            Fix Version/s 11.3.2 [ 29522 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 180868

            People

              marko Marko Mäkelä
              kennyliao kennyliao
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.