Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30775

Performance regression in fil_space_t::try_to_close() introduced in MDEV-23855

Details

    Description

      Several operations got much slower starting from 10.5.8 and apparently related to file opening/closing in handler.cc when reaching innodb_open_files limit.

      The more tables you have in the instance the more visible, with some tens of thousands of tables it gets problematic.

      SHOW TABLE STATUS FROM db gets hundreds of times slower

      mysqldump on 600 schemas x 223 empty tables goes from 2:45 minutes to 27 minutes

      Increasing innodb_open_files (with all the capping limits open_files_limit, os user limits, system limits) helps but in some case means keeping hundreds of thousands of files open.

      The change was introduced in CS 10.5.8

      To reproduce you need many thousands of tables(at least higher than innodb_open_files) and execute mysqldump or SHOW TABLE STATUS FROM until the limit is reached, testing with and without increasing innodb_open_files will show the difference.

      I have found this if condition in handler.cc getting slower, the body is not even executed:

        if (unlikely((error=open(name,mode,test_if_locked))))
        {
      	  ms = duration_cast<milliseconds>(system_clock::now().time_since_epoch()).count();
      	  fprintf(stderr,"handler.cc:ha_open 3317 %" PRIu64 "\n", ms);
          if ((error == EACCES || error == EROFS) && mode == O_RDWR &&
      	(table->db_stat & HA_TRY_READ_ONLY))
          {
            table->db_stat|=HA_READ_ONLY;
            error=open(name,O_RDONLY,test_if_locked);
          }
        }
      

      That open() call goes to ha_heap.cc calling:

      int ha_heap::open(const char *name, int mode, uint test_if_locked) ...
      

      Attachments

        1. annotated-perf.txt
          14 kB
          Barry
        2. show-global-variables.txt
          533 kB
          Barry
        3. space_list.gdb
          0.3 kB
          Vladislav Lesin
        4. test-MDEV-30775-10517.log
          2 kB
          Claudio Nanni
        5. test-MDEV-30775-10520.log
          3 kB
          Claudio Nanni

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin added a comment - - edited

            Barry, could you please confirm you are testing 7c0af5c56994f37c45d40e45fa4e15743acaeb06 commit? BTW, the changes were pushed to 10.6, so you can use the latest 10.6 from repository to test.

            vlad.lesin Vladislav Lesin added a comment - - edited Barry , could you please confirm you are testing 7c0af5c56994f37c45d40e45fa4e15743acaeb06 commit? BTW, the changes were pushed to 10.6, so you can use the latest 10.6 from repository to test.
            Barry Barry added a comment -

            Sorry for the delay and confusion - yes the tests were done on the correct version but since it didn't resolve the issue I switched back to 10.6.12 before I provided the output of SHOW GLOBAL VARIABLES. I can test again using the 10.6 branch.

            Barry Barry added a comment - Sorry for the delay and confusion - yes the tests were done on the correct version but since it didn't resolve the issue I switched back to 10.6.12 before I provided the output of SHOW GLOBAL VARIABLES. I can test again using the 10.6 branch.
            vlad.lesin Vladislav Lesin added a comment - - edited

            Barry, could you please build latest 10.6(it must contain 0cca8166f3111901019dcd33747a1a1dfd9e66d1 commit) with -DCMAKE_BUILD_TYPE=RelWithDebInfo option, then test it, and, if it still has low performance:

            1. attach to the process with gdb, i.e. "gdb -p mariadbd_pid",

            2. set gdb logging to file and turn off gdb pagination, i.e.

            set logging on
            set logging file 10.6-MDEV-30775-gdb.log
            set pagination off
            

            3. load gdb space_list.gdb :

            source space_list.gdb
            

            4. launch function from the above gdb file:

            show_space_list
            

            5. quit from gdb(Ctrl+D), and share 10.6-MDEV-30775-gdb.log with us.

            vlad.lesin Vladislav Lesin added a comment - - edited Barry , could you please build latest 10.6(it must contain 0cca8166f3111901019dcd33747a1a1dfd9e66d1 commit) with -DCMAKE_BUILD_TYPE=RelWithDebInfo option, then test it, and, if it still has low performance: 1. attach to the process with gdb, i.e. "gdb -p mariadbd_pid", 2. set gdb logging to file and turn off gdb pagination, i.e. set logging on set logging file 10.6-MDEV-30775-gdb.log set pagination off 3. load gdb space_list.gdb : source space_list.gdb 4. launch function from the above gdb file: show_space_list 5. quit from gdb(Ctrl+D), and share 10.6-MDEV-30775-gdb.log with us.
            Barry Barry added a comment -

            Sorry for the delayed reply. I can confirm latest 10.6 (fe89df42686fd41e986dc775e12ad6f3594d5bca) works fine for me. Thanks for all of the help resolving this issue!

            Barry Barry added a comment - Sorry for the delayed reply. I can confirm latest 10.6 ( fe89df42686fd41e986dc775e12ad6f3594d5bca ) works fine for me. Thanks for all of the help resolving this issue!

            Barry, thank you for bringing up the issue well in time for the fix to be included in the quarterly releases.

            marko Marko Mäkelä added a comment - Barry , thank you for bringing up the issue well in time for the fix to be included in the quarterly releases.

            People

              vlad.lesin Vladislav Lesin
              claudio.nanni Claudio Nanni
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.