Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30357

Performance regression in locking reads from secondary indexes

Details

    Description

      Since updating from MariaDB 10.3.35 to 10.6.11 one of our query's which used to take around 1min 30sec, is now still in the Sending data stage after more then an hour. I've kept it running for hours (3 or 4), but it wouldn't finish.

      I've tested every version from 10.3 till 10.6, all versions work fine (same timing as 10.3) right up until version 10.6.7. The issue starts from version 10.6.8.

      I have also tried the latest 10.10.2, which also does not work, which means this does appear to be a bug that's yet to be fixed.

      Row counts for the tables are:
      Afkomst: ~145k
      Automatch: ~67m
      Werkzoeker: ~300k

      The query itself isn't particularly difficult or complex. I've even put the two tables the WHERE is on inside a temporary table so it just becomes one join, which did not help.

      I did try limiting the row count for Werkzoeker to a much lower (6k instead of 300k) sub-set, and that did work. So it appears it either is a performance issue (one that's pretty bad), or one that chokes on a bigger row count.

      SET @LeeftijdHulpVan = 16;
      SET @LeeftijdHulpTot = 99;
      SET @Afkomst = 'Alle';
       
      CREATE TEMPORARY TABLE `AantalAutomatches` (PRIMARY KEY(WerkzoekerId)) AS (
          SELECT WerkzoekerId, COUNT(*) AS AantalAutomatch FROM `Automatch`
          JOIN Werkzoeker ON Automatch.WerkzoekerId = Werkzoeker.Id
          LEFT JOIN Afkomst ON Afkomst.UserId = Werkzoeker.Id AND Afkomst.UserType = 'Werk'
          WHERE TIMESTAMPDIFF(YEAR, Werkzoeker.Geboortedatum, CURDATE()) BETWEEN @LeeftijdHulpVan AND @LeeftijdHulpTot AND
              IF(@Afkomst != 'Alle', Afkomst.Source = @Afkomst, 1)
          GROUP BY WerkzoekerId
      );
      

      Show create table for the 3 tables:

      CREATE TABLE `Afkomst` (
        `Id` int(11) NOT NULL AUTO_INCREMENT,
        `Source` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Name` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Medium` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Content` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Term` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `UserType` enum('Zorg','Werk') COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `UserId` int(11) DEFAULT NULL,
        PRIMARY KEY (`Id`),
        UNIQUE KEY `UserAfkomst` (`UserId`,`UserType`)
      ) ENGINE=InnoDB AUTO_INCREMENT=145656 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;
       
      CREATE TABLE `Automatch` (
        `Id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        `ZorgaanvraagId` int(8) NOT NULL,
        `ZorgaanvraagAutomatchLoop` int(5) NOT NULL,
        `WerkzoekerId` int(8) NOT NULL,
        `TijdstipToegevoegd` datetime DEFAULT NULL,
        `TijdstipAppNotificatieVerstuurd` datetime DEFAULT NULL,
        `TijdstipWhatsAppNotificatieVerstuurd` datetime DEFAULT NULL,
        `TijdstipSmsVerstuurd` datetime DEFAULT NULL,
        `TijdstipMailVerstuurd` datetime DEFAULT NULL,
        `TijdstipHerinneringMailVerstuurd` datetime DEFAULT NULL,
        `ReactieWerkzoeker` enum('onbekend','geaccepteerd','afgewezen','bekeken') COLLATE utf8mb4_unicode_520_ci NOT NULL,
        PRIMARY KEY (`Id`),
        KEY `ZorgaanvraagWerkzoeker` (`ZorgaanvraagId`,`ZorgaanvraagAutomatchLoop`,`WerkzoekerId`),
        KEY `Zorgaanvraag` (`ZorgaanvraagId`,`ZorgaanvraagAutomatchLoop`),
        KEY `ZorgaanvraagId` (`ZorgaanvraagId`,`WerkzoekerId`),
        KEY `Werkzoeker` (`WerkzoekerId`)
      ) ENGINE=InnoDB AUTO_INCREMENT=67983079 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci ROW_FORMAT=COMPACT
       
      CREATE TABLE `Werkzoeker` (
        `Id` int(8) unsigned NOT NULL AUTO_INCREMENT,
        `User` varchar(250) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Wachtwoord` varchar(512) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Voorletters` varchar(8) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Voornaam` varchar(50) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Tussenvoegsel` varchar(10) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Achternaam` varchar(50) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Straat` varchar(50) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Huisnummer` varchar(120) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Toevoeging` varchar(15) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Postcode` varchar(6) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Woonplaats` varchar(50) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Geboortedatum` date DEFAULT NULL,
        `Nationaliteit` varchar(255) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Geslacht` char(1) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Rekeningnummer` varchar(20) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Iban` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `RekeningnummerUpdate` datetime DEFAULT NULL,
        `SepaAmendment` tinyint(1) NOT NULL DEFAULT 0,
        `Mandaat` smallint(6) DEFAULT NULL,
        `TenNameVan` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `BankPlaats` varchar(50) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `TelefoonnummerDag` varchar(20) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `TelefoonnummerAvond` varchar(20) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Emailadres` varchar(50) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Kilometers` tinyint(4) NOT NULL,
        `UrenBeschikbaar` tinyint(2) NOT NULL,
        `Opmerkingen` mediumtext COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Actief` tinyint(1) NOT NULL DEFAULT 0,
        `Activatiesleutel` varchar(32) COLLATE utf8mb4_unicode_520_ci NOT NULL,
        `Aanmelddatum` timestamp NOT NULL DEFAULT current_timestamp(),
        `Activatiedatum` datetime DEFAULT NULL,
        `LogboekOud` longtext COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Betalingsregeling` longtext COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `HeeftBetalingsregeling` tinyint(1) NOT NULL DEFAULT 0,
        `AantalTermijnen` int(5) DEFAULT NULL,
        `Termijn` enum('week','maand') COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `EinddatumBetalingsregeling` date DEFAULT NULL,
        `MagPuntenInwisselen` enum('Ja','Nee','Systeem') COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'Systeem',
        `HerinneringsMail` int(2) NOT NULL DEFAULT 0,
        `PersoonlijkeInfo` mediumtext COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Ervaringtekst` mediumtext COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `SiteId` int(11) NOT NULL,
        `Lat` decimal(11,8) NOT NULL,
        `Lng` decimal(11,8) NOT NULL,
        `Beschikbaarheid` varchar(20) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `AlternatiefMailadres` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `SmsCredits` int(11) NOT NULL DEFAULT 0,
        `SmsAan` tinyint(1) NOT NULL DEFAULT 1,
        `Ip` varchar(20) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Werkvergunning` varchar(3) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `Machtiging` enum('n','y') COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'n',
        `MachtigingVerstuurd` tinyint(1) NOT NULL DEFAULT 0,
        `MachtigingIngevoerd` datetime DEFAULT NULL,
        `OntvangAanvragen` enum('n','y') COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'y',
        `Referentie` mediumtext COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `UitLakslijst` int(11) NOT NULL DEFAULT 0,
        `VakantieVan` date DEFAULT NULL,
        `VakantieTot` date DEFAULT NULL,
        `VakantieType` enum('Vakantie','Zwangerschapsverlof') COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `GegevensDelen` tinyint(1) NOT NULL DEFAULT 1,
        `GooglePlaces` tinyint(1) NOT NULL DEFAULT 0,
        `Uitgeschreven` enum('n','y') COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'n',
        `ReferralType` enum('AffiliateHiH','Werkzoeker') COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `ReferralGebruikerId` int(8) unsigned DEFAULT NULL,
        `AffiliateGebruikerId` int(11) DEFAULT NULL,
        `HeeftAfkoopSom` tinyint(1) NOT NULL DEFAULT 0,
        `Mailing` int(11) NOT NULL DEFAULT 0,
        `Foto` varchar(255) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `FotoGecontroleerd` tinyint(1) NOT NULL DEFAULT 0,
        `SepaStatus` enum('FRST','RCUR') COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'FRST',
        `AanbiedingenGehad` int(11) NOT NULL DEFAULT 0,
        `MailingOptOut` tinyint(1) NOT NULL DEFAULT 0,
        `Zorgverzekeraar` int(11) NOT NULL DEFAULT 19,
        `Polisnummer` varchar(30) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `ZorgverzekeringBeschikbaar` enum('ja','nee','onbekend') COLLATE utf8mb4_unicode_520_ci NOT NULL DEFAULT 'onbekend',
        `ZorgAkkoord` tinyint(1) NOT NULL DEFAULT 0,
        `LaatsteLoginDatum` datetime DEFAULT NULL,
        `MailingOntvangen` smallint(5) unsigned NOT NULL DEFAULT 0,
        `AutomatchLaatsteRun` datetime DEFAULT NULL,
        `KwaliteitScore` decimal(5,2) DEFAULT NULL,
        `JaarWerkervaring` tinyint(3) unsigned DEFAULT NULL,
        `BesteContactUur` int(2) unsigned DEFAULT NULL,
        `HeeftVoorwaardenGeaccepteerd` tinyint(1) unsigned NOT NULL DEFAULT 0,
        `DatumVoorwaardenGeaccepteerd` datetime DEFAULT NULL,
        `DubbelGecontroleerd` tinyint(1) unsigned NOT NULL DEFAULT 0,
        `TranslateLang` varchar(10) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
        `NummerNietInGebruik` tinyint(4) NOT NULL DEFAULT 0,
        PRIMARY KEY (`Id`),
        UNIQUE KEY `user` (`User`),
        UNIQUE KEY `activatiesleutel` (`Activatiesleutel`),
        KEY `emailadres` (`Emailadres`),
        KEY `postcode` (`Postcode`),
        KEY `achternaam` (`Achternaam`),
        KEY `Lat` (`Lat`),
        KEY `Lng` (`Lng`),
        KEY `woonplaats` (`Woonplaats`),
        KEY `straat` (`Straat`),
        KEY `woonplaats&straat` (`Woonplaats`,`Straat`),
        KEY `gebdatum` (`Geboortedatum`),
        KEY `KwaliteitScore` (`KwaliteitScore`),
        KEY `LngLatLogin` (`Lng`,`Lat`,`LaatsteLoginDatum`) USING BTREE,
        KEY `BesteContactUur` (`BesteContactUur`),
        FULLTEXT KEY `woonplaats_TEXT` (`Woonplaats`)
      ) ENGINE=InnoDB AUTO_INCREMENT=292240 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;
      

      Explain extended:

      +------+-------------+------------+--------+---------------+-------------+---------+----------------------------+--------+----------+-----------------------------------------------------------+
      | id   | select_type | table      | type   | possible_keys | key         | key_len | ref                        | rows   | filtered | Extra                                                     |
      +------+-------------+------------+--------+---------------+-------------+---------+----------------------------+--------+----------+-----------------------------------------------------------+
      |    1 | SIMPLE      | Werkzoeker | index  | PRIMARY       | gebdatum    | 4       | NULL                       | 298809 |   100.00 | Using where; Using index; Using temporary; Using filesort |
      |    1 | SIMPLE      | Afkomst    | eq_ref | UserAfkomst   | UserAfkomst | 7       | pgb_db.Werkzoeker.Id,const | 1      |   100.00 | Using where                                               |
      |    1 | SIMPLE      | Automatch  | ref    | Werkzoeker    | Werkzoeker  | 4       | pgb_db.Werkzoeker.Id       | 124    |   100.00 | Using where; Using index                                  |
      +------+-------------+------------+--------+---------------+-------------+---------+----------------------------+--------+----------+-----------------------------------------------------------+
      3 rows in set, 1 warning (0.003 sec)
      

      I am able to provide database dumps for these tables if needed (Which is around 7.5GB).

      Please let me know if you need anything else from me, happy to help to get this issue resolved.

      Attachments

        1. flame-10.5.svg
          148 kB
        2. flame-10.6.svg
          233 kB
        3. MDEV-30357.test
          11 kB

        Issue Links

          Activity

            I tweaked alice’s MDEV-30357.test by adding a restart before the interesting section of the test:

            @@ -180,6 +180,7 @@
             
              #analyze table Werkzoeker,Automatch, Afkomst;
             
            +--source include/restart_mysqld.inc
             SET @LeeftijdHulpVan = 10;
             SET @LeeftijdHulpTot = 99;
             SET @Afkomst = 'Alle';
            

            This allowed me to run perf record on the interesting part of the test execution, using ./mtr --manual-gdb MDEV-30357.

            There is a prominent difference inside the function lock_sec_rec_read_check_and_lock(). In the fast version, this function is invoking lf_hash_iterate(), presumably inside trx_sys.get_min_trx_id(). In the slow version, the time taken by that is replaced by a call to lock_rec_convert_impl_to_expl(), which spends the bulk of its time in row_vers_impl_x_locked(), looking up a clustered index record version that would match the secondary index record. This could be avoided altogether if the PAGE_MAX_TRX_ID on the secondary index page is older than trx_sys.get_min_trx_id().

            To reduce the number of calls to trx_sys.get_min_trx_id() (which was the motivation of the patch) I think that we can use another trick: cache the previous value of the call, because we know that the next call to that function cannot return a smaller value.

            marko Marko Mäkelä added a comment - I tweaked alice ’s MDEV-30357.test by adding a restart before the interesting section of the test: @@ -180,6 +180,7 @@ #analyze table Werkzoeker,Automatch, Afkomst; +--source include/restart_mysqld.inc SET @LeeftijdHulpVan = 10; SET @LeeftijdHulpTot = 99; SET @Afkomst = 'Alle'; This allowed me to run perf record on the interesting part of the test execution, using ./mtr --manual-gdb MDEV-30357 . There is a prominent difference inside the function lock_sec_rec_read_check_and_lock() . In the fast version, this function is invoking lf_hash_iterate() , presumably inside trx_sys.get_min_trx_id() . In the slow version, the time taken by that is replaced by a call to lock_rec_convert_impl_to_expl() , which spends the bulk of its time in row_vers_impl_x_locked() , looking up a clustered index record version that would match the secondary index record. This could be avoided altogether if the PAGE_MAX_TRX_ID on the secondary index page is older than trx_sys.get_min_trx_id() . To reduce the number of calls to trx_sys.get_min_trx_id() (which was the motivation of the patch) I think that we can use another trick: cache the previous value of the call, because we know that the next call to that function cannot return a smaller value.

            The trx_sys.find_same_or_older() that replaced trx_sys.get_min_trx_id() in MDEV-28445 is an optimized iteration that terminates as early as possible.

            In lock_sec_rec_read_check_and_lock(), the purpose of the trx_sys.get_min_trx_id() check was to determine if any older transaction may be holding locks on record in the secondary index page. If yes, only then it would make sense to invoke lock_rec_convert_impl_to_expl() to convert any implicit lock that older transaction may be holding on the record to an explicit exclusive lock, so that the current transaction can start waiting for that explicit lock to be released. Because this check had been removed, we would unnecessarily attempt to convert a non-existing implicit lock to an explicit one. This would involve costly lookups of clustered index records.

            If I add an equivalent call to trx_sys.find_same_or_older(), the CREATE TEMPORARY TABLE…SELECT statement will consume some 0.4 seconds of CPU time, instead of 0.2 seconds that it would consume if we skipped both the lookup and the call to lock_rec_convert_impl_to_expl().

            I think that it would be even better if trx_sys.find_same_or_older() returned the minimum transaction identifier that it encountered during the search. In that way, we could cache the value and avoid repeated traversals in lock_sec_rec_read_check_and_lock().

            marko Marko Mäkelä added a comment - The trx_sys.find_same_or_older() that replaced trx_sys.get_min_trx_id() in MDEV-28445 is an optimized iteration that terminates as early as possible. In lock_sec_rec_read_check_and_lock() , the purpose of the trx_sys.get_min_trx_id() check was to determine if any older transaction may be holding locks on record in the secondary index page. If yes, only then it would make sense to invoke lock_rec_convert_impl_to_expl() to convert any implicit lock that older transaction may be holding on the record to an explicit exclusive lock, so that the current transaction can start waiting for that explicit lock to be released. Because this check had been removed, we would unnecessarily attempt to convert a non-existing implicit lock to an explicit one. This would involve costly lookups of clustered index records. If I add an equivalent call to trx_sys.find_same_or_older() , the CREATE TEMPORARY TABLE…SELECT statement will consume some 0.4 seconds of CPU time, instead of 0.2 seconds that it would consume if we skipped both the lookup and the call to lock_rec_convert_impl_to_expl() . I think that it would be even better if trx_sys.find_same_or_older() returned the minimum transaction identifier that it encountered during the search. In that way, we could cache the value and avoid repeated traversals in lock_sec_rec_read_check_and_lock() .

            My proposed fix includes some speedup of trx_sys_t::find_same_or_older().

            marko Marko Mäkelä added a comment - My proposed fix includes some speedup of trx_sys_t::find_same_or_older() .

            I realized that there is a simpler fix. When I ported the Oracle Bug #33059387 fix, I had added a condition to lock_sec_rec_some_has_impl() that I thought would improve things:

              const trx_id_t max_trx_id= page_get_max_trx_id(page_align(rec));
             
              if ((caller_trx->id > max_trx_id &&
                   !trx_sys.find_same_or_older(caller_trx, max_trx_id)) ||
                  !lock_check_trx_id_sanity(max_trx_id, rec, index, offsets))
                return nullptr;
            

            I did not realize that it is possible to have trx->id=0 in a locking read if the transaction is not modifying any persistent tables.

            marko Marko Mäkelä added a comment - I realized that there is a simpler fix. When I ported the Oracle Bug #33059387 fix , I had added a condition to lock_sec_rec_some_has_impl() that I thought would improve things: const trx_id_t max_trx_id= page_get_max_trx_id(page_align(rec));   if ((caller_trx->id > max_trx_id && !trx_sys.find_same_or_older(caller_trx, max_trx_id)) || !lock_check_trx_id_sanity(max_trx_id, rec, index, offsets)) return nullptr; I did not realize that it is possible to have trx->id=0 in a locking read if the transaction is not modifying any persistent tables.

            Looks good to me.

            vlad.lesin Vladislav Lesin added a comment - Looks good to me.

            People

              marko Marko Mäkelä
              Thijn Thijn Geurts
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.