Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26232

Performance Regression in queries against mixed BIGINT values

    XMLWordPrintable

    Details

      Description

      We are experiencing what appears to be a regression in the query planner in the process of testing an upgrade from MariaDB 10.2.39 to 10.4.20.

      The table definition we are running the query against is as follows:

      Create Table: CREATE TABLE `token_data` (
        `uid` int(10) unsigned NOT NULL,
        `token` bigint(20) unsigned NOT NULL,
        `value1` bigint(20) unsigned NOT NULL,
        `value2` bigint(20) unsigned NOT NULL,
        `last_update` date NOT NULL,
        UNIQUE KEY `id_token_data_01` (`uid`,`token`),
        KEY `last_update` (`last_update`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1
      

      The query that I am running is as follows:

      EXPLAIN EXTENDED SELECT token,value1,value2 FROM token_data WHERE uid=12001 AND token IN (14454172572971265821,6745279665170062986,5976548953849996655,8277955528833064934,17940379197342480268,14869208320558067485,17315553045409243299,17053444543038773962,7801999317187194854,9774973577495866952,6139029933980675674,5975793561981298031,15780788265856007991,6710543574905126912,8056624067400366231,5372330405299365231,1731820755350078667,10337750861957591287,18179962722013403225,4032846873084559360,15507949777456787353,14864294813815369501,16190638042948088801,9743340660943945179,4310889774226046499,8654011433740205778,8701519266898721160,1770783118357761330,15057675132202217245,277070117901065582,540659882387707353,2205831429349578201,3457869911869787254,14980716260267224861,3754261088098005618,14052870096204695077,10943173851576400023,8654011433585790674,12993243060260250460,14979329550618816285,16850241364708797108,8583000031085752646,12264907527857408362,5618407151415731567,4703440132903012219,9920933646715251360,10410078769612255479,9859952648301078679,8583000031664820550,6007773530648094063,15819724772544679200,14387761156831755502,5620458545859667311,16245102814004570404,15079799121045188381,1121062113675701369,6715486898116100096,6463300224464549874,5499253815180141935,15718037849488072027,17592402927639996588,3569614194670960609,11469729274248341082,15053603698089750301,10489091348942158279,3961032953135919073,8947867702334725879,9774973577950588488,1282350908471679870,6820046171247287274,15543245596512385203);
      

      The query returns 49 rows in 0.00 sec on MariaDB 10.2, but consistently takes over 8-9 seconds returning the same 49 rows on MariaDB 10.4. We have been running similar queries for many years on MySQL 5.5 and MariaDB 10.2 against even much longer sets of unsigned BIGINT values, and the results have always been nearly instant.

      Here is the EXPLAIN EXTENDED output on the query on MariaDB 10.2:

      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: token_data
               type: range
      possible_keys: id_token_data_01
                key: id_token_data_01
            key_len: 12
                ref: NULL
               rows: 71
           filtered: 76.06
              Extra: Using where
      1 row in set, 1 warning (0.00 sec)
      

      Here is the EXPLAIN EXTENDED output on the query on MariaDB 10.4:

      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: token_data
               type: ref
      possible_keys: id_token_data_01
                key: id_token_data_01
            key_len: 4
                ref: const
               rows: 9950548
           filtered: 100.00
              Extra: Using where
      1 row in set, 1 warning (0.002 sec)
      

      As you see in the schema above, the value for 'token' is an unsigned BIGINT.

      In the query we are running, 24 of the values are below the max size of a signed BIGINT (9223372036854775807), 25 of the values are above that size.

      Here is where it gets interesting...

      When we run the same query on MariaDB 10.4 but only include the 24 values below the signed BIGINT max size, the results return in 0.002 seconds, with the following EXPLAIN EXTENDED output:

      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: token_data
               type: range
      possible_keys: id_token_data_01
                key: id_token_data_01
            key_len: 12
                ref: NULL
               rows: 24
           filtered: 100.00
              Extra: Using where
      1 row in set, 1 warning (0.002 sec)
      

      Running the same query again, but this time against the 25 values that are above the signed BIGINT max size, the results return in 0.001 seconds, with the following EXPLAIN EXTENDED output:

      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: token_data
               type: range
      possible_keys: id_token_data_01
                key: id_token_data_01
            key_len: 12
                ref: NULL
               rows: 25
           filtered: 100.00
              Extra: Using where
      1 row in set, 1 warning (0.002 sec)
      

      Finally, if when running the same query against 24 values that are both above and below the unsigned BIGINT limit, the query takes 3.305 seconds and the EXPLAIN EXTENDED reports:

      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: token_data
               type: ref
      possible_keys: id_token_data_01
                key: id_token_data_01
            key_len: 4
                ref: const
               rows: 9950548
           filtered: 100.00
              Extra: Using where
      1 row in set, 1 warning (0.002 sec)
      

      We have tried similar tests against table structures that use a simple primary key on 'token' instead of the composite key above, but the performance characteristics are similar: fast when values do not straddle the max signed BIGINT limit, very slow when they do.

      The only workaround we have found is by using OR rather than IN, but that is not a solution which will work for us in all use cases.

        Attachments

          Activity

            People

            Assignee:
            sanja Oleksandr Byelkin
            Reporter:
            jbello Josh Bello
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration