[MDEV-19296] Run Join Order Benchmark Created: 2019-04-20  Updated: 2022-12-13  Resolved: 2022-12-13

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Done Votes: 0
Labels: benchmarking

Attachments: PNG File mariadb-vs-mysql8.png     PNG File mariadb-vs-pg.png     PNG File screenshot-1.png    

 Description   

The "How Good Are Query Optimizers, Really?" paper examined query optimizer behavior for a number of queries.

The queries are available here https://github.com/gregrahn/join-order-benchmark
(TiDB variant: https://github.com/winkyao/join-order-benchmark )

Some facts about the workload:

  • The queries are N-way inner joins (no outer joins, or subqueries, or ORDER BY LIMIT)
  • The indexes are single-component keys.
  • The available indexes are "join indexes" for FK/PK relationships.
  • The predicates are "simple ranges", possibly OR-ed

A script that [almost] automates the benchmark run:
https://github.com/spetrunia/join-order-benchmark-tool



 Comments   
Comment by Sergei Petrunia [ 2020-01-01 ]

Some notes.

There's a package for imdbpy2sql:

sudo apt-get install python-imdbpy
sudo apt-get install python-genshi python-sqlalchemy python-sqlobject  python3-mysqldb

This produces

/usr/share/doc/python-imdbpy/examples/imdbpy2sql.py.gz

Running it on the current imdb contents

mkdir imdb_data;
cd imdb_data;
wget ftp://ftp.fu-berlin.de/misc/movies/database/frozendata/*gz
cd ..
./imdbpy2sql.py -d ./imdb_data/ -umysql://root@127.0.0.1:3340/imdbload  -c `pwd`/imdb_data_csv

will eventually fail with (does this have something to do with charsets?)

```
ERROR caught exception creating a foreign key: Cannot add or update a child row: a foreign key constraint fails (`imdbload`.`#sql-6bec_18`, CONSTRAINT `aka_title_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`))
```

Comment by Sergei Petrunia [ 2020-01-01 ]

Loading the original CSV data is hard, too. It has empty strings instead of NULLs. (MySQL/MariaDB's LOAD DATA INFILE need \N).

Loading into PostgreSQL as they specified doesn't work, but this works:

\copy aka_title from './aka_title.csv' with  (format csv, escape '\') ;
\copy aka_name from './aka_name.csv' with (format csv, escape '\');
\copy cast_info from './cast_info.csv' with (format csv, escape '\');
\copy char_name from './char_name.csv' with (format csv, escape '\');
\copy comp_cast_type from './comp_cast_type.csv' with (format csv, escape '\');
\copy company_name from './company_name.csv' with (format csv, escape '\');
\copy company_type from './company_type.csv' with (format csv, escape '\');
\copy complete_cast from './complete_cast.csv' with (format csv, escape '\');
\copy info_type from './info_type.csv' with (format csv, escape '\');
\copy keyword from './keyword.csv' with (format csv, escape '\');
\copy kind_type from './kind_type.csv' with (format csv, escape '\');
\copy link_type from './link_type.csv' with (format csv, escape '\');
\copy movie_companies from './movie_companies.csv' with (format csv, escape '\');
\copy movie_info from './movie_info.csv' with (format csv, escape '\');
\copy movie_info_idx from './movie_info_idx.csv' with (format csv, escape '\');
\copy movie_keyword from './movie_keyword.csv' with (format csv, escape '\');
\copy movie_link from './movie_link.csv' with (format csv, escape '\');
\copy name from './name.csv' with (format csv, escape '\');
\copy person_info from './person_info.csv' with (format csv, escape '\');
\copy role_type from './role_type.csv' with (format csv, escape '\');
\copy title from './title.csv' with (format csv, escape '\');

Comment by Sergei Petrunia [ 2020-01-01 ]

Then, can use this to make CSV files readable by MariaDB:

\copy aka_title to './mysql/aka_title.csv' with  (format csv, null '\N', escape '\') ;
\copy aka_name to './mysql/aka_name.csv' with (format csv, null '\N', escape '\');
\copy cast_info to './mysql/cast_info.csv' with (format csv, null '\N', escape '\');
\copy char_name to './mysql/char_name.csv' with (format csv, null '\N', escape '\');
\copy comp_cast_type to './mysql/comp_cast_type.csv' with (format csv, null '\N', escape '\');
\copy company_name to './mysql/company_name.csv' with (format csv, null '\N', escape '\');
\copy company_type to './mysql/company_type.csv' with (format csv, null '\N', escape '\');
\copy complete_cast to './mysql/complete_cast.csv' with (format csv, null '\N', escape '\');
\copy info_type to './mysql/info_type.csv' with (format csv, null '\N', escape '\');
\copy keyword to './mysql/keyword.csv' with (format csv, null '\N', escape '\');
\copy kind_type to './mysql/kind_type.csv' with (format csv, null '\N', escape '\');
\copy link_type to './mysql/link_type.csv' with (format csv, null '\N', escape '\');
\copy movie_companies to './mysql/movie_companies.csv' with (format csv, null '\N', escape '\');
\copy movie_info to './mysql/movie_info.csv' with (format csv, null '\N', escape '\');
\copy movie_info_idx to './mysql/movie_info_idx.csv' with (format csv, null '\N', escape '\');
\copy movie_keyword to './mysql/movie_keyword.csv' with (format csv, null '\N', escape '\');
\copy movie_link to './mysql/movie_link.csv' with (format csv, null '\N', escape '\');
\copy name to './mysql/name.csv' with (format csv, null '\N', escape '\');
\copy person_info to './mysql/person_info.csv' with (format csv, null '\N', escape '\');
\copy role_type to './mysql/role_type.csv' with (format csv, null '\N', escape '\');
\copy title to './mysql/title.csv' with (format csv, null '\N', escape '\');

... but there are errors

ERROR 1366 (22007) at line 21 in file: '../load-data.sql': Incorrect integer value: 'F624' for column `imdbload`.`title`.`imdb_id` at row 2514451

the line causing the error:

2522636,\Frag'ile\,\N,1,2010,\N,F624,\N,\N,\N,\N,c0b2e279bce6d3b1717e750a2591bb6d

changed it to:

2522636,\\Frag'ile\\,\N,1,2010,\N,F624,\N,\N,\N,\N,c0b2e279bce6d3b1717e750a2591bb6d

.
Another

LOAD DATA INFILE '/home/psergey/dev-git2/try-job/imdb_data/mysql/person_info.csv' INTO TABLE person_info FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'; 
ERROR 1261 (01000): Row 2671666 doesn't contain data for all columns

this is

2671660,2604773,17,Daughter of Irish actor and raconteur 'Niall Toibin' (qv); \,\N

Comment by Sergei Petrunia [ 2020-01-05 ]

The first disaster was Query 16b.

Run time, seconds
10.4: 3492.499
10.4-eits 2714.008
MySQL-8-cur 44.630 
PostgreSQL-12 12.391

SELECT 
  MIN(an.name) AS cool_actor_pseudonym, 
  MIN(t.title) AS series_named_after_char 
FROM 
  aka_name AS an, 
  cast_info AS ci, 
  company_name AS cn, 
  keyword AS k, 
  movie_companies AS mc, 
  movie_keyword AS mk, 
  name AS n, 
  title AS t
WHERE 
  cn.country_code ='[us]' AND 
  k.keyword ='character-name-in-title' AND 
  an.person_id = n.id AND 
  n.id = ci.person_id AND 
  ci.movie_id = t.id AND 
  t.id = mk.movie_id AND 
  mk.keyword_id = k.id AND 
  t.id = mc.movie_id AND 
  mc.company_id = cn.id AND 
  an.person_id = ci.person_id AND 
  ci.movie_id = mc.movie_id AND
  ci.movie_id = mk.movie_id AND
  mc.movie_id = mk.movie_id;

MariaDB:

+------+-------------+-------+--------+-----------------------------------------------------+--------------------------+---------+------------------------+--------+----------+-------------+
| id   | select_type | table | type   | possible_keys                                       | key                      | key_len | ref                    | rows   | filtered | Extra       |
+------+-------------+-------+--------+-----------------------------------------------------+--------------------------+---------+------------------------+--------+----------+-------------+
|    1 | SIMPLE      | an    | ALL    | person_id_aka_name                                  | NULL                     | NULL    | NULL                   | 901343 |   100.00 |             |
|    1 | SIMPLE      | n     | eq_ref | PRIMARY                                             | PRIMARY                  | 4       | imdbload.an.person_id  | 1      |   100.00 | Using index |
|    1 | SIMPLE      | ci    | ref    | movie_id_cast_info,person_id_cast_info              | person_id_cast_info      | 4       | imdbload.an.person_id  | 8      |   100.00 |             |
|    1 | SIMPLE      | t     | eq_ref | PRIMARY                                             | PRIMARY                  | 4       | imdbload.ci.movie_id   | 1      |   100.00 |             |
|    1 | SIMPLE      | mc    | ref    | company_id_movie_companies,movie_id_movie_companies | movie_id_movie_companies | 4       | imdbload.ci.movie_id   | 2      |   100.00 |             |
|    1 | SIMPLE      | cn    | eq_ref | PRIMARY                                             | PRIMARY                  | 4       | imdbload.mc.company_id | 1      |    36.49 | Using where |
|    1 | SIMPLE      | mk    | ref    | keyword_id_movie_keyword,movie_id_movie_keyword     | movie_id_movie_keyword   | 4       | imdbload.ci.movie_id   | 9      |   100.00 |             |
|    1 | SIMPLE      | k     | eq_ref | PRIMARY                                             | PRIMARY                  | 4       | imdbload.mk.keyword_id | 1      |   100.00 | Using where |
+------+-------------+-------+--------+-----------------------------------------------------+--------------------------+---------+------------------------+--------+----------+-------------+

MySQL:

+----+-------------+-------+------------+--------+-----------------------------------------------------+--------------------------+---------+------------------------+--------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys                                       | key                      | key_len | ref                    | rows   | filtered | Extra       |
+----+-------------+-------+------------+--------+-----------------------------------------------------+--------------------------+---------+------------------------+--------+----------+-------------+
|  1 | SIMPLE      | k     | NULL       | ALL    | PRIMARY                                             | NULL                     | NULL    | NULL                   | 127338 |    10.00 | Using where |
|  1 | SIMPLE      | mk    | NULL       | ref    | keyword_id_movie_keyword,movie_id_movie_keyword     | keyword_id_movie_keyword | 4       | imdbload.k.id          |     40 |   100.00 | NULL        |
|  1 | SIMPLE      | t     | NULL       | eq_ref | PRIMARY                                             | PRIMARY                  | 4       | imdbload.mk.movie_id   |      1 |   100.00 | NULL        |
|  1 | SIMPLE      | mc    | NULL       | ref    | company_id_movie_companies,movie_id_movie_companies | movie_id_movie_companies | 4       | imdbload.mk.movie_id   |      2 |   100.00 | NULL        |
|  1 | SIMPLE      | cn    | NULL       | eq_ref | PRIMARY                                             | PRIMARY                  | 4       | imdbload.mc.company_id |      1 |    10.00 | Using where |
|  1 | SIMPLE      | ci    | NULL       | ref    | movie_id_cast_info,person_id_cast_info              | movie_id_cast_info       | 4       | imdbload.mk.movie_id   |     18 |   100.00 | NULL        |
|  1 | SIMPLE      | n     | NULL       | eq_ref | PRIMARY                                             | PRIMARY                  | 4       | imdbload.ci.person_id  |      1 |   100.00 | Using index |
|  1 | SIMPLE      | an    | NULL       | ref    | person_id_aka_name                                  | person_id_aka_name       | 4       | imdbload.ci.person_id  |      1 |   100.00 | NULL        |
+----+-------------+-------+------------+--------+-----------------------------------------------------+--------------------------+---------+------------------------+--------+----------+-------------+

PostgreSQL:

                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5048.47..5048.48 rows=1 width=64)
   ->  Nested Loop  (cost=9.50..5034.80 rows=2733 width=33)
         Join Filter: (n.id = an.person_id)
         ->  Nested Loop  (cost=9.08..4498.65 rows=1102 width=25)
               ->  Nested Loop  (cost=8.65..3996.92 rows=1102 width=21)
                     Join Filter: (t.id = ci.movie_id)
                     ->  Nested Loop  (cost=8.08..3873.63 rows=64 width=29)
                           ->  Nested Loop  (cost=7.66..3794.69 rows=177 width=33)
                                 Join Filter: (t.id = mc.movie_id)
                                 ->  Nested Loop  (cost=7.23..3774.19 rows=34 width=25)
                                       ->  Nested Loop  (cost=6.80..3757.63 rows=34 width=4)
                                             ->  Seq Scan on keyword k  (cost=0.00..2626.12 rows=1 width=4)
                                                   Filter: (keyword = 'character-name-in-title'::text)
                                             ->  Bitmap Heap Scan on movie_keyword mk  (cost=6.80..1128.44 rows=306 width=8)
                                                   Recheck Cond: (keyword_id = k.id)
                                                   ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..6.73 rows=306 width=0)
                                                         Index Cond: (keyword_id = k.id)
                                       ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21)
                                             Index Cond: (id = mk.movie_id)
                                 ->  Index Scan using movie_id_movie_companies on movie_companies mc  (cost=0.43..0.54 rows=5 width=8)
                                       Index Cond: (movie_id = mk.movie_id)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..0.45 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
                     ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.56..1.49 rows=35 width=8)
                           Index Cond: (movie_id = mk.movie_id)
               ->  Index Only Scan using name_pkey on name n  (cost=0.43..0.46 rows=1 width=4)
                     Index Cond: (id = ci.person_id)
         ->  Index Scan using person_id_aka_name on aka_name an  (cost=0.42..0.46 rows=2 width=20)
               Index Cond: (person_id = ci.person_id)

.

Comment by Sergei Petrunia [ 2020-01-05 ]

k.keyword ='character-name-in-title' selectivity is 1/134K 
cn.country_code ='[us]'  selectivity is 84843/234997=0.36

MariaDB has correctly figured the selectivity of "cn" . It didn't figure selectivity of k, because the stock DDL defines its datatype as "text" (TODO: there's also a variant with VARCHAR(1000) instead of text. Try that?)

MySQL uses guesstimates of 10%. This looks dumb but has worked for them, table "k" is the first.

PostgreSQL was able to correctly determine K's selectivity and has put it as the first element.

Comment by Sergei Petrunia [ 2020-01-06 ]

The "next big thing" is Q19d

query_name	PostgreSQL	10.4-varchar-eits	MySQL-8-eits
q19d	4016	161991	40924

Comment by Sergei Petrunia [ 2020-01-06 ]

SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS jap_engl_voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND n.gender ='f'
  AND rt.role ='actress'
  AND t.production_year > 2000
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

MariaDB:

+------+-------------+-------+--------+-----------------------------------------------------------------------------------+--------------------------+---------+----------------------------+--------+------------+----------+------------+-------------------------------------------------+
| id   | select_type | table | type   | possible_keys                                                                     | key                      | key_len | ref                        | rows   | r_rows     | filtered | r_filtered | Extra                                           |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------+--------------------------+---------+----------------------------+--------+------------+----------+------------+-------------------------------------------------+
|    1 | SIMPLE      | rt    | ALL    | PRIMARY                                                                           | NULL                     | NULL    | NULL                       | 12     | 12.00      |     8.33 |       8.33 | Using where                                     |
|    1 | SIMPLE      | it    | ALL    | PRIMARY                                                                           | NULL                     | NULL    | NULL                       | 113    | 113.00     |     0.88 |       0.88 | Using where; Using join buffer (flat, BNL join) |
|    1 | SIMPLE      | mi    | ref    | info_type_id_movie_info,movie_id_movie_info                                       | info_type_id_movie_info  | 4       | imdbload.it.id             | 208953 | 3036719.00 |   100.00 |     100.00 |                                                 |
|    1 | SIMPLE      | t     | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.mi.movie_id       | 1      | 1.00       |    56.16 |      59.22 | Using where                                     |
|    1 | SIMPLE      | mc    | ref    | company_id_movie_companies,movie_id_movie_companies                               | movie_id_movie_companies | 4       | imdbload.mi.movie_id       | 2      | 3.68       |   100.00 |     100.00 |                                                 |
|    1 | SIMPLE      | cn    | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.mc.company_id     | 1      | 1.00       |    36.49 |      26.27 | Using where                                     |
|    1 | SIMPLE      | ci    | ref    | movie_id_cast_info,person_id_cast_info,person_role_id_cast_info,role_id_cast_info | movie_id_cast_info       | 4       | imdbload.mi.movie_id       | 15     | 94.91      |   100.00 |       0.48 | Using where                                     |
|    1 | SIMPLE      | n     | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.ci.person_id      | 1      | 1.00       |    22.78 |      99.89 | Using where                                     |
|    1 | SIMPLE      | chn   | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.ci.person_role_id | 1      | 1.00       |   100.00 |     100.00 | Using index                                     |
|    1 | SIMPLE      | an    | ref    | person_id_aka_name                                                                | person_id_aka_name       | 4       | imdbload.ci.person_id      | 1      | 2.24       |   100.00 |     100.00 | Using index                                     |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------+--------------------------+---------+----------------------------+--------+------------+----------+------------+-------------------------------------------------+

PostgreSQL

 Finalize Aggregate  (cost=279430.42..279430.43 rows=1 width=64)
   ->  Gather  (cost=279430.20..279430.41 rows=2 width=64)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=278430.20..278430.21 rows=1 width=64)
               ->  Nested Loop  (cost=135386.98..278424.64 rows=1112 width=32)
                     ->  Nested Loop  (cost=135386.55..277688.22 rows=448 width=40)
                           ->  Nested Loop  (cost=135386.13..276244.21 rows=1234 width=44)
                                 Join Filter: (t.id = mc.movie_id)
                                 ->  Hash Join  (cost=135385.70..275610.63 rows=237 width=52)
                                       Hash Cond: (mi.info_type_id = it.id)
                                       ->  Nested Loop  (cost=135383.27..275535.35 rows=26750 width=56)
                                             ->  Nested Loop  (cost=135382.84..226825.94 rows=1857 width=48)
                                                   ->  Nested Loop  (cost=135382.41..221224.95 rows=3828 width=52)
                                                         ->  Hash Join  (cost=135381.98..195376.43 rows=16324 width=33)
                                                               Hash Cond: (t.id = ci.movie_id)
                                                               ->  Parallel Seq Scan on title t  (cost=0.00..49166.69 rows=573843 width=21)
                                                                     Filter: (production_year > 2000)
                                                               ->  Hash  (cost=134130.92..134130.92 rows=71925 width=12)
                                                                     ->  Nested Loop  (cost=0.56..134130.92 rows=71925 width=12)
                                                                           ->  Seq Scan on role_type rt  (cost=0.00..1.15 rows=1 width=4)
                                                                                 Filter: ((role)::text = 'actress'::text)
                                                                           ->  Index Scan using role_id_cast_info on cast_info ci  (cost=0.56..133345.14 rows=78463 width=16)
                                                                                 Index Cond: (role_id = rt.id)
                                                                                 Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voice) (uncredited)","(voice: English version)"}'::text[]))
                                                         ->  Index Scan using name_pkey on name n  (cost=0.43..1.58 rows=1 width=19)
                                                               Index Cond: (id = ci.person_id)
                                                               Filter: ((gender)::text = 'f'::text)
                                                   ->  Index Only Scan using char_name_pkey on char_name chn  (cost=0.43..1.46 rows=1 width=4)
                                                         Index Cond: (id = ci.person_role_id)
                                             ->  Index Scan using movie_id_movie_info on movie_info mi  (cost=0.43..25.83 rows=40 width=8)
                                                   Index Cond: (movie_id = t.id)
                                       ->  Hash  (cost=2.41..2.41 rows=1 width=4)
                                             ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                                                   Filter: ((info)::text = 'release dates'::text)
                                 ->  Index Scan using movie_id_movie_companies on movie_companies mc  (cost=0.43..2.61 rows=5 width=8)
                                       Index Cond: (movie_id = mi.movie_id)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.17 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
                     ->  Index Only Scan using person_id_aka_name on aka_name an  (cost=0.42..1.62 rows=2 width=4)
                           Index Cond: (person_id = n.id)

MySQL

+----+-------------+-------+------------+--------+-----------------------------------------------------------------------------------+--------------------------+---------+----------------------------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys                                                                     | key                      | key_len | ref                        | rows | filtered | Extra                                              |
+----+-------------+-------+------------+--------+-----------------------------------------------------------------------------------+--------------------------+---------+----------------------------+------+----------+----------------------------------------------------+
|  1 | SIMPLE      | it    | NULL       | ALL    | PRIMARY                                                                           | NULL                     | NULL    | NULL                       |  113 |     0.88 | Using where                                        |
|  1 | SIMPLE      | rt    | NULL       | ALL    | PRIMARY                                                                           | NULL                     | NULL    | NULL                       |   12 |     8.33 | Using where; Using join buffer (Block Nested Loop) |
|  1 | SIMPLE      | ci    | NULL       | ref    | movie_id_cast_info,person_id_cast_info,person_role_id_cast_info,role_id_cast_info | role_id_cast_info        | 4       | imdbload.rt.id             | 4905 |     1.99 | Using where                                        |
|  1 | SIMPLE      | t     | NULL       | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.ci.movie_id       |    1 |    55.15 | Using where                                        |
|  1 | SIMPLE      | chn   | NULL       | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.ci.person_role_id |    1 |   100.00 | Using index                                        |
|  1 | SIMPLE      | n     | NULL       | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.ci.person_id      |    1 |    23.43 | Using where                                        |
|  1 | SIMPLE      | an    | NULL       | ref    | person_id_aka_name                                                                | person_id_aka_name       | 4       | imdbload.ci.person_id      |    1 |   100.00 | Using index                                        |
|  1 | SIMPLE      | mc    | NULL       | ref    | company_id_movie_companies,movie_id_movie_companies                               | movie_id_movie_companies | 4       | imdbload.ci.movie_id       |    2 |   100.00 | NULL                                               |
|  1 | SIMPLE      | cn    | NULL       | eq_ref | PRIMARY                                                                           | PRIMARY                  | 4       | imdbload.mc.company_id     |    1 |     5.00 | Using where                                        |
|  1 | SIMPLE      | mi    | NULL       | ref    | info_type_id_movie_info,movie_id_movie_info                                       | movie_id_movie_info      | 4       | imdbload.ci.movie_id       |    7 |     1.43 | Using where                                        |
+----+-------------+-------+------------+--------+-----------------------------------------------------------------------------------+--------------------------+---------+----------------------------+------+----------+----------------------------------------------------+

Comment by Sergei Petrunia [ 2020-01-06 ]

The issue is selectivity of ci.note. It is defined as "TEXT", so MariaDB is not aware that condition on the table ci is very selective.

MySQL-8 can compute the selectivity (1.93% vs 40% that I get if I drop the histogram).

Comment by Sergei Petrunia [ 2020-01-06 ]

Changing ci.note to VARCHAR(1000) fixes the above.

Comment by Sergei Petrunia [ 2020-01-07 ]

Ok, when

  • all databases have histograms of their default size
  • "text" columns with conditions are changed to VARCHAR(1000) // helping MariaDB' eits

test results vary across runs, but the total benchmark times are in these
ranges:

- PostgreSQL 12: 1.4 - 1.9 min
- MariaDB: 9 - 11 min
- MySQL-8: 17 - 18 min

Total benchmark time is a metric which may skew the result towards the heaviest queries. To get an idea about the whole population, here are plots of relative query times:

Generated at Thu Feb 08 08:50:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.