Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20609

Full table scan in INFORMATION_SCHEMA.PARAMETERS/ROUTINES

Details

    Description

      It seems there is an issue when querying on I_S.PARAMETERS table (used in java metadata) :
      when doing this kind of query (one result) :
      SELECT * FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPECIFIC_SCHEMA = 'testj' AND SPECIFIC_NAME = 'p2_512'
      It takes 15ms if 400 parameters in I_S.PARAMETERS table
      It takes 65ms if 4K parameters
      It takes 350ms if 20K parameters

      It seems to execute a full table scan in INFORMATION_SCHEMA.PARAMETERS

      The same query takes 6ms on mysql 8 with 20K same parameters

      Attachments

        1. mdev20609.diff
          8 kB
        2. screenshot-1.png
          screenshot-1.png
          5 kB
        3. screenshot-2.png
          screenshot-2.png
          18 kB

        Issue Links

          Activity

            diego dupin Diego Dupin created issue -
            diego dupin Diego Dupin made changes -
            Field Original Value New Value
            Description It seems there is an issue when querying on I_S.PARAMETERS table (used in java metadata) :
            when doing this kind of query (one result) :
            SELECT * FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPECIFIC_SCHEMA = 'testj' AND SPECIFIC_NAME = 'p2_512'
            It takes 15ms if 400 parameters in I_S.PARAMETERS table
            It takes 65ms if 4K parameters
            It takes 350ms if 20K parameters

            The same query takes 6ms on mysql 8 with 20K same parameters
            It seems there is an issue when querying on I_S.PARAMETERS table (used in java metadata) :
            when doing this kind of query (one result) :
            SELECT * FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPECIFIC_SCHEMA = 'testj' AND SPECIFIC_NAME = 'p2_512'
            It takes 15ms if 400 parameters in I_S.PARAMETERS table
            It takes 65ms if 4K parameters
            It takes 350ms if 20K parameters

            It seems to execute a full table scan in INFORMATION_SCHEMA.PARAMETERS

            The same query takes 6ms on mysql 8 with 20K same parameters
            diego dupin Diego Dupin made changes -

            I_S code has support for up to two indexed columns. INFORMATION_SCHEMA.PARAMETERS should make use of that.

            serg Sergei Golubchik added a comment - I_S code has support for up to two indexed columns. INFORMATION_SCHEMA.PARAMETERS should make use of that.
            elenst Elena Stepanova made changes -
            Component/s Server [ 13907 ]
            Fix Version/s 10.1 [ 16100 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Assignee Oleksandr Byelkin [ sanja ]
            diego dupin Diego Dupin made changes -
            Attachment screenshot-1.png [ 53698 ]
            diego dupin Diego Dupin made changes -
            Attachment screenshot-2.png [ 53699 ]
            diego dupin Diego Dupin added a comment -

            mysql 8 : DESCRIBE SELECT * FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPECIFIC_SCHEMA = 'testj' AND SPECIFIC_NAME = 'p2_512'

            mariadb 10.5:

            diego dupin Diego Dupin added a comment - mysql 8 : DESCRIBE SELECT * FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPECIFIC_SCHEMA = 'testj' AND SPECIFIC_NAME = 'p2_512' mariadb 10.5:
            diego dupin Diego Dupin made changes -
            Fix Version/s 10.5 [ 23123 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.1 [ 16100 ]
            diego dupin Diego Dupin made changes -
            Labels CONNECTOR_RELATED

            Ok I_S.PARAMETERS table provides information about parameters for stored routines.

            MariaDB code has optimization for e.g. I_S.COLUMNS where it can infer the database and/or table name from the WHERE clause.

            The optimization is invoked in optimize_schema_tables_reads(). See the optimize_for_get_all_tables() call.

            However the optimization is not used for I_S.PARAMETERS table. The execution takes branch:

                  /* skip I_S optimizations specific to get_all_tables */
                  if (table_list->schema_table->fill_table != get_all_tables)
                    continue;
            

            because

            (gdb) p table_list->schema_table->fill_table
              $14 = (int (*)(THD *, TABLE_LIST *, COND *)) 0x555555f9af34 <fill_schema_proc(THD*, TABLE_LIST*, Item*)>
            

            psergei Sergei Petrunia added a comment - Ok I_S.PARAMETERS table provides information about parameters for stored routines. MariaDB code has optimization for e.g. I_S.COLUMNS where it can infer the database and/or table name from the WHERE clause. The optimization is invoked in optimize_schema_tables_reads(). See the optimize_for_get_all_tables() call. However the optimization is not used for I_S.PARAMETERS table. The execution takes branch: /* skip I_S optimizations specific to get_all_tables */ if (table_list->schema_table->fill_table != get_all_tables) continue ; because (gdb) p table_list->schema_table->fill_table $14 = (int (*)(THD *, TABLE_LIST *, COND *)) 0x555555f9af34 <fill_schema_proc(THD*, TABLE_LIST*, Item*)>

            I'ts not only the optimization code that's missing.
            The execution part in fill_schema_proc currently uses a full index scan over mysql.proc table (I am not sure why it is a full index scan and not a full table scan).

            As far as I understand, it parses each SP unconditionally (no matter whether its columns are needed or not).

            psergei Sergei Petrunia added a comment - I'ts not only the optimization code that's missing. The execution part in fill_schema_proc currently uses a full index scan over mysql.proc table (I am not sure why it is a full index scan and not a full table scan). As far as I understand, it parses each SP unconditionally (no matter whether its columns are needed or not).

            Possible optimizations would be (in the order of decreasing importance):
            1. Try to infer a lookup value from the WHERE condition. This has huge practical importance as most queries request information about one specific stored procedure.
            2. Before loading SP, check if the part of the WHERE that we can compute without loading is satisfied.

            3. Don't load the SP if we don't need any columns.

            psergei Sergei Petrunia added a comment - Possible optimizations would be (in the order of decreasing importance): 1. Try to infer a lookup value from the WHERE condition. This has huge practical importance as most queries request information about one specific stored procedure. 2. Before loading SP, check if the part of the WHERE that we can compute without loading is satisfied. 3. Don't load the SP if we don't need any columns.

            On first glance, it seems calc_lookup_values_from_cond can be used to infer the lookup value.

            psergei Sergei Petrunia added a comment - On first glance, it seems calc_lookup_values_from_cond can be used to infer the lookup value.
            psergei Sergei Petrunia made changes -
            Comment [ On first glance, it seems {{calc_lookup_values_from_cond}} can be used to infer the lookup value. ]
            psergei Sergei Petrunia made changes -
            Comment [ On first glance, it seems {{calc_lookup_values_from_cond}} can be used to infer the lookup value. ]
            psergei Sergei Petrunia made changes -
            Comment [ On first glance, it seems {{calc_lookup_values_from_cond}} can be used to infer the lookup value. ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 99707 ] MariaDB v4 [ 141542 ]
            juan.vera Juan added a comment - - edited

            Hi sanja,

            We have a customer that has identified this problem as causing a plugin they use to execute stored procedures to consume 99% of the query time in most cases.

            Is there any workaround to help with these missing optimizations from MySQL other than just reading I_S.PARAMETERS into a temporary table and indexing that properly for their plugin to use? (I don't know if this is possible - just trying to see what could be done short of implementing the missing optimizations)

            -Juan

            juan.vera Juan added a comment - - edited Hi sanja , We have a customer that has identified this problem as causing a plugin they use to execute stored procedures to consume 99% of the query time in most cases. Is there any workaround to help with these missing optimizations from MySQL other than just reading I_S.PARAMETERS into a temporary table and indexing that properly for their plugin to use? (I don't know if this is possible - just trying to see what could be done short of implementing the missing optimizations) -Juan

            I got an idea to create "shadow" table by triggers on mysql.proc as a workaround, but mariadb does not support triggers on system tables. So the only posibility is creating "shadow" table by schedule (I am not sure if it is acceptable).

            sanja Oleksandr Byelkin added a comment - I got an idea to create "shadow" table by triggers on mysql.proc as a workaround, but mariadb does not support triggers on system tables. So the only posibility is creating "shadow" table by schedule (I am not sure if it is acceptable).
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            julien.fritsch Julien Fritsch made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            ralf.gebhardt Ralf Gebhardt made changes -

            What shadow table? The fix is, as psergei wrote, use calc_lookup_values_from_cond and then index_read() to read the correct row for the requested routine

            serg Sergei Golubchik added a comment - What shadow table? The fix is, as psergei wrote, use calc_lookup_values_from_cond and then index_read() to read the correct row for the requested routine
            ccalender Chris Calender (Inactive) made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            sanja Oleksandr Byelkin made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            calc_lookup_values_from_cond can be applied only to schema, using it for routine name looks more like a feature

            sanja Oleksandr Byelkin added a comment - calc_lookup_values_from_cond can be applied only to schema, using it for routine name looks more like a feature
            sanja Oleksandr Byelkin made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            we discussed it with psergei

            calc_lookup_values_from_cond can not be used directly to make optimization possible (it require different fields and applied to database instead of a file system)

            There are 2 way to get functionality (both are not a bugfixes bug features):

            1. make something like calc_lookup_values_from_cond but for routines with 3 parameters (schema, routine name, routine type) and then in case of present of this parameters make index range scan instead index scan used now.

            2. make optimizer think that there are indexes (in both cases routines and tables) and let optimizer process it normally (it is more work but potentially more gain especially in joins).

            sanja Oleksandr Byelkin added a comment - we discussed it with psergei calc_lookup_values_from_cond can not be used directly to make optimization possible (it require different fields and applied to database instead of a file system) There are 2 way to get functionality (both are not a bugfixes bug features): 1. make something like calc_lookup_values_from_cond but for routines with 3 parameters (schema, routine name, routine type) and then in case of present of this parameters make index range scan instead index scan used now. 2. make optimizer think that there are indexes (in both cases routines and tables) and let optimizer process it normally (it is more work but potentially more gain especially in joins).
            sanja Oleksandr Byelkin made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            sanja Oleksandr Byelkin made changes -
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Issue Type Bug [ 1 ] Task [ 3 ]
            sanja Oleksandr Byelkin made changes -
            Assignee Oleksandr Byelkin [ sanja ] Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -

            See also MDEV-29104 which was filed about the same (or similar) issue.

            psergei Sergei Petrunia added a comment - See also MDEV-29104 which was filed about the same (or similar) issue.

            calc_lookup_values_from_cond() can perfectly be used directly. I've attached a proof-of-concept patch that does just that.

            serg Sergei Golubchik added a comment - calc_lookup_values_from_cond() can perfectly be used directly. I've attached a proof-of-concept patch that does just that.
            serg Sergei Golubchik made changes -
            Attachment mdev20609.diff [ 64903 ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Oleg Smirnov [ JIRAUSER50405 ]
            oleg.smirnov Oleg Smirnov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            julien.fritsch Julien Fritsch made changes -
            Summary Full table scan in INFORMATION_SCHEMA.PARAMETERS Full table scan in INFORMATION_SCHEMA.PARAMETERS
            oleg.smirnov Oleg Smirnov made changes -
            Summary Full table scan in INFORMATION_SCHEMA.PARAMETERS Full table scan in INFORMATION_SCHEMA.PARAMETERS/ROUTINES
            oleg.smirnov Oleg Smirnov added a comment - - edited

            The solution also targets queries from I_S.ROUTINES:

            SELECT * FROM information_schema.ROUTINES WHERE routine_schema='test' AND routine_name='proc1';
            SELECT * FROM information_schema.ROUTINES WHERE routine_schema='test';
            

            The mysql.proc table primary key will be used if routine_schema and/or routine_name are specified explicitly in the query.

            oleg.smirnov Oleg Smirnov added a comment - - edited The solution also targets queries from I_S.ROUTINES: SELECT * FROM information_schema.ROUTINES WHERE routine_schema='test' AND routine_name='proc1'; SELECT * FROM information_schema.ROUTINES WHERE routine_schema='test'; The mysql.proc table primary key will be used if routine_schema and/or routine_name are specified explicitly in the query.
            oleg.smirnov Oleg Smirnov made changes -
            Assignee Oleg Smirnov [ JIRAUSER50405 ] Sergei Petrunia [ psergey ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            psergei Sergei Petrunia made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            psergei Sergei Petrunia made changes -
            Status Stalled [ 10000 ] In Testing [ 10301 ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Elena Stepanova [ elenst ]
            elenst Elena Stepanova made changes -

            The tree isn't ready for testing, it failed in buildbot.

            elenst Elena Stepanova added a comment - The tree isn't ready for testing, it failed in buildbot.
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ] Oleg Smirnov [ JIRAUSER50405 ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            oleg.smirnov Oleg Smirnov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            oleg.smirnov Oleg Smirnov made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]
            oleg.smirnov Oleg Smirnov made changes -
            Assignee Oleg Smirnov [ JIRAUSER50405 ] Elena Stepanova [ elenst ]
            oleg.smirnov Oleg Smirnov added a comment -

            The fix was pushed, the testing may continue.

            oleg.smirnov Oleg Smirnov added a comment - The fix was pushed, the testing may continue.
            danblack Daniel Black added a comment -

            related, pushed for increased usage of information_schema.PARAMETERS in mybatis-3 so much appreciate the increased performance.

            danblack Daniel Black added a comment - related, pushed for increased usage of information_schema.PARAMETERS in mybatis-3 so much appreciate the increased performance.
            serg Sergei Golubchik made changes -
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ] Ramesh Sivaraman [ JIRAUSER48189 ]
            diego dupin Diego Dupin added a comment - - edited

            Just for confirmation: correction has been benchmarked using java connector and that shows that performance doesn't decrease using 400/4K/20K parameters.

            diego dupin Diego Dupin added a comment - - edited Just for confirmation: correction has been benchmarked using java connector and that shows that performance doesn't decrease using 400/4K/20K parameters.

            Performance looks good when compared to 10.6 version.

            10.11.0 9f7d2824779785506d1bb316540428990e815245 (Optimized)

            for i in 1 5 20 100; do /test/mtest/MD280922-mariadb-10.11.0-linux-x86_64-opt/bin/mysqlslap -uroot -S/test/mtest/MD280922-mariadb-10.11.0-linux-x86_64-opt/socket.sock -q "select * from information_schema.parameters where SPECIFIC_NAME = 'aaa_sp50001'; " -i 10 -c $i; done
            Benchmark
            	Average number of seconds to run all queries: 0.021 seconds
            	Minimum number of seconds to run all queries: 0.010 seconds
            	Maximum number of seconds to run all queries: 0.039 seconds
            	Number of clients running queries: 1
            	Average number of queries per client: 1
             
            Benchmark
            	Average number of seconds to run all queries: 0.080 seconds
            	Minimum number of seconds to run all queries: 0.058 seconds
            	Maximum number of seconds to run all queries: 0.095 seconds
            	Number of clients running queries: 5
            	Average number of queries per client: 1
             
            Benchmark
            	Average number of seconds to run all queries: 0.434 seconds
            	Minimum number of seconds to run all queries: 0.411 seconds
            	Maximum number of seconds to run all queries: 0.450 seconds
            	Number of clients running queries: 20
            	Average number of queries per client: 1
             
            Benchmark
            	Average number of seconds to run all queries: 2.793 seconds
            	Minimum number of seconds to run all queries: 2.754 seconds
            	Maximum number of seconds to run all queries: 2.859 seconds
            	Number of clients running queries: 100
            	Average number of queries per client: 1
            

            ramesh Ramesh Sivaraman added a comment - Performance looks good when compared to 10.6 version. 10.11.0 9f7d2824779785506d1bb316540428990e815245 (Optimized) for i in 1 5 20 100; do /test/mtest/MD280922-mariadb-10.11.0-linux-x86_64-opt/bin/mysqlslap -uroot -S/test/mtest/MD280922-mariadb-10.11.0-linux-x86_64-opt/socket.sock -q "select * from information_schema.parameters where SPECIFIC_NAME = 'aaa_sp50001'; " -i 10 -c $i; done Benchmark Average number of seconds to run all queries: 0.021 seconds Minimum number of seconds to run all queries: 0.010 seconds Maximum number of seconds to run all queries: 0.039 seconds Number of clients running queries: 1 Average number of queries per client: 1   Benchmark Average number of seconds to run all queries: 0.080 seconds Minimum number of seconds to run all queries: 0.058 seconds Maximum number of seconds to run all queries: 0.095 seconds Number of clients running queries: 5 Average number of queries per client: 1   Benchmark Average number of seconds to run all queries: 0.434 seconds Minimum number of seconds to run all queries: 0.411 seconds Maximum number of seconds to run all queries: 0.450 seconds Number of clients running queries: 20 Average number of queries per client: 1   Benchmark Average number of seconds to run all queries: 2.793 seconds Minimum number of seconds to run all queries: 2.754 seconds Maximum number of seconds to run all queries: 2.859 seconds Number of clients running queries: 100 Average number of queries per client: 1

            Performance comparison using Perl and Python mysql connector

            perl mysql connector
            10.6.10
            -----------
            Queries: 1 | Execution time(seconds) : 0.237706 
            Queries: 5 | Execution time(seconds) : 1.392153 
            Queries: 20 | Execution time(seconds) : 5.468022 
            Queries: 100 | Execution time(seconds) : 27.572478 
             
            10.7.6
            ----------
            Queries: 1 | Execution time(seconds) : 0.206260 
            Queries: 5 | Execution time(seconds) : 0.917974 
            Queries: 20 | Execution time(seconds) : 3.775994 
            Queries: 100 | Execution time(seconds) : 18.507874 
             
            10.11 Bug fix branch
            -------------------------------
            Queries: 1 | Execution time(seconds) : 0.010314 
            Queries: 5 | Execution time(seconds) : 0.049723 
            Queries: 20 | Execution time(seconds) : 0.195309 
            Queries: 100 | Execution time(seconds) : 0.975044 
             
            python mysql connector
            10.6.10
            ----------
            Queries: 5  Execution time(seconds) :  1.1055042743682861
            Queries: 20  Execution time(seconds) :  5.25916600227356
            Queries: 100  Execution time(seconds) :  27.54954743385315
             
            10.7.6
            ---------
            Queries: 5  Execution time(seconds) :  0.7787322998046875
            Queries: 20  Execution time(seconds) :  3.5663089752197266
            Queries: 100  Execution time(seconds) :  18.440176963806152
             
            10.11 Bug fix branch
            -------------------------------
            Queries: 5  Execution time(seconds) :  0.04043316841125488
            Queries: 20  Execution time(seconds) :  0.18743038177490234
            Queries: 100  Execution time(seconds) :  0.9795618057250977
            

            ramesh Ramesh Sivaraman added a comment - Performance comparison using Perl and Python mysql connector perl mysql connector 10.6.10 ----------- Queries: 1 | Execution time(seconds) : 0.237706 Queries: 5 | Execution time(seconds) : 1.392153 Queries: 20 | Execution time(seconds) : 5.468022 Queries: 100 | Execution time(seconds) : 27.572478   10.7.6 ---------- Queries: 1 | Execution time(seconds) : 0.206260 Queries: 5 | Execution time(seconds) : 0.917974 Queries: 20 | Execution time(seconds) : 3.775994 Queries: 100 | Execution time(seconds) : 18.507874   10.11 Bug fix branch ------------------------------- Queries: 1 | Execution time(seconds) : 0.010314 Queries: 5 | Execution time(seconds) : 0.049723 Queries: 20 | Execution time(seconds) : 0.195309 Queries: 100 | Execution time(seconds) : 0.975044   python mysql connector 10.6.10 ---------- Queries: 5 Execution time(seconds) : 1.1055042743682861 Queries: 20 Execution time(seconds) : 5.25916600227356 Queries: 100 Execution time(seconds) : 27.54954743385315   10.7.6 --------- Queries: 5 Execution time(seconds) : 0.7787322998046875 Queries: 20 Execution time(seconds) : 3.5663089752197266 Queries: 100 Execution time(seconds) : 18.440176963806152   10.11 Bug fix branch ------------------------------- Queries: 5 Execution time(seconds) : 0.04043316841125488 Queries: 20 Execution time(seconds) : 0.18743038177490234 Queries: 100 Execution time(seconds) : 0.9795618057250977
            ramesh Ramesh Sivaraman made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ] Oleg Smirnov [ JIRAUSER50405 ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]

            okay to push.

            ramesh Ramesh Sivaraman added a comment - okay to push.
            oleg.smirnov Oleg Smirnov added a comment -

            Pushed to 10.11.

            oleg.smirnov Oleg Smirnov added a comment - Pushed to 10.11.
            oleg.smirnov Oleg Smirnov made changes -
            Fix Version/s 10.11.0 [ 28411 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            elenst Elena Stepanova made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Labels CONNECTOR_RELATED CONNECTOR_RELATED Preview_10.11
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 164352 150968 150559

            People

              oleg.smirnov Oleg Smirnov
              diego dupin Diego Dupin
              Votes:
              2 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.