Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14275

Improving memory utilization for information schema

Details

    Description

      This task is about improving memory utilization and performance for
      Information schema

      Some work has recently been done in bb-10.2-ext to free memory early for
      tables and views used be performance schema. The next step is to create
      more efficient temporary tables that doesn't store information that we don't
      need.

      MariaDB [test]> select MEMORY_USED,MAX_MEMORY_USED from information_schema.processlist where db="test";
      +-------------+-----------------+
      | MEMORY_USED | MAX_MEMORY_USED |
      +-------------+-----------------+
      |       86120 |          245768 |
      +-------------+-----------------+
      1 row in set (0.00 sec)
       
      MariaDB [test]> select table_name from information_schema.tables where table_schema="mysql";
      ....
      MariaDB [test]> select MEMORY_USED,MAX_MEMORY_USED from information_schema.processlist where db="test";
      +-------------+-----------------+
      | MEMORY_USED | MAX_MEMORY_USED |
      +-------------+-----------------+
      |       86120 |          696880 |
      +-------------+-----------------+
      

      Here we used 600K memory for a simple query

      MariaDB [test]> select count(*) from information_schema.tables where table_schema="mysql";
      MariaDB [test]> select table_name from information_schema.tables;
      ...
      MariaDB [test]> select MEMORY_USED,MAX_MEMORY_USED from information_schema.processlist where db="test";
      +-------------+-----------------+
      | MEMORY_USED | MAX_MEMORY_USED |
      +-------------+-----------------+
      |       86120 |         5293216 |
      +-------------+-----------------+
      

      Here we used 5M memory for a simple query over 341 tables.

      The reason for the excessive memory used comes from that the temporary table
      created has a very wide record:

      While running:

      select table_name from information_schema.tables;
      

      in gdb:

      (gdb) break handler::ha_write_tmp_row
      (gdb) p table->s->reclength
      $2 = 14829
      

      Two possible ways to fix this:

      1) Extend heap tables to store VARCHAR and BLOB efficiently
      2) In sql_show, change all fields that are not used to be CHAR(1)

      1) is a major tasks and we can't get that done in time for 10.3
      2) will help even if we do 1) as we have less to store.

      This task is to do 2)

      This should not be that hard as information_schema already knows which
      fields are accessed in the query. This is already used to decide if we
      can solve the information_schema access without opening the table.

      This should be done against the bb-10.2-ext tree, which has the new
      MAX_MEMORY_USED column in information_schema.processlist.

      Attachments

        Activity

          musazhang musazhang added a comment -

          Hi, TXSQL from Tencent has work on this issue and improved memory used during query on information_schema. The ideas just like bellow:

          1) Fields used by select and where cond should be collected during query.
          2) When run create_schema_table before instantiate_tmp_table, we reduce fields_info->field_length to 1 to reduce memory used in the SQL.
          3) The method can only be used in non-nested query.

          The attachment is based on bb-10.2-ext tree, would you please have a look on it and give some suggestions ?

          musazhang musazhang added a comment - Hi, TXSQL from Tencent has work on this issue and improved memory used during query on information_schema. The ideas just like bellow: 1) Fields used by select and where cond should be collected during query. 2) When run create_schema_table before instantiate_tmp_table, we reduce fields_info->field_length to 1 to reduce memory used in the SQL. 3) The method can only be used in non-nested query. The attachment is based on bb-10.2-ext tree, would you please have a look on it and give some suggestions ?
          musazhang musazhang added a comment -

          By the way, we didn't deal with "order by" or "group by" and in this case, there may be wrong results when query use "order by", "group by". So, give me some suggestions and I will modify it according your suggestions.

          musazhang musazhang added a comment - By the way, we didn't deal with "order by" or "group by" and in this case, there may be wrong results when query use "order by", "group by". So, give me some suggestions and I will modify it according your suggestions.
          musazhang musazhang added a comment -

          I built a branch named bb-10.2-ext (https://github.com/musazhang/bb-10.2-ext), and commit the change based on it , you can find the change by visiting https://github.com/musazhang/bb-10.2-ext/commit/f1273a78900466e641995ecc01a86a6bda9176e3, thank you.

          musazhang musazhang added a comment - I built a branch named bb-10.2-ext ( https://github.com/musazhang/bb-10.2-ext ), and commit the change based on it , you can find the change by visiting https://github.com/musazhang/bb-10.2-ext/commit/f1273a78900466e641995ecc01a86a6bda9176e3 , thank you.

          Working on review

          monty Michael Widenius added a comment - Working on review

          diff --git a/sql/sql_show.cc b/sql/sql_show.cc
          index ae18e1cac04..99cf4b84ad6 100644
          — a/sql/sql_show.cc
          +++ b/sql/sql_show.cc
          @@ -7759,6 +7759,88 @@ ST_SCHEMA_TABLE *get_schema_table(enum enum_schema_tables schema_table_idx)
          return &schema_tables[schema_table_idx];
          }

          +bool evaluate_schema_field_recursive(Item* item, const char* field_name)
          +{
          + switch(item->type())
          + {
          + case Item::FIELD_ITEM:
          +

          { + Item_field* field= (Item_field *)item; + if (!strcasecmp(field->field_name.str, field_name)) + return true; + else if (!strcasecmp(field->field_name.str, "*")) + return true; + else + return false; + }

          +
          + case Item::FUNC_ITEM:
          + {
          + bool show_field= false;
          + Item_func* func= (Item_func *)item;
          + for (uint i= 0; i < func->argument_count(); i++)
          +

          { + show_field= show_field || + evaluate_schema_field_recursive(func->arguments()[i], + field_name); + if (show_field) + return true; + }

          + return false;
          + }
          +
          + case Item::COND_ITEM:
          + {
          + Item *tmp;
          + bool show_field= false;
          + Item_cond* cond= (Item_cond *)item;
          + List_iterator<Item> it(*(cond->argument_list()));
          + while ((tmp= it++))
          +

          { + show_field= show_field || + evaluate_schema_field_recursive(tmp, field_name); + if (show_field) + return true; + }

          + return false;
          + }
          +
          + default:
          + return false;
          + }
          +}

          The above function doesn't check all possible item types, for example
          SUM_FUNC_ITEM is required to be tested.

          +
          +bool field_can_be_used_in_query(THD* thd, ST_FIELD_INFO *field_info)
          +{
          + if (thd->lex->select_lex.sj_nests.elements > 0 ||
          + thd->lex->select_lex.sj_subselects.elements > 0 ||
          + thd->lex->select_lex.nest_level > 0)
          + return true;
          +
          + reg2 Item *item;
          + List_iterator<Item> it(thd->lex->select_lex.item_list);
          +
          + /* select fields list check */
          + while ((item= it++))
          +

          { + if (evaluate_schema_field_recursive(item, field_info->field_name)) + return true; + }

          +
          + /* select fields where cond check */
          + if (thd->lex->select_lex.where &&
          + evaluate_schema_field_recursive(thd->lex->select_lex.where,
          + field_info->field_name))
          + return true;
          +
          + /* select fields having cond check */
          + if (thd->lex->select_lex.having &&
          + evaluate_schema_field_recursive(thd->lex->select_lex.having,
          + field_info->field_name))
          + return true;
          +
          + return false;
          +}

          The above was quite ok, but it missed a couple of things:

          • Doesn't handle sub queries
          • Doesn't handle multiple tables in a query
          • Doesn't handle ON conditions for multiple tables
          • Doesn't handle SET @innodb_rows_deleted_orig = (SELECT
          • The code goes trough all parts of select for every column,
            which will take some resources if there is many columns used.

          /**
          Create information_schema table using schema_table data.
          @@ -7783,6 +7865,7 @@ ST_SCHEMA_TABLE *get_schema_table(enum enum_schema_tables schema_table_idx)

          TABLE *create_schema_table(THD *thd, TABLE_LIST *table_list)
          {
          + bool show_field= false;
          int field_count= 0;
          Item *item;
          TABLE *table;
          @@ -7869,19 +7952,35 @@ TABLE *create_schema_table(THD *thd, TABLE_LIST *table_list)
          case MYSQL_TYPE_MEDIUM_BLOB:
          case MYSQL_TYPE_LONG_BLOB:
          case MYSQL_TYPE_BLOB:

          • if (!(item= new (mem_root)
          • Item_blob(thd, fields_info->field_name,
          • fields_info->field_length)))
            +
            + show_field= field_can_be_used_in_query(thd, fields_info);
            + if (show_field)
            {
          • DBUG_RETURN(0);
            + if (!(item= new (mem_root)
            + Item_blob(thd, fields_info->field_name,
            + fields_info->field_length)))
            + { + DBUG_RETURN(0); + }
            + }
            + else
            + {
            + if (!(item= new (mem_root)
            + Item_empty_string(thd, "", 1, cs)))
            + { + DBUG_RETURN(0); + }

            + item->set_name(thd, fields_info->field_name,
            + field_name_length, cs);
            }
            break;
            default:
            /* Don't let unimplemented types pass through. Could be a grave error. */
            DBUG_ASSERT(fields_info->field_type == MYSQL_TYPE_STRING);

          + show_field= field_can_be_used_in_query(thd, fields_info);
          if (!(item= new (mem_root)

          • Item_empty_string(thd, "", fields_info->field_length, cs)))
            + Item_empty_string(thd, "", show_field ? fields_info->field_length : 1, cs))) { DBUG_RETURN(0); }

          The above code was ok.

          What was missing in the code:

          • The test suite had not been run (as there was a lot of test failing
            because of generated warnings when the old code tried to write too long
            strings into the shortened fields)

          To solve the issue with the not handled queries, I decide to use a little
          different approach:

          • Create a bitmap for all fields in the information_schema table
          • Use a field processor to mark which fields where used in the query
          • Use the bitmap to decide if a column should be replaced with a short
            string column or not.
          • Ensure that we don't generate warnings when trying to write to shortend
            columns.

          The final patch is attached to this issue.

          Note that even if I decided to use a different approach, having your code
          as a base made my work much faster, so thanks a lot for doing this!

          monty Michael Widenius added a comment - diff --git a/sql/sql_show.cc b/sql/sql_show.cc index ae18e1cac04..99cf4b84ad6 100644 — a/sql/sql_show.cc +++ b/sql/sql_show.cc @@ -7759,6 +7759,88 @@ ST_SCHEMA_TABLE *get_schema_table(enum enum_schema_tables schema_table_idx) return &schema_tables [schema_table_idx] ; } +bool evaluate_schema_field_recursive(Item* item, const char* field_name) +{ + switch(item->type()) + { + case Item::FIELD_ITEM: + { + Item_field* field= (Item_field *)item; + if (!strcasecmp(field->field_name.str, field_name)) + return true; + else if (!strcasecmp(field->field_name.str, "*")) + return true; + else + return false; + } + + case Item::FUNC_ITEM: + { + bool show_field= false; + Item_func* func= (Item_func *)item; + for (uint i= 0; i < func->argument_count(); i++) + { + show_field= show_field || + evaluate_schema_field_recursive(func->arguments()[i], + field_name); + if (show_field) + return true; + } + return false; + } + + case Item::COND_ITEM: + { + Item *tmp; + bool show_field= false; + Item_cond* cond= (Item_cond *)item; + List_iterator<Item> it(*(cond->argument_list())); + while ((tmp= it++)) + { + show_field= show_field || + evaluate_schema_field_recursive(tmp, field_name); + if (show_field) + return true; + } + return false; + } + + default: + return false; + } +} The above function doesn't check all possible item types, for example SUM_FUNC_ITEM is required to be tested. + +bool field_can_be_used_in_query(THD* thd, ST_FIELD_INFO *field_info) +{ + if (thd->lex->select_lex.sj_nests.elements > 0 || + thd->lex->select_lex.sj_subselects.elements > 0 || + thd->lex->select_lex.nest_level > 0) + return true; + + reg2 Item *item; + List_iterator<Item> it(thd->lex->select_lex.item_list); + + /* select fields list check */ + while ((item= it++)) + { + if (evaluate_schema_field_recursive(item, field_info->field_name)) + return true; + } + + /* select fields where cond check */ + if (thd->lex->select_lex.where && + evaluate_schema_field_recursive(thd->lex->select_lex.where, + field_info->field_name)) + return true; + + /* select fields having cond check */ + if (thd->lex->select_lex.having && + evaluate_schema_field_recursive(thd->lex->select_lex.having, + field_info->field_name)) + return true; + + return false; +} The above was quite ok, but it missed a couple of things: Doesn't handle sub queries Doesn't handle multiple tables in a query Doesn't handle ON conditions for multiple tables Doesn't handle SET @innodb_rows_deleted_orig = (SELECT The code goes trough all parts of select for every column, which will take some resources if there is many columns used. /** Create information_schema table using schema_table data. @@ -7783,6 +7865,7 @@ ST_SCHEMA_TABLE *get_schema_table(enum enum_schema_tables schema_table_idx) TABLE *create_schema_table(THD *thd, TABLE_LIST *table_list) { + bool show_field= false; int field_count= 0; Item *item; TABLE *table; @@ -7869,19 +7952,35 @@ TABLE *create_schema_table(THD *thd, TABLE_LIST *table_list) case MYSQL_TYPE_MEDIUM_BLOB: case MYSQL_TYPE_LONG_BLOB: case MYSQL_TYPE_BLOB: if (!(item= new (mem_root) Item_blob(thd, fields_info->field_name, fields_info->field_length))) + + show_field= field_can_be_used_in_query(thd, fields_info); + if (show_field) { DBUG_RETURN(0); + if (!(item= new (mem_root) + Item_blob(thd, fields_info->field_name, + fields_info->field_length))) + { + DBUG_RETURN(0); + } + } + else + { + if (!(item= new (mem_root) + Item_empty_string(thd, "", 1, cs))) + { + DBUG_RETURN(0); + } + item->set_name(thd, fields_info->field_name, + field_name_length, cs); } break; default: /* Don't let unimplemented types pass through. Could be a grave error. */ DBUG_ASSERT(fields_info->field_type == MYSQL_TYPE_STRING); + show_field= field_can_be_used_in_query(thd, fields_info); if (!(item= new (mem_root) Item_empty_string(thd, "", fields_info->field_length, cs))) + Item_empty_string(thd, "", show_field ? fields_info->field_length : 1, cs))) { DBUG_RETURN(0); } The above code was ok. What was missing in the code: The test suite had not been run (as there was a lot of test failing because of generated warnings when the old code tried to write too long strings into the shortened fields) To solve the issue with the not handled queries, I decide to use a little different approach: Create a bitmap for all fields in the information_schema table Use a field processor to mark which fields where used in the query Use the bitmap to decide if a column should be replaced with a short string column or not. Ensure that we don't generate warnings when trying to write to shortend columns. The final patch is attached to this issue. Note that even if I decided to use a different approach, having your code as a base made my work much faster, so thanks a lot for doing this!

          People

            monty Michael Widenius
            monty Michael Widenius
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.