Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35958

Cost estimates for materialized derived tables are poor

Details

    Description

      (Filing based on discussion with monty )

      When analyzing the trace MDEV-30877, noted this:

      The derived table has

                      "table_scan": {
                        "rows": 13304563,
                        "cost": 1.33e7
                      }
      

      Compare to the non-derived table in the same query:

                      "table": "es",
                      "table_scan": {
                        "rows": 12840562,
                        "cost": 103969
                      }
      

      The numbers of rows are similar but there's 128x difference in costs.

      The number comes from here:

      double JOIN_TAB::scan_time()
      {
        if (table->is_created())
           ...
        else  
        {   
          found_records= records=table->stat_records();
          read_time= found_records ? (double)found_records: 10.0;// TODO:fix this stub
          res= read_time;
        }   
      

      Note the read_time= found_records assignment...

      Attachments

        Activity

          However, the code was rewritten in 11.0's cost model change:
          scan_time is now called JOIN_TAB::estimate_scan_time(), There one can see:

            if (table->is_created())
            ...
            else
            {
              /*
                The following is same as calling
                TABLE_SHARE::update_optimizer_costs, but without locks
              */
              if (table->s->db_type() == heap_hton)
                memcpy(&table->s->optimizer_costs, &heap_optimizer_costs,
                       sizeof(heap_optimizer_costs));
              else
                memcpy(&table->s->optimizer_costs, &tmp_table_optimizer_costs,
                       sizeof(tmp_table_optimizer_costs));
              file->set_optimizer_costs(thd);
              table->s->optimizer_costs_inited=1;
           
              records= table->stat_records();
              DBUG_ASSERT(table->opt_range_condition_rows == records);
              cost->row_cost= table->file->ha_scan_time(MY_MAX(records, 1000));
              read_time= file->cost(cost->row_cost);
              row_copy_cost= table->s->optimizer_costs.row_copy_cost;
            }
          

          psergei Sergei Petrunia added a comment - However, the code was rewritten in 11.0's cost model change: scan_time is now called JOIN_TAB::estimate_scan_time() , There one can see: if (table->is_created()) ... else { /* The following is same as calling TABLE_SHARE::update_optimizer_costs, but without locks */ if (table->s->db_type() == heap_hton) memcpy (&table->s->optimizer_costs, &heap_optimizer_costs, sizeof (heap_optimizer_costs)); else memcpy (&table->s->optimizer_costs, &tmp_table_optimizer_costs, sizeof (tmp_table_optimizer_costs)); file->set_optimizer_costs(thd); table->s->optimizer_costs_inited=1;   records= table->stat_records(); DBUG_ASSERT(table->opt_range_condition_rows == records); cost->row_cost= table->file->ha_scan_time(MY_MAX(records, 1000)); read_time= file->cost(cost->row_cost); row_copy_cost= table->s->optimizer_costs.row_copy_cost; }

          Should this be closed as wont-fix, because it's fixed in 11.0 ?

          psergei Sergei Petrunia added a comment - Should this be closed as wont-fix, because it's fixed in 11.0 ?

          What is needed in 11.X is to also examine if the table fits in heap or not.
          If it does not fit in heap, we should use the Aria costs.

          monty Michael Widenius added a comment - What is needed in 11.X is to also examine if the table fits in heap or not. If it does not fit in heap, we should use the Aria costs.

          Coding done, will be pushed to 11.8 shortly

          monty Michael Widenius added a comment - Coding done, will be pushed to 11.8 shortly

          Did a separate fix for 11.8 compared to the one we added to 10.11

          monty Michael Widenius added a comment - Did a separate fix for 11.8 compared to the one we added to 10.11

          Pushed this into 10.11:

          commit 43c5d1303f5c7c726db276815c459436110f342f (HEAD -> 10.11, origin/bb-10.11-MDEV-35958-backport, origin/HEAD, origin/10.11, bb-10.11-MDEV-35958-backport)
          Author: Sergei Petrunia <sergey@mariadb.com>
          Date:   Mon Feb 10 14:36:56 2025 +0200
           
              MDEV-35958 Cost estimates for materialized derived tables are poor
              
              Backport of commit 74f70c394456767c1c1b0bee6bf34faff5b72dce to 10.11.
              The new logic is disabled by default, to enable, use
              optimizer_adjust_secondary_key_costs=fix_derived_table_read_cost.
              
              == Original commit comment ==
              Fixed costs in JOIN_TAB::estimate_scan_time() and HEAP
          

          psergei Sergei Petrunia added a comment - Pushed this into 10.11: commit 43c5d1303f5c7c726db276815c459436110f342f (HEAD -> 10.11, origin/bb-10.11-MDEV-35958-backport, origin/HEAD, origin/10.11, bb-10.11-MDEV-35958-backport) Author: Sergei Petrunia <sergey@mariadb.com> Date: Mon Feb 10 14:36:56 2025 +0200   MDEV-35958 Cost estimates for materialized derived tables are poor Backport of commit 74f70c394456767c1c1b0bee6bf34faff5b72dce to 10.11. The new logic is disabled by default, to enable, use optimizer_adjust_secondary_key_costs=fix_derived_table_read_cost. == Original commit comment == Fixed costs in JOIN_TAB::estimate_scan_time() and HEAP

          And this into 11.4:

          commit cd03bf5c532e7276c22827ead69dd5bcfd1242a0 (origin/bb-11.4-MDEV-35958)
          Author: Monty <monty@mariadb.org>
          Date:   Tue Feb 4 21:35:55 2025 +0200
           
              Fixed costs in JOIN_TAB::estimate_scan_time() and HEAP
              
              MDEV-35958 Cost estimates for materialized derived tables are poor
              
              (Backport 11.8->11.4, the same patch)
              
              Estimate_scan_time() calculates the cost of scanning a derivied table.
              The old code did not take into account that the temporary table heap table
              may be converted to Aria.
          

          psergei Sergei Petrunia added a comment - And this into 11.4: commit cd03bf5c532e7276c22827ead69dd5bcfd1242a0 (origin/bb-11.4-MDEV-35958) Author: Monty <monty@mariadb.org> Date: Tue Feb 4 21:35:55 2025 +0200   Fixed costs in JOIN_TAB::estimate_scan_time() and HEAP MDEV-35958 Cost estimates for materialized derived tables are poor (Backport 11.8->11.4, the same patch) Estimate_scan_time() calculates the cost of scanning a derivied table. The old code did not take into account that the temporary table heap table may be converted to Aria.

          People

            monty Michael Widenius
            psergei Sergei Petrunia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.