[MDEV-26728] Histogram may return selectivity > 1.0 Created: 2021-09-30  Updated: 2022-08-04

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.3

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-26718 Query with: Cross join, Non-indexed c... Closed

 Description   

I've hit this example when investigating MDEV-26718.

Consider this testcase:

--source include/have_sequence.inc
CREATE OR REPLACE TABLE t (
  f2 int
) ENGINE=MyISAM;
 
INSERT INTO t SELECT
  seq%8+1
FROM seq_1_to_128;
 
ANALYZE TABLE t PERSISTENT FOR ALL;

set optimizer_trace=1;
analyze select * from t where f2 != 5;
 
select * from information_schema.optimizer_trace;

The trace will show:

"rows_estimation": [
              {
                "selectivity_for_indexes": [],
                "selectivity_for_columns": [
                  {
                    "column_name": "f2",
                    "ranges": ["NULL < f2 < 5", "5 < f2"],
                    "selectivity_from_histogram": 1.1171875
                  }
                ],
                "cond_selectivity": 1

Note this part:

                    "selectivity_from_histogram": 1.1171875

this is because this call:

          rows= records_in_column_ranges(&param, idx, key);

returns 143 rows, while the table has 128 rows (and the table statistics says so, too).

We can see below that the incorrect selectivity doesn't spread:

                ],
                "cond_selectivity": 1

.

This is thanks to these lines:

          rows= records_in_column_ranges(&param, idx, key);
          if (rows != DBL_MAX)
          {
            key->field->cond_selectivity_= rows/table_records;
            selectivity_for_column.add("selectivity_from_histogram",
                                       key->field->cond_selectivity);
          }
        }
      }
    }
 
    for (Field **field_ptr= table->field; *field_ptr; field_ptr++)
    {
      Field *table_field= *field_ptr;   
      if (bitmap_is_set(used_fields, table_field->field_index) &&
          table_field->cond_selectivity < 1.0)
      {
        if (!bitmap_is_set(&handled_columns, table_field->field_index))
          table->cond_selectivity*= table_field->cond_selectivity;

Note the table_field->cond_selectivity < 1.0 condition.

However, field->cond_selectivity still has the value that's greater than 1, which looks dangerous.



 Comments   
Comment by Michael Widenius [ 2021-09-30 ]

This is already fixed in my 10.7 tree where I am correcting all things related with selectivity calcluations to ensure that selectivity is always between 0 and 1

Generated at Thu Feb 08 09:47:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.