[MDEV-26892] JSON histograms become invalid with a specific (corrupt) value in table Created: 2021-10-24  Updated: 2022-01-19  Resolved: 2021-10-24

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: N/A
Fix Version/s: 10.7.1

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-26519 JSON Histograms: improve histogram co... Closed

 Description   

create or replace table t (a varchar(32)) DEFAULT CHARSET=cp1257;
set histogram_type= JSON_HB, histogram_size= 1;
insert into t values ('foo'),(unhex('9C'));
analyze table t persistent for all;
 
select * from mysql.column_stats where table_name = 't';
 
select * from t;
 
# Cleanup
drop table t;

preview-10.7-MDEV-26519-json-histograms 508f5f3f1

MariaDB [test]> select * from mysql.column_stats where table_name = 't';
+---------+------------+-------------+-----------+-----------+-------------+------------+---------------+-----------+-----------+---------------------------------------------------------------+
| db_name | table_name | column_name | min_value | max_value | nulls_ratio | avg_length | avg_frequency | hist_size | hist_type | histogram                                                     |
+---------+------------+-------------+-----------+-----------+-------------+------------+---------------+-----------+-----------+---------------------------------------------------------------+
| test    | t          | a           | foo       | �          |      0.0000 |     2.0000 |        1.0000 |         0 | JSON_HB   | {
  "histogram_hb_v2": [
    {
      "start": "foo"
    ]
  } |
+---------+------------+-------------+-----------+-----------+-------------+------------+---------------+-----------+-----------+---------------------------------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> 
MariaDB [test]> select * from t;
ERROR 4183 (HY000): Failed to parse histogram: Root JSON element must be a JSON object at offset 0.

The original test didn't insert UNHEX-ed values. I didn't investigate how exactly the symbol ended up in the table, most likely through value truncation in non-strict mode, as we have seen before.

Between the invalid value in the column and the meaningless histogram size, I consider it as much a corner case as it gets. It is still worth investigation though, at least to make sure it is really only limited to histogram size 1.



 Comments   
Comment by Sergei Petrunia [ 2021-10-24 ]

Debugging, I see the problem: Histogram collection code neglects to handle the case where the last value in the table (the end of the last bucket) cannot be converted from its own charset into utf8mb4.

Generated at Thu Feb 08 09:48:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.