Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
MDEV-40032: VARCHAR->BLOB Promotion for HEAP Internal Temporary Tables
Depends on MDEV-38975 (HEAP engine BLOB/TEXT/JSON/GEOMETRY support).
Motivation
HEAP uses fixed-width rows where every VARCHAR field occupies its full declared octet_length per row regardless of actual data length. For wide VARCHAR columns (common in INFORMATION_SCHEMA views, JSON-heavy schemas, and many user tables), this wastes significant memory. To address this, VARCHAR fields whose octet_length > HEAP_CONVERT_IF_BIGGER_TO_BLOB (32 bytes) are automatically promoted to BLOB when the temporary table uses the HEAP engine, storing only the actual data in continuation chains (from MDEV-38975) instead of reserving the full declared width in every row. This is the same promotion mechanism that type_handler_for_tmp_table() applies at CONVERT_IF_BIGGER_TO_BLOB (512 characters), just at a lower byte-based threshold for HEAP.
Changes – SQL Layer
start() / pick_engine() (VARCHAR->BLOB promotion):
- Extracted pick_engine() from choose_engine() and calls it early in start() to set m_heap_expected. When the temp table is predicted to use HEAP and a VARCHAR field's octet_length > HEAP_CONVERT_IF_BIGGER_TO_BLOB (32 bytes), the field is promoted to BLOB. The promotion is gated by Tmp_field_param::is_heap_engine() (a bool member on Tmp_field_param), and the threshold HEAP_CONVERT_IF_BIGGER_TO_BLOB is used directly in the comparison – not passed as a parameter. The promotion logic lives in varstring_type_handler() and blob_type_handler() (both in sql/sql_type.cc). For I_S tables, the same promotion is applied directly in add_schema_fields() using the field definition's octet length.
derived_with_keys ref access for BLOB columns on HEAP:
The promotion initially broke ref access on materialized derived tables because multiple layers rejected BLOB key columns. The fix bypasses the SQL-layer key buffer for BLOB key parts entirely:
- sql/sql_select.h: New heap_store_key_blob_ref store_key subclass – writes lookup values directly into record[0]'s Field_blob via item->save_in_field(), bypassing the key buffer. hp_make_key(record[0]) then builds the HEAP key with BLOB data already in place.
- sql/sql_select.cc: add_key_field() accepts all BLOB columns on materialized derived tables (no size restriction). get_store_key() / create_ref_for_key() create heap_store_key_blob_ref when key_part->length == 0 && HA_BLOB_PART && heap_hton. Early-const optimization skipped for HEAP BLOB parts (record[0] overwritten during materialization). Ref cache disabled (key buffer lacks BLOB data for memcmp).
- sql/table.cc: add_tmp_key() leaves BLOB key_part->length = 0 and store_length as metadata-only (2-3 bytes).
This gives all BLOB sizes including LONGBLOB/JSON (field_length > UINT16_MAX) ref access on HEAP derived tables. The uint16 KEY_PART_INFO::length limit no longer applies since the key buffer is bypassed.
create_ref_for_key() – memory leak fix:
- When a constant key value is copied into the ref buffer during JOIN optimization, a temporary store_key_item is created on the stack. If the key field is a promoted Field_blob, store_field() allocates into the field's member String value via value.copy(). The default destructor does not call cleanup(), so the buffer is never freed. Fix: explicit tmp.cleanup() after tmp.copy(thd) to release the Field_blob::value allocation.
create_tmp_field_from_item_field() – default value and flag propagation:
- The HEAP blob path creates Field_blob_key via make_and_init_table_field() which does not copy default_value or NO_DEFAULT_VALUE_FLAG from the source field. Fix: copy both from the source field after creation.
- Item_type_holder::create_tmp_field_ex(): set NO_DEFAULT_VALUE_FLAG for NOT NULL blob columns created via the HEAP/unique-key blob path.
Changes – Test Changes
New Tests
- main/sj_mat_blob.test: Semi-join materialization with promoted blob columns – verifies that group_concat and substring subquery results are correctly matched via materialization hash lookup when the temp table's key columns are promoted from VARCHAR to BLOB.
- heap/heap_blob_derived_keys.test (additions): LONGTEXT, JSON, mixed LONGTEXT+INT, and const BLOB ref access test cases for derived_with_keys via heap_store_key_blob_ref.
- hp_test_key_setup-t.cc (addition): derived_blob_key_longblob – LONGBLOB (packlength=4) with metadata-only store_length, verifying heap_prepare_hp_create_info handles zero-length BLOB key parts correctly.
- main/heap_blob_default.test: Four TEXT default/null combinations (NOT NULL no-default, NOT NULL with-default, nullable explicit, nullable implicit) + expression-promoted VARCHAR + I_S column metadata check. Verifies NO_DEFAULT_VALUE_FLAG propagation and default_value copying through the HEAP blob promotion path.
Modified Tests
- main/derived_view.result: Updated for heap_store_key_blob_ref – key_len and ref column changes for BLOB key parts on derived tables (metadata-only key_len, blob_ref in ref column). Promoted VARCHAR columns lose ref access (BLOB key format differs), reverting to BNL join in some cases.
- sysschema/v_schema_redundant_indexes.result: Additional changes from promotion --sql_drop_index changes from varchar(223) to longtext; table_schema, table_name, redundant_index_name, dominant_index_name change from varchar(64) to longtext.
- main/information_schema.test: Added ORDER BY inside group_concat() for deterministic output – promoted columns change row ordering within groups.
- main/information_schema.result, main/information_schema_parameters.result, main/information_schema_routines.result: Additional updates for VARCHAR->BLOB promotion: I_S fields with octet_length > 32 bytes (e.g. CHARACTER_SET_NAME varchar(32) in utf8mb3 = 96 bytes) now appear as longtext.
- main/show_check.result: Re-recorded for VARCHAR->BLOB promotion in I_S STATISTICS columns – field type codes change from varchar (253) to blob (252) and octet lengths adjust accordingly.
- main/create.result: Re-recorded for VARCHAR->BLOB promotion (I_S COMMAND varchar(16)
> longtext, etc.). Reverted varchar(171)>text change – non-HEAP CTAS no longer promotes (promotion is HEAP-only). - funcs_1/is_columns.result, funcs_1/is_routines.result, funcs_1/is_triggers.result, funcs_1/is_views.result: Additional updates for VARCHAR->BLOB promotion of I_S fields with octet_length > 32 bytes.
- funcs_1/processlist_priv_ps.result, funcs_1/processlist_val_ps.result: Additional updates for VARCHAR->BLOB promotion.
- funcs_1/processlist_priv_no_prot.result, funcs_1/processlist_val_no_prot.result: Additional updates for VARCHAR->BLOB promotion.
- spider/bugfix/mdev_33538.result: Updated for VARCHAR->BLOB promotion – Spider's SPIDER_ALLOC_MEM I_S table's FUNC_NAME and FILE_NAME columns (varchar(64) in utf8mb3, octet_length = 192 > 32) now appear as longtext.
- main/userstat.result: Updated for VARCHAR->BLOB promotion in userstat I_S tables (CLIENT, USER, TABLE_SCHEMA, TABLE_NAME, INDEX_NAME columns).
- main/mysqldump.result: Updated for VARCHAR->BLOB promotion in INFORMATION_SCHEMA.TABLES (TABLE_CATALOG, TABLE_SCHEMA, etc.).
- main/gis.result: Updated for VARCHAR->BLOB promotion in GEOMETRY_COLUMNS and SPATIAL_REF_SYS I_S tables.
- main/sp-anchor-type.result: Updated for VARCHAR->BLOB promotion --tables_table_name from I_S TABLES (varchar(64) in utf8mb3) now appears as longtext in CTAS from I_S.
- main/subselect_mat.result: Updated query plan – subquery materialization strategy changed when I_S columns are promoted to BLOB (DuplicateWeedout instead of materialization).
- main/ctype_utf8_uca.result, main/ctype_utf8mb4_uca.result: Updated wire protocol metadata for COLLATION_NAME from I_S – type 253->252, length reflects LONGBLOB max.
- main/partition_error.result: Updated for partition comment column promotion – full 1024-char comments now displayed instead of truncated at 80 chars (comment column in I_S.PARTITIONS promoted to BLOB).
- main/thread_pool_info.result: Updated for VARCHAR->BLOB promotion in thread pool I_S tables.
- main/subselect_sj_mat.result: Updated for VARCHAR->BLOB promotion affecting semi-join materialization plans.
- compat/oracle/sp.result: Updated for I_S column promotion.
- rpl/rpl_empty_string_is_null.result: Updated for I_S column promotion.
- tests/mysql_client_test.c: Updated test_explain_bug() --SHOW COLUMNS Field and Extra columns now report as MYSQL_TYPE_BLOB (promoted from VARCHAR in I_S temp table).
- funcs_1/is_character_sets.result, funcs_1/is_coll_char_set_appl.result, funcs_1/is_collations.result, funcs_1/is_column_privileges.result, funcs_1/is_columns_is.result, funcs_1/is_engines.result, funcs_1/is_key_column_usage.result, funcs_1/is_schema_privileges.result, funcs_1/is_schemata.result, funcs_1/is_statistics.result, funcs_1/is_table_constraints.result, funcs_1/is_table_privileges.result, funcs_1/is_tables.result, funcs_1/is_user_privileges.result: Updated for VARCHAR->BLOB promotion in I_S columns.
- innodb/innodb-mdev-7408.result, innodb/innodb-mdev-7408.test, innodb/innodb_i_s_innodb_trx.result, innodb/innodb_information_schema.result: Updated for I_S column promotion.
- innodb_i_s/.result* (buffer_page, buffer_page_lru, cmp_per_index, cmp_per_index_reset, ft_config, ft_default_stopword, ft_index_cache, ft_index_table, lock_waits, locks, metrics, sys_columns, sys_fields, sys_foreign, sys_foreign_cols, sys_indexes, sys_tables, sys_tablespaces, sys_tablestats, tablespaces_encryption, trx): Updated for I_S column promotion.
- sysschema/.result* (v_host_summary, v_host_summary_by_file_io, v_host_summary_by_file_io_type, v_host_summary_by_stages, v_host_summary_by_statement_latency, v_host_summary_by_statement_type, v_innodb_buffer_stats_by_schema, v_innodb_buffer_stats_by_table, v_innodb_lock_waits, v_io_by_thread_by_latency, v_io_global_by_wait_by_bytes, v_io_global_by_wait_by_latency, v_latest_file_io, v_memory_by_host_by_current_bytes, v_memory_by_thread_by_current_bytes, v_memory_by_user_by_current_bytes, v_metrics, v_processlist, v_schema_auto_increment_columns, v_schema_object_overview, v_schema_table_lock_waits, v_session, v_user_summary, v_user_summary_by_file_io, v_user_summary_by_file_io_type, v_user_summary_by_stages, v_user_summary_by_statement_latency, v_user_summary_by_statement_type, v_wait_classes_global_by_avg_latency, v_wait_classes_global_by_latency, v_waits_by_host_by_latency, v_waits_by_user_by_latency): Updated for I_S column promotion.
- sysschema/v_schema_object_overview.result: Additional change beyond I_S column promotion – NOT NULL db column's Default changed from blank to NULL in DESCRIBE because NO_DEFAULT_VALUE_FLAG is now set for NOT NULL blob columns in UNION type resolution (Item_type_holder::create_tmp_field_ex()).
- plugin/disks/disks.result, plugin/query_response_time/basic.result, plugin/user_variables/basic.result: Updated for I_S column promotion.
- rocksdb/i_s.result, rocksdb/i_s_deadlock.result: Updated for VARCHAR->BLOB promotion in RocksDB I_S tables --varchar(193), varchar(513), varchar(800) columns promoted to longtext (all have octet_length > 32 in utf8mb3).
- main/ctype_utf8.result, main/ctype_ucs.result, main/ctype_latin1.result, main/ctype_binary.result, main/ctype_cp1251.result: Updated for view column type promotion --concat(a) in views shows longtext/longblob instead of varchar(N)/varbinary(N) because SHOW COLUMNS FROM <view> resolves column types via the I_S temp table (HEAP), where the promotion applies to expressions.
- main/derived.result: Updated for query plan improvement – BNL join -> ref access on derived table. Blob columns in derived tables now get KEY_OPTIMIZE_EQ enabling derived_with_keys ref access.
- main/information_schema.result: Additional re-record for SHOW CREATE TABLE INFORMATION_SCHEMA.* showing longtext for promoted I_S columns (e.g. CHARACTER_SET_NAME varchar(32) -> longtext).
- sys_vars/tmp_disk_table_size_basic.test, sys_vars/tmp_disk_table_size_basic.result: Increased test data from 1 view to 500 views to trigger ER_RECORD_FILE_FULL – blob promotion makes HEAP rows much smaller so more data is needed to overflow the 16KB limit.
Adaptations for main (13.1)
Originally developed on 10.11, then forward-ported to main (13.1) on top of the base MDEV-38975 forward-port. Cherry-picked as a single commit with conflict resolution and main-specific adaptations.
API Adaptations
- pick_engine() signature: main removed the reclength parameter from pick_engine() (done in the base forward-port). Call site pick_engine(thd, 0) updated to pick_engine(thd).
- KEY non-trivial type: main made struct st_key a C++ class with constructor/destructor. Three new unit test functions using memset(&sql_key, 0, sizeof(sql_key)) changed to KEY sql_key= {} value-initialization.
- Item_type_holder::create_tmp_field_ex(): main uses virtual dispatch via type_handler->type_handler_for_tmp_table(this, param) combined with a dynamic_cast<Type_handler_blob_common*> guard (for xmltype identity preservation and varchar-promotion safety). The is_heap_engine() flag feeds through the type_handler_for_tmp_table() virtual (modified to return blob_key_type_handler() when is_heap_engine()), so the existing dual guard works correctly without additional changes.
Bug Fixes (main-specific)
- HA_UNIQUE_HASH assertion in ha_index_init() (sql/handler.cc): main added DBUG_ASSERT(!(key_flags & HA_UNIQUE_HASH)) to prevent reads on unique hash keys. This is correct for Aria/MyISAM (whose unique hash is a B-tree on a 4-byte checksum, not a readable index), but HEAP's hash index natively supports blob key lookups. The promotion causes more DISTINCT/sj-materialize keys to have blob parts (because VARCHAR is promoted to blob), triggering HA_UNIQUE_HASH unconditionally via m_blobs_count[distinct]. On 10.11 the assertion does not exist. Fix: relaxed the assertion for HEAP (table->s->db_type() == heap_hton).
- I_S collation mismatch in add_schema_fields() (sql/sql_select.cc): the blob promotion path used system_charset_info (utf8mb3_general1400_as_ci on main) instead of def.charset() ? def.charset() : system_charset_info_for_i_s (utf8mb3_general_ci). On 10.11, system_charset_info is utf8mb3_general_ci (same as I_S), so the bug does not manifest. On main, system_charset_info changed to the UCA-1400 collation, causing "Illegal mix of collations" errors when joining user tables (legacy collation) against I_S columns (promoted blob with wrong collation). Fix: use def.charset() ? def.charset() : system_charset_info_for_i_s, matching the non-promoted path in Type_handler_varchar::make_schema_field().
Test Adaptations (main-specific)
- heap/heap_blob_derived_keys.test: Rewrote Test 3 to use self-contained tables instead of joining user tables against INFORMATION_SCHEMA.TABLES – avoids a pre-existing utf8mb4_uca1400_ai_ci vs utf8mb3_general1400_as_ci collation mismatch in I_S joins on main.
- main/subselect_mat.result, main/subselect_sj_mat.result, main/information_schema.result: Re-recorded for promotion changes on main.
- main/optimizer_costs.result: ENGINE varchar(192) -> longtext in SHOW CREATE TABLE information_schema.optimizer_costs (I_S column promotion).
- main/ctype_utf8mb4_0900.result: COLLATION_NAME wire protocol type 253 (varchar) -> 252 (blob), length 256 -> 4294967295 (I_S column promotion).
- main/derived.result: Row estimate changed in derived table ref access EXPLAIN (optimizer cardinality adjustment from blob promotion).
- funcs_1/is_columns_is.result, funcs_1/is_statistics.result, funcs_1/is_triggers.result: Re-recorded – result files were recorded before the I_S collation fix (system_charset_info_for_i_s), so promoted blob fields had utf8mb3_general1400_as_ci instead of utf8mb3_general_ci.
- funcs_1/is_columns_is_embedded.result: Rebuilt from base with promotions – previous re-recording lost 3 trailing columns (IS_SYSTEM_TIME_PERIOD_START, IS_SYSTEM_TIME_PERIOD_END, CREATE_OPTIONS) due to being recorded on a non-embedded build.
- funcs_1/is_triggers_embedded.result: Rebuilt from base with promotions – EVENT_MANIPULATION varchar(20) (60 bytes > 32) promoted to longtext; DATABASE_COLLATION values updated to utf8mb4_uca1400_ai_ci (main default).
- funcs_1/is_collations.result: COMMENT varchar(80) (240 bytes > 32) promoted to longtext.
- funcs_1/is_columns.result: CREATE_OPTIONS varchar(2048) (6144 bytes > 32) promoted to longtext.
- compat/oracle/sp-cursor-decl.result: Cursor-materialized varchar(11) column (33 bytes in utf8mb3 > 32) promoted to longtext in CTAS from cursor.
- heap/blob_update_overflow.result: Created_tmp_disk_tables 2->1, Created_tmp_tables 4->3 – I_S session_status query's temp table now stays in HEAP (wide varchars promoted to blob instead of overflowing to Aria).
- binlog/create_like.result: PROCESSLIST I_S columns (USER varchar(128), HOST varchar(64), etc.) promoted to longtext.
- rpl/rpl_empty_string_is_null.result: I_S collation fix (utf8mb3_general1400_as_ci -> utf8mb3_general_ci).
- sysschema/v_privileges_by_table_by_level.result: TABLE_SCHEMA varchar(64), TABLE_NAME varchar(64), GRANTEE varchar(385), PRIVILEGE varchar(64) promoted to longtext.
- 13 additional result files re-recorded for 10.11-vs-main differences in optimizer plans, I_S column types, and sysschema output.
Attachments
Issue Links
- is blocked by
-
MDEV-38975 BLOBs in MEMORY (HEAP)
-
- In Testing
-
- relates to
-
MDEV-40033 HEAP: free-list contiguous block coalescing
-
- Open
-
- split from
-
MDEV-38975 BLOBs in MEMORY (HEAP)
-
- In Testing
-