[MDEV-28710] Replication broken after upgrading to 10.7.4 Created: 2022-05-31  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB, Virtual Columns
Affects Version/s: 10.7.4, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Denis Chernyaev Assignee: Nikita Malyavin
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Master-master replication between Windows Server 2019 Standard and CentOS 8


Issue Links:
Problem/Incident
is caused by MDEV-371 Unique indexes for blobs Closed

 Description   

First we updated MariaDB instance on Windows Server to 10.7.4. For about a week it ran smoothly. Then we upgrade CentOS instance to 10.7.4 and shortly after Windows instance crashed. Every `START SLAVE` on Windows instance leads to a crash. We tried to reset Windows instance with mariadb-init-db and restoring the whole database with a fresh dump. But as soon as the replication starts, the instance crashes with the following error:

2022-05-31 05:23:12 0x6930  InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\row\row0ins.cc line 220
InnoDB: Failing assertion: !cursor->index->is_committed()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
220531  5:23:12 [ERROR] mysqld got exception 0x80000003 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.7.4-MariaDB-log
key_buffer_size=1073741824
read_buffer_size=131072
max_used_connections=1
max_threads=2050
thread_count=5
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5554373 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x28c6d39d238
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
server.dll!my_parameter_handler()[my_init.c:289]
ucrtbase.dll!raise()
ucrtbase.dll!abort()
server.dll!ut_dbg_assertion_failed()[ut0dbg.cc:60]
server.dll!row_ins_sec_index_entry_low()[row0ins.cc:3021]
server.dll!row_ins_sec_index_entry()[row0ins.cc:3232]
server.dll!row_upd_sec_index_entry()[row0upd.cc:2070]
server.dll!row_upd()[row0upd.cc:2820]
server.dll!row_upd_step()[row0upd.cc:2938]
server.dll!row_update_for_mysql()[row0mysql.cc:1702]
server.dll!ha_innobase::update_row()[ha_innodb.cc:8636]
server.dll!handler::ha_update_row()[handler.cc:7599]
server.dll!Update_rows_log_event::do_exec_row()[log_event_server.cc:8438]
server.dll!Rows_log_event::do_apply_event()[log_event_server.cc:5728]
server.dll!apply_event_and_update_pos_apply()[slave.cc:3881]
server.dll!exec_relay_log_event()[slave.cc:4370]
server.dll!handle_slave_sql()[slave.cc:5551]
server.dll!pfs_spawn_thread()[pfs.cc:2204]
server.dll!pthread_start()[my_winthread.c:62]
ucrtbase.dll!_o_exp()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): 
Connection ID (thread ID): 7
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.



 Comments   
Comment by Denis Chernyaev [ 2022-05-31 ]

Actually the replication process goes a little further each time, but keeps crashing.

Comment by Daniel Black [ 2022-06-01 ]

It it always crashing on the same table? If so can you share its SHOW CREATE TABLE structure?

I assume the restore was a SQL based restore (rather than a physical mariabackup).

What replication settings are you using? Are the binary log events from Centos8 on 10.7.4 or your previous version?

Can you see what type of updates are occurring in the binary log?

What was your previous version?

Comment by Denis Chernyaev [ 2022-06-01 ]

Thank you for your reply.

Update: after numerous crashes and restarts the Windows instance managed to catch up with CentOS and is running smoothly for roughly an hour.

There was one particular table that failed to restore from mysqldump due to a duplicate value in UNIQUE because of trailing spaces. This table was fixed on CentOS and a fresh mysqldump was taken.
>> Can you see what type of updates are occurring in the binary log?
If I use SHOW BINLOG EVENTS with parameters taken from `Slave SQL thread initialized, starting replication in log...` I can see it always crashes on this query which affects that same table

Log_name	Pos	Event_type	Server_id	End_log_pos	Info
dallas1-bin.008379	443387055	Gtid	4	443387097	BEGIN GTID 0-4-8221117647
dallas1-bin.008379	443387097	Annotate_rows	4	443387193	UPDATE users_referrers SET `counter` = `counter` + ( 1 ) WHERE `id` = '4'
dallas1-bin.008379	443387193	Table_map	4	443387256	table_id: 1189 (test.users_referrers)
dallas1-bin.008379	443387256	Update_rows_v1	4	443387374	table_id: 1189 flags: STMT_END_F

The id field in WHERE may vary. The `counter` value is equal in both databases.

SHOW CREATE TABLE users_referrers 
 
CREATE TABLE `users_referrers` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `content` varchar(8096) NOT NULL,
  `counter` int(11) DEFAULT 0,
  PRIMARY KEY (`id`),
  UNIQUE KEY `content` (`content`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=20157 DEFAULT CHARSET=cp1251

Yes it was an SQL-based `mysqldump --single-transaction --master-data=2 --opt --all-databases` result, which was taken from CentOS after the upgrade.

The previous version was 10.7.3.

Comment by Denis Chernyaev [ 2022-06-01 ]

Ok so the issue seems to be worked around by reducing `content` length to 3072 and recreating the corresponding index using BTREE algo. The instance stopped crashing.

Comment by Daniel Black [ 2022-06-01 ]

Thanks. Glad you have a workaround.

Comment by Marko Mäkelä [ 2022-06-03 ]

I believe that this bug was caused by MDEV-371. If you check the contents of the view INFORMATION_SCHEMA.INNODB_SYS_VIRTUAL, you should find a hidden virtual column that was created in order to implement the unique index.

Unfortunately, there exist numerous bugs in the implementation of virtual columns in MariaDB. I am reassigning this to the current maintainer of that subsystem.

Generated at Thu Feb 08 10:02:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.