[MDEV-21466] BF-BF X lock conflict, replication fails, cluster unusable - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.4.10
Fix Version/s: N/A
Component/s: Galera, Replication, Storage Engine - InnoDB
Labels:
- need_feedback
Environment:
Linux 4.19.92-gentoo x86_64 AMD EPYC 7451, galera-26.4.3, mariadb-10.4.10

Description

2020-01-13 15:51:03 32 [C Note] InnoDB: BF-BF X lock conflict,mode: 2563 supremum: 0conflicts states: my executing locked committing

2020-01-13 15:51:03 32 [C Note] InnoDB:  SQL1: INSERT INTO `trigger_event` (`contactlist_id`, `date`, `id`) VALUES (102, '2020-01-13 15:50:51', 32) SQL2: NULL

2020-01-13 15:51:47 79 [C ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452

2020-01-13 15:51:47 79 [C Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319

2020-01-13 15:51:47 96 [B ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452

2020-01-13 15:51:47 96 [B Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319

2020-01-13 15:51:58 0 [A Note] WSREP: Flow-control interval: [475, 679]

2020-01-13 15:51:58 0 [A Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2

2020-01-13 15:51:58 0 [A Note] WSREP: Node 28075c64 state prim

2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)

2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)

2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef

2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

2020-01-13 15:51:58 0 [A Note] WSREP: view(view_id(PRIM,28075c64,17) memb

2020-01-13 15:51:58 0 [B Note] WSREP: Flow-control interval: [475, 679]

2020-01-13 15:51:58 0 [B Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2

2020-01-13 15:51:58 0 [B Note] WSREP: Node 28075c64 state prim

2020-01-13 15:51:58 0 [B Note] WSREP: Quorum results:

2020-01-13 15:51:58 0 [B Note] WSREP: save pc into disk

2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)

2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)

2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef

2020-01-13 15:51:58 0 [B Note] WSREP: STATE_EXCHANGE: sent state UUID: 3fc0...51ef

2020-01-13 15:51:58 0 [B Note] WSREP: view(view_id(PRIM,28075c64,17) memb

2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [0, 0]

2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [336, 480]

2020-01-13 15:51:58 0 [C Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

2020-01-13 15:51:58 0 [C Note] WSREP: New SELF-LEAVE.

2020-01-13 15:51:58 0 [C Note] WSREP: Received NON-PRIMARY.

2020-01-13 15:51:58 0 [C Note] WSREP: Received SELF-LEAVE. Closing connection.

For convenience, node number inserted into message type, thus [C Note] was originally '[Note]' logged on node C.

Main problem is that operation with DB31 that was performed on server A made all the other servers unusable and in fact lead to cluster manual restart and full data resynchronization.
Also, real error message is printed with priority 'Note' (not even 'Warning') that makes it unnoticeable while reading logs. Also, nothing was logged on the node that performed the operation (referred as A)

Expected behavior:

As similar behavior was also mentioned in (for example) ~~MDEV-15227~~ and ~~MDEV-15611~~, it would be reasonable to
a) log warning on the node that is performing operation about altering table without PK or issue with Foreign key while working in cluster.
b) reject such an operation on the node that requested it and keep cluster up.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

DB Variables.pdf
84 kB
2020-03-04 10:11
errorlog_vcenturio40.zip
307 kB
2020-02-10 08:55
errorlog_vcenturio41.zip
113 kB
2020-02-10 08:55
insert.sql
44 kB
2020-02-07 10:42
LogFiles.zip
10 kB
2020-02-07 15:20
reset.sql
0.1 kB
2020-02-07 10:42
tablecreate.sql
0.5 kB
2020-02-07 10:42

Issue Links

relates to

MDEV-15227 Inserted data violates foreign key constraint and breaks (Galera and regular async) replication

Closed

MDEV-18317 Mariadb Crashed due to BF-BF X lock conflict

Closed

Activity

Ascending order - Click to sort in descending order

Eugene created issue - 2020-01-13 18:07

Eugene made changes - 2020-01-13 18:07

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-18317~~ [ ~~MDEV-18317~~ ]

Eugene made changes - 2020-01-13 18:07

Link

This issue relates to ~~MDEV-15227~~ [ ~~MDEV-15227~~ ]

Eugene made changes - 2020-01-13 18:13

Description

The issue looks as follows: on Galera setup with 3 nodes (let's call them A, B, C) all having wsrep_slave_threads = 96, operation on the table without Primary Key (referred here as 'DB31.trigger_event') crashed cluster with logging following lines:

{code:java}
2020-01-13 15:51:03 32 [C Note] InnoDB: BF-BF X lock conflict,mode: 2563 supremum: 0conflicts states: my executing locked committing
2020-01-13 15:51:03 32 [C Note] InnoDB: SQL1: INSERT INTO `trigger_event` (`contactlist_id`, `date`, `id`) VALUES (102, '2020-01-13 15:50:51', 32) SQL2: NULL
2020-01-13 15:51:47 79 [C ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452
2020-01-13 15:51:47 79 [C Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319
2020-01-13 15:51:47 96 [B ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452
2020-01-13 15:51:47 96 [B Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319
2020-01-13 15:51:58 0 [A Note] WSREP: Flow-control interval: [475, 679]
2020-01-13 15:51:58 0 [A Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2020-01-13 15:51:58 0 [A Note] WSREP: Node 28075c64 state prim
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2020-01-13 15:51:58 0 [A Note] WSREP: view(view_id(PRIM,28075c64,17) memb
2020-01-13 15:51:58 0 [B Note] WSREP: Flow-control interval: [475, 679]
2020-01-13 15:51:58 0 [B Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2020-01-13 15:51:58 0 [B Note] WSREP: Node 28075c64 state prim
2020-01-13 15:51:58 0 [B Note] WSREP: Quorum results:
2020-01-13 15:51:58 0 [B Note] WSREP: save pc into disk
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef
2020-01-13 15:51:58 0 [B Note] WSREP: STATE_EXCHANGE: sent state UUID: 3fc0...51ef
2020-01-13 15:51:58 0 [B Note] WSREP: view(view_id(PRIM,28075c64,17) memb
2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [0, 0]
2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [336, 480]
2020-01-13 15:51:58 0 [C Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-01-13 15:51:58 0 [C Note] WSREP: New SELF-LEAVE.
2020-01-13 15:51:58 0 [C Note] WSREP: Received NON-PRIMARY.
2020-01-13 15:51:58 0 [C Note] WSREP: Received SELF-LEAVE. Closing connection.
{code}

For convenience, node number inserted into message type, thus [C Note] was originally '[Note]' logged on node C.

Main problem is that operation with DB31 that was performed on server A made all the other servers unusable and in fact lead to cluster manual restart and full data resynchronization.
Also, real error message is printed with priority 'Note' (not even 'Warning') that makes it unnoticeable while reading logs. Also, nothing was logged on the node that performed the operation (referred as A)

*Expected behavior:*

As similar behavior was also mentioned in (for example) ~~MDEV-15227~~ and ~~MDEV-15611~~, it would be reasonable to
a) log warning on the node that is performing operation about altering table without PK while working in cluster
b) reject such an operation on the node that requested it and keep cluster up.

The issue looks as follows: on Galera setup with 3 nodes (let's call them A, B, C) all having wsrep_slave_threads = 96, operation on the table without Primary Key (referred here as 'DB31.trigger_event') crashed cluster with logging following lines:

{code:java}
2020-01-13 15:51:03 32 [C Note] InnoDB: BF-BF X lock conflict,mode: 2563 supremum: 0conflicts states: my executing locked committing
2020-01-13 15:51:03 32 [C Note] InnoDB: SQL1: INSERT INTO `trigger_event` (`contactlist_id`, `date`, `id`) VALUES (102, '2020-01-13 15:50:51', 32) SQL2: NULL
2020-01-13 15:51:47 79 [C ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452
2020-01-13 15:51:47 79 [C Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319
2020-01-13 15:51:47 96 [B ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452
2020-01-13 15:51:47 96 [B Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319
2020-01-13 15:51:58 0 [A Note] WSREP: Flow-control interval: [475, 679]
2020-01-13 15:51:58 0 [A Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2020-01-13 15:51:58 0 [A Note] WSREP: Node 28075c64 state prim
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2020-01-13 15:51:58 0 [A Note] WSREP: view(view_id(PRIM,28075c64,17) memb
2020-01-13 15:51:58 0 [B Note] WSREP: Flow-control interval: [475, 679]
2020-01-13 15:51:58 0 [B Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2020-01-13 15:51:58 0 [B Note] WSREP: Node 28075c64 state prim
2020-01-13 15:51:58 0 [B Note] WSREP: Quorum results:
2020-01-13 15:51:58 0 [B Note] WSREP: save pc into disk
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef
2020-01-13 15:51:58 0 [B Note] WSREP: STATE_EXCHANGE: sent state UUID: 3fc0...51ef
2020-01-13 15:51:58 0 [B Note] WSREP: view(view_id(PRIM,28075c64,17) memb
2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [0, 0]
2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [336, 480]
2020-01-13 15:51:58 0 [C Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-01-13 15:51:58 0 [C Note] WSREP: New SELF-LEAVE.
2020-01-13 15:51:58 0 [C Note] WSREP: Received NON-PRIMARY.
2020-01-13 15:51:58 0 [C Note] WSREP: Received SELF-LEAVE. Closing connection.
{code}

For convenience, node number inserted into message type, thus [C Note] was originally '[Note]' logged on node C.

Main problem is that operation with DB31 that was performed on server A made all the other servers unusable and in fact lead to cluster manual restart and full data resynchronization.
Also, real error message is printed with priority 'Note' (not even 'Warning') that makes it unnoticeable while reading logs. Also, nothing was logged on the node that performed the operation (referred as A)

*Expected behavior:*

As similar behavior was also mentioned in (for example) ~~MDEV-15227~~ and ~~MDEV-15611~~, it would be reasonable to
a) log warning on the node that is performing operation about altering table without PK or issue with Foreign key while working in cluster.
b) reject such an operation on the node that requested it and keep cluster up.

Elena Stepanova made changes - 2020-01-13 18:34

Fix Version/s		10.4 [ 22408 ]
Assignee		Jan Lindström [ jplindst ]

Philipp Jenni added a comment - 2020-02-07 08:51

I have the same issue with the same log entries.

Environment:

mariadb-10.4.10
Galera Setup with 3 Nodes (Multimaster)
Ubuntu 18.04

I can force the problem when i run an sql with many inserts (more than 100) like this:

insert into masterdata (value) values ('XXX');

insert into referencedata (value, masterid) values ('XXX1',1);

insert into referencedata (value, masterid) values ('XXX2',1);

insert into referencedata (value, masterid) values ('XXX3',1);

insert into referencedata (value, masterid) values ('XXX4',1);

insert into masterdata (value) values ('XXX');

insert into referencedata (value, masterid) values ('XXX1',4);

insert into referencedata (value, masterid) values ('XXX2',4);

insert into referencedata (value, masterid) values ('XXX3',4);

insert into referencedata (value, masterid) values ('XXX4',4);

insert into masterdata (value) values ('XXX');

insert into referencedata (value, masterid) values ('XXX1',7);

insert into referencedata (value, masterid) values ('XXX2',7);

insert into referencedata (value, masterid) values ('XXX3',7);

insert into referencedata (value, masterid) values ('XXX4',7);

This is the table definitions for my insert

CREATE TABLE `MasterData` (

	`id` INT(11) NOT NULL AUTO_INCREMENT,

	`value` VARCHAR(50) NOT NULL DEFAULT '0',

	PRIMARY KEY (`id`)

);

CREATE TABLE `ReferenceData` (

	`id` INT(11) NOT NULL AUTO_INCREMENT,

	`value` VARCHAR(50) NULL DEFAULT NULL,

	`masterId` INT(11) NOT NULL,

	PRIMARY KEY (`id`),

	INDEX `FK_ReferenceData_MASTER` (`masterId`),

	CONSTRAINT `FK_ReferenceData_MASTER` FOREIGN KEY (`masterId`) REFERENCES `MasterData` (`id`) ON DELETE CASCADE

);

Philipp Jenni added a comment - 2020-02-07 08:51 I have the same issue with the same log entries. Environment: mariadb-10.4.10 Galera Setup with 3 Nodes (Multimaster) Ubuntu 18.04 I can force the problem when i run an sql with many inserts (more than 100) like this: insert into masterdata (value) values ( 'XXX' ); insert into referencedata (value, masterid) values ( 'XXX1' ,1); insert into referencedata (value, masterid) values ( 'XXX2' ,1); insert into referencedata (value, masterid) values ( 'XXX3' ,1); insert into referencedata (value, masterid) values ( 'XXX4' ,1); insert into masterdata (value) values ( 'XXX' ); insert into referencedata (value, masterid) values ( 'XXX1' ,4); insert into referencedata (value, masterid) values ( 'XXX2' ,4); insert into referencedata (value, masterid) values ( 'XXX3' ,4); insert into referencedata (value, masterid) values ( 'XXX4' ,4); insert into masterdata (value) values ( 'XXX' ); insert into referencedata (value, masterid) values ( 'XXX1' ,7); insert into referencedata (value, masterid) values ( 'XXX2' ,7); insert into referencedata (value, masterid) values ( 'XXX3' ,7); insert into referencedata (value, masterid) values ( 'XXX4' ,7); This is the table definitions for my insert CREATE TABLE `MasterData` ( `id` INT (11) NOT NULL AUTO_INCREMENT, `value` VARCHAR (50) NOT NULL DEFAULT '0' , PRIMARY KEY (`id`) ); CREATE TABLE `ReferenceData` ( `id` INT (11) NOT NULL AUTO_INCREMENT, `value` VARCHAR (50) NULL DEFAULT NULL , `masterId` INT (11) NOT NULL , PRIMARY KEY (`id`), INDEX `FK_ReferenceData_MASTER` (`masterId`), CONSTRAINT `FK_ReferenceData_MASTER` FOREIGN KEY (`masterId`) REFERENCES `MasterData` (`id`) ON DELETE CASCADE );

Jan Lindström (Inactive) added a comment - 2020-02-07 09:08 - edited

Do you have GRA_* files? Can you provide them? Can you explain more detailed way how to repeat?

Do you run those inserts to all nodes ?
If you have some scripts you use can you provide them or step by step instructions?
Can you provide cluster configuration ?

Jan Lindström (Inactive) added a comment - 2020-02-07 09:08 - edited Do you have GRA_* files? Can you provide them? Can you explain more detailed way how to repeat? Do you run those inserts to all nodes ? If you have some scripts you use can you provide them or step by step instructions? Can you provide cluster configuration ?

Philipp Jenni made changes - 2020-02-07 10:42

Attachment		insert.sql [ 50435 ]
Attachment		reset.sql [ 50436 ]
Attachment		tablecreate.sql [ 50437 ]

Philipp Jenni added a comment - 2020-02-07 10:45 - edited

Where i can find gra_* files? How i can generate them? If i have anyone, i can provide this.

The insert is running only at 1 node
I have attached my scripts insert.sql reset.sql tablecreate.sql
Here my Config. It is identical on each of the 3 nodes:

galera.cnf

[mysqld]

binlog_format=ROW

default-storage-engine=innodb

innodb_autoinc_lock_mode=2

bind-address=0.0.0.0

# Galera Provider Configuration

wsrep_on=ON

wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration

wsrep_cluster_name="exanicMariaDb"

wsrep_cluster_address="gcomm://192.168.210.60,192.168.210.61,192.168.210.62"

# Galera Synchronization Configuration

wsrep_sst_method=rsync

# Galera Node Configuration

wsrep_node_address="192.168.210.60"

wsrep_node_name="vcenturio40"

# Performance Testing

wsrep_slave_threads=4

wsrep_provider_options="gcache.size=1G;gcs.fc_limit=256;gcs.fc_factor=0.99"

my.cnf

# MariaDB database server configuration file.

# You can copy this file to one of:

# - "/etc/mysql/my.cnf" to set global options,

# - "~/.my.cnf" to set user-specific options.

# One can use all long options that the program supports.

# Run program with --help to get a list of available options and with

# --print-defaults to see which it would actually understand and use.

# For explanations see

# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

# This will be passed to all mysql clients

# It has been reported that passwords should be enclosed with ticks/quotes

# escpecially if they contain "#" chars...

# Remember to edit /etc/mysql/debian.cnf when changing the socket location.

[client]

port		= 3306

socket		= /var/run/mysqld/mysqld.sock

# Here is entries for some specific programs

# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.

[mysqld_safe]

socket		= /var/run/mysqld/mysqld.sock

nice		= 0

[mysqld]

# * Basic Settings

user		= mysql

pid-file	= /var/run/mysqld/mysqld.pid

socket		= /var/run/mysqld/mysqld.sock

port		= 3306

basedir		= /usr

datadir		= /srv/mariadb-data/data

tmpdir		= /tmp

lc_messages_dir	= /usr/share/mysql

lc_messages	= en_US

skip-external-locking

# Performance Test

# skip-name-resolve

# Instead of skip-networking the default is now to listen only on

# localhost which is more compatible and is not less secure.

bind-address		= 127.0.0.1

# * Fine Tuning

# max_connections = 200

max_connections		= 1000

connect_timeout		= 5

# wait_timeout = 600

wait_timeout		= 60

max_allowed_packet	= 16M

thread_cache_size       = 128

sort_buffer_size	= 4M

bulk_insert_buffer_size	= 16M

tmp_table_size		= 32M

max_heap_table_size	= 32M

# * MyISAM

# This replaces the startup script and checks MyISAM tables if needed

# the first time they are touched. On error, make copy and try a repair.

myisam_recover_options = BACKUP

key_buffer_size		= 128M

# open-files-limit = 2000

table_open_cache	= 400

myisam_sort_buffer_size	= 512M

concurrent_insert	= 2

read_buffer_size	= 2M

read_rnd_buffer_size	= 1M

# * Query Cache Configuration

# Cache only tiny result sets, so we can fit more in the query cache.

query_cache_limit		= 128K

query_cache_size		= 64M

# for more write intensive setups, set to DEMAND or OFF

# query_cache_type = DEMAND

# * Logging and Replication

# Both location gets rotated by the cronjob.

# Be aware that this log type is a performance killer.

# As of 5.1 you can enable the log at runtime!

# general_log_file = /var/log/mysql/mysql.log

# general_log = 1

# Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.

# we do want to know about network errors and such

log_warnings		= 2

# Enable the slow query log to see queries with especially long duration

#slow_query_log[={0|1}]

slow_query_log_file	= /var/log/mysql/mariadb-slow.log

long_query_time = 10

# log_slow_rate_limit = 1000

log_slow_verbosity	= query_plan

log_error = /var/log/mysql/mariadb.err

#log-queries-not-using-indexes

#log_slow_admin_statements

# The following can be used as easy to replay backup logs or for replication.

# note: if you are setting up a replication slave, see README.Debian about

#       other settings you may need to change.

# server-id = 1

# report_host = master1

# auto_increment_increment = 2

# auto_increment_offset = 1

log_bin			= /var/log/mysql/mariadb-bin

log_bin_index		= /var/log/mysql/mariadb-bin.index

# not fab for performance, but safer

# sync_binlog = 1

expire_logs_days	= 10

max_binlog_size         = 100M

# slaves

# relay_log = /var/log/mysql/relay-bin

# relay_log_index = /var/log/mysql/relay-bin.index

# relay_log_info_file = /var/log/mysql/relay-bin.info

#log_slave_updates

#read_only

# If applications support it, this stricter sql_mode prevents some

# mistakes like inserting invalid dates etc.

# sql_mode = NO_ENGINE_SUBSTITUTION,TRADITIONAL

# * InnoDB

# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.

# Read the manual for more InnoDB related options. There are many!

default_storage_engine	= InnoDB

# you can't just change log file size, requires special procedure

# innodb_log_file_size = 50M

innodb_buffer_pool_size	= 256M

innodb_log_buffer_size	= 8M

innodb_file_per_table	= 1

innodb_open_files	= 400

innodb_io_capacity	= 400

innodb_flush_method	= O_DIRECT

# * Security Features

# Read the manual, too, if you want chroot!

# chroot = /var/lib/mysql/

# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".

# ssl-ca = /etc/mysql/cacert.pem

# ssl-cert = /etc/mysql/server-cert.pem

# ssl-key = /etc/mysql/server-key.pem

# * Galera-related settings

[galera]

# Mandatory settings

# wsrep_on = ON

#wsrep_provider=

#wsrep_cluster_address=

# binlog_format = row

# default_storage_engine = InnoDB

# innodb_autoinc_lock_mode = 2

# Allow server to accept connections on all interfaces.

# bind-address = 0.0.0.0

# Optional setting

# wsrep_slave_threads = 1

# innodb_flush_log_at_trx_commit = 0

[mysqldump]

quick

quote-names

max_allowed_packet	= 16M

[mysql]

#no-auto-rehash	# faster start of mysql but no tab completion

[isamchk]

key_buffer		= 16M

# * IMPORTANT: Additional settings that can override those from this file!

#   The files must end with '.cnf', otherwise they'll be ignored.

!include /etc/mysql/mariadb.cnf

!includedir /etc/mysql/conf.d/

Philipp Jenni added a comment - 2020-02-07 10:45 - edited Where i can find gra_* files? How i can generate them? If i have anyone, i can provide this. The insert is running only at 1 node I have attached my scripts insert.sql reset.sql tablecreate.sql Here my Config. It is identical on each of the 3 nodes: galera.cnf [mysqld] binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 bind-address=0.0.0.0 # Galera Provider Configuration wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so # Galera Cluster Configuration wsrep_cluster_name="exanicMariaDb" wsrep_cluster_address="gcomm://192.168.210.60,192.168.210.61,192.168.210.62" # Galera Synchronization Configuration wsrep_sst_method=rsync # Galera Node Configuration wsrep_node_address="192.168.210.60" wsrep_node_name="vcenturio40" # Performance Testing wsrep_slave_threads=4 wsrep_provider_options="gcache.size=1G;gcs.fc_limit=256;gcs.fc_factor=0.99" my.cnf # MariaDB database server configuration file. # # You can copy this file to one of: # - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. # # One can use all long options that the program supports. # Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. # # For explanations see # http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients # It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars... # Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client] port = 3306 socket = /var/run/mysqld/mysqld.sock # Here is entries for some specific programs # The following values assume you have at least 32M ram # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /srv/mariadb-data/data tmpdir = /tmp lc_messages_dir = /usr/share/mysql lc_messages = en_US skip-external-locking # Performance Test # skip-name-resolve # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. bind-address = 127.0.0.1 # # * Fine Tuning # # max_connections = 200 max_connections = 1000 connect_timeout = 5 # wait_timeout = 600 wait_timeout = 60 max_allowed_packet = 16M thread_cache_size = 128 sort_buffer_size = 4M bulk_insert_buffer_size = 16M tmp_table_size = 32M max_heap_table_size = 32M # # * MyISAM # # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched. On error, make copy and try a repair. myisam_recover_options = BACKUP key_buffer_size = 128M # open-files-limit = 2000 table_open_cache = 400 myisam_sort_buffer_size = 512M concurrent_insert = 2 read_buffer_size = 2M read_rnd_buffer_size = 1M # # * Query Cache Configuration # # Cache only tiny result sets, so we can fit more in the query cache. query_cache_limit = 128K query_cache_size = 64M # for more write intensive setups, set to DEMAND or OFF # query_cache_type = DEMAND # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. # As of 5.1 you can enable the log at runtime! # general_log_file = /var/log/mysql/mysql.log # general_log = 1 # # Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf. # # we do want to know about network errors and such log_warnings = 2 # # Enable the slow query log to see queries with especially long duration #slow_query_log[={0|1}] slow_query_log_file = /var/log/mysql/mariadb-slow.log long_query_time = 10 # log_slow_rate_limit = 1000 log_slow_verbosity = query_plan log_error = /var/log/mysql/mariadb.err #log-queries-not-using-indexes #log_slow_admin_statements # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. # server-id = 1 # report_host = master1 # auto_increment_increment = 2 # auto_increment_offset = 1 log_bin = /var/log/mysql/mariadb-bin log_bin_index = /var/log/mysql/mariadb-bin.index # not fab for performance, but safer # sync_binlog = 1 expire_logs_days = 10 max_binlog_size = 100M # slaves # relay_log = /var/log/mysql/relay-bin # relay_log_index = /var/log/mysql/relay-bin.index # relay_log_info_file = /var/log/mysql/relay-bin.info #log_slave_updates #read_only # # If applications support it, this stricter sql_mode prevents some # mistakes like inserting invalid dates etc. # sql_mode = NO_ENGINE_SUBSTITUTION,TRADITIONAL # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! default_storage_engine = InnoDB # you can't just change log file size, requires special procedure # innodb_log_file_size = 50M innodb_buffer_pool_size = 256M innodb_log_buffer_size = 8M innodb_file_per_table = 1 innodb_open_files = 400 innodb_io_capacity = 400 innodb_flush_method = O_DIRECT # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca = /etc/mysql/cacert.pem # ssl-cert = /etc/mysql/server-cert.pem # ssl-key = /etc/mysql/server-key.pem # # * Galera-related settings # [galera] # Mandatory settings # wsrep_on = ON #wsrep_provider= #wsrep_cluster_address= # binlog_format = row # default_storage_engine = InnoDB # innodb_autoinc_lock_mode = 2 # # Allow server to accept connections on all interfaces. # # bind-address = 0.0.0.0 # # Optional setting # wsrep_slave_threads = 1 # innodb_flush_log_at_trx_commit = 0 [mysqldump] quick quote-names max_allowed_packet = 16M [mysql] #no-auto-rehash # faster start of mysql but no tab completion [isamchk] key_buffer = 16M # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !include /etc/mysql/mariadb.cnf !includedir /etc/mysql/conf.d/

Philipp Jenni added a comment - 2020-02-07 10:48

@Jan Lindström: If you would, we can do an remote debugging session. My system is currently not really productive.

Philipp Jenni added a comment - 2020-02-07 10:48 @Jan Lindström: If you would, we can do an remote debugging session. My system is currently not really productive.

Jan Lindström (Inactive) made changes - 2020-02-07 11:04

Assignee

Jan Lindström [ jplindst ]

Seppo Jaakola [ seppo ]

Jan Lindström (Inactive) added a comment - 2020-02-07 14:37

GRA files should appear on datadir.

Jan Lindström (Inactive) added a comment - 2020-02-07 14:37 GRA files should appear on datadir.

Jan Lindström (Inactive) made changes - 2020-02-07 14:39

Labels

need_feedback

Philipp Jenni added a comment - 2020-02-07 15:20

OK... I found it ...

VCENTURIO40 is the server where the sql is executed.
VCENTURIO41 is the server which is going out of sync

Philipp Jenni added a comment - 2020-02-07 15:20 OK... I found it ... VCENTURIO40 is the server where the sql is executed. VCENTURIO41 is the server which is going out of sync

Philipp Jenni made changes - 2020-02-07 15:20

Attachment

LogFiles.zip [ 50441 ]

Jan Lindström (Inactive) added a comment - 2020-02-10 06:18

There is DDL on those GRA files i.e. CREATE/DROP USER, TRUNCATE etc, can you provide error logs from VCENTURIO40 and VCENTURIO41 ?

Jan Lindström (Inactive) added a comment - 2020-02-10 06:18 There is DDL on those GRA files i.e. CREATE/DROP USER, TRUNCATE etc, can you provide error logs from VCENTURIO40 and VCENTURIO41 ?

Philipp Jenni made changes - 2020-02-10 08:55

Attachment		errorlog_vcenturio40.zip [ 50451 ]
Attachment		errorlog_vcenturio41.zip [ 50452 ]

Philipp Jenni added a comment - 2020-02-10 08:56

The error logs are attached to the ticket

errorlog_vcenturio41.zip
errorlog_vcenturio40.zip

Philipp Jenni added a comment - 2020-02-10 08:56 The error logs are attached to the ticket errorlog_vcenturio41.zip errorlog_vcenturio40.zip

Philipp Jenni added a comment - 2020-02-17 09:10

Any progress at this ticket. A remote debugging session to analyze the problem is possible at any time.

Philipp Jenni added a comment - 2020-02-17 09:10 Any progress at this ticket. A remote debugging session to analyze the problem is possible at any time.

Jan Lindström (Inactive) added a comment - 2020-02-21 10:33

I do not find any BF errors on both of those error logs.

Jan Lindström (Inactive) added a comment - 2020-02-21 10:33 I do not find any BF errors on both of those error logs.

Philipp Jenni added a comment - 2020-02-21 11:07 - edited

@Jan Lindström:

What should be the next steps for my problem? I don’t know if these a BF Error, but the effect is the same, that the cluster is unusable.

Philipp Jenni added a comment - 2020-02-21 11:07 - edited @Jan Lindström: What should be the next steps for my problem? I don’t know if these a BF Error, but the effect is the same, that the cluster is unusable.

Philipp Jenni added a comment - 2020-02-26 13:06 - edited

I have to some test and it's an BF error. I have enabled WSREP_DEBUG an see the follow entries in my logs:

2020-02-26 13:58:45 2 [Note] InnoDB: BF-BF lock conflict, locking: 1

RECORD LOCKS space id 166475 page no 3 n bits 72 index PRIMARY of table `cluster-test`.`MasterData` trx id 16390951 l                        ock_mode X locks rec but not gap

Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0

 0: len 4; hex 8000000a; asc     ;;

 1: len 6; hex 000000fa1b27; asc      ';;

 2: len 7; hex f2000004670c00; asc     g  ;;

 3: len 3; hex 585858; asc XXX;;

2020-02-26 13:58:45 2 [Note] InnoDB:  SQL1: insert into ReferenceData (value, masterid) values ('XXX1',10)▒kV^ SQL2:                         NULL

2020-02-26 13:58:45 2 [Note] InnoDB: BF conflict, modes: 1026:1059 idx: PRIMARY table: cluster@002dtest/MasterData n_                        uniq: 1 n_user: 1 SQL1: insert into ReferenceData (value, masterid) values ('XXX1',10)▒kV^ SQL2: NULL

Philipp Jenni added a comment - 2020-02-26 13:06 - edited I have to some test and it's an BF error. I have enabled WSREP_DEBUG an see the follow entries in my logs: 2020-02-26 13:58:45 2 [Note] InnoDB: BF-BF lock conflict, locking: 1 RECORD LOCKS space id 166475 page no 3 n bits 72 index PRIMARY of table `cluster-test`.`MasterData` trx id 16390951 l ock_mode X locks rec but not gap Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 8000000a; asc ;; 1: len 6; hex 000000fa1b27; asc ';; 2: len 7; hex f2000004670c00; asc g ;; 3: len 3; hex 585858; asc XXX;; 2020-02-26 13:58:45 2 [Note] InnoDB: SQL1: insert into ReferenceData (value, masterid) values ('XXX1',10)▒kV^ SQL2: NULL 2020-02-26 13:58:45 2 [Note] InnoDB: BF conflict, modes: 1026:1059 idx: PRIMARY table: cluster@002dtest/MasterData n_ uniq: 1 n_user: 1 SQL1: insert into ReferenceData (value, masterid) values ('XXX1',10)▒kV^ SQL2: NULL

Philipp Jenni added a comment - 2020-02-26 14:49

I can solve my Problem when i change the property "wsrep_slave_threads" from 4 to 1. With "1" is the error gone and i can reproduce the problem.

Philipp Jenni added a comment - 2020-02-26 14:49 I can solve my Problem when i change the property "wsrep_slave_threads" from 4 to 1. With "1" is the error gone and i can reproduce the problem.

Julien Fritsch made changes - 2020-03-03 13:38

Priority

Major [ 3 ]

Critical [ 2 ]

Jan Lindström (Inactive) made changes - 2020-03-03 16:00

Labels

need_feedback

Mario Karuza (Inactive) made changes - 2020-03-03 17:00

Assignee

Seppo Jaakola [ seppo ]

Mario Karuza [ mkaruza ]

Mario Karuza (Inactive) added a comment - 2020-03-04 09:49

Hi Philipp,

Can you attach complete system variables values before running your test case, and can you give us steps how to reproduce this issue?

Mario Karuza (Inactive) added a comment - 2020-03-04 09:49 Hi Philipp, Can you attach complete system variables values before running your test case, and can you give us steps how to reproduce this issue?

Philipp Jenni made changes - 2020-03-04 10:11

Attachment

DB Variables.pdf [ 50656 ]

Philipp Jenni added a comment - 2020-03-04 10:16

@Mario Karuza: See DB Variables.pdf

Important: Currently i ran the application with "wsrep_slave_threads=1". With this config the problem does'nt exists. When i change this value to 4 and restart all cluster nodes, than i can reproduce the problem.

The steps are:

1. create an empty db
2. run tablecreate.sql
3. run insert.sql
4. run reset.sql
5. run insert.sql

If the exception is not raised, run step 4 and 5 multiple times

Philipp Jenni added a comment - 2020-03-04 10:16 @Mario Karuza: See DB Variables.pdf Important: Currently i ran the application with "wsrep_slave_threads=1". With this config the problem does'nt exists. When i change this value to 4 and restart all cluster nodes, than i can reproduce the problem. The steps are: 1. create an empty db 2. run tablecreate.sql 3. run insert.sql 4. run reset.sql 5. run insert.sql If the exception is not raised, run step 4 and 5 multiple times

Mario Karuza (Inactive) added a comment - 2020-03-04 10:21

Thank you for info,

Do you run `sql` files directly from connection or by providing files to client ?

Mario Karuza (Inactive) added a comment - 2020-03-04 10:21 Thank you for info, Do you run `sql` files directly from connection or by providing files to client ?

Philipp Jenni added a comment - 2020-03-04 10:23

I run the script from heidisql client.

Philipp Jenni added a comment - 2020-03-04 10:23 I run the script from heidisql client.

Mario Karuza (Inactive) added a comment - 2020-03-05 09:55

Hi Philipp,

I could not reproduce this problem with provided scripts locally.
If you can run this again, please set wsrep_debug=3 so we could have more logging information. If you hit this again attach logs from all 3 nodes.

Mario Karuza (Inactive) added a comment - 2020-03-05 09:55 Hi Philipp, I could not reproduce this problem with provided scripts locally. If you can run this again, please set wsrep_debug=3 so we could have more logging information. If you hit this again attach logs from all 3 nodes.

Julien Fritsch made changes - 2020-03-13 09:08

Labels

need_feedback

Sergei Golubchik made changes - 2020-04-13 11:35

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.4 [ 22408 ]
Resolution		Incomplete [ 4 ]
Status	Open [ 1 ]	Closed [ 6 ]

Nikita Karpin added a comment - 2020-06-02 14:46

@Mario Karuza, I have and issue very similar to this one, could you please take a look - https://jira.mariadb.org/browse/MDEV-22766. I've attached all logs with wsrep_debug=3 and reproduce script

Nikita Karpin added a comment - 2020-06-02 14:46 @Mario Karuza, I have and issue very similar to this one, could you please take a look - https://jira.mariadb.org/browse/MDEV-22766 . I've attached all logs with wsrep_debug=3 and reproduce script

Sergei Golubchik made changes - 2021-12-06 21:50

Workflow

MariaDB v3 [ 102973 ]

MariaDB v4 [ 157186 ]

Jira Automation (IT) made changes - 2024-07-04 03:18

Zendesk Related Tickets

191079

People

Assignee:: Mario Karuza (Inactive)

Reporter:: Eugene

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2020-01-13 18:07

Updated:: 2024-07-07 22:14

Resolved:: 2020-04-13 11:35

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration