[MDEV-13906] Crash during WSREP recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.1.25, 10.1.26
Fix Version/s: N/A
Component/s: wsrep
Labels:
None
Environment:
Ubuntu 16.04.2, MariaDB 10.1.25 and 10.1.26

Description

We have a 3 node Galera Cluster and this weekend we tried rebooting each node one at a time for an EC2 instance upgrade. When the servers came back online they each crashed with signal 11 while trying to rejoin the cluster.

The crash is occurring during WSREP recovery. If I set wsrep_on=OFF MySQL will startup without crashing, but it again crashes when dynamically setting wsrep_on=ON.

Nothing shows up in the other two nodes' logs while the other join is starting up before it crashes. And all ports are open between the Galera nodes. Each node is running MariaDB 10.1.25 but I did upgrade one node to 10.1.26 to see if the problem was fixed there and it exhibited the same behavior. The only way I was get the nodes to rejoin the cluster was to force an SST sync. However the data directory is 1.8TB so that is far from ideal for each node restart.

I've attached the wsrep_recovery log and apport crash file, but it doesn't contain the core dump for some reason. I've also uploaded the my.cnf and a mariadb.cnf config file containing the Galera Cluster related config options.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

_usr_sbin_mysqld.0.crash
17 kB
2017-09-25 19:50
mariadb.cnf
0.9 kB
2017-09-25 19:50
my.cnf
2 kB
2017-09-25 19:50
wsrep_recovery.KbkcqG
5 kB
2017-09-25 19:50

Issue Links

relates to

MDEV-5408 Crash in mariadb-wsrep during plugin load at startup

Closed

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Bryan Traywick added a comment - 2017-09-26 13:19

I do have the dump available and it is InnoDB in the dump:

--

-- Table structure for table `servers`

--

DROP TABLE IF EXISTS `servers`;

/*!40101 SET @saved_cs_client     = @@character_set_client */;

/*!40101 SET character_set_client = utf8 */;

CREATE TABLE `servers` (

  `Server_name` char(64) NOT NULL DEFAULT '',

  `Host` char(64) NOT NULL DEFAULT '',

  `Db` char(64) NOT NULL DEFAULT '',

  `Username` char(80) NOT NULL DEFAULT '',

  `Password` char(64) NOT NULL DEFAULT '',

  `Port` int(4) NOT NULL DEFAULT '0',

  `Socket` char(64) NOT NULL DEFAULT '',

  `Wrapper` char(64) NOT NULL DEFAULT '',

  `Owner` char(64) NOT NULL DEFAULT '',

  PRIMARY KEY (`Server_name`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='MySQL Foreign Servers table';

The enforce_storage_variables variable is blank in the new MariaDB 10.1.25 cluster and doesn't appear to be present at all on the older 10.1.14 cluster (just gives me an empty set when I try to show it).

Bryan Traywick added a comment - 2017-09-26 13:19 I do have the dump available and it is InnoDB in the dump: -- -- Table structure for table `servers` -- DROP TABLE IF EXISTS `servers`; /*!40101 SET @saved_cs_client = @@character_set_client */ ; /*!40101 SET character_set_client = utf8 */ ; CREATE TABLE `servers` ( `Server_name` char (64) NOT NULL DEFAULT '' , `Host` char (64) NOT NULL DEFAULT '' , `Db` char (64) NOT NULL DEFAULT '' , `Username` char (80) NOT NULL DEFAULT '' , ` Password ` char (64) NOT NULL DEFAULT '' , `Port` int (4) NOT NULL DEFAULT '0' , `Socket` char (64) NOT NULL DEFAULT '' , `Wrapper` char (64) NOT NULL DEFAULT '' , `Owner` char (64) NOT NULL DEFAULT '' , PRIMARY KEY (`Server_name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT= 'MySQL Foreign Servers table' ; The enforce_storage_variables variable is blank in the new MariaDB 10.1.25 cluster and doesn't appear to be present at all on the older 10.1.14 cluster (just gives me an empty set when I try to show it).

Andrii Nikitin (Inactive) added a comment - 2017-09-26 13:54

Thank you for confirmation. It should be safe to just convert it back to MyISAM to avoid the problem, but it may be good idea to do that in downtime.

Andrii Nikitin (Inactive) added a comment - 2017-09-26 13:54 Thank you for confirmation. It should be safe to just convert it back to MyISAM to avoid the problem, but it may be good idea to do that in downtime.

Bryan Traywick added a comment - 2017-09-26 14:15 - edited

Thank you so much Andrii. I was able to recreate the issue in our staging cluster by converting mysql.servers to InnoDB on one of the nodes and restarting that node. I was then able to startup that node with wsrep_on=OFF, convert the table back to MyISAM, and then restart MySQL with wsrep_on=ON and it was able to rejoin the cluster without an SST sync.

We will be converting the table back to MyISAM in our production cluster tonight and will restart a node to ensure it doesn't need a full resync. I will report back with confirmation once that has gone successfully but this appears to be the fix we are looking for.

Bryan Traywick added a comment - 2017-09-26 14:15 - edited Thank you so much Andrii. I was able to recreate the issue in our staging cluster by converting mysql.servers to InnoDB on one of the nodes and restarting that node. I was then able to startup that node with wsrep_on=OFF , convert the table back to MyISAM, and then restart MySQL with wsrep_on=ON and it was able to rejoin the cluster without an SST sync. We will be converting the table back to MyISAM in our production cluster tonight and will restart a node to ensure it doesn't need a full resync. I will report back with confirmation once that has gone successfully but this appears to be the fix we are looking for.

Bryan Traywick added a comment - 2017-09-27 15:43

Thanks again Andrii. We converted the table back to MyISAM and were able to restart MySQL with only an IST sync required. As a final test I also tried restarting MySQL on one of the nodes in the older Galera Cluster running MariaDB 10.1.14 and we didn't run into this crash despite the mysql.servers table being InnoDB there as well. So it's likely a change introduced between 10.1.14 and 10.1.25. The older cluster is also running Ubuntu 14.04 and the 10.1.25 cluster is running 16.04 so it could be something to do with the systemd init scripts.

Bryan Traywick added a comment - 2017-09-27 15:43 Thanks again Andrii. We converted the table back to MyISAM and were able to restart MySQL with only an IST sync required. As a final test I also tried restarting MySQL on one of the nodes in the older Galera Cluster running MariaDB 10.1.14 and we didn't run into this crash despite the mysql.servers table being InnoDB there as well. So it's likely a change introduced between 10.1.14 and 10.1.25. The older cluster is also running Ubuntu 14.04 and the 10.1.25 cluster is running 16.04 so it could be something to do with the systemd init scripts.

Jan Lindström added a comment - 2023-04-11 07:13

10.1 is EOL.

Jan Lindström added a comment - 2023-04-11 07:13 10.1 is EOL.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Bryan Traywick

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2017-09-25 19:50

Updated:: 2023-04-12 12:07

Resolved:: 2023-04-11 07:13

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration