[MDEV-4404] Galera Node throws "Could not read field" error and drops out of cluster Created: 2013-04-17 Updated: 2021-07-12 Resolved: 2014-11-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 5.5.29-galera |
| Fix Version/s: | 5.5.41-galera |
| Type: | Bug | Priority: | Major |
| Reporter: | Matthew Wheeler | Assignee: | Nirbhay Choubey (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 3 |
| Labels: | galera | ||
| Environment: |
CentOS release 6.3 |
||
| Description |
|
After a couple days of running one node in a 2 node cluster (with arbitrator) will error out saying "Could not read field" Error 1610 and then "Could not execute Update_rows event" Error 1030. The other node continued. The field exists in the table. Nodes were initialized using xtrabackup method.
From our settings:
MariaDB-galera was installed from repos. |
| Comments |
| Comment by Matthew Wheeler [ 2013-04-17 ] | ||||||||||||||
|
BTW - this has happened twice in the last week. Each time we reinitialized the effected node. Different node each time. Different table and field each time. | ||||||||||||||
| Comment by Elena Stepanova [ 2013-04-22 ] | ||||||||||||||
|
Hi Matthew, Did GRA_* log files with this event appear in the datadir when the failure happened? Could you please provide them, along with the complete error log? Thanks. | ||||||||||||||
| Comment by Matthew Wheeler [ 2013-04-22 ] | ||||||||||||||
|
Yep. i have the logs available for both times this has happened. We are running using a ZFS file system and I took snapshots of the data and log directories after each instance. I'll send the .err, the GRA_, and last binary log file(mysql-bin.000.) If there is anything else you need, let me know. Where should i upload them to on the ftp site? In the err, you'll see that we were trying to do some bulk loads - for some reason this is not working at all for us, even with data sets as small as 10 records. they load into the node that it is run on, but the replication does not happen. you'll see the error there. I'll submit this as a separate issue, i just wanted to explain what was going on before the crash. We were testing that some time before and it was separate tables, so i don't think that effected it. Thanks. | ||||||||||||||
| Comment by Elena Stepanova [ 2013-04-22 ] | ||||||||||||||
|
>> Where should i upload them to on the ftp site? | ||||||||||||||
| Comment by Matthew Wheeler [ 2013-04-22 ] | ||||||||||||||
|
ok. Files uploaded: Thanks for your help. | ||||||||||||||
| Comment by Ryo Tagami [ 2013-04-23 ] | ||||||||||||||
|
I had a similar plobrem with 2 node MariaDB Galera Cluster where one node dies with following error (other node continues to work) . 130423 1:08:01 [ERROR] Slave SQL: Could not read field 'inventory_link' of table 'zabbix.items', Error_code: 1610 I've uploaded tarball contains GRA_*.log, my.cnf, binary log and error log as | ||||||||||||||
| Comment by Elena Stepanova [ 2013-04-23 ] | ||||||||||||||
|
Matthew, Ryo, thank you. Hi Seppo, Could it be another manifestation of | ||||||||||||||
| Comment by Matthew Wheeler [ 2013-05-13 ] | ||||||||||||||
|
We just had this same error happen again on an otherwise stable cluster. We have been functionally using the cluster as master-slave - only doing reads from one node and reads and writes to the other. Same error on the "slave" node. I have snapshots of the system after the crash if you would like another example of this. | ||||||||||||||
| Comment by Aleksey Sanin (Inactive) [ 2013-05-21 ] | ||||||||||||||
|
Same here: [root@devdb01 ~]# cat /var/log/mysqld/error.log | ||||||||||||||
| Comment by Matthew Wheeler [ 2013-06-12 ] | ||||||||||||||
|
Happened again today. Took a snapshot and can post any/all of the logs needed to help diagnose. It has been almost a month since the last crash. We are running with one box as a read only "slave" which seems to delay times between crashes. Thanks. | ||||||||||||||
| Comment by Seppo Jaakola [ 2013-08-20 ] | ||||||||||||||
|
The error is thrown, when unpacking a replication ROW event in rpl_record.cc:unpack_row():
The check for pack_ptr and throwing the error is MariaDB specific code | ||||||||||||||
| Comment by Seppo Jaakola [ 2013-08-20 ] | ||||||||||||||
|
Analyzed Ryo's logs from zabbix database. | ||||||||||||||
| Comment by Seppo Jaakola [ 2013-08-20 ] | ||||||||||||||
|
Did an experiment with MGC 5.5.32 node, which was modified to read replication events from the zabbix GRA_* file. It turns out that MGC can process the full file (all 318 events). updates from the GRA file do not succeed, as my database is empty, but no problem in parsing the events, nevertheless. So, all in all, it looks like this issue is concurrency related, something interferes with the parsing of replication events. | ||||||||||||||
| Comment by Seppo Jaakola [ 2013-08-20 ] | ||||||||||||||
|
There is no clear explanation to this symptom: the node crashed for a corrupted replication event, yet the same replication events, as analyzed now, seem to be valid. I added log message to show more information of the replication event, after the corruption has been detected. This log message will be in next 5.5.32 release, the actual fix for this bug must wait for the release after 5.5.32. Log message pushed in revision: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3413 | ||||||||||||||
| Comment by Matthew Wheeler [ 2013-09-25 ] | ||||||||||||||
|
I am happy to report this happened again and after we have updated to 5.5.32 with the extra logging.. I have a snapshot of both systems nodes in the cluster. Please let me know what files you wish me to send you. You have never seen a bunch of programmers and admins so happy over a crash!! | ||||||||||||||
| Comment by Seppo Jaakola [ 2013-09-27 ] | ||||||||||||||
|
Matthew, this is interesting indeed! Please upload your mysql error logs from both nodes. If you have GRA__.log file related to the crash, it will be needed as well. If the information is sensitive (GRA file contains the transaction data in plain text), you can email directly to me as well (seppo.jaakola@codership.com). We recently tracked a problem related to binlog event annotation processing. If you have binlog_annotate_row_events enabled, it may affect the the issue you are facing. | ||||||||||||||
| Comment by Matthew Wheeler [ 2013-09-27 ] | ||||||||||||||
|
No sensitive info but I emailed the files to you anyway. Let me know if you didn't get them. I included our cnf files. we do not have binlog_annotate_row_events set and I don't think it is enabled by default. Let me know if you need anything else. thanks again. | ||||||||||||||
| Comment by Matthew Wheeler [ 2014-10-17 ] | ||||||||||||||
|
After the last couple of updates we have not seen this issue again. This could be closed. | ||||||||||||||
| Comment by Elena Stepanova [ 2014-10-17 ] | ||||||||||||||
|
Could you please confirm it should be closed, and close if it should be? | ||||||||||||||
| Comment by Eugene Pankov [ 2021-07-12 ] | ||||||||||||||
|
I've just seen this with MariaDB 10.2.30 today, without Xtrabackup and with `binlog_annotate_row_events`. All nodes in the cluster have shut themselves down the moment one ran out of /tmp space. The log mentions an existing VARCHAR field. |