[MDEV-15770] We have three node galera cluster with mariadb, bootstrap primary node is running but other two nodes not able to recover the data from first node and crash after some data recover like after recovery 15GB data Created: 2018-04-04 Updated: 2018-08-25 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.1.31 |
| Fix Version/s: | 10.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Shrikant Anjankar | Assignee: | Seppo Jaakola |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Environment: |
CentOS 7.4 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
We have three node galera cluster with mariadb, bootstrap primary node is running but other two nodes not able to recover the data from first node and crash after some data recover like after recovery 15GB data. Service Status: Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local sh[25879]: 2018-04-04 6:09:05 139810169485568 [Note] Loaded 'file_key_management.so' with offset 0x7f28081fb000 Log errors: ------------------------------------------------------------------------ 2018-04-04 6:01:38 140400691869952 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer... To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 10.1.31-MariaDB Thread pointer: 0x0 stack_bottom = 0x0 thread_stack 0x48400 |
| Comments |
| Comment by Sachin Setiya (Inactive) [ 2018-04-04 ] | |
|
Can you upload core dump ? And cnf file ? | |
| Comment by Shrikant Anjankar [ 2018-04-04 ] | |
|
I have uploaded server.conf file. One node 10.134.18.4 is p running with bootstrap and accessible, and another two nodes not able to recover and sync data after 6 GB data collection and MySQL service aborted. New error . Please let us knnow if any option need to enable during recovery. History: all three nodes was ran fine before the saterday night. [4/4/2018 4:52 PM] Kalmady, Sachin: | |
| Comment by Marko Mäkelä [ 2018-04-04 ] | |
|
This looks awfully similar to fraggeln’s report in | |
| Comment by Shrikant Anjankar [ 2018-04-05 ] | |
|
Any specific workaround. Production server only ruuning single galera mariadb 1 node. another two nodes not able to resync with the primary one it get failed. | |
| Comment by Shrikant Anjankar [ 2018-04-05 ] | |
|
we have test 10.2.14 mariadb version on other galera cluster with version 25.3.22 but still stuck rsync after 9GB. Please give the solution | |
| Comment by Seppo Jaakola [ 2018-04-05 ] | |
|
The error log shows that rsync has failed earlier, with that it is easy to understand that innodb startup will fail also. Therefore I doubt this would relate to For troubleshooting rsync transfer issue, please attach the error log from donor node as well. | |
| Comment by Shrikant Anjankar [ 2018-04-10 ] | |
|
azabnl-id05 is joiner/reciever node Please check the logs and let us know if anything need to change or apply new patch. Just I want to share one success story: When we stuck up with resync method we used below custom command to sync data between two node and it got succeeded, but this was not worked with other case. #mysqld --basedir=/usr --datadir=/mnt/data --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --log-error=/mnt/data/AZABNL-ID04.XXXXXX.err --pid-file=/mnt/data/AZABNL-ID04.XXXXX.pid --wsrep_start_position=53540047-107d-11e6-8b2a-9a31eea4d5df:0 | |
| Comment by Shrikant Anjankar [ 2018-06-06 ] | |
|
anyone have any update regarding three node galara syncing issue? | |
| Comment by Jacques Amar [ 2018-08-25 ] | |
|
@shrikant, if you can afford the downtime, here's how i resorted to fix it: Basically stop mariadb on a running server. I had to do this. On a 15GB, downtime was in the minutes, If you only have one server running, you might have to use 'galera_new_cluster' to start the first server. My Case:
Hope this helps. |