[MCOL-3945] load_brm will hang on dbroot1 failover Created: 2020-04-14 Updated: 2023-10-26 Resolved: 2020-06-22 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ? |
| Affects Version/s: | None |
| Fix Version/s: | 1.2.6, 1.4.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ben Thompson (Inactive) | Assignee: | Ben Thompson (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
saveBRM on failover runs before the dbroot is exchanged. this could lead to saveBRM being run before the brm_saves_journal file exists on the new primary module on a OAM parent failure and could lead to load_brm hanging. Reproduce by setting up multi-node glusterfs installation and perform large table import. After import completes kill PM1 and wait for PM2 to take over primary roll will see save_brm command run first then dbroot1 moved to PM2 and then load_brm called in logging. Fix is to first move dbroot1 then run saveBRM this should allow load_brm to run successfully. |
| Comments |
| Comment by Patrick LeBlanc (Inactive) [ 2020-04-14 ] |
|
Looks ok. This will need to get into develop, and develop-1. {2,4}also. |
| Comment by Ben Thompson (Inactive) [ 2020-05-27 ] |
|
Part of this fix was reverted with other failover changes in |
| Comment by Ben Thompson (Inactive) [ 2020-05-27 ] |
|
This was all merged in 1.2.6 with |
| Comment by Daniel Lee (Inactive) [ 2020-06-01 ] |
|
Build tested: 1.4.4-1 (Jenkins 20200601) Failover (PM1 to PM2) after a 10g lineitem import worked fine. According to the debug.log on PM2, save_brm is still being executed first, then dbroot moved. |