[MCOL-3842] 1.4.2 centos 7 with gluster setup - pm failover and movePmDbrootConfig fail Created: 2020-02-27 Updated: 2020-08-25 Resolved: 2020-05-13 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.4.2 |
| Fix Version/s: | 1.4.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | David Hill (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
centos 7 3pm with gluster |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Sprint: | 2020-4, 2020-5, 2020-6, 2020-7 | ||||||||||||||||
| Description |
|
Reported by customer and reproduce by support (dh). Did successfully install of a 3pm combo system on centos 7 with gluster having 3 copies configuration. I reproduced these 2 problems, which the customer reported 1. PM3 failover testing - when PM3 recovered, the dbroot assignment failed to move dbroot 3 back to pm3 2. tried the manual move and it also failed first took pm3 down to see if that works Component Status Last Status Change Module pm1 ACTIVE Thu Feb 27 22:47:34 2020 Active Parent OAM Performance Module is 'pm1' MariaDB ColumnStore Process statuses Process Module Status Last Status Change Process ID ProcessMonitor pm2 ACTIVE Thu Feb 27 22:40:28 2020 16217 ProcessMonitor pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020 Active Alarm Counts: Critical = 7, Major = 1, Minor = 0, Warning = 0, Info = 0 mcsadmin> getst System Storage Configuration Performance Module (DBRoot) Storage Type = DataRedundancy Data Redundant Configuration Copies Per DBroot = 3 Brought pm3 back up and it failed to bring PM3 back in the system mcsadmin> getsystemi System columnstore-1 System and Module statuses Component Status Last Status Change Module pm1 ACTIVE Thu Feb 27 22:54:32 2020 Active Parent OAM Performance Module is 'pm1' MariaDB ColumnStore Process statuses Process Module Status Last Status Change Process ID ProcessMonitor pm2 ACTIVE Thu Feb 27 22:40:28 2020 16217 ProcessMonitor pm3 ACTIVE Thu Feb 27 22:51:38 2020 4492 Active Alarm Counts: Critical = 7, Major = 0, Minor = 0, Warning = 0, Info = 0 System Storage Configuration Performance Module (DBRoot) Storage Type = DataRedundancy Data Redundant Configuration Copies Per DBroot = 3 Then I tried the enable and move and got the same error you did mcsadmin> stopsystem This command stops the processing of applications on all Modules within the MariaDB ColumnStore System Checking for active transactions System being stopped now... NOTE: These module(s) are DISABLED: pm3 mcsadmin> altersystem-enablemodule pm3 This command starts the processing of applications on a Module within the MariaDB ColumnStore System Enabling Modules Performance Module(s) Enabled, run movePmDbrootConfig or assignDbrootPmConfig to assign dbroots, if needed mcsadmin> movePmDbrootConfig pm1 1 pm3
mcsadmin> movePmDbrootConfig pm1 3 pm3 DBRoot IDs currently assigned to 'pm1' = 1, 3 DBroot IDs being moved, please wait...
|
| Comments |
| Comment by David Hill (Inactive) [ 2020-02-27 ] |
|
error logs on pm1 from my testing Feb 27 22:38:59 ip-172-30-0-15 ProcessMonitor[16519]: 59.919220 |0|0|0| E 18 CAL0000: glusterAssign mount failure: dbroot: 1 error: 1 |
| Comment by Daniel Lee (Inactive) [ 2020-05-01 ] |
|
1.4.4-1 /root/ColumnStore/buildColumnstoreFromGithubSource/server /root/ColumnStore/buildColumnstoreFromGithubSource/server/engine 3pm glusterfs installation failed. mysqld on PM1 was not running, although getprocessstatus show mysqld ACTIVE with a PID. localhost.localdomain.err on PM1 showed the following errors: The same 3pm.glusterfs installation test worked on 1.4.3-4 log files from all three PMs have been attached to this ticket. 1.2.6-1 and 1.5.0-1 tests are pending for packages. |
| Comment by Patrick LeBlanc (Inactive) [ 2020-05-05 ] |
|
Daniel, it looks like you installed the gssapi packages, which aren't configured (or missing other pieces like a separate authentication server; I'm still not totally clear), and which are causing the server to commit suicide. Not a columnstore problem. Please retest w/o installing the gssapi packages. |
| Comment by Daniel Lee (Inactive) [ 2020-05-05 ] |
|
Skipped installing gssapi and still have the same issue. 2020-05-05 20:21:13 0 [Note] /usr/sbin/mysqld: ready for connections. |
| Comment by Ben Thompson (Inactive) [ 2020-05-06 ] |
|
Built local with same commit ID could not reproduce. gssapi does not appear to be an issue however reviewing logs have noticed this logging: 2020-05-01 15:38:58 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!! Leading to believe server was built with mismatch of flags from engine. I also would like to know what systemd is logging for mariadb.service rpms I tested with have been upload to shared location. |
| Comment by Daniel Lee (Inactive) [ 2020-05-07 ] |
|
Build tested: 1.4.4-1 RC from Jenkins - 2020-05-06 3PM combo installation, without glusterfs. Installation finished, OAM show the stack in good state, but mysqld is not running. cat /var/lib/mysql/localhost.localdomain.err [centos7:root~]# cat localhost.localdomain.err 200507 16:31:35 Columnstore: Started; Version: 1.4.4-1 |
| Comment by David Hall (Inactive) [ 2020-05-08 ] |
|
The symptom seen where mysqld is shutting down unexpectedly is seen only when systemd is not used. |
| Comment by Daniel Lee (Inactive) [ 2020-05-13 ] |
|
Build verified: 1.4.4-1 (Jenkins 20200508) Testing has been performed for 1.4.4-1. Releases 1.2.6 and 1.5 will be tracked on a separate ticket. Tested on Centos 7 and Ubuntu 18.04 Tickets have been open for newly identified issues. Please see related tickets. |