[MCOL-1797] resumedatabasewrites causes both DDL/DML to go active on um1/um2 Created: 2018-10-12  Updated: 2023-10-26  Resolved: 2018-12-21

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.1.6
Fix Version/s: 1.1.7, 1.2.3

Type: Bug Priority: Critical
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Environment:

2um/2pm system



 Description   

did a suspenddatabasewrites and then a resumedatabasewrites caused both version of DDL/DML processes to go into an ACTIVE state. DDL/DMLproc should only have 1 set active on a system at any time

Oct 12 14:21:20 ip-172-31-43-56 ProcessManager[12832]: 20.208442 |0|0|0| I 17 CAL0000: MSG RECEIVED: suspend database writes
Oct 12 14:21:25 ip-172-31-43-56 ProcessManager[12832]: 25.234401 |0|0|0| I 17 CAL0000: SUSPENDWRITES: ACK back to sender0
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.885556 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process um1/DDLProc State = ACTIVE
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.885639 |0|0|0| D 18 CAL0000: statusControl: Set Process um1/DDLProc State = ACTIVE PID = 1
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.885885 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process um1/DMLProc State = ACTIVE
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.885928 |0|0|0| D 18 CAL0000: statusControl: Set Process um1/DMLProc State = ACTIVE PID = 1
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.886347 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process um2/DDLProc State = ACTIVE
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.886387 |0|0|0| D 18 CAL0000: statusControl: Set Process um2/DDLProc State = ACTIVE PID = 1
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.886560 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process um2/DMLProc State = ACTIVE
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.886618 |0|0|0| D 18 CAL0000: statusControl: Set Process um2/DMLProc State = ACTIVE PID = 1
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.886889 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/WriteEngineServer State = ACTIVE
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.886934 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/WriteEngineServer State = ACTIVE PID = 1
Oct 12 14:21:40 ip-172-31-43-56 ProcessMonitor[12777]: 40.887194 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = ACTIVE



 Comments   
Comment by David Hill (Inactive) [ 2018-10-12 ]

suspend puts wes/ddl/dml into these states

mcsadmin> suspend
suspenddatabasewrites Fri Oct 12 15:11:19 2018

This command suspends the DDL/DML writes to the MariaDB ColumnStore Database
Do you want to proceed: (y or n) [n]: y

Suspend Calpont Database Writes Request successfully completed
mcsadmin>
mcsadmin>
mcsadmin> getsystemi
getsysteminfo Fri Oct 12 15:11:33 2018

System columnstore-1

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE WRITE SUSPENDED Fri Oct 12 15:10:21 2018

Module um1 ACTIVE Fri Oct 12 15:10:11 2018
Module um2 ACTIVE Fri Oct 12 15:09:58 2018
Module pm1 ACTIVE Fri Oct 12 15:09:48 2018

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'um1'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 ACTIVE Fri Oct 12 15:09:31 2018 13686
ServerMonitor um1 ACTIVE Fri Oct 12 15:09:45 2018 14032
DBRMWorkerNode um1 ACTIVE Fri Oct 12 15:09:49 2018 14086
ExeMgr um1 ACTIVE Fri Oct 12 15:10:02 2018 15250
DDLProc um1 WRITE_SUSPEND Fri Oct 12 15:10:09 2018 16028
DMLProc um1 WRITE_SUSPEND Fri Oct 12 15:10:21 2018 16635
mysqld um1 ACTIVE Fri Oct 12 15:10:14 2018 13987

ProcessMonitor um2 ACTIVE Fri Oct 12 15:09:34 2018 8952
ServerMonitor um2 ACTIVE Fri Oct 12 15:09:49 2018 9316
DBRMWorkerNode um2 ACTIVE Fri Oct 12 15:10:11 2018 9352
ExeMgr um2 ACTIVE Fri Oct 12 15:10:01 2018 9754
DDLProc um2 COLD_STANDBY Fri Oct 12 15:09:58 2018
DMLProc um2 COLD_STANDBY Fri Oct 12 15:09:58 2018
mysqld um2 ACTIVE Fri Oct 12 15:09:57 2018 9253

ProcessMonitor pm1 ACTIVE Fri Oct 12 15:08:53 2018 24667
ProcessManager pm1 ACTIVE Fri Oct 12 15:08:59 2018 24725
DBRMControllerNode pm1 ACTIVE Fri Oct 12 15:09:42 2018 25510
ServerMonitor pm1 ACTIVE Fri Oct 12 15:09:42 2018 25547
DBRMWorkerNode pm1 ACTIVE Fri Oct 12 15:09:45 2018 25583
DecomSvr pm1 ACTIVE Fri Oct 12 15:09:46 2018 25686
PrimProc pm1 ACTIVE Fri Oct 12 15:09:54 2018 25754
WriteEngineServer pm1 WRITE_SUSPEND Fri Oct 12 15:09:55 2018 25797

Active Alarm Counts: Critical = 0, Major = 0, Minor = 0, Warning = 0, Info = 0
mcsadmin>

resume

mcsadmin> resume
resumedatabasewrites Fri Oct 12 15:12:30 2018

This command resumes the DDL/DML writes to the MariaDB ColumnStore Database
Do you want to proceed: (y or n) [n]: y

Resume MariaDB ColumnStore Database Writes Request successfully completed
mcsadmin> getsystemi
getsysteminfo Fri Oct 12 15:12:43 2018

System columnstore-1

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Fri Oct 12 15:12:32 2018

Module um1 ACTIVE Fri Oct 12 15:10:11 2018
Module um2 ACTIVE Fri Oct 12 15:09:58 2018
Module pm1 ACTIVE Fri Oct 12 15:09:48 2018

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'um1'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 ACTIVE Fri Oct 12 15:09:31 2018 13686
ServerMonitor um1 ACTIVE Fri Oct 12 15:09:45 2018 14032
DBRMWorkerNode um1 ACTIVE Fri Oct 12 15:09:49 2018 14086
ExeMgr um1 ACTIVE Fri Oct 12 15:10:02 2018 15250
DDLProc um1 ACTIVE Fri Oct 12 15:12:32 2018 16028
DMLProc um1 ACTIVE Fri Oct 12 15:12:32 2018 16635
mysqld um1 ACTIVE Fri Oct 12 15:10:14 2018 13987

ProcessMonitor um2 ACTIVE Fri Oct 12 15:09:34 2018 8952
ServerMonitor um2 ACTIVE Fri Oct 12 15:09:49 2018 9316
DBRMWorkerNode um2 ACTIVE Fri Oct 12 15:10:11 2018 9352
ExeMgr um2 ACTIVE Fri Oct 12 15:10:01 2018 9754
DDLProc um2 ACTIVE Fri Oct 12 15:12:32 2018
DMLProc um2 ACTIVE Fri Oct 12 15:12:32 2018
mysqld um2 ACTIVE Fri Oct 12 15:09:57 2018 9253

ProcessMonitor pm1 ACTIVE Fri Oct 12 15:08:53 2018 24667
ProcessManager pm1 ACTIVE Fri Oct 12 15:08:59 2018 24725
DBRMControllerNode pm1 ACTIVE Fri Oct 12 15:09:42 2018 25510
ServerMonitor pm1 ACTIVE Fri Oct 12 15:09:42 2018 25547
DBRMWorkerNode pm1 ACTIVE Fri Oct 12 15:09:45 2018 25583
DecomSvr pm1 ACTIVE Fri Oct 12 15:09:46 2018 25686
PrimProc pm1 ACTIVE Fri Oct 12 15:09:54 2018 25754
WriteEngineServer pm1 ACTIVE Fri Oct 12 15:12:32 2018 25797

Active Alarm Counts: Critical = 0, Major = 0, Minor = 0, Warning = 0, Info = 0
mcsadmin>

Comment by David Hill (Inactive) [ 2018-10-12 ]

work-around, after the resumedatabasewrites command, do a

mcsadmin restartsystem y

Comment by David Hill (Inactive) [ 2018-11-27 ]

https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/647

Comment by David Hill (Inactive) [ 2018-11-27 ]

run suspend and resume

before fix
DDLProc um2 ACTIVE Fri Oct 12 15:12:32 2018
DMLProc um2 ACTIVE Fri Oct 12 15:12:32 2018

after fix
DDLProc um2 COLD_STANDBY Fri Oct 12 15:12:32 2018
DMLProc um2 COLD_STANDBY Fri Oct 12 15:12:32 2018

Comment by David Hill (Inactive) [ 2018-11-27 ]

showing full run with fix

mcsadmin> getsystemi
getsysteminfo Tue Nov 27 19:14:40 2018

System columnstore-1

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Tue Nov 27 19:14:00 2018

Module um1 ACTIVE Tue Nov 27 19:13:57 2018
Module um2 ACTIVE Tue Nov 27 19:13:52 2018
Module pm1 ACTIVE Tue Nov 27 19:13:36 2018
Module pm2 ACTIVE Tue Nov 27 19:13:48 2018

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'um1'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 ACTIVE Tue Nov 27 19:13:19 2018 3240
ServerMonitor um1 ACTIVE Tue Nov 27 19:13:33 2018 3586
DBRMWorkerNode um1 ACTIVE Tue Nov 27 19:13:34 2018 3640
ExeMgr um1 ACTIVE Tue Nov 27 19:13:50 2018 5930
DDLProc um1 ACTIVE Tue Nov 27 19:13:54 2018 5967
DMLProc um1 ACTIVE Tue Nov 27 19:13:58 2018 6000
mysqld um1 ACTIVE Tue Nov 27 19:13:51 2018 3541

ProcessMonitor um2 ACTIVE Tue Nov 27 19:13:21 2018 3367
ServerMonitor um2 ACTIVE Tue Nov 27 19:13:37 2018 3733
DBRMWorkerNode um2 ACTIVE Tue Nov 27 19:13:38 2018 3769
ExeMgr um2 ACTIVE Tue Nov 27 19:13:50 2018 5319
DDLProc um2 COLD_STANDBY Tue Nov 27 19:13:52 2018
DMLProc um2 COLD_STANDBY Tue Nov 27 19:13:52 2018
mysqld um2 ACTIVE Tue Nov 27 19:13:48 2018 3668

ProcessMonitor pm1 ACTIVE Tue Nov 27 19:12:38 2018 4021
ProcessManager pm1 ACTIVE Tue Nov 27 19:12:45 2018 4091
DBRMControllerNode pm1 ACTIVE Tue Nov 27 19:13:28 2018 5173
ServerMonitor pm1 ACTIVE Tue Nov 27 19:13:30 2018 5194
DBRMWorkerNode pm1 ACTIVE Tue Nov 27 19:13:30 2018 5265
DecomSvr pm1 ACTIVE Tue Nov 27 19:13:34 2018 5436
PrimProc pm1 ACTIVE Tue Nov 27 19:13:36 2018 5533
WriteEngineServer pm1 ACTIVE Tue Nov 27 19:13:37 2018 5601

ProcessMonitor pm2 ACTIVE Tue Nov 27 19:13:21 2018 2638
ProcessManager pm2 HOT_STANDBY Tue Nov 27 19:13:22 2018 2683
DBRMControllerNode pm2 COLD_STANDBY Tue Nov 27 19:13:38 2018
ServerMonitor pm2 ACTIVE Tue Nov 27 19:13:41 2018 2717
DBRMWorkerNode pm2 ACTIVE Tue Nov 27 19:13:42 2018 2755
DecomSvr pm2 ACTIVE Tue Nov 27 19:13:46 2018 2771
PrimProc pm2 ACTIVE Tue Nov 27 19:13:48 2018 2784
WriteEngineServer pm2 ACTIVE Tue Nov 27 19:13:49 2018 2795

Active Alarm Counts: Critical = 0, Major = 0, Minor = 0, Warning = 0, Info = 0
mcsadmin> suspendd
suspenddatabasewrites Tue Nov 27 19:15:13 2018

This command suspends the DDL/DML writes to the MariaDB ColumnStore Database
Do you want to proceed: (y or n) [n]: y

Suspend MariaDB Columnstore Database Writes Request successfully completed
mcsadmin> getsystemi
getsysteminfo Tue Nov 27 19:15:21 2018

System columnstore-1

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE WRITE SUSPENDED Tue Nov 27 19:14:00 2018

Module um1 ACTIVE Tue Nov 27 19:13:57 2018
Module um2 ACTIVE Tue Nov 27 19:13:52 2018
Module pm1 ACTIVE Tue Nov 27 19:13:36 2018
Module pm2 ACTIVE Tue Nov 27 19:13:48 2018

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'um1'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 ACTIVE Tue Nov 27 19:13:19 2018 3240
ServerMonitor um1 ACTIVE Tue Nov 27 19:13:33 2018 3586
DBRMWorkerNode um1 ACTIVE Tue Nov 27 19:13:34 2018 3640
ExeMgr um1 ACTIVE Tue Nov 27 19:13:50 2018 5930
DDLProc um1 WRITE_SUSPEND Tue Nov 27 19:13:54 2018 5967
DMLProc um1 WRITE_SUSPEND Tue Nov 27 19:13:58 2018 6000
mysqld um1 ACTIVE Tue Nov 27 19:13:51 2018 3541

ProcessMonitor um2 ACTIVE Tue Nov 27 19:13:21 2018 3367
ServerMonitor um2 ACTIVE Tue Nov 27 19:13:37 2018 3733
DBRMWorkerNode um2 ACTIVE Tue Nov 27 19:13:38 2018 3769
ExeMgr um2 ACTIVE Tue Nov 27 19:13:50 2018 5319
DDLProc um2 COLD_STANDBY Tue Nov 27 19:13:52 2018
DMLProc um2 COLD_STANDBY Tue Nov 27 19:13:52 2018
mysqld um2 ACTIVE Tue Nov 27 19:13:48 2018 3668

ProcessMonitor pm1 ACTIVE Tue Nov 27 19:12:38 2018 4021
ProcessManager pm1 ACTIVE Tue Nov 27 19:12:45 2018 4091
DBRMControllerNode pm1 ACTIVE Tue Nov 27 19:13:28 2018 5173
ServerMonitor pm1 ACTIVE Tue Nov 27 19:13:30 2018 5194
DBRMWorkerNode pm1 ACTIVE Tue Nov 27 19:13:30 2018 5265
DecomSvr pm1 ACTIVE Tue Nov 27 19:13:34 2018 5436
PrimProc pm1 ACTIVE Tue Nov 27 19:13:36 2018 5533
WriteEngineServer pm1 WRITE_SUSPEND Tue Nov 27 19:13:37 2018 5601

ProcessMonitor pm2 ACTIVE Tue Nov 27 19:13:21 2018 2638
ProcessManager pm2 HOT_STANDBY Tue Nov 27 19:13:22 2018 2683
DBRMControllerNode pm2 COLD_STANDBY Tue Nov 27 19:13:38 2018
ServerMonitor pm2 ACTIVE Tue Nov 27 19:13:41 2018 2717
DBRMWorkerNode pm2 ACTIVE Tue Nov 27 19:13:42 2018 2755
DecomSvr pm2 ACTIVE Tue Nov 27 19:13:46 2018 2771
PrimProc pm2 ACTIVE Tue Nov 27 19:13:48 2018 2784
WriteEngineServer pm2 WRITE_SUSPEND Tue Nov 27 19:13:49 2018 2795

Active Alarm Counts: Critical = 0, Major = 0, Minor = 0, Warning = 0, Info = 0
mcsadmin> resumed
resumedatabasewrites Tue Nov 27 19:15:29 2018

This command resumes the DDL/DML writes to the MariaDB ColumnStore Database
Do you want to proceed: (y or n) [n]: y

Resume MariaDB ColumnStore Database Writes Request successfully completed
mcsadmin> getsystemi
getsysteminfo Tue Nov 27 19:15:34 2018

System columnstore-1

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Tue Nov 27 19:15:30 2018

Module um1 ACTIVE Tue Nov 27 19:13:57 2018
Module um2 ACTIVE Tue Nov 27 19:13:52 2018
Module pm1 ACTIVE Tue Nov 27 19:13:36 2018
Module pm2 ACTIVE Tue Nov 27 19:13:48 2018

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'um1'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 ACTIVE Tue Nov 27 19:13:19 2018 3240
ServerMonitor um1 ACTIVE Tue Nov 27 19:13:33 2018 3586
DBRMWorkerNode um1 ACTIVE Tue Nov 27 19:13:34 2018 3640
ExeMgr um1 ACTIVE Tue Nov 27 19:13:50 2018 5930
DDLProc um1 ACTIVE Tue Nov 27 19:13:54 2018 5967
DMLProc um1 ACTIVE Tue Nov 27 19:13:58 2018 6000
mysqld um1 ACTIVE Tue Nov 27 19:13:51 2018 3541

ProcessMonitor um2 ACTIVE Tue Nov 27 19:13:21 2018 3367
ServerMonitor um2 ACTIVE Tue Nov 27 19:13:37 2018 3733
DBRMWorkerNode um2 ACTIVE Tue Nov 27 19:13:38 2018 3769
ExeMgr um2 ACTIVE Tue Nov 27 19:13:50 2018 5319
DDLProc um2 COLD_STANDBY Tue Nov 27 19:13:52 2018
DMLProc um2 COLD_STANDBY Tue Nov 27 19:13:52 2018
mysqld um2 ACTIVE Tue Nov 27 19:13:48 2018 3668

ProcessMonitor pm1 ACTIVE Tue Nov 27 19:12:38 2018 4021
ProcessManager pm1 ACTIVE Tue Nov 27 19:12:45 2018 4091
DBRMControllerNode pm1 ACTIVE Tue Nov 27 19:13:28 2018 5173
ServerMonitor pm1 ACTIVE Tue Nov 27 19:13:30 2018 5194
DBRMWorkerNode pm1 ACTIVE Tue Nov 27 19:13:30 2018 5265
DecomSvr pm1 ACTIVE Tue Nov 27 19:13:34 2018 5436
PrimProc pm1 ACTIVE Tue Nov 27 19:13:36 2018 5533
WriteEngineServer pm1 ACTIVE Tue Nov 27 19:13:37 2018 5601

ProcessMonitor pm2 ACTIVE Tue Nov 27 19:13:21 2018 2638
ProcessManager pm2 HOT_STANDBY Tue Nov 27 19:13:22 2018 2683
DBRMControllerNode pm2 COLD_STANDBY Tue Nov 27 19:13:38 2018
ServerMonitor pm2 ACTIVE Tue Nov 27 19:13:41 2018 2717
DBRMWorkerNode pm2 ACTIVE Tue Nov 27 19:13:42 2018 2755
DecomSvr pm2 ACTIVE Tue Nov 27 19:13:46 2018 2771
PrimProc pm2 ACTIVE Tue Nov 27 19:13:48 2018 2784
WriteEngineServer pm2 ACTIVE Tue Nov 27 19:13:49 2018 2795

Active Alarm Counts: Critical = 0, Major = 0, Minor = 0, Warning = 0, Info = 0
mcsadmin>

Comment by Daniel Lee (Inactive) [ 2018-12-20 ]

Build verified: 1.1.7-1 nightly

For 1.2.3-1, the fix is not yet in the 1.2.2-1 nightly build, nor in the "develop" branch. Will retest after code merge later.

Comment by Daniel Lee (Inactive) [ 2018-12-21 ]

Build verified: Github source, develop branch

[root@localhost ~]# cat gitInfo.log
/root/columnstore/mariadb-columnstore-server
commit 797eb854d840043f415610e707a80c08f16c43b2
Merge: c84cbab cdd5705
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Thu Dec 20 22:58:10 2018 +0000

Merge pull request #146 from mariadb-corporation/MCOL-2007

MCOL-2007: add gitversionServer file to builds/packages

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 5082b01078177755f7ffacb8d539655a10abcfb8
Merge: 453850e e4ee109
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu Dec 20 12:55:34 2018 -0800

Merge pull request #663 from mariadb-corporation/1.1-merge-up-2018-12-20

Merge develop-1.1 into develop

Generated at Thu Feb 08 02:31:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.