[MCOL-3660] DBAAS: Columstore SingleNode System is out of service , not read/write capable , but it's mcs system status remains active and Pod recovering is not initiated Created: 2019-12-10  Updated: 2023-03-06  Resolved: 2023-03-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr, PrimProc
Affects Version/s: 1.4.1
Fix Version/s: Icebox

Type: Bug Priority: Critical
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: SkySQL
Environment:

GKE
columnstore.image=mariadb/enterprise-columnstore:1.4.1-1


Attachments: File columnstoreSupportReport.columnstore-1.tar.gz    

 Description   

DBAAS: Columstore SingleNode System is out of service , not read/write capable , but it's mcs system status remains active and Pod recovering is not initiated

How to repeat :
1. Spin up Kubernates SingleNode Columstore Topology.
2. Start continuously killing PrimProc from inside the PM's pod container
Columnstore become out of service but MCS system status remains Active and PM Pod recovering is not triggered

MariaDB [(none)]> select count(*) from  tpcds_100.web_site ;
ERROR 1815 (HY000): Internal error: IDB-2004: Cannot connect to ExeMgr.

1.Spin up Kubernates SingleNode Columstore Topology , check that mcs is operational

[root@expmcsrcc001-mdb-cs-single-0 /]# mcsadmin getsystemi
getsysteminfo   Tue Dec 10 15:42:38 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Tue Dec 10 14:20:35 2019
 
Module pm1    ACTIVE                       Tue Dec 10 14:20:33 2019
 
 
MariaDB ColumnStore Process statuses
 
Process             Module    Status            Last Status Change        Process ID
------------------  ------    ---------------   ------------------------  ----------
ProcessMonitor      pm1       ACTIVE            Tue Dec 10 14:19:41 2019          94
ProcessManager      pm1       ACTIVE            Tue Dec 10 14:19:48 2019         214
StorageManager      pm1       ACTIVE            Tue Dec 10 14:19:54 2019         720
DBRMControllerNode  pm1       ACTIVE            Tue Dec 10 14:20:13 2019         850
ServerMonitor       pm1       ACTIVE            Tue Dec 10 14:20:14 2019         870
DBRMWorkerNode      pm1       ACTIVE            Tue Dec 10 14:20:15 2019         890
PrimProc            pm1       ACTIVE            Tue Dec 10 14:20:19 2019         974
ExeMgr              pm1       ACTIVE            Tue Dec 10 14:20:23 2019        1093
WriteEngineServer   pm1       ACTIVE            Tue Dec 10 14:20:27 2019        1182
DDLProc             pm1       ACTIVE            Tue Dec 10 14:20:31 2019        1230
DMLProc             pm1       ACTIVE            Tue Dec 10 14:20:35 2019        1285
mysqld              pm1       ACTIVE            Tue Dec 10 14:19:53 2019         523
 
Active Alarm Counts: Critical = 0, Major = 0, Minor = 0, Warning = 0, Info = 0

2.

 mcsadmin getsystemi
getsysteminfo   Tue Dec 10 15:46:41 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Tue Dec 10 15:46:30 2019
 
Module pm1    ACTIVE                       Tue Dec 10 15:46:07 2019
 
 
MariaDB ColumnStore Process statuses
 
Process             Module    Status            Last Status Change        Process ID
------------------  ------    ---------------   ------------------------  ----------
ProcessMonitor      pm1       ACTIVE            Tue Dec 10 14:19:41 2019          94
ProcessManager      pm1       ACTIVE            Tue Dec 10 14:19:48 2019         214
StorageManager      pm1       ACTIVE            Tue Dec 10 14:19:54 2019         720
DBRMControllerNode  pm1       ACTIVE            Tue Dec 10 14:20:13 2019         850
ServerMonitor       pm1       ACTIVE            Tue Dec 10 14:20:14 2019         870
DBRMWorkerNode      pm1       ACTIVE            Tue Dec 10 14:20:15 2019         890
PrimProc            pm1       AUTO_OFFLINE      Tue Dec 10 15:46:04 2019
ExeMgr              pm1       MAN_OFFLINE       Tue Dec 10 15:46:27 2019
WriteEngineServer   pm1       ACTIVE            Tue Dec 10 14:20:27 2019        1182
DDLProc             pm1       ACTIVE            Tue Dec 10 14:20:31 2019        1230
DMLProc             pm1       ACTIVE            Tue Dec 10 14:20:35 2019        1285
mysqld              pm1       ACTIVE            Tue Dec 10 14:19:53 2019         523

Columnstore System is out of service

MariaDB [(none)]> select count(*) from  tpcds_100.web_site ;
ERROR 1815 (HY000): Internal error: IDB-2004: Cannot connect to ExeMgr.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-12-10 ]

Can you please attach a ColumnStore support report for this?

Comment by Zdravelina Sokolovska (Inactive) [ 2019-12-10 ]

attached columnstoreSupportReport.columnstore-1.tar.gz , and logs below while getting the ColumnStore support report

Get software report data for pm1
Get config report data for pm1
 
Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.
 
      If you want to list systemd services use 'systemctl list-unit-files'.
      To see services enabled on particular target use
      'systemctl list-dependencies [target]'.
 
 
Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.
 
      If you want to list systemd services use 'systemctl list-unit-files'.
      To see services enabled on particular target use
      'systemctl list-dependencies [target]'.
 
Get log report data for pm1
Get log config data for pm1
Get bulklog report data for pm1
Get hardware report data for pm1
Get resource report data for pm1
Get dbms report data for pm1
ERROR 1815 (HY000) at line 4: Internal error: IDB-2004: Cannot connect to ExeMgr.
ERROR 1815 (HY000) at line 1: Internal error: IDB-2004: Cannot connect to ExeMgr.
 
Columnstore Support Script Successfully completed, files located in columnstoreSupportReport.columnstore-1.tar.gz

Comment by Todd Stoffel (Inactive) [ 2023-03-06 ]

This ticket was created prior to convergence with the server and may be obsolete. If you find this issue still exists in a modern version, please open a new ticket.

Generated at Thu Feb 08 02:44:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.