[MCOL-5306] Broken connections in mariadb while primary node down. failover related. Created: 2022-11-11  Updated: 2022-12-28  Resolved: 2022-12-13

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 22.08.1, 6.4.1
Fix Version/s: 22.08.7

Type: Bug Priority: Blocker
Reporter: Alan Mologorsky Assignee: Gagan Goel (Inactive)
Resolution: Fixed Votes: 0
Labels: cluster, stability

Attachments: File a0001_failover.result    
Issue Links:
Blocks
blocks MCOL-5293 Replication not working after failove... Closed
Assigned for Review: Roman Roman
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Cluster 3 nodes, docker compose.
Steps to reproduce:

  • stop primary node
  • wait after failover do its stuff
  • trying to select previously existing data at new primary node failing and gives such an error in debug.log:

    tail -f /var/log/mariadb/columnstore/debug.log
    Nov 11 12:35:58 mcs2 controllernode[409]: 58.561877 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::getSystemState() failed (network)
      %%10%%
    Nov 11 12:35:59 mcs2 messagequeue[409]: 59.588469 |0|0|0| E 31 CAL0000: messageqcpp::hostnameResolver Name or service not known         %%10%%
    Nov 11 12:36:01 mcs2 messagequeue[409]: 01.642554 |0|0|0| E 31 CAL0000: messageqcpp::hostnameResolver Name or service not known         %%10%%
    

After manual restart mariadbd there are no errors and select\insert works as expected.



 Comments   
Comment by Daniel Lee (Inactive) [ 2022-12-13 ]

Build verified: 22.08.7
engine: e243a5332b8613ce0e370a503461990fefc24fce
server: d3049350bb5c61340f5a7518b155d3c9dacdcb33
buildNo: 6202
Executed test case in mustest, test advance.a000_failover.test
Steps performed.
echo Checking MaxScale status......
echo Checking ColumnStore status on mcs1......
echo Running sanity test on mcs1......
echo Checking ColumnStore status on mcs1......
echo Stopping node mcs1......
echo Checking MaxScale status......
echo Checking ColumnStore status on mcs2......
echo Starting node mcs1......
echo Checking MaxScale status......
echo Checking ColumnStore status on mcs1......
echo Create a 1g DBT2 database on mcs2......
echo Check row counts on mcs1 for replication......
echo Drop test database......
echo Ending of test.
Test result, output from the test, has been attached.

Generated at Thu Feb 08 02:56:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.