[MCOL-1498] Installation failed in Replication Distribution command Created: 2018-06-25  Updated: 2023-10-26  Resolved: 2018-08-14

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.1.5
Fix Version/s: 1.1.6

Type: Bug Priority: Major
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

1um 1pm local query yum package non-distribution install, configured with ssh keys


Sprint: 2018-15, 2018-16

 Description   

ColumnStore postConfigure install fails when setting up the MariaDB replication.
Didn't happen on a binary or rpm distribution install, did happen on a yum non-distribution install.
And system was configured with ssh keys between the pm and um nodes, both ways

----- Starting MariaDB ColumnStore on local server -----

MariaDB ColumnStore successfully started

MariaDB ColumnStore Database Platform Starting, please wait ............ DONE

System Catalog Successfully Created

Run MariaDB ColumnStore Replication Setup..
ERROR: Error return in running the MariaDB ColumnStore Master DB Distribute, check /tmp/master-dist*.logs on um1

MariaDB ColumnStore Replication Setup Failed, check logs

IMPORTANT: Once issue has been resolved, rerun postConfigure

Here is the bug, the password ssh should show up here in this command that is outputted to the debug log:

cmd = /usr/local/mariadb/columnstore/bin/rsync.sh 10.128.0.3 /usr/local/mariadb/columnstore 1 > /tmp/master-dist_pm1.log

here is the example when a binary or rpm distributed install works

cmd = /usr/local/mariadb/columnstore/bin/rsync.sh 10.128.0.3 ssh /usr/local/mariadb/columnstore 1 > /tmp/master-dist_pm1.log

This error was reported by a customer and diagnosed by development on customer system. Have reproduced the problem in house.

So at this time, dont know if its a yum issue or a non-distributed install issue. further testing is required



 Comments   
Comment by David Hill (Inactive) [ 2018-06-25 ]

rpm tar package non-distribution worked

cmd = /usr/local/mariadb/columnstore/bin/rsync.sh 10.128.0.3 ssh /usr/local/mariadb/columnstore 1 > /tmp/master-dist_pm1.log

Comment by David Hill (Inactive) [ 2018-08-02 ]

https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/532

Comment by Daniel Lee (Inactive) [ 2018-08-10 ]

Build tested: 1.1.6-1

/root/columnstore/mariadb-columnstore-server
commit 513775738f72ec990d055a5d47e2511e3c0e34dd
Merge: 3c37210 9236098
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Wed Jul 18 09:37:17 2018 +0100

Merge pull request #123 from drrtuy/MCOL-970

MCOL-970 Slow query log now contains original query even in vtable mode

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 150171b714c16bd91ef620ea483f6200ad775038
Merge: 1068679 8a42949
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu Aug 9 17:49:01 2018 -0500

Merge pull request #535 from mariadb-corporation/MCOL-1605

MCOL-1605 - changed error to debug, alarms trying to get issued befor…

Unable to reproduce the issue in 1.1.5-1

Performed many test configurations, with replication, local query enabled and used sshkey.
tested rpm, bin, and repo download installations. Also did not see any issue.

Wait to identify a reproducible test case and will try again.

Comment by Daniel Lee (Inactive) [ 2018-08-14 ]

Build tested: 1.1.6-1 source

/root/columnstore/mariadb-columnstore-server
commit 513775738f72ec990d055a5d47e2511e3c0e34dd
Merge: 3c37210 9236098
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Wed Jul 18 09:37:17 2018 +0100

Merge pull request #123 from drrtuy/MCOL-970

MCOL-970 Slow query log now contains original query even in vtable mode

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 150171b714c16bd91ef620ea483f6200ad775038
Merge: 1068679 8a42949
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu Aug 9 17:49:01 2018 -0500

Merge pull request #535 from mariadb-corporation/MCOL-1605

MCOL-1605 - changed error to debug, alarms trying to get issued befor…

Finally reproduced the case where replication failed in 1.1.5-1

On a 1um2pm stack, manually did a local query (replication enabled) with ssh setup. postConfigure did not prompt for a password and therefore some remote installation command would failed. It seems this affected non-distribed installation only, as distributed installations, with or without localquery worked.

I verified the same tests worked in 1.1.6-1.

I could not reproduce the issue previous because -p password option was used for postConfigure.

In ticket, all we have is the symptoms of the issue customer encountered. A pull request has a comment about adding a prompt for password. Investigation on the issue should have narrowed down what the issue really was. Additional information from findings, root cause, and how the issue was fixed would have help testing tremendously.

Generated at Thu Feb 08 02:29:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.