[MCOL-3494] S3 When postConfigure failed to access online storage, better error msg is needed Created: 2019-09-10  Updated: 2020-09-21  Resolved: 2020-08-31

Status: Closed
Project: MariaDB ColumnStore
Component/s: installation
Affects Version/s: 1.4.0
Fix Version/s: 1.4.5, 5.4.1

Type: Bug Priority: Major
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-3638 StorageManager logging inconsistency Closed
Sprint: 2020-1, 2020-2, 2020-3, 2020-4, 2020-5, 2020-6, 2020-7

 Description   

Build tested: 1.4.0-1

server commit:
67452bc
engine commit:
64ceb86

I configured the storagemanager.cnf to use AWS S3, but have the region setup incorrectly, used us-west-2b instead of us-west-2. postConfigure would return the following messages:

We should output better messages to indicate that ColumnStore failed to connect to online S3 storage.

postConfigure terminal output

s1pm1: DBRM::send_recv caught: InetStreamSocket::connect: connect() error: Connection refused to: InetStreamSocket: sd: 10 inet: 127.0.0.1 port: 8616
s1pm1: DBRM::send_recv caught: InetStreamSocket::connect: connect() error: Connection refused to: InetStreamSocket: sd: 10 inet: 127.0.0.1 port: 8616
s1pm1: DBRM::send_recv caught: InetStreamSocket::connect: connect() error: Connection refused to: InetStreamSocket: sd: 10 inet: 127.0.0.1 port: 8616

crit.log file:

[root@localhost columnstore]# cat crit.log
Sep 10 14:32:34 localhost workernode[16006]: 34.479403 |0|0|0| C 30 CAL0000: An error occured: Could not open the BRM journal for writing!
Sep 10 14:32:36 localhost ProcessMonitor[15107]: 36.251417 |0|0|0| C 18 CAL0000: *****MariaDB ColumnStore Process Restarting: DBRMWorkerNode, old PID = 16006
Sep 10 14:32:47 localhost ProcessManager[15386]: 47.773078 |0|0|0| C 17 CAL0000: startMgrProcessThread Exit with a failure, error returned from startSystemThread
Sep 10 14:32:48 localhost workernode[16469]: 48.713846 |0|0|0| C 30 CAL0000: An error occured: Could not open the BRM journal for writing!
Sep 10 14:32:48 localhost ProcessMonitor[15107]: 48.732895 |0|0|0| C 18 CAL0000: DBRMControllerNode/15935 failed to init in 20 seconds, force killing it so it can restart

err.log

[root@localhost columnstore]# cat err.log
Sep 10 14:32:01 localhost configcpp[15107]: 01.128292 |0|0|0| E 12 SocketPool::getSocket() failed to connect; got 'Connection refused'
Sep 10 14:32:02 localhost configcpp[15107]: 02.128475 |0|0|0| E 12 SocketPool::getSocket() failed to connect; got 'Connection refused'
Sep 10 14:32:03 localhost configcpp[15107]: 03.129094 |0|0|0| E 12 SocketPool::getSocket() failed to connect; got 'Connection refused'
Sep 10 14:32:04 localhost configcpp[15107]: 04.130295 |0|0|0| E 12 SocketPool::getSocket() failed to connect; got 'Connection refused'



 Comments   
Comment by Ben Thompson (Inactive) [ 2019-10-03 ]

There should be an message in the err.log, it should read:
S3Storage: failed to " [S3 operation] ", check log files for specific error
possible it was on a different node but should be referring to a connectivity test for S3 failed.

Comment by Daniel Lee (Inactive) [ 2019-10-11 ]

postConfigure is a interactive process. Instead of having inetstreamsocket internal messages, which don't have anything to the user, output to the terminal, we should output error intended for the user so the user knows what the issue is.

Comment by Ben Thompson (Inactive) [ 2019-12-03 ]

Some of the log messages were going to warning log file instead of error log file. This should be fixed in MCOL-3638.

Comment by Patrick LeBlanc (Inactive) [ 2020-03-17 ]

I thought of an easy way to do this. It would be hard for postconfig to see if SM is complaining when ProcMgr/Mon is trying to start the system, but it would be easy for postConfig to run an access checker directly and get an error code.

So, what I'd do is add a cmdline option to SM, or write a new prog for this purpose (only a few lines I think), which runs the existing access check (and nothing else), and returns 0 on success, 1 on failure. In postConfig, run it when the user selects the SM storage option.

Comment by Daniel Lee (Inactive) [ 2020-05-06 ]

Build tested: 1.4.4-1, 1.5.0-1 sourcee

1.4.4-1

/root/ColumnStore/buildColumnstoreFromGithubSource/server
commit d8ff39957275afc7a870487631cd3b3be5eb8818
Author: Rasmus Johansson <razze@iki.fi>
Date: Mon May 4 10:10:07 2020 +0000

MDEV-22273 jUnit patch: xml test result differs from MTR output in case if retry

/root/ColumnStore/buildColumnstoreFromGithubSource/server/engine
commit 4c275557633edb60905faa500990d55a6834951d
Merge: aa054a9 47f2933
Author: David.Hall <david.hall@mariadb.com>
Date: Tue May 5 12:34:03 2020 -0500

Merge pull request #1179 from pleblanc1976/update-libs3-ref

Updated the s3 lib.

1.5.0-1

/root/ColumnStore/buildColumnstoreFromGithubSource/server
commit 43b7e5d29a5480214cee8317a4625b749ccffbaf
Author: Rasmus Johansson <razze@iki.fi>
Date: Mon May 4 10:10:07 2020 +0000

MDEV-22273 jUnit patch: xml test result differs from MTR output in case if retry

/root/ColumnStore/buildColumnstoreFromGithubSource/server/engine
commit 368c4fac059d5cf4f1596e56ea7b1e729e29ec49
Merge: 15a9efa 4bc408c
Author: David.Hall <david.hall@mariadb.com>

Date: Tue May 5 12:35:14 2020 -0500

Merge pull request #1178 from pleblanc1976/update-libs3-ref-1.5

Updated s3 lib ref

I purposely enter invalid S3 related parameters for cpimport. I hung for over 10 minutes after I selected S3 storage.

I also did another test, having only the region being invalid. Same behavior.

I also did another test by entering all correct parameter values, cpimport advanced to the next prompt immediately.

[centos7:root~]# ./postConfigure -sm-bucket dleeqadbroot1 -sm-region hello -sm-id ok -sm-secret secret

This is the MariaDB ColumnStore System Configuration and Installation tool.
It will Configure the MariaDB ColumnStore System and will perform a Package
Installation of all of the Servers within the System that is being configured.

IMPORTANT: This tool requires to run on the Performance Module #1

Prompting instructions:

Press 'enter' to accept a value in (), if available or
Enter one of the options within [], if available, or
Enter a new value

===== Setup System Server Type Configuration =====

There are 2 options when configuring the System Server Type: single and multi

'single' - Single-Server install is used when there will only be 1 server configured
on the system. It can also be used for production systems, if the plan is
to stay single-server.

'multi' - Multi-Server install is used when you want to configure multiple servers now or
in the future. With Multi-Server install, you can still configure just 1 server
now and add on addition servers/modules in the future.

Select the type of System Server install [1=single, 2=multi] (2) > 1

Performing the Single Server Install.

Enter System Name (columnstore-1) >

===== Setup Storage Configuration =====

----- Setup Performance Module DBRoot Data Storage Mount Configuration -----

Columnstore supports the following storage options...
1 - internal. This uses the linux VFS to access files and does
not manage the filesystem.
2 - external *. If you have other mountable filesystems you would
like ColumnStore to use & manage, select this option.
3 - GlusterFS * Note: glusterd service must be running and enabled on
all PMs.
4 - S3-compatible cloud storage *. Note: that should be configured
before running postConfigure (see storagemanager.cnf)

  • - This option enables data replication and server failover in a
    multi-node configuration.

These options are available on this system: [1, 2, 4]
Select the type of data storage (1) > 4

Comment by Daniel Lee (Inactive) [ 2020-08-25 ]

reopen per my last test result

Comment by Ben Thompson (Inactive) [ 2020-08-26 ]

This change to SM did not get merged from 1.4 to 1.5 and needed to update the columnstore-post-install script to actually test env variables set by user on install.

Comment by Daniel Lee (Inactive) [ 2020-08-31 ]

Builds verified: 1.4.5-1 (drone #483), 1.5.4-1 (drone #496)

[centos7:root~]# cat err.log
Aug 31 17:29:34 localhost StorageManager[12979]: S3Storage::putObject(): failed to PUT, got 'Authentication failed'. bucket = dleeqadbroot1, key = 0f1eda14-e4d3-4abb-a295-040b6dda6599connectivity_test.
Aug 31 17:29:34 localhost StorageManager[12979]: S3Storage: failed to PUT, check log files for specific error

Generated at Thu Feb 08 02:43:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.