[MCOL-3702] postConfigure Replication Error Created: 2019-12-28  Updated: 2023-10-26  Resolved: 2020-01-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.4.2
Fix Version/s: 1.4.2

Type: Bug Priority: Blocker
Reporter: Todd Stoffel (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MCOL-3705 Combo installation with schema-sync d... Closed
Problem/Incident
is caused by MCOL-3624 Move jemalloc to LD_PRELOAD Closed
Epic Link: ColumnStore integration in ES in December 2019

 Description   

postConfigure throws the following schema sync/replication error when using multi node option:

/bin/postConfigure -qm -pm-ip-addrs 10.10.10.10,10.10.10.11 -sn columnstore-system1

ERROR: Error return in running the MariaDB ColumnStore Master replication, check /tmp/columnstore_tmp_filesmaster-rep*.logs on pm1

cat /var/log/mariadb/columnstore/debug.log

...
disable-rep-columnstore.sh: Error return, check log /tmp/columnstore_tmp_files/disable-rep-columnstore.log
...

cat /tmp/columnstore_tmp_files/disable-rep-columnstore.log

ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: check log file:/tmp/columnstore_tmp_files/disable-rep-status.log
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.

cat /tmp/columnstore_tmp_files/disable-rep-status.log

Run stop slave command
stop slave;
ERROR: ld.so: object 'libjemalloc.so' from LD_PRELOAD cannot be preloaded: ignored.

It also fails aftewards using mcsadmin:

mcsadmin enableMySQLReplication

enablemysqlreplication   Sat Dec 28 02:16:24 2019
 
Enter the 'User' Password or 'ssh' if configured with ssh-keys
           Please enter: ssh
 
**** enableRep Failed :  API Failure return in enableMySQLRep API



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-12-28 ]

Cause is:

1. The symlink for jemalloc is the devel package for both CentOS and Ubuntu. In my tests the symlink is installed but in production it wouldn't be
2. If jemalloc won't load there is a shell error when ColumnStore processes execute commands
3. If there are any shell errors in logs the replication commands think they have failed

To fix this:

1. We need to detect at compile time if libjemalloc.so.1 or libjemalloc.so.2 is used (get symlink location?)
2. Test if library exists / is installed before LD_PRELOAD is set in columnstore_run.sh

On to the next problems:

1. server_id and log_bin. Don't know how to solve this in an automated yet as this can be in any .cnf file now. May need to write an entire .cnf parser. Or maybe recursively find the .cnf files and regex each one?
2. the error message is missing a '/' in it.

Comment by Andrew Hutchings (Inactive) [ 2019-12-28 ]

Workaround:

1. Install jemalloc-devel / libjemalloc-dev package
2. Add log_bin to /etc/my.cnf.d/columnstore.cnf

At that point this function should work as long as server_id isn't set in any other .cnf file.

Comment by Patrick LeBlanc (Inactive) [ 2020-01-03 ]

Made the run script figure out which to use of libjemalloc.so, libjemalloc.so.1, and libjemalloc.so.2. Didn't address the log_bin or server_id issue. Todd reports that it fixes the Big Problem, and that he has a workaround for setting those 2 vars.

Made the PR, is in Andrew's queue.

Comment by Patrick LeBlanc (Inactive) [ 2020-01-03 ]

As a side note, the real problem here is that one process or another is searching for strings in a log file to decide whether a cmd or script worked or not. It sees the error msgs from ld.so and thinks it failed when it didn't. By itself, ld.so not finding a lib in LD_PRELOAD isn't fatal.

Comment by Andrew Hutchings (Inactive) [ 2020-01-03 ]

Added fixes for everything else.

Comment by Patrick LeBlanc (Inactive) [ 2020-01-03 ]

LGTM; merged it.

Generated at Thu Feb 08 02:44:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.