[MCOL-3565] UM1 has no / zombie mysqld_safe processes after postConfigure Created: 2019-10-17  Updated: 2019-11-18  Resolved: 2019-11-18

Status: Closed
Project: MariaDB ColumnStore
Component/s: MariaDB Server
Affects Version/s: 1.4.0
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Jens Röwekamp (Inactive) Assignee: Andrew Hutchings (Inactive)
Resolution: Not a Bug Votes: 0
Labels: SkySQLMVP
Environment:

VirtualBox VMs, SkySQL dev environment

1UM 2PM multi node cluster (root installation)

gitversionEngine: 1f47534



 Description   

After configuring a 1UM 2PM cluster with postConfigure UM1 executes:

/bin/sh /usr/local/mariadb/columnstore/mysql//bin/mysqld_safe --datadir=/usr/local/mariadb/columnstore/mysql/db --pid-file=/usr/local/mariadb/columnstore/mysql/db/um1.pid -

but no mysqld_safe process is visible on the VM:

[root@um1 jens]# ps -aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1 125320  3832 ?        Ss   05:06   0:00 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
root         2  0.0  0.0      0     0 ?        S    05:06   0:00 [kthreadd]
root         4  0.0  0.0      0     0 ?        S<   05:06   0:00 [kworker/0:0H]
root         5  0.0  0.0      0     0 ?        S    05:06   0:00 [kworker/u4:0]
root         6  0.0  0.0      0     0 ?        S    05:06   0:00 [ksoftirqd/0]
root         7  0.0  0.0      0     0 ?        S    05:06   0:00 [migration/0]
root         8  0.0  0.0      0     0 ?        S    05:06   0:00 [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    05:06   0:00 [rcu_sched]
root        10  0.0  0.0      0     0 ?        S<   05:06   0:00 [lru-add-drain]
root        11  0.0  0.0      0     0 ?        S    05:06   0:00 [watchdog/0]
root        12  0.0  0.0      0     0 ?        S    05:06   0:00 [watchdog/1]
root        13  0.0  0.0      0     0 ?        S    05:06   0:00 [migration/1]
root        14  0.0  0.0      0     0 ?        S    05:06   0:00 [ksoftirqd/1]
root        16  0.0  0.0      0     0 ?        S<   05:06   0:00 [kworker/1:0H]
root        18  0.0  0.0      0     0 ?        S    05:06   0:00 [kdevtmpfs]
root        19  0.0  0.0      0     0 ?        S<   05:06   0:00 [netns]
root        20  0.0  0.0      0     0 ?        S    05:06   0:00 [khungtaskd]
root        21  0.0  0.0      0     0 ?        S<   05:06   0:00 [writeback]
root        22  0.0  0.0      0     0 ?        S<   05:06   0:00 [kintegrityd]
root        23  0.0  0.0      0     0 ?        S<   05:06   0:00 [bioset]
root        24  0.0  0.0      0     0 ?        S<   05:06   0:00 [bioset]
root        25  0.0  0.0      0     0 ?        S<   05:06   0:00 [bioset]
root        26  0.0  0.0      0     0 ?        S<   05:06   0:00 [kblockd]
root        27  0.0  0.0      0     0 ?        S<   05:06   0:00 [md]
root        28  0.0  0.0      0     0 ?        S<   05:06   0:00 [edac-poller]
root        29  0.0  0.0      0     0 ?        S<   05:06   0:00 [watchdogd]
root        30  0.0  0.0      0     0 ?        S    05:06   0:00 [kworker/0:1]
root        35  0.0  0.0      0     0 ?        S    05:06   0:00 [kswapd0]
root        36  0.0  0.0      0     0 ?        SN   05:06   0:00 [ksmd]
root        37  0.0  0.0      0     0 ?        SN   05:06   0:00 [khugepaged]
root        38  0.0  0.0      0     0 ?        S<   05:06   0:00 [crypto]
root        46  0.0  0.0      0     0 ?        S<   05:06   0:00 [kthrotld]
root        48  0.0  0.0      0     0 ?        S<   05:06   0:00 [kmpath_rdacd]
root        49  0.0  0.0      0     0 ?        S<   05:06   0:00 [kaluad]
root        50  0.0  0.0      0     0 ?        S    05:06   0:00 [kworker/1:1]
root        51  0.0  0.0      0     0 ?        S<   05:06   0:00 [kpsmoused]
root        52  0.0  0.0      0     0 ?        S<   05:06   0:00 [ipv6_addrconf]
root        66  0.0  0.0      0     0 ?        S<   05:06   0:00 [deferwq]
root       102  0.0  0.0      0     0 ?        S    05:06   0:00 [kauditd]
root       259  0.0  0.0      0     0 ?        S<   05:06   0:00 [ata_sff]
root       296  0.0  0.0      0     0 ?        S    05:06   0:00 [scsi_eh_0]
root       297  0.0  0.0      0     0 ?        S<   05:06   0:00 [scsi_tmf_0]
root       298  0.0  0.0      0     0 ?        S    05:06   0:00 [scsi_eh_1]
root       299  0.0  0.0      0     0 ?        S<   05:06   0:00 [scsi_tmf_1]
root       301  0.0  0.0      0     0 ?        S    05:06   0:00 [scsi_eh_2]
root       302  0.0  0.0      0     0 ?        S<   05:06   0:00 [scsi_tmf_2]
root       303  0.0  0.0      0     0 ?        S    05:06   0:00 [kworker/u4:3]
root       318  0.0  0.0      0     0 ?        S<   05:06   0:00 [kworker/0:1H]
root       325  0.0  0.0      0     0 ?        S<   05:06   0:00 [bioset]
root       326  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfsalloc]
root       327  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs_mru_cache]
root       328  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs-buf/sda1]
root       329  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs-data/sda1]
root       330  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs-conv/sda1]
root       331  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs-cil/sda1]
root       332  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs-reclaim/sda]
root       333  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs-log/sda1]
root       334  0.0  0.0      0     0 ?        S<   05:06   0:00 [xfs-eofblocks/s]
root       335  0.0  0.0      0     0 ?        S    05:06   0:00 [xfsaild/sda1]
root       336  0.0  0.0      0     0 ?        S<   05:06   0:00 [kworker/1:1H]
root       415  0.0  0.1  39080  3292 ?        Ss   05:06   0:00 /usr/lib/systemd/systemd-journald
root       434  0.0  0.0  44760  1884 ?        Ss   05:06   0:00 /usr/lib/systemd/systemd-udevd
root       435  0.0  0.0      0     0 ?        S    05:06   0:00 [kworker/1:2]
root       510  0.0  0.0  55528   892 ?        S<sl 05:06   0:00 /sbin/auditd
root       605  0.0  0.0  21536  1224 ?        Ss   05:06   0:00 /usr/sbin/irqbalance --foreground
polkitd    607  0.0  0.4 612244 14156 ?        Ssl  05:06   0:00 /usr/lib/polkit-1/polkitd --no-debug
dbus       608  0.0  0.0  58236  2412 ?        Ss   05:06   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root       614  0.0  0.3 550184  8988 ?        Ssl  05:06   0:00 /usr/sbin/NetworkManager --no-daemon
root       615  0.0  0.0  26380  1748 ?        Ss   05:06   0:00 /usr/lib/systemd/systemd-logind
root       617  0.0  0.0 126292  1568 ?        Ss   05:06   0:00 /usr/sbin/crond -n
root       624  0.0  0.0 110108   856 tty1     Ss+  05:06   0:00 /sbin/agetty --noclear tty1 linux
root       884  0.0  0.5 574200 17424 ?        Ssl  05:06   0:00 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
root       885  0.0  0.1 112920  4316 ?        Ss   05:06   0:00 /usr/sbin/sshd -D
root      8517  0.0  0.1 218548  3180 ?        Ssl  05:08   0:00 /usr/sbin/rsyslogd -n
root      8600  0.0  0.0 113184  1428 ?        S    05:08   0:00 /bin/bash /usr/local/mariadb/columnstore/bin/run.sh -l /tmp/columnstore_tmp_files /usr/local/mariadb/columnstore/bin/ProcMon
root      9128  4.5  1.2 530472 36592 ?        Sl   05:13   0:31 /usr/local/mariadb/columnstore/bin/ProcMon
root      9507  0.0  0.0 113320  1640 ?        S    05:14   0:00 /bin/sh /usr/local/mariadb/columnstore/mysql//bin/mysqld_safe --datadir=/usr/local/mariadb/columnstore/mysql/db --pid-file=/usr/local/mariadb/columnstore/mysql/db/um1.pid -
mysql     9670  0.0  3.6 1942984 105328 ?      Sl   05:14   0:00 /usr/local/mariadb/columnstore/mysql//bin/mysqld --basedir=/usr/local/mariadb/columnstore/mysql/ --datadir=/usr/local/mariadb/columnstore/mysql/db --plugin-dir=/usr/local/m
root      9740  0.0  1.1 225876 32844 ?        Sl   05:14   0:00 /usr/local/mariadb/columnstore/bin/ServerMonitor
root      9756  0.0  1.3 245420 38312 ?        Sl   05:14   0:00 /usr/local/mariadb/columnstore/bin/workernode DBRM_Worker2 fg
root     10663  0.0  1.6 279076 47252 ?        S<l  05:14   0:00 [ExeMgr]
root     10681  0.0  1.1 240564 32156 ?        Sl   05:14   0:00 [DDLProc]
root     10721  0.0  1.4 299004 41076 ?        Sl   05:14   0:00 [DMLProc]
root     11923  0.0  0.0      0     0 ?        R    05:18   0:00 [kworker/0:2]
root     13519  0.0  0.0      0     0 ?        S    05:23   0:00 [kworker/0:0]
root     13674  0.0  0.1 102896  5516 ?        S    05:24   0:00 /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-enp0s3.pid -lf /var/lib/NetworkManager/dhclient-21babde8-ca18-4d13-ae77-f865427ea90c-enp0s3.lease
root     13778  0.1  0.2 161532  6176 ?        Ss   05:24   0:00 sshd: jens [priv]
jens     13780  0.0  0.0 161532  2340 ?        D    05:24   0:00 sshd: jens@pts/0
jens     13781  0.0  0.0 115448  2032 pts/0    Ss   05:24   0:00 -bash
root     13799  0.1  0.1 243976  5292 pts/0    S    05:24   0:00 sudo su
root     13800  0.0  0.0      0     0 ?        S    05:24   0:00 [kworker/0:3]
root     13801  0.0  0.0 191780  2348 pts/0    S    05:24   0:00 su
root     13802  0.0  0.0 115448  2052 pts/0    S    05:24   0:00 bash
root     13902  0.0  0.0 155372  1872 pts/0    R+   05:24   0:00 ps -aux

In our SkySQL dev environment however 7 zombie mysqld_safe processes are seen:

[root@cs-test-mdb-cs-um-module-0 /]# ps -aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 110268  6720 ?        Ssl  12:13   0:00 /mnt/config-map/mariacmd start columnstore -t multi-node
root           9  0.0  0.0   4392  1168 ?        S    12:13   0:00 runsvdir -P -H /etc/service log: .........................................................................................................................................
root         126  0.0  0.0  11692  2756 ?        S    12:15   0:00 /bin/bash /usr/local/mariadb/columnstore/bin/run.sh -l /tmp/columnstore_tmp_files /usr/local/mariadb/columnstore/bin/ProcMon
root         129  0.0  0.0   4240   736 ?        Ss   12:15   0:00 runsv systemd-journald
root         130  0.0  0.0   4240   644 ?        Ss   12:15   0:00 runsv rsyslogd
root         135  0.0  0.0  11692  2644 ?        S    12:15   0:00 /bin/sh ./run
root         140  0.0  0.0  39096  6696 ?        S    12:15   0:00 /usr/lib/systemd/systemd-journald
root         448  0.0  0.0      0     0 ?        Z    12:15   0:00 [mysqld_safe] <defunct>
root         832  0.0  0.0 218548  4920 ?        Sl   12:16   0:00 /sbin/rsyslogd -n
root         867  0.3  0.2 431060 19880 ?        Sl   12:16   0:17 /usr/local/mariadb/columnstore/bin/ProcMon
root        1054  0.0  0.0  11828  2964 ?        S    12:16   0:00 /bin/sh /usr/local/mariadb/columnstore/mysql//bin/mysqld_safe --datadir=/usr/local/mariadb/columnstore/mysql/db --pid-file=/usr/local/mariadb/columnstore/mysql/db/cs-test
mysql       1232  0.1  1.3 1958296 104020 ?      Sl   12:16   0:04 /usr/local/mariadb/columnstore/mysql//bin/mysqld --basedir=/usr/local/mariadb/columnstore/mysql/ --datadir=/usr/local/mariadb/columnstore/mysql/db --plugin-dir=/usr/local
root        1296  0.0  0.2 230084 19224 ?        Sl   12:16   0:02 /usr/local/mariadb/columnstore/bin/ServerMonitor
root        1313  0.0  0.3 249632 26064 ?        Sl   12:16   0:01 /usr/local/mariadb/columnstore/bin/workernode DBRM_Worker2 fg
root        1376  0.0  0.0      0     0 ?        Z    12:16   0:00 [mysqld_safe] <defunct>
root        1555  0.0  0.0      0     0 ?        Z    12:16   0:00 [mysqld_safe] <defunct>
root        1739  0.0  0.0      0     0 ?        Z    12:16   0:00 [mysqld_safe] <defunct>
root        1918  0.0  0.0      0     0 ?        Z    12:16   0:00 [mysqld_safe] <defunct>
root        2100  0.0  0.0      0     0 ?        Z    12:16   0:00 [mysqld_safe] <defunct>
root        2293  0.0  0.0      0     0 ?        Z    12:16   0:00 [mysqld_safe] <defunct>
root        2465  0.0  0.2 283284 18352 ?        Sl   12:16   0:01 [ExeMgr]
root        2509  0.0  0.3 519316 24168 ?        Sl   12:16   0:01 [DDLProc]
root        2558  0.0  0.3 303212 26560 ?        Sl   12:16   0:02 [DMLProc]
root       30007  0.1  0.0  11832  2948 pts/0    Ss   13:36   0:00 /bin/bash
root       30070  0.0  0.0  51752  3512 pts/0    R+   13:36   0:00 ps -aux



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-10-17 ]

Can you please run ps -aufx on the one with defunct so we can see where they come from?

Comment by Jens Röwekamp (Inactive) [ 2019-10-17 ]

absolutely:

[root@cs-test-mdb-cs-um-module-0 /]# ps -auxf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        7092  0.0  0.0  11832  2956 pts/0    Ss   15:27   0:00 /bin/bash
root        7146  0.0  0.0  51748  3556 pts/0    R+   15:27   0:00  \_ ps -auxf
root           1  0.0  0.0 111676  6720 ?        Ssl  15:03   0:00 /mnt/config-map/mariacmd start columnstore -t multi-node
root           9  0.0  0.0   4392  1236 ?        S    15:03   0:00 runsvdir -P -H /etc/service log: .........................................................................................................................................
root         380  0.0  0.0   4240   648 ?        Ss   15:10   0:00  \_ runsv systemd-journald
root         386  0.0  0.0  11692  2496 ?        S    15:10   0:00  |   \_ /bin/sh ./run
root         388  0.0  0.0  39096  6752 ?        S    15:10   0:00  |       \_ /usr/lib/systemd/systemd-journald
root         381  0.0  0.0   4240   656 ?        Ss   15:10   0:00  \_ runsv rsyslogd
root         385  0.0  0.0 218548  4940 ?        Sl   15:10   0:00      \_ /sbin/rsyslogd -n
root         377  0.0  0.0  11692  2652 ?        S    15:10   0:00 /bin/bash /usr/local/mariadb/columnstore/bin/run.sh -l /tmp/columnstore_tmp_files /usr/local/mariadb/columnstore/bin/ProcMon
root        1014  1.2  0.2 431060 19868 ?        Sl   15:12   0:11  \_ /usr/local/mariadb/columnstore/bin/ProcMon
root        1440  0.1  0.2 230084 18828 ?        Sl   15:12   0:01      \_ /usr/local/mariadb/columnstore/bin/ServerMonitor
root        1454  0.1  0.3 249632 25940 ?        Sl   15:12   0:01      \_ /usr/local/mariadb/columnstore/bin/workernode DBRM_Worker2 fg
root        3491  0.1  0.2 299676 18404 ?        Sl   15:12   0:01      \_ [ExeMgr]
root        3537  0.1  0.3 523416 24036 ?        Sl   15:12   0:01      \_ [DDLProc]
root        3578  0.1  0.3 303212 26480 ?        Sl   15:12   0:01      \_ [DMLProc]
root         694  0.0  0.0      0     0 ?        Z    15:11   0:00 [mysqld_safe] <defunct>
root        1198  0.0  0.0  11828  2800 ?        S    15:12   0:00 /bin/sh /usr/local/mariadb/columnstore/mysql//bin/mysqld_safe --datadir=/usr/local/mariadb/columnstore/mysql/db --pid-file=/usr/local/mariadb/columnstore/mysql/db/cs-test
mysql       1373  0.0  1.3 1958304 104332 ?      Sl   15:12   0:00  \_ /usr/local/mariadb/columnstore/mysql//bin/mysqld --basedir=/usr/local/mariadb/columnstore/mysql/ --datadir=/usr/local/mariadb/columnstore/mysql/db --plugin-dir=/usr/l
root        1509  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        1688  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        1864  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        2043  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        2219  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        2418  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        2594  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        2773  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        2952  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        3148  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>
root        3327  0.0  0.0      0     0 ?        Z    15:12   0:00 [mysqld_safe] <defunct>

Comment by Jens Röwekamp (Inactive) [ 2019-11-18 ]

Using tini helped. Thanks a lot.

Generated at Thu Feb 08 02:43:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.