[MXS-4599] AVX instructions end up being executed on startup Created: 2023-04-26  Updated: 2023-05-08  Resolved: 2023-05-08

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 22.08.5, 23.02.1
Fix Version/s: 6.4.7, 22.08.6, 23.02.2

Type: Bug Priority: Major
Reporter: Brad Chin (Inactive) Assignee: Niclas Antti
Resolution: Fixed Votes: 0
Labels: None
Environment:

Red Hat Enterprise Linux release 9.1
Rocky Linux release 9.1


Sprint: MXS-SPRINT-182

 Description   

Program received signal SIGILL, Illegal instruction.
0x00007ffff7b9fa5c in _GLOBAL__sub_I_simd_canonical.cc () at /usr/include/c++/11/ext/new_allocator.h:79
79	      new_allocator() _GLIBCXX_USE_NOEXCEPT { }
Missing separate debuginfos, use: dnf debuginfo-install maxscale-23.02.1-1.rhel.9.x86_64
(gdb) bt
#0  0x00007ffff7b9fa5c in _GLOBAL__sub_I_simd_canonical.cc () at /usr/include/c++/11/ext/new_allocator.h:79
#1  0x00007ffff7fd11ae in call_init (env=0x7fffffffe3e8, argv=0x7fffffffe3b8, argc=5, l=<optimized out>) at dl-init.c:70
#2  call_init (l=<optimized out>, argc=5, argv=0x7fffffffe3b8, env=0x7fffffffe3e8) at dl-init.c:26
#3  0x00007ffff7fd129c in _dl_init (main_map=0x7ffff7ffe220, argc=5, argv=0x7fffffffe3b8, env=0x7fffffffe3e8) at dl-init.c:117
#4  0x00007ffff7fe7b0a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5  0x0000000000000005 in ?? ()
#6  0x00007fffffffe663 in ?? ()
#7  0x00007fffffffe675 in ?? ()
#8  0x00007fffffffe678 in ?? ()
#9  0x00007fffffffe681 in ?? ()
#10 0x00007fffffffe684 in ?? ()
#11 0x0000000000000000 in ?? ()
(gdb)


Original description:

I start the installation process

[root@karma033 clustrix]# yum -y install ./maxscale-23.02.1-1.rhel.9.x86_64.rpm
Updating Subscription Management repositories.
Unable to read consumer identity
 
This system is not registered with an entitlement server. You can use subscription-manager to register.
 
Clustrix custom EL 9 packages                                                                                                                                                                                           23 MB/s | 491 kB     00:00
RHEL-9 - AppStream                                                                                                                                                                                                      59 MB/s |  18 MB     00:00
RHEL-9 - BaseOS                                                                                                                                                                                                         58 MB/s |  10 MB     00:00
RHEL-9 - CRB                                                                                                                                                                                                            53 MB/s | 4.8 MB     00:00
Extra Packages for Enterprise Linux 9 - x86_64                                                                                                                                                                          60 MB/s |  15 MB     00:00
Dependencies resolved.
=======================================================================================================================================================================================================================================================
 Package                                                     Architecture                                          Version                                                           Repository                                                   Size
=======================================================================================================================================================================================================================================================
Installing:
 maxscale                                                    x86_64                                                23.02.1-1.rhel.9                                                  @commandline                                                 82 M
Installing dependencies:
 libatomic                                                   x86_64                                                11.3.1-2.1.el9                                                    base                                                         59 k
 libtool-ltdl                                                x86_64                                                2.4.6-45.el9                                                      AppStream                                                    39 k
 unixODBC                                                    x86_64                                                2.3.9-4.el9                                                       AppStream                                                   495 k
 
Transaction Summary
=======================================================================================================================================================================================================================================================
Install  4 Packages
 
Total size: 82 M

I check the usual locations that maxscale files should be

[root@karma033 etc]# cd /var/lib/maxscale
[root@karma033 maxscale]# ls
[root@karma033 maxscale]# cd
[root@karma033 ~]# ls
anaconda-ks.cfg  original-ks.cfg  pxe_command
[root@karma033 ~]# systemctl restart maxscale
Job for maxscale.service failed because a fatal signal was delivered causing the control process to dump core.
See "systemctl status maxscale.service" and "journalctl -xeu maxscale.service" for details.
[root@karma033 ~]# ls /var/log/maxscale
[root@karma033 ~]#

There is no maxscale.log.

I had the following in my /etc/maxscale.cnf

[maxscale]
threads          = auto
admin_host       = 0.0.0.0
admin_secure_gui = false

This is what I see in dmesg

[  646.714809] show_signal: 11 callbacks suppressed
[  646.714815] traps: maxscale[44390] trap invalid opcode ip:7fd8ed566a5c sp:7ffc245ef3d0 error:0 in libmaxscale-common.so.1.0.0[7fd8ed528000+338000]
[  646.976665] traps: maxscale[44399] trap invalid opcode ip:7faa3028aa5c sp:7ffcc1971450 error:0 in libmaxscale-common.so.1.0.0[7faa3024c000+338000]
[  647.226668] traps: maxscale[44408] trap invalid opcode ip:7f4ba177da5c sp:7ffed9fe9ab0 error:0 in libmaxscale-common.so.1.0.0[7f4ba173f000+338000]
[  647.476661] traps: maxscale[44417] trap invalid opcode ip:7f34f49bea5c sp:7ffeece09ce0 error:0 in libmaxscale-common.so.1.0.0[7f34f4980000+338000]
[  647.727664] traps: maxscale[44426] trap invalid opcode ip:7f3003144a5c sp:7ffd8454f6b0 error:0 in libmaxscale-common.so.1.0.0[7f3003106000+338000]
[  647.976657] traps: maxscale[44435] trap invalid opcode ip:7f11d636ea5c sp:7ffe879f8fe0 error:0 in libmaxscale-common.so.1.0.0[7f11d6330000+338000]



 Comments   
Comment by markus makela [ 2023-04-27 ]

bchin what does cat /proc/cpuinfo show on the machine?

Also, could you check if /usr/share/maxscale contains any files? If it does, I think it installed correctly but the invalid instruction is causing it to not start.

Comment by Brad Chin (Inactive) [ 2023-04-27 ]

Created symlink /etc/systemd/system/multi-user.target.wants/maxscale.service → /usr/lib/systemd/system/maxscale.service.
 
  Verifying        : libtool-ltdl-2.4.6-45.el9.x86_64                                                                                                                                                                                              1/4
  Verifying        : unixODBC-2.3.9-4.el9.x86_64                                                                                                                                                                                                   2/4
  Verifying        : libatomic-11.3.1-2.1.el9.x86_64                                                                                                                                                                                               3/4
  Verifying        : maxscale-23.02.1-1.rhel.9.x86_64                                                                                                                                                                                              4/4
 
Installed:
  libatomic-11.3.1-2.1.el9.x86_64                               libtool-ltdl-2.4.6-45.el9.x86_64                               maxscale-23.02.1-1.rhel.9.x86_64                               unixODBC-2.3.9-4.el9.x86_64
 
Complete!
[root@karma033 ~]# ls /usr/share/maxscale/
COPYRIGHT      LICENSE-THIRDPARTY.TXT  LICENSE2302.TXT  ReleaseNotes.txt           cdc_schema.go  create_roles.sql  maxscale               maxscale.conf     maxscale_logrotate  prerm
Changelog.txt  LICENSE.TXT             README.md        UpgradingToMaxScale12.txt  create_grants  gui               maxscale.cnf.template  maxscale.service  postinst
[root@karma033 ~]# cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping	: 2
microcode	: 0x1f
cpu MHz		: 2401.000
cache size	: 12288 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
vmx flags	: vnmi preemption_timer invvpid ept_x_only ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips	: 4799.70
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

And 15 more core info.

Comment by markus makela [ 2023-04-27 ]

Looks like the CPU doesn't support AVX or AVX2. Is this a virtualized environment of some sorts?

Comment by Brad Chin (Inactive) [ 2023-04-27 ]

This is a bare metal machine. just an old cpu.

Comment by markus makela [ 2023-04-27 ]

Can you start MaxScale under GDB and see if you can catch where the problem is occurring? Something like this should work:

sudo gdb -batch -ex 'r -d -lstdout -U maxscale' /usr/bin/maxscale

Comment by Brad Chin (Inactive) [ 2023-04-27 ]

[root@karma033 ~]# gdb -batch -ex 'r -d -lstdout -U maxscale' /usr/bin/maxscale
...
Downloading separate debug info for /lib64/libresolv.so.2...
Downloading separate debug info for /root/.cache/debuginfod_client/dd26798426928fb454335411ecfeb883030b1f6c/debuginfo...
Downloading separate debug info for /lib64/libsasl2.so.3...
Downloading separate debug info for /root/.cache/debuginfod_client/78a77f792072693b0a303f6d8924eeee033fb498/debuginfo...
 
Program received signal SIGILL, Illegal instruction.
0x00007ffff7b9fa5c in _GLOBAL__sub_I_simd_canonical.cc () at /usr/include/c++/11/ext/new_allocator.h:79
79	      new_allocator() _GLIBCXX_USE_NOEXCEPT { }
[root@karma033 ~]#

Comment by markus makela [ 2023-04-27 ]

Oh yeah, it doesn't print the stacktrace if it's not requested

Could you try again by running GDB manually:

sudo gdb /usr/bin/maxscale

Then once GDB has started, start MaxScale with:

r -d -lstdout -U maxscale

Once it stops and hits that problem, run the bt command to get the full stacktrace.

Comment by Brad Chin (Inactive) [ 2023-04-27 ]

this doesn't look useful, but hopefully it helps

[root@karma033 ~]# gdb /usr/bin/maxscale
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.el9
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
 
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/maxscale...
(gdb) r -d -lstdout -U maxscale
Starting program: /usr/bin/maxscale -d -lstdout -U maxscale
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
 
Program received signal SIGILL, Illegal instruction.
0x00007ffff7b9fa5c in _GLOBAL__sub_I_simd_canonical.cc () at /usr/include/c++/11/ext/new_allocator.h:79
79	      new_allocator() _GLIBCXX_USE_NOEXCEPT { }
Missing separate debuginfos, use: dnf debuginfo-install maxscale-23.02.1-1.rhel.9.x86_64
(gdb) bt
#0  0x00007ffff7b9fa5c in _GLOBAL__sub_I_simd_canonical.cc () at /usr/include/c++/11/ext/new_allocator.h:79
#1  0x00007ffff7fd11ae in call_init (env=0x7fffffffe3e8, argv=0x7fffffffe3b8, argc=5, l=<optimized out>) at dl-init.c:70
#2  call_init (l=<optimized out>, argc=5, argv=0x7fffffffe3b8, env=0x7fffffffe3e8) at dl-init.c:26
#3  0x00007ffff7fd129c in _dl_init (main_map=0x7ffff7ffe220, argc=5, argv=0x7fffffffe3b8, env=0x7fffffffe3e8) at dl-init.c:117
#4  0x00007ffff7fe7b0a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5  0x0000000000000005 in ?? ()
#6  0x00007fffffffe663 in ?? ()
#7  0x00007fffffffe675 in ?? ()
#8  0x00007fffffffe678 in ?? ()
#9  0x00007fffffffe681 in ?? ()
#10 0x00007fffffffe684 in ?? ()
#11 0x0000000000000000 in ?? ()
(gdb)

Comment by markus makela [ 2023-04-27 ]

I think that actually turned out to be very useful output. It shows that this is probably happening when the linker loads the core MaxScale library and somehow AVX instructions end up spilling over where they shouldn't be called.

Generated at Thu Feb 08 04:29:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.