Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35428

After upgrading from 10.6.16 to 10.6.19, a DB crashed frequently

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.6.19
    • N/A
    • None
    • None
    • RHEL8, VMWare, 2 active nodes plus a witness in a cluster running Galera 26.4.19.

    Description

      Hi MariaDB support team,

      Observation:

      • no stack trace/coredump is produced for checking further
      • no assertion error message appeared in mariadb-error.log
      • sudden crash, no signal 6, signal 11 or other signals in mariadb-error.log
      • sar CPU has spare capacity
      • sar memory was enough and no swapping
      • same version 10.6.19 executed for other similar systems (>300) but only 2 DBs w/ similar schema have this issue

      Next action:

      • suspect the problem is SQL / schema specific but could not pinpoint from binlog as failed SQLs are not captured in binlog
      • plan to enable audit (initially for DMLs) to track the offending SQL(s), may extend to DDL or even queries later
      • advise if there is better method to capture the offending SQL(s)

      Thanks and best regards,

      Lawrence

      Attachments

        Issue Links

          Activity

            serg Sergei Golubchik added a comment - - edited

            it's likely MDEV-34683, which was fixed in 10.6.20.
            Please, comment, if it won't disappear after the upgrade

            serg Sergei Golubchik added a comment - - edited it's likely MDEV-34683 , which was fixed in 10.6.20. Please, comment, if it won't disappear after the upgrade
            LawrenceMan Lawrence Man added a comment -

            Our site has a lot of 10.6.19 DBs and to ease administration, we're inclined to also use 10.6.19 for this particular DB encountering the crash issue, unless it is sure that the problem is resolved in 10.6.20.

            We tried to run the 10.6.19 DB for a while and captured some core dumps. Please see if more information can be derived from the dump files to give a better way forward.

            Auditing was turned on but the SQLs captured just before the crash were also executed well before the crash. Hence, it appears such SQLs may not be the true cause of crash.

            LawrenceMan Lawrence Man added a comment - Our site has a lot of 10.6.19 DBs and to ease administration, we're inclined to also use 10.6.19 for this particular DB encountering the crash issue, unless it is sure that the problem is resolved in 10.6.20. We tried to run the 10.6.19 DB for a while and captured some core dumps. Please see if more information can be derived from the dump files to give a better way forward. Auditing was turned on but the SQLs captured just before the crash were also executed well before the crash. Hence, it appears such SQLs may not be the true cause of crash.

            can you run thread apply all bt in gdb on your coredump?

            I don't have exactly the same set of libraries of exactly the same versions as what you have and a core is less useful without them

            serg Sergei Golubchik added a comment - can you run thread apply all bt in gdb on your coredump? I don't have exactly the same set of libraries of exactly the same versions as what you have and a core is less useful without them
            LawrenceMan Lawrence Man added a comment -

            Finally, we decided to go for an organization-wide adoption of 10.6.20 as more than one DBs hit this issue and 10.6.19 is not stable enough for production use.

            LawrenceMan Lawrence Man added a comment - Finally, we decided to go for an organization-wide adoption of 10.6.20 as more than one DBs hit this issue and 10.6.19 is not stable enough for production use.

            People

              Unassigned Unassigned
              LawrenceMan Lawrence Man
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.