[MDEV-30411] main.explain_json_format_partitions fails on Debian armel and armhf builders with mismatches Created: 2023-01-14  Updated: 2023-10-09  Resolved: 2023-04-27

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.11.1
Fix Version/s: 11.1.1, 10.11.3, 11.0.2

Type: Bug Priority: Critical
Reporter: Otto Kekäläinen Assignee: Daniel Black
Resolution: Fixed Votes: 0
Labels: None

Attachments: File my_timer_cycles-default.patch     PNG File screenshot-1.png    
Issue Links:
Problem/Incident
is caused by MDEV-28926 Make ANALYZE FORMAT=JSON show time sp... Closed
Relates
relates to MDEV-30875 main.analyze_stmt_orderby/rowid_filte... Closed

 Description   

On the first ever upload of MariaDB 10.11 to Debian, I noticed that various tests that compare the JSON output from the optimizer fails with mismatches on armel and armhf builds (amd64 and arm64 pass):

Failing test(s): main.rowid_filter_innodb main.analyze_stmt_orderby main.explain_json_format_partitions main.cte_recursive main.derived_cond_pushdown main.order_by main.rowid_filter main.intersect main.except main.except_all

main.except_all                          w5 [ fail ]
        Test ended at 2023-01-12 20:20:18
 
CURRENT_TEST: main.except_all
--- /<<PKGBUILDDIR>>/mysql-test/main/except_all.result	2022-11-14 18:10:21.000000000 +0000
+++ /<<PKGBUILDDIR>>/mysql-test/main/except_all.reject	2023-01-12 20:20:18.132443098 +0000
@@ -115,13 +115,9 @@
 ANALYZE format=json select * from ((select a,b from t1) except all (select c,d from t2)) a;
 ANALYZE
 {
-  "query_optimization": {
-    "r_total_time_ms": "REPLACED"
-  },
   "query_block": {
     "select_id": 1,
     "r_loops": 1,
-    "r_total_time_ms": "REPLACED",
     "nested_loop": [
       {
         "table": {
@@ -146,7 +142,6 @@
                     "query_block": {
                       "select_id": 2,
                       "r_loops": 1,
-                      "r_total_time_ms": "REPLACED",
                       "nested_loop": [
                         {
                           "table": {
@@ -169,7 +164,6 @@
                       "select_id": 3,
                       "operation": "EXCEPT",
                       "r_loops": 1,
-                      "r_total_time_ms": "REPLACED",
                       "nested_loop": [
                         {
                           "table": {
@@ -199,13 +193,9 @@
 ANALYZE format=json select * from ((select a from t1) except all (select c from t2)) a;
 ANALYZE
 {
-  "query_optimization": {
-    "r_total_time_ms": "REPLACED"
-  },
   "query_block": {
     "select_id": 1,
     "r_loops": 1,
-    "r_total_time_ms": "REPLACED",
     "nested_loop": [
       {
         "table": {
@@ -230,7 +220,6 @@
                     "query_block": {
                       "select_id": 2,
                       "r_loops": 1,
-                      "r_total_time_ms": "REPLACED",
                       "nested_loop": [
                         {
                           "table": {
@@ -253,7 +242,6 @@
                       "select_id": 3,
                       "operation": "EXCEPT",
                       "r_loops": 1,
-                      "r_total_time_ms": "REPLACED",
                       "nested_loop": [
                         {
                           "table": {
@@ -469,9 +457,6 @@
 ANALYZE format=json (select a,b,e,f from t1,t3) except all (select c,d,g,h from t2,t4);
 ANALYZE
 {
-  "query_optimization": {
-    "r_total_time_ms": "REPLACED"
-  },
   "query_block": {
     "union_result": {
       "table_name": "<except1,2>",
@@ -483,7 +468,6 @@
           "query_block": {
             "select_id": 1,
             "r_loops": 1,
-            "r_total_time_ms": "REPLACED",
             "nested_loop": [
               {
                 "table": {
@@ -525,7 +509,6 @@
             "select_id": 2,
             "operation": "EXCEPT",
             "r_loops": 1,
-            "r_total_time_ms": "REPLACED",
             "nested_loop": [
               {
                 "table": {
@@ -569,13 +552,9 @@
 ANALYZE format=json select * from ((select a,b,e,f from t1,t3) except all (select c,d,g,h from t2,t4)) t;
 ANALYZE
 {
-  "query_optimization": {
-    "r_total_time_ms": "REPLACED"
-  },
   "query_block": {
     "select_id": 1,
     "r_loops": 1,
-    "r_total_time_ms": "REPLACED",
     "nested_loop": [
       {
         "table": {
@@ -600,7 +579,6 @@
                     "query_block": {
                       "select_id": 2,
                       "r_loops": 1,
-                      "r_total_time_ms": "REPLACED",
                       "nested_loop": [
                         {
                           "table": {
@@ -642,7 +620,6 @@
                       "select_id": 3,
                       "operation": "EXCEPT",
                       "r_loops": 1,
-                      "r_total_time_ms": "REPLACED",
                       "nested_loop": [
                         {
                           "table": {
 
mysqltest: Result length mismatch

Logs:

This is somewhat similar to MDEV-11711 that had mismatches in JSON output on armhf only.

This is also similar to MDEV-8981 and MDEV-11866 where the issue was specifically the row

"r_total_time_ms": "REPLACED",

Jira MDEV-20538 is the newest in this category and only one I found still open (assigned to psergei).



 Comments   
Comment by Daniel Black [ 2023-01-14 ]

Debian non-supported-upstream arches already notified of this and shown where to do a 1-2 line change.

Request ref.

noting armel, armhf, hppa, sparc64 unsupported upstream:

If a (include/my_rdtsc.h) static inline ulonglong my_timer_cycles(void) implementation for those is provided, all the "ANALYZE format=json" tests, will pass.

Contributions welcome ( https://github.com/MariaDB/server/tree/10.11/include/my_rdtsc.h) (10.4+)

I guess we'll see if they are going to support their architectures.

If MTR supported json expression output, mysql-test/include/analyze-format.inc could remove r_total_time_ms.

Comment by Otto Kekäläinen [ 2023-01-15 ]

The same is visible also on https://launchpadlibrarian.net/644154363/buildlog_ubuntu-lunar-armhf.mariadb_1%3A10.11.1-1~exp1~ubuntu23.04.1~1672817558.953d5feaf75.debian.experimental_BUILDING.txt.gz - seems all tests with JSON EXPLAIN output has this and they fail. Thus it can't be worked around by simply skipping a couple of tests.

Comment by Adrian Bunk [ 2023-01-20 ]

I've just attached my_timer_cycles-default.patch, which uses my_timer_microseconds() as fallback implementation for my_timer_cycles().

This is the fastest (effectively one-line change) hack for getting this test failure fixed, but likely not a good fix.

Comment by Otto Kekäläinen [ 2023-01-21 ]

This is tracked in Debian as: https://bugs.debian.org/1029163

I think this is quite a relevant thing to fix as it affects the architecture used in Raspberry PIs, and they are quite popular.

Eric_Herman Are you familiar with the topic https://github.com/MariaDB/server/tree/10.11/include/my_rdtsc.h?
Do you want to review the patch attached here (and same patch pending at https://salsa.debian.org/mariadb-team/mariadb-server/-/merge_requests/28)?

Comment by Daniel Black [ 2023-01-21 ]

AdrianBunk thanks for your patch an comments.

From your comments you suggest std::chrono::high_resolution_clock may be an option. I'm open to moving the few c files that use this to C++ (especially as its C++11).

list of c files using my_rdtsc.h

  • mysys/my_cpu.c
  • mysys/my_delete.c
  • mysys/my_rdtsc.c
  • sql/sql_analyze_stmt.h
  • storage/perfschema/pfs_timer.cc
  • unittest/mysys/my_rdtsc-t.c

Can you please:

  • use std::chrono::high_resolution_clock in include/my_rdtsc.h
  • rename unittest/mysys/my_rdtsc-t.c -> unittest/mysys/my_rdtsc-t.cc (as a first test and its entry in CMakeList.txt)
  • optional: my_timer_init_frequency might be better initialized from std::chrono::high_resolution_clock.period (though if a comparison with the existing self measurement is possible that would be insightful)
  • build and run unit test cmake --build . --target my_rdtsc-t && unittest/mysys/my_rdtsc-t, (include output here)

I can handle the renaming of other files.

Comment by Eric Herman [ 2023-01-22 ]

I am not yet familiar with the topic, yet it seems like it might be "up my street", thus I shall take a look in the coming days.

Comment by Adrian Bunk [ 2023-01-22 ]

@danblack Your first test does not work due to MY_ADD_TESTS having SET(ARG_EXT "c"). That's likely easier to test by someone who understands your build system better.

Using std::chrono::high_resolution_clock is not my suggestion, it's from the TODO in the comment above my_timer_cycles() in include/my_rdtsc.h. Based on the Notes at https://en.cppreference.com/w/cpp/chrono/high_resolution_clock std::chrono::steady_clock might be a better option.

Comment by Daniel Black [ 2023-01-23 ]

I had a look at the C++ and while resturning std::chrono::time_point was easy I didn't want to restructure it enough to go back to getting intervals and returning difference.

I found that clock_gettime is a vdso function linux on all architectures (except hppa) and this is what's used by my_interval_timer.

lizardo, is this acceptable? https://github.com/MariaDB/server/pull/2448

AdrianBunk, If not acceptable, could my_timer_cycles would return my_timer_nanoseconds (rather than milliseconds) be acceptable?

Comment by Otto Kekäläinen [ 2023-03-19 ]

For the record, I have not encountered this failure on Launchpad or Debian builders since https://github.com/MariaDB/server/pull/2448 was backported to Debian in https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/340b190ac8a38dc60e176b136966229d40d14b2b

Comment by Sergei Petrunia [ 2023-04-24 ]

Review comments in the PR. Ok to push after addressed.

Comment by Daniel Black [ 2023-04-27 ]

Thanks psergei.

Generated at Thu Feb 08 10:16:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.