Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33305

Testing for MDEV-21829: Packed keys in Unique

Details

    • Task
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6
    • Tests
    • None

    Description

      This is about doing testing for MDEV-21829: Use packed sort keys in Unique objects

      Queries affected by MDEV-21829

      Packed Unique has these uses:

      USE1: Histogram collection in EITS
      USE2: agg_func(DISTINCT col,,, )
      USE3: Index_merge stores rowids.

      We need to test the scenario where the data doesn't fit into the in-memory Unique and is spilled on disk.

      The focus is on the scenario where Unique stores variable-sized data, although it's nice to have a test for fixed-size data, too.

      There is no way to turn off the new behavior. You will need to compare with the old server.

      There is a bpftrace script to watch tempfile usage at MDEV-32472.

      Variable-sized data

      This is VARCHAR(n) columns.
      (TODO: what happens with CHAR(n)?) cvicentiu ?

      Controlling amount of memory that Unique uses

      How much memory Unique will use for COUNT(DISTINCT): The logic is here:

        size_t Item_sum::ram_limitation(THD *thd)
        {     
          return MY_MAX(1024,
                   (size_t)MY_MIN(thd->variables.tmp_memory_table_size,
                                  thd->variables.max_heap_table_size));
      

      How much memory Unique will use in EITS ANALYZE:

        void Column_statistics_collected::init(THD *thd, Field *table_field)
        {
          size_t max_heap_table_size= (size_t)thd->variables.max_heap_table_size;
      

      Plan for test#1:

      Create a table with VARCHAR column.

      • Try different charsets (latin, utf8mb3, utf8mb4 if you have the data, ucs2)
      • Try a few different collations (There's no need to try all collations)

      Populate the table with enough random data of different lengths and chars.

      Set @@max_heap_table_size low enough to force Unique to flush to disk.

      Run

      ANALYZE TABLE t PERSISTENT FOR COLUMNS (col) INDEXES ();
      

      Compare the output of

      select * from mysql.column_stats where db_name=datababase() and table_name='t'
      

      with the server before the patch.

      Also compare bpftrace tempfile usage output for both servers.

      Testing for agg_func(DISTINCT ...)

      TODO

      Attachments

        Issue Links

          Activity

            lstartseva, followup to previous talk:

            Attached script indeed counts writes to tempfile, not tempfile size. If tempfile is rewritten, its size is actually smaller.

            Please find attached tempfile-usage3.bt which catches lseek calls, and also tracks read to see how large the file was. Example output.

            XX Lseek on fd=57
            Lseek on fd=57
            ^C
             
            @lseek_in_progress[15244]: 57
             
             
            @tempfile_max_read[58]: 5954293
            @tempfile_max_read[57]: 5968974
             
            @tempfile_size[58]: 5954293
            @tempfile_size[57]: 11920744
            

            Note that tempfile_max_read[57] < tempfile_size[57]: 11920744, and tempfile_max_read is how large the file actually was.

            psergei Sergei Petrunia added a comment - lstartseva , followup to previous talk: Attached script indeed counts writes to tempfile, not tempfile size. If tempfile is rewritten, its size is actually smaller. Please find attached tempfile-usage3.bt which catches lseek calls, and also tracks read to see how large the file was. Example output. XX Lseek on fd=57 Lseek on fd=57 ^C   @lseek_in_progress[15244]: 57     @tempfile_max_read[58]: 5954293 @tempfile_max_read[57]: 5968974   @tempfile_size[58]: 5954293 @tempfile_size[57]: 11920744 Note that tempfile_max_read [57] < tempfile_size [57] : 11920744, and tempfile_max_read is how large the file actually was.
            psergei Sergei Petrunia added a comment - - edited

            I've also discovered this piece of code in:

            bool Unique::walk(TABLE *table, tree_walk_action action, void *walk_action_arg)
            

              if (buff_sz < full_size * (file_ptrs.elements + 1UL))
                res= merge(table, merge_buffer, buff_sz,
                           buff_sz >= full_size * MERGEBUFF2) ;
             
              if (!res)
              {
                res= merge_walk(merge_buffer, buff_sz, full_size,
                                (Merge_chunk *) file_ptrs.buffer,
            

            file_ptrs.elements is "how many unique objects were dumped".
            buff_sz is the amount of buffer space we have
            full_size is one element size.

            This means if the buffer is so so small that we can't have one element from each file in the buffer, then... we use merge_walk() procedure that merges all the buffers at once.

            This involves doing a lot of small read calls:

            @tempfile_read_size: 
            [8, 16)             2297 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
            [16, 32)             495 |@@@@@@@@@@@                                         |
            [32, 64)             458 |@@@@@@@@@@                                          |
            [64, 128)            465 |@@@@@@@@@@                                          |
            [128, 256)           463 |@@@@@@@@@@                                          |
            [256, 512)           491 |@@@@@@@@@@@                                         |
            [512, 1K)            517 |@@@@@@@@@@@                                         |
            [1K, 2K)             511 |@@@@@@@@@@@                                         |
            [2K, 4K)             329 |@@@@@@@                                             |
            [4K, 8K)            1198 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
            

            this is an edge case. It would be nice if testing involved this edge case but mostly "normal" case.
            One can recognize which case occurred by checking whether lseek call was made (see the trace script).

            psergei Sergei Petrunia added a comment - - edited I've also discovered this piece of code in: bool Unique::walk(TABLE *table, tree_walk_action action, void *walk_action_arg) if (buff_sz < full_size * (file_ptrs.elements + 1UL)) res= merge(table, merge_buffer, buff_sz, buff_sz >= full_size * MERGEBUFF2) ;   if (!res) { res= merge_walk(merge_buffer, buff_sz, full_size, (Merge_chunk *) file_ptrs.buffer, file_ptrs.elements is "how many unique objects were dumped". buff_sz is the amount of buffer space we have full_size is one element size. This means if the buffer is so so small that we can't have one element from each file in the buffer, then... we use merge_walk() procedure that merges all the buffers at once. This involves doing a lot of small read calls: @tempfile_read_size: [8, 16) 2297 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16, 32) 495 |@@@@@@@@@@@ | [32, 64) 458 |@@@@@@@@@@ | [64, 128) 465 |@@@@@@@@@@ | [128, 256) 463 |@@@@@@@@@@ | [256, 512) 491 |@@@@@@@@@@@ | [512, 1K) 517 |@@@@@@@@@@@ | [1K, 2K) 511 |@@@@@@@@@@@ | [2K, 4K) 329 |@@@@@@@ | [4K, 8K) 1198 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ | this is an edge case. It would be nice if testing involved this edge case but mostly "normal" case. One can recognize which case occurred by checking whether lseek call was made (see the trace script).

            There are usually two temporary files generated. To understand when it happens see the stacktraces below:
            Stacktrace of the creation of the first temporary file:

            #0  create_temp_file (to=0x7f99afefc570 "\001", dir=0x556bfef55838 ".../MariaDB/10.6/bld/mysql-test/var/tmp/mysqld.1", prefix=0x7f998401bf00 "MY", mode=0, MyFlags=80)
                at .../MariaDB/10.6/mysys/mf_tempfile.c:64
            #1  0x0000556bfb918003 in real_open_cached_file (cache=0x7f998401bdc8) at .../MariaDB/10.6/mysys/mf_cache.c:61
            #2  0x0000556bfb91d387 in my_b_flush_io_cache (info=0x7f998401bdc8, need_append_buffer_lock=0) at .../MariaDB/10.6/mysys/mf_iocache.c:1706
            #3  0x0000556bfb91aba7 in _my_b_write (info=0x7f998401bdc8, Buffer=0x7f9984ae6774 "p%dJ&zX(^#2p`m&n|F&)=66KXZ:Sre\"uLZaYyv\217\217\217\217\217\217\370\253\235\204\231\177", Count=38)
                at .../MariaDB/10.6/mysys/mf_iocache.c:559
            #4  0x0000556bfad83099 in my_b_write (info=0x7f998401bdc8, Buffer=0x7f9984ae6700 "\226", Count=154) at .../MariaDB/10.6/include/my_sys.h:522
            #5  0x0000556bfad83154 in Unique::unique_write_to_file (key=0x7f9984ae6700 "\226", count=1, unique=0x7f998401bd90) at .../MariaDB/10.6/sql/uniques.cc:45
            #6  0x0000556bfb94e8a9 in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984ae66e8, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, 
                argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:590
            #7  0x0000556bfb94e85a in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984992a20, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, 
                argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:588
            #8  0x0000556bfb94e8ce in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984ad4928, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, 
                argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:593
            #9  0x0000556bfb94e8ce in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984a7a838, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, 
                argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:593
            #10 0x0000556bfb94e85a in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f99849d4730, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, 
                argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:588
            #11 0x0000556bfb94e8ce in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984a701f0, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, 
                argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:593
            

            Stacktrace of the creation of the second temporary file:

            #0  create_temp_file (to=0x7f99afefc090 "m\034\357\373kU", dir=0x556bfef55838 ".../MariaDB/10.6/bld/mysql-test/var/tmp/mysqld.1", prefix=0x7f99afefc708 "MY", mode=0, MyFlags=80)
                at .../MariaDB/10.6/mysys/mf_tempfile.c:64
            #1  0x0000556bfb918003 in real_open_cached_file (cache=0x7f99afefc5d0) at .../MariaDB/10.6/mysys/mf_cache.c:61
            #2  0x0000556bfb91d387 in my_b_flush_io_cache (info=0x7f99afefc5d0, need_append_buffer_lock=0) at .../MariaDB/10.6/mysys/mf_iocache.c:1706
            #3  0x0000556bfb91aba7 in _my_b_write (info=0x7f99afefc5d0, 
                Buffer=0x7f9984bc1769 "P_ju12GM0CCdN:vf{Z1$^2x-pnYTvwpNrw{%Eh`'?H,@>6AXyvg#r8`Y|)RA/g9'TS#RW<gr*t/+))4h2\\Y*d:48X7F:2)v:", '\245' <repeats 104 times>..., Count=45)
                at .../MariaDB/10.6/mysys/mf_iocache.c:559
            #4  0x0000556bfafa86c2 in my_b_write (info=0x7f99afefc5d0, Buffer=0x7f9984bc172c "f", Count=106) at .../MariaDB/10.6/include/my_sys.h:522
            #5  0x0000556bfafaf45a in merge_buffers (param=0x7f99afefc7d0, from_file=0x7f998401bdc8, to_file=0x7f99afefc5d0, sort_buffer=..., lastbuff=0x7f9984a10118, Fb=0x7f9984a10118, Tb=0x7f9984a10268, flag=0)
                at .../MariaDB/10.6/sql/filesort.cc:1980
            #6  0x0000556bfafae290 in merge_many_buff (param=0x7f99afefc7d0, sort_buffer=..., buffpek=0x7f9984a10118, maxbuffer=0x7f99afefc7a4, t_file=0x7f998401bdc8)
                at .../MariaDB/10.6/sql/filesort.cc:1657
            #7  0x0000556bfad84d8b in Unique::merge (this=0x7f998401bd90, table=0x7f99849bbf20, buff=0x7f9984ac1f88 "i", buff_size=1046436, without_last_merge=true)
                at .../MariaDB/10.6/sql/uniques.cc:769
            #8  0x0000556bfad84a06 in Unique::walk (this=0x7f998401bd90, table=0x7f99849bbf20, action=0x556bfb0ccf82 <item_sum_distinct_walk_for_count(void*, element_count, void*)>, walk_action_arg=0x7f998401bbd8)
                at .../MariaDB/10.6/sql/uniques.cc:689
            #9  0x0000556bfb0ce3ef in Aggregator_distinct::endup (this=0x7f998401bbd8) at .../MariaDB/10.6/sql/item_sum.cc:1100
            #10 0x0000556bfb0d2022 in Item_sum_count::val_int (this=0x7f99840184f8) at .../MariaDB/10.6/sql/item_sum.cc:1946
            #11 0x0000556bfae95a3b in Type_handler::Item_send_longlong (this=0x556bfc6aa7c0 <type_handler_slonglong>, item=0x7f99840184f8, protocol=0x7f99840013e0, buf=0x7f99afefcaa0)
                at .../MariaDB/10.6/sql/sql_type.cc:7514
            

            lstartseva Lena Startseva added a comment - There are usually two temporary files generated. To understand when it happens see the stacktraces below: Stacktrace of the creation of the first temporary file: #0 create_temp_file (to=0x7f99afefc570 "\001", dir=0x556bfef55838 ".../MariaDB/10.6/bld/mysql-test/var/tmp/mysqld.1", prefix=0x7f998401bf00 "MY", mode=0, MyFlags=80) at .../MariaDB/10.6/mysys/mf_tempfile.c:64 #1 0x0000556bfb918003 in real_open_cached_file (cache=0x7f998401bdc8) at .../MariaDB/10.6/mysys/mf_cache.c:61 #2 0x0000556bfb91d387 in my_b_flush_io_cache (info=0x7f998401bdc8, need_append_buffer_lock=0) at .../MariaDB/10.6/mysys/mf_iocache.c:1706 #3 0x0000556bfb91aba7 in _my_b_write (info=0x7f998401bdc8, Buffer=0x7f9984ae6774 "p%dJ&zX(^#2p`m&n|F&)=66KXZ:Sre\"uLZaYyv\217\217\217\217\217\217\370\253\235\204\231\177", Count=38) at .../MariaDB/10.6/mysys/mf_iocache.c:559 #4 0x0000556bfad83099 in my_b_write (info=0x7f998401bdc8, Buffer=0x7f9984ae6700 "\226", Count=154) at .../MariaDB/10.6/include/my_sys.h:522 #5 0x0000556bfad83154 in Unique::unique_write_to_file (key=0x7f9984ae6700 "\226", count=1, unique=0x7f998401bd90) at .../MariaDB/10.6/sql/uniques.cc:45 #6 0x0000556bfb94e8a9 in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984ae66e8, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:590 #7 0x0000556bfb94e85a in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984992a20, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:588 #8 0x0000556bfb94e8ce in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984ad4928, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:593 #9 0x0000556bfb94e8ce in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984a7a838, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:593 #10 0x0000556bfb94e85a in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f99849d4730, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:588 #11 0x0000556bfb94e8ce in tree_walk_left_root_right (tree=0x7f998401bf38, element=0x7f9984a701f0, action=0x556bfad830f8 <Unique::unique_write_to_file(unsigned char*, unsigned int, Unique*)>, argument=0x7f998401bd90) at .../MariaDB/10.6/mysys/tree.c:593 Stacktrace of the creation of the second temporary file: #0 create_temp_file (to=0x7f99afefc090 "m\034\357\373kU", dir=0x556bfef55838 ".../MariaDB/10.6/bld/mysql-test/var/tmp/mysqld.1", prefix=0x7f99afefc708 "MY", mode=0, MyFlags=80) at .../MariaDB/10.6/mysys/mf_tempfile.c:64 #1 0x0000556bfb918003 in real_open_cached_file (cache=0x7f99afefc5d0) at .../MariaDB/10.6/mysys/mf_cache.c:61 #2 0x0000556bfb91d387 in my_b_flush_io_cache (info=0x7f99afefc5d0, need_append_buffer_lock=0) at .../MariaDB/10.6/mysys/mf_iocache.c:1706 #3 0x0000556bfb91aba7 in _my_b_write (info=0x7f99afefc5d0, Buffer=0x7f9984bc1769 "P_ju12GM0CCdN:vf{Z1$^2x-pnYTvwpNrw{%Eh`'?H,@>6AXyvg#r8`Y|)RA/g9'TS#RW<gr*t/+))4h2\\Y*d:48X7F:2)v:" , '\245' <repeats 104 times>..., Count=45) at .../MariaDB/10.6/mysys/mf_iocache.c:559 #4 0x0000556bfafa86c2 in my_b_write (info=0x7f99afefc5d0, Buffer=0x7f9984bc172c "f", Count=106) at .../MariaDB/10.6/include/my_sys.h:522 #5 0x0000556bfafaf45a in merge_buffers (param=0x7f99afefc7d0, from_file=0x7f998401bdc8, to_file=0x7f99afefc5d0, sort_buffer=..., lastbuff=0x7f9984a10118, Fb=0x7f9984a10118, Tb=0x7f9984a10268, flag=0) at .../MariaDB/10.6/sql/filesort.cc:1980 #6 0x0000556bfafae290 in merge_many_buff (param=0x7f99afefc7d0, sort_buffer=..., buffpek=0x7f9984a10118, maxbuffer=0x7f99afefc7a4, t_file=0x7f998401bdc8) at .../MariaDB/10.6/sql/filesort.cc:1657 #7 0x0000556bfad84d8b in Unique::merge (this=0x7f998401bd90, table=0x7f99849bbf20, buff=0x7f9984ac1f88 "i", buff_size=1046436, without_last_merge=true) at .../MariaDB/10.6/sql/uniques.cc:769 #8 0x0000556bfad84a06 in Unique::walk (this=0x7f998401bd90, table=0x7f99849bbf20, action=0x556bfb0ccf82 <item_sum_distinct_walk_for_count(void*, element_count, void*)>, walk_action_arg=0x7f998401bbd8) at .../MariaDB/10.6/sql/uniques.cc:689 #9 0x0000556bfb0ce3ef in Aggregator_distinct::endup (this=0x7f998401bbd8) at .../MariaDB/10.6/sql/item_sum.cc:1100 #10 0x0000556bfb0d2022 in Item_sum_count::val_int (this=0x7f99840184f8) at .../MariaDB/10.6/sql/item_sum.cc:1946 #11 0x0000556bfae95a3b in Type_handler::Item_send_longlong (this=0x556bfc6aa7c0 <type_handler_slonglong>, item=0x7f99840184f8, protocol=0x7f99840013e0, buf=0x7f99afefcaa0) at .../MariaDB/10.6/sql/sql_type.cc:7514

            How to get EITS to always put the same data into Unique when analyze_sample_percentage<100: Set the Rand's Seed.

            set @@rand_seed1=N1, @@rand_seed2=N2; analyze table t1 persistent for all;
            

            psergei Sergei Petrunia added a comment - How to get EITS to always put the same data into Unique when analyze_sample_percentage<100: Set the Rand's Seed. set @@rand_seed1=N1, @@rand_seed2=N2; analyze table t1 persistent for all;
            lstartseva Lena Startseva added a comment - - edited

            Result of testing:

            Results for "CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci" and 10000350 records in the table:

            Tables were created like:

            CREATE TABLE `t1a` (
              `a` varchar(1000) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,
              KEY `a` (`a`(766))
            ) ENGINE=Aria;
            

            Case number Case type Engine Size on disk without indexes Temp file size before changes (Mb) Temp file size after changes (Mb) hex(histogram) is the same before and after changes
            1 EITS 20% Aria 975 Mb 73.47 5033.38
            2 EITS 20% InnoDB 1,1 Gb 164.89 5030.46
            3 EITS 20% MyISAM 825 Mb 73.43 5033.54
            4 EITS 50% Aria 975 Mb 82.22 5630.58
            5 EITS 50% InnoDB 1,1 Gb 825.18 5630.69
            6 EITS 50% MyISAM 825 Mb 82.22 5630.33
            7 EITS 100% Aria 975 Mb 82.30 5635.99
            8 EITS 100% InnoDB 1,1 Gb 1649.00 5635.99
            9 EITS 100% MyISAM 825 Mb 82.30 5635.99
            10 count(DISTINCT a) Aria 975 Mb 78.55 5628.49
            11 count(DISTINCT a) InnoDB 1,1 Gb 1573.15 5628.49
            12 count(DISTINCT a) MyISAM 825 Mb 78.55 5628.49

            Results for "CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci" and 10001000 records in the table:

            Tables were created like:

            CREATE TABLE `t100mb4a` (
              `a` varchar(1000) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
              KEY `a` (`a`(575))
            ) ENGINE=Aria;
            

            Case number Case type Engine Size on disk without indexes Temp file size before changes (Mb) Temp file size after changes (Mb) hex(histogram) is the same before and after changes
            1 EITS 20% Aria 2.6 Gb 448.52 15277.57
            2 EITS 20% InnoDB 2.8 Gb 897.04 15277.57
            3 EITS 20% MyISAM 2.4 Gb 73.43 15277.57
            4 EITS 50% Aria 2.6 Gb 2242.91 38212.40
            5 EITS 50% InnoDB 2.8 Gb 2242.91 38212.40
            6 EITS 50% MyISAM 2.4 Gb 2242.91 38212.40
            7 EITS 100% Aria 2.6 Gb 4485.20 76414.26
            8 EITS 100% InnoDB 2.8 Gb 4485.20 76414.26
            9 EITS 100% MyISAM 2.4 Gb 4485.20 76414.26
            10 count(DISTINCT a) Aria 2.6 Gb 4408.90 76337.96
            11 count(DISTINCT a) InnoDB 2.8 Gb 4408.90 76337.96
            12 count(DISTINCT a) MyISAM 2.4 Gb 4408.90 76337.96

            Task for the problem with histogram for charset utf8mb4 is MDEV-33457

            lstartseva Lena Startseva added a comment - - edited Result of testing: Results for "CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci" and 10000350 records in the table: Tables were created like: CREATE TABLE `t1a` ( `a` varchar (1000) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL , KEY `a` (`a`(766)) ) ENGINE=Aria; Case number Case type Engine Size on disk without indexes Temp file size before changes (Mb) Temp file size after changes (Mb) hex(histogram) is the same before and after changes 1 EITS 20% Aria 975 Mb 73.47 5033.38 2 EITS 20% InnoDB 1,1 Gb 164.89 5030.46 3 EITS 20% MyISAM 825 Mb 73.43 5033.54 4 EITS 50% Aria 975 Mb 82.22 5630.58 5 EITS 50% InnoDB 1,1 Gb 825.18 5630.69 6 EITS 50% MyISAM 825 Mb 82.22 5630.33 7 EITS 100% Aria 975 Mb 82.30 5635.99 8 EITS 100% InnoDB 1,1 Gb 1649.00 5635.99 9 EITS 100% MyISAM 825 Mb 82.30 5635.99 10 count(DISTINCT a) Aria 975 Mb 78.55 5628.49 11 count(DISTINCT a) InnoDB 1,1 Gb 1573.15 5628.49 12 count(DISTINCT a) MyISAM 825 Mb 78.55 5628.49 Results for "CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci" and 10001000 records in the table: Tables were created like: CREATE TABLE `t100mb4a` ( `a` varchar (1000) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL , KEY `a` (`a`(575)) ) ENGINE=Aria; Case number Case type Engine Size on disk without indexes Temp file size before changes (Mb) Temp file size after changes (Mb) hex(histogram) is the same before and after changes 1 EITS 20% Aria 2.6 Gb 448.52 15277.57 2 EITS 20% InnoDB 2.8 Gb 897.04 15277.57 3 EITS 20% MyISAM 2.4 Gb 73.43 15277.57 4 EITS 50% Aria 2.6 Gb 2242.91 38212.40 5 EITS 50% InnoDB 2.8 Gb 2242.91 38212.40 6 EITS 50% MyISAM 2.4 Gb 2242.91 38212.40 7 EITS 100% Aria 2.6 Gb 4485.20 76414.26 8 EITS 100% InnoDB 2.8 Gb 4485.20 76414.26 9 EITS 100% MyISAM 2.4 Gb 4485.20 76414.26 10 count(DISTINCT a) Aria 2.6 Gb 4408.90 76337.96 11 count(DISTINCT a) InnoDB 2.8 Gb 4408.90 76337.96 12 count(DISTINCT a) MyISAM 2.4 Gb 4408.90 76337.96 Task for the problem with histogram for charset utf8mb4 is MDEV-33457

            People

              lstartseva Lena Startseva
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.