Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4760

Unify hash implementation

    XMLWordPrintable

    Details

      Description

      Plz take a look at the first comment

      There are multiple implementation of the same functionality that must be shared: RowStorage::hashData(), Row::hash(), Row::colUpdateMariaDBHasher(), colUpdateMariaDBHasherTypeless().
      They all contains the same or almost the same code that must be merged so that changes in the common hashing code are shared.

      The benchmark tests from xxHash project demonstrates that MDB hash_bin_sort is slow in terms of hash calculation speed for arguments of different lenghts > 1 byte and its hash lacks avalance properties. So hash_bin_sort looks like a function mapping.

      benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) 
      MDBHSB ,   383,   381,   381,   380,   380,   381,   382,   382,   382,   382,   382,   382,   382,   382,   382,   381,   382,   382,   381
      Throughput small inputs of fixed size (from 1 to 127 bytes): 
      MDBHSB , 295024704, 221975236, 176417305, 145091235, 117256116,  96673542,  83958535,  73986830,  65592337,  57141176,  49911601,  44737604,  40951247,  37909058,  35393027,  32953202,  30992028,  28786576,  26987866,  25276406,  23892856,  22451620,  21232112,  20026773,  19072129,  18164413,  17314950,  16607676,  15879450,  15254507,  14668375,  14066414,  12932428,  13084167,  12695770,  12265419,  11952779,  10862721,  11256207,  10222532,  10624014,   9783401,  10056642,   9777877,   9571974,   9299894,   9110920,   8895325,   8690382,   8488004,   8308099,   8134366,   7926744,   7791090,   7626070,   7479251,   7317875,   7205873,   7065789,   6954038,   6784444,   6705094,   6595014,   6467644,   6357642,   6260518,   6152652,   6035240,   5948864,   5846997,   5733851,   5697956,   5628485,   5406450,   5353351,   5264790,   5206617,   5114699,   5053051,   4993530,   4935846,   4862844,   4801407,   4746994,   4690869,   4632266,   4572240,   4594217,   4539712,   4496971,   4434655,   4365478,   4309741,   4288916,   4134032,   4068177,   4022139,   3974391,   3963883,   3904032,   3869541,   3825133,   3787679,   3763346,   3723585,   3681054,   3661020,   3620390,   3578960,   3545808,   3509987,   3487164,   3449164,   3407913,   3387210,   3364107,   3319239,   3309640,   3274259,   3250296,   3214470,   3182862,   3163310,   3122483,   3119702,   3088232,   3066980
      

      MDB's hash has a peculiar collision bench test output. All other hash functions that have avalance produce a way more collisions.

      root@drrtuy-u20-3:/data/xxHash/tests/collisions# ./collisionsTest --nbh=100000000 MDBMHSB
       *** Collision tester for 64+ bit hashes ***  
       
      Testing MDBMHSB algorithm (32-bit) 
      This program will allocate a lot of memory,
      generate 100000000 32-bit hashes from samples of 256 bytes, 
      and attempt to produce 1164153 collisions. 
       
       Storing hash candidates (762 MB) 
       Stored 100000000 hash candidates in 1 mn 8s  
       Sorting candidates...  Completed in 14s  
       Looking for duplicates:  
      Worst collision ratio at 0 high bits: x0.00 
       Completed in 1s  
       
       
      *===>   Found 310 collisions (x0.00, 1164153.2 expected) in 1 mn 23s *
      ===>   MaxRSS(self) 0MB, MaxRSS(children) 0MB
      

      The current MM3 works good for arguments less then 4 bytes. However it becomes slower comparing with xxh3 algo when the argument size rizes, e.g. 20% for 8 byte ints.

      benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27)
      untMM3 ,  2155,  2146,  2147,  2143,  2143,  2139,  2139,  2137,  2144,  2134,  2136,  2133,  2127,  2124,  2115,  2093,  2092,  2092,  2090
      Throughput small inputs of fixed size (from 1 to 127 bytes): 
      untMM3 , 204284868, 189941763, 187683043, 182659021, 161042711, 156535892, 150589280, 153351099, 137775956, 134420450, 130829023, 127511769, 116809979, 115851507, 112099070, 108798420,  98950005,  98777625,  95562293,  93433347,  86244149,  86587886,  84061656,  81636141,  77233807,  76733327,  76669764,  73881067,  69981605,  70070235,  69198969,  67117863,  64342817,  64408593,  62881163,  62322846,  59897757,  58350184,  56656276,  55414579,  53951981,  53166370,  51384075,  50595045,  46912379,  46872338,  45945803,  45131230,  42949705,  42113625,  41784170,  42067208,  40655509,  40212051,  38945872,  39705839,  38105405,  38196277,  36771063,  36819016,  35011767,  35868562,  36412883,  34272545,  33222025,  34771724,  34663263,  33860232,  31834301,  32748014,  32669688,  31576345,  30503543,  31189141,  31686580,  31044214,  29228665,  29938116,  30231256,  29125699,  28406851,  29013015,  29016415,  28254045,  27827256,  27593192,  28016709,  27246513,  26182600,  26799719,  26694069,  26033691,  25957675,  26168739,  25888978,  25351701,  25214387,  24911826,  25102338,  24546225,  24310765,  24368056,  24254358,  23994280,  23522953,  23664822,  23493898,  23232668,  23026452,  22849098,  22873393,  22430808,  22128842,  22069674,  21935954,  21855169,  21434595,  21546590,  21130468,  21189418,  20914317,  20681515,  20541955,  20506514,  19925455,  19937362,  19752999
      

      Here is the stats for the proposed xxh3.

      benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) 
      xxh3   , 12195, 13933, 14246, 14637, 14661, 14847, 14845, 14882, 14879, 14870, 14995, 14867, 14763, 14718, 13570, 11421,  7505,  7115,  6951
      Throughput small inputs of fixed size (from 1 to 127 bytes): 
      xxh3   , 166328851, 166047117, 164750104, 222359743, 220859434, 222076138, 220839877, 222124593, 222410937, 222153846, 222561873, 222417751, 221551744, 222360951, 221106940, 222137297, 148433684, 148402082, 147931004, 147603247, 148186608, 148506960, 147230193, 147981851, 148233496, 148699782, 147402912, 148129418u, 148103169, 146850057, 148423203, 148411703, 102949607, 102925862, 102418241, 102952443, 102703214, 102907196, 102680279, 102889139, 102771179, 102832211, 102810108, 102866670, 102589097, 102200641, 103008151, 102449118, 102728269, 102587431, 102773045, 102686431, 102727251 , 102723369, 102729421, 102691531, 102830333, 102712080, 102590246, 102598743, 102992448, 102928081, 102657147, 102861148,  88892164,  88829285,  89086126,  89052115,  88852884,  89043314,  88769220,  89014418,  88356691,  89123644,  88574436,  89034979,  88701565,  89158585,  89123412,  89034997,  89076156,  88761990,  89213032,  89087844,  89147630,  89110953,  89248370,  87508480,  88759711,  88953844,  88634257,  88771926,  88888889,  89030197,  89178675,  89160849,  70232638,  70094447,  70366298,  70294825,  70395110,  70282285,  70282717,  70225001,  70365661,  70368055,  70293172,  69595835,  69942559,  70188431,  70137656,  70233269,  70026864,  70295104,  70063321,  70262560,  69990198,  70338534,  69965830,  70379123,  70409426,  70261515,  70024664,  70353988,  70142783,  70294010,  70315257
      

      The modified code for the tests is available here. I also included the simple test that demonstrates the overal latency calculating hashes with our current MurMur3, MDB's hash_bin_sort, xxh3. The test code calls for xxHash static or dynamic library.

      #include <iostream>
      #include <bits/stdc++.h>
      #include <chrono>
       
      using int128_t = __int128;
       
      #include "hasher.h"
      #include "/data/xxHash/xxhash.h"
       
      static constexpr size_t ITERS = 100000000LL;
       
      using namespace std;
       
      int main(int argc, char** argv)
      {
          volatile uint32_t seed = 0;
          {
              utils::Hasher_r h;
              auto start = chrono::high_resolution_clock::now(); 
              for (size_t i = 0; i < ITERS; ++i)
              {
                  seed = h((const char*)&i, sizeof(size_t), seed);
                  seed = h.finalize(seed, 32);
              }
              auto end = chrono::high_resolution_clock::now();
              double time_taken = chrono::duration_cast<chrono::nanoseconds>(end-start).count();
              cout << "Existing MurMur3 Loop time is : " << fixed << time_taken * 1e-9 << setprecision(9) << " sec" << endl;
          }
       
          {
              auto start = chrono::high_resolution_clock::now(); 
              for (size_t i = 0; i < ITERS; ++i)
              {
                  seed = XXH3_64bits_withSeed((const void*)&i, sizeof(size_t), seed);
                  //seed = XXH64((const void*)&i, sizeof(size_t), seed);
              }
              auto end = chrono::high_resolution_clock::now();
              double time_taken = chrono::duration_cast<chrono::nanoseconds>(end-start).count();
              cout << "XXH3_64 Loop time is : " << fixed << time_taken * 1e-9 << setprecision(9) << " sec" << endl;
          }
       
          {
              auto start = chrono::high_resolution_clock::now(); 
              volatile size_t nr1 = 1;
              volatile size_t nr2 = 4;
              for (size_t i = 0; i < ITERS; ++i)
              {
                  nr1 = utils::my_hash_sort_bin((const unsigned char*)&i, sizeof(size_t), nr1, nr2);
              }
              auto end = chrono::high_resolution_clock::now();
              double time_taken = chrono::duration_cast<chrono::nanoseconds>(end-start).count();
              cout << "my_hash_sort_bin Loop time is : " << fixed << time_taken * 1e-9 << setprecision(9) << " sec" << endl;
          }
       
       
      }
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              drrtuy Roman
              Reporter:
              drrtuy Roman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration