Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18726

INNDOB gets confused when using large pages if pages=1G

Details

    Description

      In older machines, large pages are =2MB, and of you have enough of them, let's say 10G, you may use in my.cnf
      large_pages=1
      innodb_buffer_pool_size=10G
      and innodb correctly allocates from this faster, never swappable memory pool.
      BUT, if the machine is newer, and you booted with a kernel command line with
      hugepagesz=1G default_hugepagesz=1G
      Then you only need to allocate 10 pages to get 10GB of memory. This makes memory managent much faster
      however, Innodb get confused. if you add
      innodb_buffer_pool_size=5G
      it will allocate 50G from the OS, verifiable doing
      cat /proc/meminfo | grep HugePages
      yet internally it will think it has only 5G, the rest is wasted.

      I have a box ready to show the issue to Elena is she wants to see it. I have seen the issue in many boxes.

      Attachments

        Issue Links

          Activity

            My database has all possible numbers in North America, 17BN plus all associated information.
            Anyway, I stopped using Innodb for the main table. It requires about 8 times the disk space compared to RocksDB, for the same information. It is faster maybe but inferior.

            philip_38 Philip orleans added a comment - My database has all possible numbers in North America, 17BN plus all associated information. Anyway, I stopped using Innodb for the main table. It requires about 8 times the disk space compared to RocksDB, for the same information. It is faster maybe but inferior.
            danblack Daniel Black added a comment - - edited

            Just to highlight the problem, below a 2M chunk size is increased by 2% and when allocating on a 2M large_page_size system, 4M gets allocated per chunk of which only 51% is used.

            gdb --args sql/mysqld --no-defaults --skip-networking --datadir=/tmp/datadir --log-bin=/tmp/datadir/mysqlbin --socket /tmp/s.sock --lc-messages-dir=/home/dan/repos/build-mariadb-server-10.4-upstream/sql/share --verbose --innodb-buffer-pool-size=10M --innodb-buffer-pool-instances=2 --innodb-buffer-pool-chunk-size=2M --large-pages
            (gdb) break buf_chunk_init
            Breakpoint 1 at 0x51d17f: buf_chunk_init. (2 locations)
            (gdb) r
            Thread 1 "mysqld" hit Breakpoint 1, buf_chunk_init (buf_pool=0x5555574a47e0, chunk=0x5555574a4e20, mem_size=2097152) at /home/dan/repos/mariadb-server/storage/innobase/buf/buf0buf.cc:1560
            1560	{
            (gdb) p my_large_page_size 
            $1 = 2097152
            (gdb) n
            1567		mem_size = ut_2pow_round(mem_size, ulint(srv_page_size));
            (gdb) 
            1569		mem_size += ut_2pow_round((mem_size >> srv_page_size_shift)
            (gdb) p mem_size
            $2 = 2097152
            (gdb) n
            1576		chunk->mem = buf_pool->allocator.allocate_large(mem_size,
            (gdb) p mem_size
            $3 = 2146304
            (gdb) p mem_size - 2097152
            $4 = 49152
            (gdb) p 49152 * 100 / 2097152
            $5 = 2
            (gdb) s
            ut_allocator<unsigned char, true>::allocate_large (dontdump=true, pfx=0x5555574a4e30, n_elements=2146304, this=0x5555574a4850) at /home/dan/repos/mariadb-server/storage/innobase/include/ut0new.h:634
            634		allocate_large(
            (gdb) s
            os_mem_alloc_large (n=0x7fffffff59c0) at /home/dan/repos/mariadb-server/storage/innobase/os/os0proc.cc:66
            66	{
            (gdb) n
            73		if (!os_use_large_pages || !os_large_page_size) {
            (gdb) 
            79		size = ut_2pow_round(*n + (os_large_page_size - 1),
            (gdb) 
            82		shmid = shmget(IPC_PRIVATE, (size_t) size, SHM_HUGETLB | SHM_R | SHM_W);
            (gdb) p size
            $6 = 4194304
            (gdb) n
            83		if (shmid < 0) {
            (gdb) 
            88			ptr = shmat(shmid, NULL, 0);
            (gdb) 
            89			if (ptr == (void*)-1) {
            (gdb) p ptr
            $1 = (void *) 0x7fffe1000000
            

            OS confirms:

            $ cd /proc/`pidof mysqld` ; egrep -A 20 '/(SYS|anon_huge)' smaps | more
            7fffe1000000-7fffe1400000 rw-s 00000000 00:0f 31064141                   /SYSV00000000 (deleted)
            Size:               4096 kB
            KernelPageSize:     2048 kB
            MMUPageSize:        2048 kB
            Rss:                   0 kB
            Pss:                   0 kB
            Shared_Clean:          0 kB
            Shared_Dirty:          0 kB
            Private_Clean:         0 kB
            Private_Dirty:         0 kB
            Referenced:            0 kB
            Anonymous:             0 kB
            LazyFree:              0 kB
            AnonHugePages:         0 kB
            ShmemPmdMapped:        0 kB
            Shared_Hugetlb:        0 kB
            Private_Hugetlb:       0 kB
            Swap:                  0 kB
            SwapPss:               0 kB
            Locked:                0 kB
            VmFlags: rd wr sh mr mw me ms de ht sd 
            

            danblack Daniel Black added a comment - - edited Just to highlight the problem, below a 2M chunk size is increased by 2% and when allocating on a 2M large_page_size system, 4M gets allocated per chunk of which only 51% is used. gdb --args sql/mysqld --no-defaults --skip-networking --datadir=/tmp/datadir --log-bin=/tmp/datadir/mysqlbin --socket /tmp/s.sock --lc-messages-dir=/home/dan/repos/build-mariadb-server-10.4-upstream/sql/share --verbose --innodb-buffer-pool-size=10M --innodb-buffer-pool-instances=2 --innodb-buffer-pool-chunk-size=2M --large-pages (gdb) break buf_chunk_init Breakpoint 1 at 0x51d17f: buf_chunk_init. (2 locations) (gdb) r Thread 1 "mysqld" hit Breakpoint 1, buf_chunk_init (buf_pool=0x5555574a47e0, chunk=0x5555574a4e20, mem_size=2097152) at /home/dan/repos/mariadb-server/storage/innobase/buf/buf0buf.cc:1560 1560 { (gdb) p my_large_page_size $1 = 2097152 (gdb) n 1567 mem_size = ut_2pow_round(mem_size, ulint(srv_page_size)); (gdb) 1569 mem_size += ut_2pow_round((mem_size >> srv_page_size_shift) (gdb) p mem_size $2 = 2097152 (gdb) n 1576 chunk->mem = buf_pool->allocator.allocate_large(mem_size, (gdb) p mem_size $3 = 2146304 (gdb) p mem_size - 2097152 $4 = 49152 (gdb) p 49152 * 100 / 2097152 $5 = 2 (gdb) s ut_allocator<unsigned char, true>::allocate_large (dontdump=true, pfx=0x5555574a4e30, n_elements=2146304, this=0x5555574a4850) at /home/dan/repos/mariadb-server/storage/innobase/include/ut0new.h:634 634 allocate_large( (gdb) s os_mem_alloc_large (n=0x7fffffff59c0) at /home/dan/repos/mariadb-server/storage/innobase/os/os0proc.cc:66 66 { (gdb) n 73 if (!os_use_large_pages || !os_large_page_size) { (gdb) 79 size = ut_2pow_round(*n + (os_large_page_size - 1), (gdb) 82 shmid = shmget(IPC_PRIVATE, (size_t) size, SHM_HUGETLB | SHM_R | SHM_W); (gdb) p size $6 = 4194304 (gdb) n 83 if (shmid < 0) { (gdb) 88 ptr = shmat(shmid, NULL, 0); (gdb) 89 if (ptr == (void*)-1) { (gdb) p ptr $1 = (void *) 0x7fffe1000000 OS confirms: $ cd /proc/`pidof mysqld` ; egrep -A 20 '/(SYS|anon_huge)' smaps | more 7fffe1000000-7fffe1400000 rw-s 00000000 00:0f 31064141 /SYSV00000000 (deleted) Size: 4096 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB VmFlags: rd wr sh mr mw me ms de ht sd

            This is a welcome idea, but there are a couple of minor problems with the implementation, causing mismatch related to innodb_buffer_pool_size. Please address my review comments.

            marko Marko Mäkelä added a comment - This is a welcome idea, but there are a couple of minor problems with the implementation, causing mismatch related to innodb_buffer_pool_size . Please address my review comments.
            rs@databay.de Ralf Schenk added a comment - - edited

            When husing 1G hugepages with 10.1.38 on Ubuntu I get 32GB of used SHM Memory when declaring innodb_buffer_pool=16G. I think 16GB are wasted.
            In earlier days (before 10.1.38) and when using 2 MB Hugepages when I set up innodb_buffer_pool=16G and innodb_buffer_instances=16. I got exactly 16 shared memory segments of 1 GB. Now I get 16 Segments of 2 GB!
            On 10.3.x I had no possibility (tried different innodb_buffer_chunk_size and innodb_buffer_instances settings) to get innodb_buffer_pool of configured size. MySQL tried to allocate multiple times the innodb_buffer_pool size of RAM.

            rs@databay.de Ralf Schenk added a comment - - edited When husing 1G hugepages with 10.1.38 on Ubuntu I get 32GB of used SHM Memory when declaring innodb_buffer_pool=16G. I think 16GB are wasted. In earlier days (before 10.1.38) and when using 2 MB Hugepages when I set up innodb_buffer_pool=16G and innodb_buffer_instances=16. I got exactly 16 shared memory segments of 1 GB. Now I get 16 Segments of 2 GB! On 10.3.x I had no possibility (tried different innodb_buffer_chunk_size and innodb_buffer_instances settings) to get innodb_buffer_pool of configured size. MySQL tried to allocate multiple times the innodb_buffer_pool size of RAM.
            danblack Daniel Black added a comment -

            rs@databay.de you may be interested in MDEV-18851 too.

            danblack Daniel Black added a comment - rs@databay.de you may be interested in MDEV-18851 too.

            People

              marko Marko Mäkelä
              philip_38 Philip orleans
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.