[MDEV-11520] Extending an InnoDB data file unnecessarily allocates a large memory buffer on Windows Created: 2016-12-09  Updated: 2019-01-23  Resolved: 2017-03-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2.3, 10.2
Fix Version/s: 5.5.55, 10.0.30, 10.1.22, 10.2.5

Type: Bug Priority: Major
Reporter: Lawrin Novitsky Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None
Environment:

Windows


Attachments: Text File error.log     File mysqld.7z     File mysqld_after_noheap.dmp     File mysqld_noheap.dmp     File mysqld_pdb.7z    
Issue Links:
Blocks
blocks MDEV-12250 rpl.rpl_domain_id_filter fails in bui... Closed
Relates
relates to MDEV-13941 Innodb/Windows, 10.2 : High NTFS fra... Closed
relates to MDEV-18349 InnoDB file size changes are not safe... Closed
relates to MDEV-5746 Slow file extend when innodb_use_fall... Closed
relates to MDEV-11968 With innodb_page_size=8K crash with '... Closed
relates to MDEV-12097 Innodb allocates almost 3GB instead i... Closed
relates to MDEV-13177 MariaDB 10.2.6 eats virtual memory Closed
relates to MDEV-14244 MariaDB 10.2.10 fails to run on Debia... Closed
relates to MDEV-16015 Unhandled EOPNOTSUPP of posix_falloca... Closed

 Description   

Debug version of 10.2 server has assertion failure while importing 'employees' database(https://github.com/datacharmer/test_db). Server is build using current 10.2 branch. ("current" means commit 7ca1e2abad42a7436e6b668b4568d6fadc2ca165 here)

2016-12-09 23:37:00 16112 [Note] InnoDB: Buffer pool(s) load completed at 161209 23:37:00
2016-12-09 23:46:00 0x4624  InnoDB: Assertion failure in thread 17956 in file ut0byte.ic line 89
InnoDB: Failing assertion: ptr

Occurs in ut_align called from fil_write_zeros. The callstack:

> mysqld.exe!my_sigabrt_handler(int sig) Line 477 C
  [External Code]
  mysqld.exe!ut_dbg_assertion_failed(const char * expr, const char * file, unsigned long line) Line 68  C++
  mysqld.exe!ut_align(const void * ptr, unsigned long align_no) Line 89 C++
  mysqld.exe!fil_write_zeros(const fil_node_t * node, unsigned long page_size, unsigned __int64 start, unsigned long len, bool read_only_mode) Line 4891  C++
  mysqld.exe!fil_space_extend(fil_space_t * space, unsigned long size) Line 5067  C++
  mysqld.exe!fsp_try_extend_data_file(fil_space_t * space, unsigned char * header, mtr_t * mtr, unsigned long * n_pages_added) Line 1581  C++
  mysqld.exe!fsp_reserve_free_extents(unsigned long * n_reserved, unsigned long space_id, unsigned long n_ext, fsp_reserve_t alloc_type, mtr_t * mtr, unsigned long n_pages) Line 3506  C++
  mysqld.exe!btr_cur_pessimistic_insert(unsigned long flags, btr_cur_t * cursor, unsigned long * * offsets, mem_block_info_t * * heap, dtuple_t * entry, unsigned char * * rec, big_rec_t * * big_rec, unsigned long
n_ext, que_thr_t * thr, mtr_t * mtr) Line 3501 C++
  mysqld.exe!row_ins_clust_index_entry_low(unsigned long flags, unsigned long mode, dict_index_t * index, unsigned long n_uniq, dtuple_t * entry, unsigned long n_ext, que_thr_t * thr, bool dup_chk_only) Line 2651
 C++
  mysqld.exe!row_ins_clust_index_entry(dict_index_t * index, dtuple_t * entry, que_thr_t * thr, unsigned long n_ext, bool dup_chk_only) Line 3386 C++
  mysqld.exe!row_ins_index_entry(dict_index_t * index, dtuple_t * entry, que_thr_t * thr) Line 3490 C++
  mysqld.exe!row_ins_index_entry_step(ins_node_t * node, que_thr_t * thr) Line 3640 C++
  mysqld.exe!row_ins(ins_node_t * node, que_thr_t * thr) Line 3782  C++
  mysqld.exe!row_ins_step(que_thr_t * thr) Line 3967  C++
  mysqld.exe!row_insert_for_mysql_using_ins_graph(const unsigned char * mysql_rec, row_prebuilt_t * prebuilt) Line 1784 C++
  mysqld.exe!row_insert_for_mysql(const unsigned char * mysql_rec, row_prebuilt_t * prebuilt) Line 1915 C++
  mysqld.exe!ha_innobase::write_row(unsigned char * record) Line 9094 C++
  mysqld.exe!handler::ha_write_row(unsigned char * buf) Line 5924 C++
  mysqld.exe!write_record(THD * thd, TABLE * table, st_copy_info * info) Line 1883  C++
  mysqld.exe!mysql_insert(THD * thd, TABLE_LIST * table_list, List<Item> & fields, List<List<Item> > & values_list, List<Item> & update_fields, List<Item> & update_values, enum_duplicates duplic, bool ignore) Line
 1003  C++
  mysqld.exe!mysql_execute_command(THD * thd) Line 4328 C++
  mysqld.exe!mysql_parse(THD * thd, char * rawbuf, unsigned int length, Parser_state * parser_state, bool is_com_multi, bool is_next_command) Line 7799 C++
  mysqld.exe!dispatch_command(enum_server_command command, THD * thd, char * packet, unsigned int packet_length, bool is_com_multi, bool is_next_command) Line 1808 C++
  mysqld.exe!do_command(THD * thd) Line 1368  C++
  mysqld.exe!threadpool_process_request(THD * thd) Line 319 C++
  mysqld.exe!tp_callback(TP_connection * c) Line 158  C++
  mysqld.exe!tp_callback(_TP_CALLBACK_INSTANCE * instance, void * context) Line 377 C++
  mysqld.exe!work_callback(_TP_CALLBACK_INSTANCE * instance, void * context, _TP_WORK * work) Line 451  C++
  [External Code]
  [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]

After that, mysqld cannot recover, and crashes on start in memcpy because of writing to/reading from NULL pointer. The "after" dump is for that case, but there is not much useful info in it. At least there is no meaningful backtrace. And again, RelWithDebInfo version can start on the same config.



 Comments   
Comment by Elena Stepanova [ 2016-12-09 ]

Doesn't crash for me. Is it reproducible? Are there any non-default server options?

Comment by Lawrin Novitsky [ 2016-12-09 ]

I get it all the time. And I built server on updates more than once.

[mysqld]
datadir=C:/Program Files (x86)/MariaDB 10.2/data
port=3308
sql_mode="STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION"
default_storage_engine=innodb
innodb_buffer_pool_size=1500M
innodb_log_file_size=800M
max_allowed_packet=100M

I think it occurred on different tables for me. And probably after I changed max_allowed_packet. Also I increased innodb_log_file_size while tested it.

Comment by Vladislav Vaintroub [ 2016-12-09 ]

The error log claims buffer pool was 2.75G (and 11 instances)
but my.ini has 1.5G

From error.log : Initializing buffer pool, total size = 2.75G, instances = 11, chunk size = 128M

Comment by Lawrin Novitsky [ 2016-12-12 ]

Indeed,

MariaDB [test]> select @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|                2952790016 |
+---------------------------
which does look like 2.75G
If I make it 1000M in the ini file, it looks like 1000M though
MariaDB [test]> select @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|                1073741824 |
+---------------------------+
1 row in set (0.00 sec)
But if I do innodb_buffer_pool_size=1200M
MariaDB [test]> select @@innodb_buffer_pool_size/1024/1024;
+-------------------------------------+
| @@innodb_buffer_pool_size/1024/1024 |
+-------------------------------------+
|                       2304.00000000 |
+-------------------------------------+
1 row in set (0.00 sec)

Comment by Vladislav Vaintroub [ 2016-12-12 ]

there is a bug somewhere

Comment by Lawrin Novitsky [ 2016-12-12 ]

With innodb_buffer_pool_size=1000M it did not crash.
But this is not the whole story. RelWithDebInfo does not crash on any values, but it gets the same "wrong" values for different values of innodb_buffer_pool_size set in the ini file.

Comment by Elena Stepanova [ 2016-12-24 ]

Regarding the wrong size of the buffer pool, we have a bug report about it: MDEV-10961 (which turns out to be not a bug, but the new InnoDB design, see my comment in the report).

Comment by Vladislav Vaintroub [ 2016-12-25 ]

1.5 GB to 2.75GB is hardly to attribute to rounding (thats what as MDEV-10961 suggests)

Comment by Elena Stepanova [ 2016-12-25 ]

wlad,
It is, if you actually read the description in the manual (quoted in MDEV-10961). I don't think it really needs detailed explanation, but here you go.

  • The buffer size is rounded up to the closest innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size value.
  • You have innodb_buffer_pool_instances == 11 (auto-sized on 32-bit systems, rather weirdly, but that's another story), and innodb_buffer_pool_chunk_size == 128M.
  • So, it's aligned to 1476395008 bytes: 1476395008, 2952790016, 4429185024, ...
  • Initial value is 1500M, so 1476395008 is not acceptable, hence 2952790016.

I think your confusion comes from the sequence of events – it might appear that first the buffer pool size is set, and then buffer pool instances are calculated. But according to the manual, it's the other way round – number of buffer pool instances is set first (auto-sized by default on 32-bit systems), and based on it the buffer pool size is calculated.

Comment by Vladislav Vaintroub [ 2016-12-25 ]

So ,we will proceed not considering 3GB buffer pool on 32 bit systems a bug, rather a feature. Note, that 32 bit Windows will only give process a 2GB virtual address space (there are boot parameters to give some more address space, but 3GB is the absolute maximum).

Anything beyond 1.5GB with default parameters would give OOMs on a 32 bit (possibly also on Linuxes)

Comment by Elena Stepanova [ 2016-12-25 ]

There is most certainly a bug here, I never said otherwise, the comments above only addressed the side question about the strange size of the buffer pool.

And FWIW, current documentation claims that buffer size up to 4G should work on 32-bit systems. Maybe that's why the buffer pool instances are auto-sized – to split it into pieces that can be addressed? (I'm just guessing, don't know how it is implemented).

It doesn't work well though – not only auto-sizing, but the whole "above 2G" thing, you're right about it.
There was an upstream bug https://bugs.mysql.com/bug.php?id=57707 which could be blamed for that, but it's supposed to be fixed in all current versions.

Comment by Vladislav Vaintroub [ 2016-12-25 ]

4GB is possible for 32 processed running on 64bit OS, yes. But hardly on 32bit OS ( not as common these days, but according to https://mariadb.org/feedback_plugin/stats/architecture/ still in use for about 15% of installations)

Comment by Elena Stepanova [ 2016-12-25 ]

In this case, I guess there are at least two obvious problems here already:

  • first, whatever auto-sizing algorithm is used for innodb_buffer_pool_instances, if it ends up converting a perfectly good 1.5G buffer pool size into de-facto unsupported 2.75G, it's a bad algorithm;
  • second, if the buffer pool size is of a value that de-facto cannot work (be it via auto-adjusting or due to an explicit setting), the server/plugin should refuse to start, rather than imitate activity and then go down when it actually has to use the whole buffer.

I've run some experiments on 32-bit Windows and Linux, results are all over the place, I'll try to summarize them in the next comment.

Comment by Elena Stepanova [ 2016-12-25 ]

To marko:

There are various problems that might be related or not related to the issue, but affect the outcome. I can't filter them meaningfully, so I'll just list here what I've seen, in order of appearance, and will set as generic Affect/Fix versions as possible. Please feel free to do whatever you deem right with this report: change affect/fix versions, split it into separate reports about different issues, etc.
One thing which is common for everything here is that it's about 32-bit builds.

1. Buffer pool autosizing (10.2)
Alignment to instances * chunk is pain everywhere, but on 32-bit systems it's worse.
innodb_buffer_pool_instances number has been auto-sized on 32-bit systems since (at least) 5.5. It didn't affect the user till InnoDB 5.7; but now it does – as in Lawrin's configuration above, the initial value is 1.5G, but InnoDB chooses to set instances to 11, and after that raises the buffer pool size to 2.75G. Obviously, when people set 1.5G (especially on a 32-bit systems, which means they probably don't have so many resources to begin with), they don't expect to have 2.75G taken instead.

2. Buffer pool size over 2G (5.5-10.2)
MySQL manual claims that on 32-bit systems buffer pool size up to 4G is supported. I don't see it happen.
In Lawrin's case, The buffer appears to be initialized without errors, but when the time comes to use it, the assertion failure happens. It's reliably reproducible for me on a 64-bit Windows running 32-bit build of current 10.2 (I can't say anything about real 32-bit Windows, because don't have it handy).
The failure does not seem to happen on 10.1. However, earlier versions show different problems, both on Windows and Linux.

Windows, 10.1, --innodb_buffer_pool_size=2000M --innodb_buffer_pool_instances=1

2016-12-25 19:17:20 4120 [Note] InnoDB: Initializing buffer pool, size = 2.0G
InnoDB: VirtualAlloc(2186248192 bytes) failed; Windows error 8
2016-12-25 19:17:20 1018  InnoDB: Assertion failure in thread 4120 in file mem0dbg.cc line 680
InnoDB: Failing assertion: heap->magic_n == MEM_BLOCK_MAGIC_N

Windows, 10.1, --innodb_buffer_pool_size=3500M --innodb_buffer_pool_instances=8

2016-12-25 19:19:49 308 [Note] InnoDB: Initializing buffer pool, size = 3.4G
InnoDB: VirtualAlloc(478248960 bytes) failed; Windows error 8
2016-12-25 19:19:50 134  InnoDB: Assertion failure in thread 308 in file mem0dbg.cc line 680
InnoDB: Failing assertion: heap->magic_n == MEM_BLOCK_MAGIC_N

Windows, 10.2 32-bit, --innodb_buffer_pool_size=3500M --innodb_buffer_pool_instances=1

2016-12-25 17:08:43 4628 [Note] InnoDB: Initializing buffer pool, total size = 3.5G, instances = 1, chunk size = 128M
2016-12-25 17:08:47 4628 [Note] InnoDB: VirtualAlloc(138477568 bytes) failed; Windows error 8
2016-12-25 17:08:47 4628 [ERROR] InnoDB: Cannot allocate memory for the buffer pool
2016-12-25 17:08:47 4628 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1810] with error Generic error
2016-12-25 17:08:47 4628 [ERROR] Plugin 'InnoDB' init function returned error.
2016-12-25 17:08:47 4628 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2016-12-25 17:08:47 4628 [Note] Plugin 'FEEDBACK' is disabled.
2016-12-25 17:08:47 4628 [ERROR] Unknown/unsupported storage engine: InnoDB
2016-12-25 17:08:47 4628 [ERROR] Aborting

Memory on the machine is not a problem here, it has 14G, only 1.8G is used. It is also confirmed by the fact that 64-bit build with the same options starts fine:

Windows, 10.2 64-bit, --innodb_buffer_pool_size=3500M --innodb_buffer_pool_instances=1

2016-12-25 17:06:35 2120 [Note] InnoDB: Initializing buffer pool, total size = 3.5G, instances = 1, chunk size = 128M
2016-12-25 17:06:46 2120 [Note] InnoDB: Completed initialization of buffer pool
2016-12-25 17:06:47 2120 [Note] InnoDB: Tablespace ID 0 name innodb_system:Default tablespace encryption mode scheme unencrypted
2016-12-25 17:06:47 2120 [Note] InnoDB: Highest supported file format is Barracuda.
2016-12-25 17:06:51 2120 [Note] InnoDB: Creating shared tablespace for temporary tables
2016-12-25 17:06:51 2120 [Note] InnoDB: Setting file '.\ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2016-12-25 17:06:51 2120 [Note] InnoDB: File '.\ibtmp1' size is now 12 MB.
2016-12-25 17:06:51 2120 [Note] InnoDB: Tablespace ID 13 name innodb_temporary:Default tablespace encryption mode scheme unencrypted
2016-12-25 17:06:51 2120 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2016-12-25 17:06:51 2120 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2016-12-25 17:06:52 2120 [Note] InnoDB: Waiting for purge to start
2016-12-25 17:06:52 1084 [Note] InnoDB: page_cleaner: 1000ms intended loop took 5426ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
2016-12-25 17:06:52 2120 [Note] InnoDB: 5.7.14 started; log sequence number 292311925
2016-12-25 17:06:52 3224 [Note] InnoDB: Loading buffer pool(s) from E:\elenst\10.2\sql\data\ib_buffer_pool
2016-12-25 17:06:52 2120 [Note] Plugin 'FEEDBACK' is disabled.
2016-12-25 17:06:52 2120 [Note] Server socket created on IP: '::'.
2016-12-25 17:06:52 2120 [Note] sql\Debug\mysqld: ready for connections.
Version: '10.2.3-MariaDB-debug'  socket: ''  port: 3308  Source distribution
2016-12-25 17:06:56 3224 [Note] InnoDB: Buffer pool(s) load completed at 161225 17:06:56

Linux is not any better. On a system with 6G memory:

Linux, 5.5

161225 11:59:28 InnoDB: Initializing buffer pool, size = 2.4G
InnoDB: mmap(2672640000 bytes) failed; errno 12
161225 11:59:28  InnoDB: Assertion failure in thread 3073656576 in file mem0dbg.c line 684
InnoDB: Failing assertion: heap->magic_n == MEM_BLOCK_MAGIC_N

Linux, 10.1

2016-12-25 12:01:23 3052226944 [Note] InnoDB: Initializing buffer pool, size = 2.4G
2016-12-25 12:01:24 3052226944 [Note] InnoDB: Completed initialization of buffer pool
2016-12-25 12:01:24 3052226944 [Note] InnoDB: Created tablespace for space 0 name ./ibdata1 key_id 1 encryption 0.
2016-12-25 12:01:24 3052226944 [Note] InnoDB: Created tablespace for space 4294967280 name ./ib_logfile0 key_id 0 encryption 0.
2016-12-25 12:01:24 3052226944 [Note] InnoDB: Created tablespace for space 4294967281 name arch_log_space key_id 0 encryption 0.
2016-12-25 12:01:24 3052226944 [Note] InnoDB: Highest supported file format is Barracuda.
2016-12-25 12:01:25 3052226944 [Note] InnoDB: Created tablespace for space 3 name mysql/gtid_slave_pos key_id 0 encryption 0.
2016-12-25 12:01:25 3052226944 [Note] InnoDB: Created tablespace for space 2 name mysql/innodb_index_stats key_id 0 encryption 0.
2016-12-25 12:01:25 3052226944 [Note] InnoDB: Created tablespace for space 1 name mysql/innodb_table_stats key_id 0 encryption 0.
2016-12-25 12:01:26 3052226944 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Error: pthread_create returned 11

This one is also nice – the server starts, but doesn't behave well (note the "out of memory" error on startup):

Linux, 10.1

2016-12-25 12:04:40 3052161408 [Note] InnoDB: Initializing buffer pool, size = 2.2G
2016-12-25 12:04:41 3052161408 [Note] InnoDB: Completed initialization of buffer pool
2016-12-25 12:04:42 3052161408 [Note] InnoDB: Created tablespace for space 0 name ./ibdata1 key_id 1 encryption 0.
2016-12-25 12:04:42 3052161408 [Note] InnoDB: Created tablespace for space 4294967280 name ./ib_logfile0 key_id 0 encryption 0.
2016-12-25 12:04:42 3052161408 [Note] InnoDB: Created tablespace for space 4294967281 name arch_log_space key_id 0 encryption 0.
2016-12-25 12:04:42 3052161408 [Note] InnoDB: Highest supported file format is Barracuda.
2016-12-25 12:04:42 3052161408 [Note] InnoDB: Created tablespace for space 3 name mysql/gtid_slave_pos key_id 0 encryption 0.
2016-12-25 12:04:42 3052161408 [Note] InnoDB: Created tablespace for space 2 name mysql/innodb_index_stats key_id 0 encryption 0.
2016-12-25 12:04:42 3052161408 [Note] InnoDB: Created tablespace for space 1 name mysql/innodb_table_stats key_id 0 encryption 0.
2016-12-25 12:04:43 3052161408 [Note] InnoDB: 128 rollback segment(s) are active.
2016-12-25 12:04:43 3052161408 [Note] InnoDB: Waiting for purge to start
2016-12-25 12:04:44 3052161408 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.34-79.1 started; log sequence number 1622918
2016-12-25 12:04:44 71273280 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-12-25 12:04:44 3052161408 [ERROR] mysqld: Out of memory (Needed 130760704 bytes)
2016-12-25 12:04:44 3052161408 [Note] Plugin 'FEEDBACK' is disabled.
2016-12-25 12:04:44 3052161408 [Note] Server socket created on IP: '::'.
2016-12-25 12:04:44 3051634496 [Note] InnoDB: Read page 0 from tablespace for space 1 name mysql/innodb_table_stats key_id 0 encryption 0 handle 37.
2016-12-25 12:04:44 3051634496 [Note] InnoDB: Read page 0 from tablespace for space 2 name mysql/innodb_index_stats key_id 0 encryption 0 handle 38.
2016-12-25 12:04:44 3051634496 [Note] InnoDB: Read page 0 from tablespace for space 3 name mysql/gtid_slave_pos key_id 0 encryption 0 handle 39.
2016-12-25 12:04:44 3052161408 [Note] /home/buildbot/mariadb-10.1.21/sql/mysqld: ready for connections.
Version: '10.1.21-MariaDB-debug'  socket: '/home/buildbot/mariadb-10.1.21/data/tmp/mysql.sock'  port: 3306  Source distribution

3. Documentation and over-promise
As wlad said in above comments, from his point of view buffer pool over 2G cannot work reliably on 32-bit systems. If it's indeed so, then we shouldn't claim it in documentation, shouldn't allow InnoDB to start on 32-bit systems with the buffer pool over 2G, and certainly shouldn't auto-size it over this value.

Comment by Vladislav Vaintroub [ 2016-12-25 ]

The 2GB on 32 bit is subject to some amendments, at least on Windows.

a) wow64 processes (32 bit process running on 64bit Windows) can use 4GB virtual address space
b) 32bit on 32bit Windows can be configured to use up to 3GB virtual address space, via boot option, but the default is 2GB. 3GB for user processes might result in some memory pressure in kernel (as I remember from my previous experience using it)

a) and b) only work if executables are linked with /LARGEADDRESSAWARE ( we have this option

Source: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366912(v=vs.85).aspx

Comment by Marko Mäkelä [ 2016-12-28 ]

The code in question does not use the InnoDB buffer pool for memory allocation, but instead normal malloc() via a wrapper:

	/* Extend at most 1M at a time */
	ulint	n_bytes = ut_min(static_cast<ulint>(1024 * 1024), len);
	byte*	ptr = reinterpret_cast<byte*>(ut_zalloc_nokey(n_bytes
							      + page_size));
	byte*	buf = reinterpret_cast<byte*>(ut_align(ptr, page_size));

Apparently the code has been compiled without PERFORMANCE_SCHEMA. In that case, ut_zalloc_nokey() will map to calloc(), which indeed can return NULL.

Of course, this particular code probably should invoke fallocate() or similar to extend the file. Failing that, it could use a statically allocated always-zero buffer, shared with field_ref_zero and other always-zero buffers.

Comment by Marko Mäkelä [ 2017-02-20 ]

In MariaDB Server 5.5, we are using mem_alloc() which would crash the server on allocation failure.

	/* Extend at most 64 pages at a time */
	buf_size = ut_min(64, size_after_extend - start_page_no) * page_size;
	buf2 = mem_alloc(buf_size + page_size);
	buf = ut_align(buf2, page_size);

Similar code is used in MariaDB Server 10.0 and 10.1. In MariaDB Server 10.2 we are using ut_zalloc_nokey() which can actually return NULL when PERFORMANCE_SCHEMA is disabled.

I am not sure if a failure to allocate 1 megabyte of memory should be treated as a fatal error, like it currently is. InnoDB is rather unforgiving when it comes to running out of memory or file space. Theoretically we could try a smaller allocation that works, and then use that to write zero bytes to the file.

Note: With innodb_use_fallocate=1 (which is not the default), we would avoid the file allocation if HAVE_POSIX_FALLOCATE. Maybe we should use SetEndOfFile() to quickly extend files on Windows. And maybe we should make innodb_use_fallocate=1 the default?

Comment by Marko Mäkelä [ 2017-02-20 ]

bb-5.5-marko

Comment by Marko Mäkelä [ 2017-02-20 ]

Based on this Microsoft blog article “Why does my single-byte write take forever?” we should seek to the desired end of the file and write a single zero byte, to have Windows extend the file with zero bytes. There is absolutely no need to allocate a huge 1MiB buffer for extending the file.
Maybe in 10.2, we can remove the logic to handle posix_fallocate() failures by writing zeroes to the file.

Comment by Marko Mäkelä [ 2017-02-21 ]

The tests failed on all Windows platforms, so it definitely needs some more work.

Comment by Marko Mäkelä [ 2017-02-21 ]

There was a logic error in the patch. The revised patch works.
While looking at the surrounding code, I noticed that os_file_set_size() could also do with a smaller buffer on Windows.
And worse, I noticed that posix_fallocate() is misused. We should read the error code directly from the return value, because errno is not supposed to be set. This would avoid bogus error messages such as that in MDEV-12027 in the future.

Comment by Marko Mäkelä [ 2017-02-21 ]

In MariaDB Server 5.5, XtraDB contains Yasufumi Kinoshita’s patch to introduce fil_system->file_extend_mutex to address MySQL Bug #56433. I will port that also for the innodb_plugin as part of this fix. In MariaDB Server 10.x, there is no new mutex but instead a status flag, as described by Inaam Rana’s blog post.

Comment by Marko Mäkelä [ 2017-02-21 ]

Revised patch: bb-5.5-marko

Comment by Jan Lindström (Inactive) [ 2017-02-21 ]

ok to push.

Comment by Marko Mäkelä [ 2017-02-21 ]

I also ported the patch to 10.0, but will wait for buildbot results before pushing.
A port to 10.1 and 10.2 will require some more work. Either we should ensure that we are really allocating space into sparse files, also on Windows, or we should just logically extend the file when page_compression is enabled.

Comment by Marko Mäkelä [ 2017-02-21 ]

In 10.1 and 10.2 where InnoDB supports page compression via sparse files, we should extend the files using SetEndOfFile() on Windows, and on POSIX, either posix_fallocate() or ftruncate(), depending on whether the file is supposed to be sparse.

Comment by Marko Mäkelä [ 2017-02-22 ]

bb-10.1-marko implements the use of sparse files via ftruncate() on POSIX, for tables that use page_compression. On Windows, SetEndOfFile() is used, but because the files are not being declared as sparse, this will end up physically extending the files (preallocating, wasting space for all-zero, unused pages).

Comment by Marko Mäkelä [ 2017-02-22 ]

bb-10.1-marko employs SetFileInformationByHandle() and FILE_END_OF_FILE_INFO on Windows, and fixes some bugs in the error messages.

Comment by Marko Mäkelä [ 2017-03-03 ]

It looks like we should implement some retry logic for EINTR, to address a rpl.rpl_domain_id_filter test failure:

2017-03-02 18:25:53 139979823866624 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./test/t3.ibd, desired size 65536 Operating system error number 4. Check that the disk is not full or a disk quota exceeded. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html

Comment by Marko Mäkelä [ 2017-03-03 ]

I added a retry loop if posix_fallocate() returns EINTR.

Generated at Thu Feb 08 07:50:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.