Sending replication events to clients causes multiple intermediate buffers to be used when the event could be loaded directly into memory that is sent to the client. Most of this can be fixed in 2.5 by adjusting the network packet creation code to be smarter.
In addition, a switch from std::vector to GWBUF would also fix the problem where the current use of std::vector::resize to allocate memory causes the new memory to be default-initialized before new data overwrites it. This means that for each replicated event there's an extra memset that isn't needed. A one-time measurement showed that this was about half of the CPU time spent on non-MaxScale code. This needs verification but the inefficiently is there regardless of its severity.