I recently was performing an optimize on some InnoDB tables and experienced an issue that causes the process to hang without intervention. I was running the optimize on my 3 galera nodes while rolling through them with wsrep_osu_method set to RSU to not affect the running cluster. The Innodb_checkpoint_age keeps increasing until it gets to a maximum level that is about 90% of the Innodb_checkpoint_max_age at which point the process stops advancing (MySQL disk IO stops). I have to set innodb_idle_flush_pct to a non-zero value to get the checkpoint age down and allow the process to continue. The statistics I have of the last time I ran into this issue are too low resolution to be useful; I'll save everything useful that I can the next time I do an online DDL and update the issue, but hopefully this information is enough to start looking into the issue. The issue is really only a problem for larger tables (multiple GB) since the redo log has enough space for smaller tables.
I have tried adjusting the following options up and down to extremes, but only innodb_idle_flush_pct helps:
I have attached my lightly redacted server config file.