[MDEV-28772] Mariabackup locks database for minutes since 10.8.3 Created: 2022-06-08  Updated: 2023-06-30  Resolved: 2022-10-17

Status: Closed
Project: MariaDB Server
Component/s: mariabackup
Affects Version/s: 10.8.3
Fix Version/s: 10.8.6, 10.9.4, 10.10.2, 10.11.1

Type: Bug Priority: Major
Reporter: Vicente Rossello Jaume Assignee: Marko Mäkelä
Resolution: Fixed Votes: 3
Labels: regression

Issue Links:
Duplicate
is duplicated by MDEV-28955 MariaBackup is locking my database (a... Closed
is duplicated by MDEV-29631 Mariabackup blocks connections until ... Closed
Relates
relates to MDEV-28758 Mariabackup copies binary logs to bac... Closed

 Description   

We just upgraded mariadb from 10.7.3 to 10.8.3 and now backup locks the whole database for minutes, before the change the lock was only 3 or 4 seconds (no hardware change, beefy server).

This is the command we use for backup:

    mariabackup --backup --host="production.travelc.internal" --stream=xbstream  --target-dir /backup/dump --user someuser --password somepassword --use-memory=12288M | pigz -p 4 > /backup/dump/mariadbbackup-2022-06-08.gz

With 10.8.3 the lock block (executed at the end) is much slower now:

   11:45:25  PRODUCTION: [00] 2022-06-08 09:45:25 Acquiring BACKUP LOCKS...
    11:45:26  PRODUCTION: [00] 2022-06-08 09:45:25 Starting to backup non-InnoDB tables and files
    11:45:26  PRODUCTION: [01] 2022-06-08 09:45:25 Streaming ./travelc/powerbi_bookedhotelservice.frm to <STDOUT>
    11:45:26  PRODUCTION: [01] 2022-06-08 09:45:25         ...done
    11:45:26  PRODUCTION: [01] 2022-06-08 09:45:25 Streaming ./travelc/contract_apartment_room_facility.frm to <STDOUT>
    11:45:27  PRODUCTION: [00] 2022-06-08 09:45:27 >> log scanned up to (3530424892051)
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27         ...done
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27 Streaming ./travelc/promotioncode_use.frm to <STDOUT>
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27         ...done
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27 Streaming ./travelc/microsite_loyalty_settings.frm to <STDOUT>
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27         ...done
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27 Streaming ./travelc/atlante_pushed_cancelled_services.frm to <STDOUT>
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27         ...done
    11:45:27  PRODUCTION: [01] 2022-06-08 09:45:27 Streaming ./travelc/channel_rate_room_closed_to_arrival.frm to <STDOUT>
    11:45:30  PRODUCTION: [00] 2022-06-08 09:45:29 >> log scanned up to (3530424918499)
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29         ...done
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29 Streaming ./travelc/cruise_route_region.frm to <STDOUT>
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29         ...done
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29 Streaming ./travelc/cruiseline_included_i18n.frm to <STDOUT>
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29         ...done
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29 Streaming ./travelc/contract_closedtour_modality.frm to <STDOUT>
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29         ...done
    11:45:30  PRODUCTION: [01] 2022-06-08 09:45:29 Streaming ./travelc/contract_event.frm to <STDOUT>
    11:45:32  PRODUCTION: [00] 2022-06-08 09:45:32 >> log scanned up to (3530424949156)
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32         ...done
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32 Streaming ./travelc/channel_rate_room.frm to <STDOUT>
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32         ...done
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32 Streaming ./travelc/contract_private_tour_price.frm to <STDOUT>
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32         ...done
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32 Streaming ./travelc/vecilatam_airline_cash_payment.frm to <STDOUT>
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32         ...done
    11:45:32  PRODUCTION: [01] 2022-06-08 09:45:32 Streaming ./travelc/world2meet_hotel_provider_configuration_copy.frm to <STDOUT>
    11:45:35  PRODUCTION: [00] 2022-06-08 09:45:34 >> log scanned up to (3530424992606)
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34         ...done
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34 Streaming ./travelc/contract_hotel_rate_room_stop_sales.frm to <STDOUT>
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34         ...done
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34 Streaming ./travelc/trippaymentrefund.frm to <STDOUT>
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34         ...done
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34 Streaming ./travelc/hotel_roomingtemplates.frm to <STDOUT>
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34         ...done
    11:45:35  PRODUCTION: [01] 2022-06-08 09:45:34 Streaming ./travelc/contract_hotel_remarks_i18n.frm to <STDOUT>
    11:45:37  PRODUCTION: [00] 2022-06-08 09:45:37 >> log scanned up to (3530425039695)
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37         ...done
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37 Streaming ./travelc/gift_box_promotion_code.frm to <STDOUT>
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37         ...done
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37 Streaming ./travelc/europassistance_territorial_scope.frm to <STDOUT>
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37         ...done
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37 Streaming ./travelc/insurance_configuration_terms_i18n.frm to <STDOUT>
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37         ...done
    11:45:37  PRODUCTION: [01] 2022-06-08 09:45:37 Streaming ./travelc/carrental_office_i18n.frm to <STDOUT>
    11:45:40  PRODUCTION: [00] 2022-06-08 09:45:39 >> log scanned up to (3530425048820)
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39         ...done
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39 Streaming ./travelc/powerbi_providerconfiguration.frm to <STDOUT>
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39         ...done
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39 Streaming ./travelc/bookedeventservice.frm to <STDOUT>
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39         ...done
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39 Streaming ./travelc/contract_closedtour_supplement_i18n.frm to <STDOUT>
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39         ...done
    11:45:40  PRODUCTION: [01] 2022-06-08 09:45:39 Streaming ./travelc/promotion_code_country.frm to <STDOUT>
    11:45:42  PRODUCTION: [00] 2022-06-08 09:45:42 >> log scanned up to (3530425067126)
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42         ...done
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42 Streaming ./travelc/insurance_microsite.frm to <STDOUT>
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42         ...done
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42 Streaming ./travelc/pushbookingconfiguration.frm to <STDOUT>
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42         ...done
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42 Streaming ./travelc/qr_configuration_destination.frm to <STDOUT>
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42         ...done
    11:45:42  PRODUCTION: [01] 2022-06-08 09:45:42 Streaming ./travelc/repricing_hotel.frm to <STDOUT>
    11:45:45  PRODUCTION: [00] 2022-06-08 09:45:44 >> log scanned up to (3530425076298)
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44         ...done
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44 Streaming ./travelc/bookedtrippdf.frm to <STDOUT>
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44         ...done
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44 Streaming ./travelc/powerbi_destination.frm to <STDOUT>
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44         ...done
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44 Streaming ./travelc/tripnode.frm to <STDOUT>
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44         ...done
    11:45:45  PRODUCTION: [01] 2022-06-08 09:45:44 Streaming ./travelc/service_offer_supplier.frm to <STDOUT>
    11:45:47  PRODUCTION: [00] 2022-06-08 09:45:47 >> log scanned up to (3530425080390)
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47         ...done
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47 Streaming ./travelc/campaign.frm to <STDOUT>
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47         ...done
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47 Streaming ./travelc/transport_service_offer_continent_arrival.frm to <STDOUT>
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47         ...done
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47 Streaming ./travelc/themei18n.frm to <STDOUT>
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47         ...done
    11:45:47  PRODUCTION: [01] 2022-06-08 09:45:47 Streaming ./travelc/contract_transfer_prices.frm to <STDOUT>
    11:45:49  PRODUCTION: [00] 2022-06-08 09:45:49 >> log scanned up to (3530425099530)
    11:45:49  PRODUCTION: [01] 2022-06-08 09:45:49         ...done
    11:45:49  PRODUCTION: [01] 2022-06-08 09:45:49 Streaming ./travelc/atlante_additional_practice_references.frm to <STDOUT>
    11:45:49  PRODUCTION: [01] 2022-06-08 09:45:49         ...done
    ....

With 10.7.3 the whole lock block was just 3 seconds. Log sample is:

    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 Acquiring BACKUP LOCKS...
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040507593)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040507891)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040508531)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040508742)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040509143)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040509425)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040509738)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040510037)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040510330)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040510645)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040510929)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040511226)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040511868)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040512225)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040512527)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040512810)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040513123)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040513406)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040513703)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040514143)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040514747)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040515031)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040515354)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040515653)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040516058)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040516140)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040517523)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040517852)
    00:24:11  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040518356)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040518828)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040518877)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040519296)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040519576)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040519868)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040520163)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040520464)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040520760)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040521271)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040521507)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040521695)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040522017)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040522313)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040522608)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040522860)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040523216)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040523568)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040524062)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040524486)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040524791)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040525105)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040525527)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:11 >> log scanned up to (3495040525838)
    00:24:12  PRODUCTION: 2022-06-02 22:24:12 0 [Note] InnoDB: Read redo log up to LSN=3495040526336
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040525838)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040526125)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040526425)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040526766)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040527076)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040527376)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040527668)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040527983)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 Starting to backup non-InnoDB tables and files
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/powerbi_bookedhotelservice.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_apartment_room_facility.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/promotioncode_use.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/microsite_loyalty_settings.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/atlante_pushed_cancelled_services.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/channel_rate_room_closed_to_arrival.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/cruise_route_region.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/cruiseline_included_i18n.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_closedtour_modality.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_event.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/channel_rate_room.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_private_tour_price.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/vecilatam_airline_cash_payment.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/world2meet_hotel_provider_configuration_copy.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_hotel_rate_room_stop_sales.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/trippaymentrefund.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/hotel_roomingtemplates.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_hotel_remarks_i18n.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/gift_box_promotion_code.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/europassistance_territorial_scope.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/insurance_configuration_terms_i18n.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/carrental_office_i18n.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/powerbi_providerconfiguration.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/bookedeventservice.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_closedtour_supplement_i18n.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/promotion_code_country.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/insurance_microsite.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/pushbookingconfiguration.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/qr_configuration_destination.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/repricing_hotel.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/bookedtrippdf.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/powerbi_destination.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/tripnode.frm to <STDOUT>
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/service_offer_supplier.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/campaign.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/transport_service_offer_continent_arrival.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/themei18n.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/contract_transfer_prices.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12 Streaming ./travelc/atlante_additional_practice_references.frm to <STDOUT>
    00:24:12  PRODUCTION: [00] 2022-06-02 22:24:12 >> log scanned up to (3495040528266)
    00:24:12  PRODUCTION: [01] 2022-06-02 22:24:12         ...done

We tried purging all binary logs at the beginning with no luck. Where can we look at?



 Comments   
Comment by Marko Mäkelä [ 2022-06-08 ]

The frequent log scanned up to messages were output due to the bug MDEV-27919. Will the 10.8.3 backup complete eventually? What if you specify a shorter --log-copy-interval?

Note that there is no easy way to downgrade from 10.8 due to MDEV-14425 and MDEV-27199.

Comment by Vicente Rossello Jaume [ 2022-06-08 ]

10.8.3 completes in about 4 minutes (this is just the lock phase), it was 4 seconds before.

Log size is not a problem. The test with log-copy-interval is to reduce the logs, right? I mean, it will not have any impact on the lock speed, will it?

Comment by Vicente Rossello Jaume [ 2022-06-13 ]

We can see that the problem is caused by the binary logs, that are now much bigger than before (I don't have the numbers here).

Lock duration with mariadb 10.8.3 look like these:

  • Backup with binary logs (3 days retenetion): about 2.5 minutes
  • Backup purging all the binary logs at the beginning: 14 seconds
  • Backup deactivating binary logs: 8 seconds

With 10.7.3:

  • Backup with binary logs (3 days retenetion): 10 seconds
Comment by Stefan May [ 2022-06-17 ]

I believe this might be the same bug as described in MDEV 28758. Some of the most up-to-date versions of mariabackup include binary logs into the backup directory, resulting in bloated backups and (depending on binlog size) massively increased lock durations.

Comment by Nuno [ 2022-06-27 ]

Same happening to me. My database is being locked for 30 minutes every day, causing a huge downtime.

I raised MDEV-28955 with many details.

The difference is that I don't have binary logs enabled, so I don't think that's the cause for me.

Comment by Marko Mäkelä [ 2022-06-27 ]

cocorossello and nunop, did you try specifying a shorter --log-copy-interval like I suggested? If an attempt to copy recently written logs is made less frequently, the backup may take a much longer time. The size of the backed up ib_logfile0 depends on how many persistent changes were made between the latest log checkpoint (at the start of the backup) and the end of the backup.

Comment by Nuno [ 2022-06-27 ]

Thanks marko

I can try, but I wonder if that will just reduce the downtime from 30 minutes to a lower time, but still not "fix" the problem.

I didn't have this problem with 10.5. Do you know if this is a bug, or something that we can't go around anymore?

I see there's an option "--no-backup-locks" that I wonder if I could use, even if temporarily: https://mariadb.com/kb/en/mariabackup-options/#-no-backup-locks

But it says: "This option may eventually be removed. See MDEV-19753 for more information."
I would like to read more about why, but MDEV-19753 redirects to MENT-1084, which I can't access.

Thank you!

Comment by Ian Gilfillan [ 2022-06-27 ]

You can view the archive at https://web.archive.org/web/20200809201840/https://jira.mariadb.org/browse/MDEV-19753 - this appears to refer only to a Percona Server feature, so it doesn't seem likely to help

Comment by Marko Mäkelä [ 2022-06-27 ]

nunop, it is unclear where the time is being consumed, and I am not convinced that the problem is the time needed for acquiring the locks. Since you are not using binlog, you can’t be affected by MDEV-28758. Many other users are. I would strongly recommend that you try specifying --log-copy-interval=1 or similar. The default was restored to 1 second; it accidentally was 1 millisecond in 10.6 and later, before MDEV-27919 was fixed.

Comment by Nuno [ 2022-06-27 ]

Thanks marko

I would strongly recommend that you try specifying --log-copy-interval=1 or similar

I'll have to postpone this test to tomorrow morning, as now would be too late to make this test (and unnecessarily cause a downtime if it happens again)

the default was restored to 1 second; it accidentally was 1 millisecond in 10.6 and later,

In my raised issue - MDEV-28955 - I mention I was on 10.5, where this didn't happen. So the accidental change you mention on 10.6 (MDEV-27919) didn't apply to me.

it is unclear where the time is being consumed

If it helps, I can provide my full logs to you, but I'd prefer if I could send it through some Private Message or email address, rather than fully public here.

I can understand how "--log-copy-interval=1" would at least reduce the amount of time taken by "log scanned up to", though.
Are there any cons in putting such a short time? (I wonder why it isn't the default)

But again I say, on 10.5 I can see that "log scanned up to" was once per second, yet the database wasn't locking as is now on 10.8.

There were, however, a lot less "log scanned up to" from what I can see, though (not sure why there are so many now on 10.8),
but I'm certain that the database wasn't locking during these 2 minutes the backup was taking (while on 10.5)

Thank you very much!

Comment by Vicente Rossello Jaume [ 2022-06-27 ]

I will try --log-copy-interval=1 tonight as suggested. I just thoght that this interval affected only to that frequent "scanned up to..." log

Comment by Nuno [ 2022-06-28 ]

Hi marko

I tried now running:

/usr/bin/mariabackup --backup --parallel=4 --log-copy-interval=1 --target-dir=/backup/mysql/

LOADS of log scanned up to entries appearing really fast, and the backup took 1:30 minutes to run.
Absolutely no locked queries while this happened...

I understand how --log-copy-interval=1 made it faster, but I don't understand how that would resolve the database locks???

And is there any cons in keeping --log-copy-interval=1 ? (why is it not the default?)

Thank you very much.

Comment by Vicente Rossello Jaume [ 2022-06-28 ]

I also tried that without much luck. First I tried with log-copy-interval=1, but after 70 minutes I aborted the backup because it didn't finish.

Then I run with log-copy-interval=100 and the results were not good, almost 1 minute of locks.

log-copy-interval=100

  • mariabackup --backup --log-copy-interval=100 --stream=xbstream --target-dir /backup/dump --user ... --password ... --use-memory=12288M

[00] 2022-06-27 22:16:29 Connecting to MariaDB server host: production.travelc.internal, user: onlyscripts, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
[00] 2022-06-27 22:16:29 Using server version 10.8.3-MariaDB-1:10.8.3+maria~jammy-log
mariabackup based on MariaDB server 10.8.3-MariaDB debian-linux-gnu (x86_64)

[00] 2022-06-28 01:30:36 Acquiring BACKUP LOCKS...
[00] 2022-06-28 01:30:37 >> log scanned up to (3665225576819)

[00] 2022-06-28 01:31:34 >> log scanned up to (3665226113706)
[00] 2022-06-28 01:31:34 Executing BACKUP STAGE END
[00] 2022-06-28 01:31:34 All tables unlocked

Full backup duration: 3 hours 16 minutes

no log copy interval:

  • mariabackup --backup --stream=xbstream --target-dir /backup/dump --user ... --password ... --use-memory=12288M

[00] 2022-06-26 22:11:40 Connecting to MariaDB server host: production.travelc.internal, user: onlyscripts, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
[00] 2022-06-26 22:11:40 Using server version 10.8.3-MariaDB-1:10.8.3+maria~jammy-log
mariabackup based on MariaDB server 10.8.3-MariaDB debian-linux-gnu (x86_64)

[00] 2022-06-26 22:43:46 Acquiring BACKUP LOCKS...
[00] 2022-06-26 22:43:47 Starting to backup non-InnoDB tables and files
[00] 2022-06-26 22:43:47 >> log scanned up to (3656370393330)

[00] 2022-06-26 22:43:58 Executing BACKUP STAGE END
[00] 2022-06-26 22:43:58 All tables unlocked
[00] 2022-06-26 22:43:58 Streaming /var/lib/mysql/ib_buffer_pool to <STDOUT>

Full backup duration: 32 minutes

Comment by Marko Mäkelä [ 2022-06-28 ]

The purpose of the backup locks is to prevent any changes to non-InnoDB tables. mariadb-backup is only treating the InnoDB log specially; the Aria storage engine would rely on backup locks. As far as I understand, the problem here is that the steps executed while holding the backup lock are taking longer than expected or acceptable. InnoDB performs some writes also internally (for example when purging the history of committed transactions), so even when the strictest backup lock is being held, some InnoDB writes are possible. That is why we must keep copying the ib_logfile0 in the final phase.

nunop, I see that your question was answered by cocorossello: Too frequent polling of the log file may be a bad idea. One size does not fit all.

cocorossello, if you are enabling the binary log, to work around MDEV-28758 you could try specifying --sst-max-binlogs=1 so that less data will be copied.

Alternatively, you can use the mariadb-backup executable from the 10.8.2 distribution, for executing the --backup step.

Reverting mariadb-backup to 10.8.2 would also revert --log-copy-interval to its buggy meaning (counting microseconds, not milliseconds).

For the --prepare step, I think that it is better to use the 10.8.3 version, because several recovery bugs were fixed since 10.8.2.

I have one more possible explanation or solution. In 10.8.3, MDEV-28111 made the InnoDB redo log use O_DIRECT by default. That would essentially disable the file system cache between mariadbd and mariadb-backup and could ruin the performance. MDEV-28766 (in the upcoming 10.8.4 release) would introduce the parameter innodb_log_file_buffering. It could be a good idea to execute SET GLOBAL innodb_log_file_buffering=OFF; before starting a backup and SET GLOBAL innodb_log_file_buffering=ON; after it. If this is the problem that you both are facing, then maybe this logic should be made part of mariadb-backup itself.

In MariaDB 10.8.3, to disable the use of O_DIRECT for the redo log, you must set innodb_flush_method to something else than O_DIRECT, O_DIRECT_NO_FSYNC, or O_DSYNC. Before MDEV-24854 in 10.6, the default was innodb_flush_method=fsync. This setting will also affect data page writes.

I suggest that you try innodb_flush_method=fsync on both mariadbd and mariadb-backup to check if enabling the file system cache on the ib_logfile0 fixes the performance regression.

Comment by Nuno [ 2022-06-28 ]

Hi marko

Thank you for your reply.

For many many years, I've been using O_DIRECT, and I never had this database lock problem up to v10.5. Only now that I moved to 10.8.3, this problem started happening.

To me, everything indicates that something with 10.8 isn't behaving well, which on 10.5 was working perfectly.

> It could be a good idea to execute SET GLOBAL innodb_log_file_buffering=OFF; before starting a backup and SET GLOBAL innodb_log_file_buffering=ON; after it.

I suppose I can try this, when 10.8.4 is released. (although I hope --log-copy-interval=1 alleviates the issue in the meantime.

Thank you.

Comment by Marko Mäkelä [ 2022-06-28 ]

nunop, before MDEV-14425 in 10.8 replaced the 512-byte log block structure, it was not possible or feasible to enable O_DIRECT on the redo log, only on the data files. To enable O_DIRECT, we must be able to detect the physical block size, which usually is 512 or 4096 bytes.

Comment by Nuno [ 2022-06-28 ]

marko - thank you for your reply. I'm not sure what do you mean with that (sorry). Are you saying that O_DIRECT was not doing anything before 10.8 (meaning that even though I had this option enabled, it was not taking effect), or that I need to do something now to be able to accommodate this problem on 10.8.3?

Sorry for the confusion.

Comment by Marko Mäkelä [ 2022-06-28 ]

nunop, I mean that before MDEV-28111 in MariaDB Server 10.8.3, O_DIRECT was never enabled on log files. (To be exact, something equivalent was enabled on Microsoft Windows if the physical block size of the log file was detected to be 512 bytes.) Settings like innodb_flush_method=O_DIRECT only affected the reads and writes of data files.

Comment by Nuno [ 2022-06-28 ]

marko Thanks. That's clear now!

In MDEV-28766, I read this:

O_DIRECT should not be used when innodb_flush_log_at_trx_commit = 2 is set.

I actually always had both of these enabled (O_DIRECT and "innodb_flush_log_at_trx_commit = 2"), for many years.
Could this be one of the reasons for what's happening now?

(what is the reason that O_DIRECT should not be used when innodb_flush_log_at_trx_commit = 2 is set, btw?)

I actually also just raised MDEV-28974 with another problem with mariabackup (cannot be prepared) - not sure if that's another consequence of the same issues.

Thank you very much.

Comment by Marko Mäkelä [ 2022-06-28 ]

nunop, that opening statement in MDEV-28766 did not seem to hold on my Intel Optane 960 NVMe PCIe drive. Based on the performance tests that I ran on my system (you can find details in the ticket), I did not identify any simple rule that would work with all 3 types of storage that I had at my disposal. Therefore, I thought that it is simplest to control the {{O_DIRECT}}ness of the redo log file with a separate Boolean parameter. While I was at it, I made it changeable without a server restart.

Can you please try to test if disabling O_DIRECT on the redo log (in 10.8.3, you’d have to use innodb_flush_method=fsync) would fix the problem for you? That is, would that work as an alternative to shortening the --log-copy-interval?

Comment by Vicente Rossello Jaume [ 2022-06-29 ]

I just tried using mariabackup 10.8.2 and the results were not good

full docker command :

echo "apt update; apt install pigz; mariabackup --backup --host="somehost.internal" --parallel=4 --stream=xbstream --target-dir /backup/dump --user ... --password ... --use-memory=12288M | pigz -p 4 > /backup/dump/mariadbbackup-2022-06-29.gz" | docker run --name mariabackup -v /services/mysql/data/mysql:/var/lib/mysql -v /opt/backups/mariadb/dump/:/backup/dump -v /services/mysql/conf:/etc/mysql/conf.d:ro --rm -i --entrypoint sh mariadb:10.8.2

[00] 2022-06-29 07:20:55 Connecting to MariaDB server host: production.travelc.internal, user: onlyscripts, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
[00] 2022-06-29 07:20:55 Using server version 10.8.3-MariaDB-1:10.8.3+maria~jammy-log
mariabackup based on MariaDB server 10.8.2-MariaDB debian-linux-gnu (x86_64)
[00] 2022-06-29 07:20:55 uses posix_fadvise().
[00] 2022-06-29 07:20:55 cd to /var/lib/mysql/

[00] 2022-06-29 07:32:54 Acquiring BACKUP LOCKS...
[00] 2022-06-29 07:32:54 Starting to backup non-InnoDB tables and files
[01] 2022-06-29 07:32:54 Streaming ./travelc/powerbi_bookedhotelservice.frm to <STDOUT>
[00] 2022-06-29 07:32:54 >> log scanned up to (3677498865960)
[01] 2022-06-29 07:32:54 ...done

[00] 2022-06-29 07:33:18 Executing BACKUP STAGE END
[00] 2022-06-29 07:33:18 All tables unlocked
[00] 2022-06-29 07:33:18 Streaming ib_buffer_pool to <STDOUT>

so backup full time is much better at 13 minutes (half) but lock time is 24 seconds, double than 10.8.3.

Maybe I should specify log-copy-interval=1000000 to get the same log copy interval behaviour as with 10.8.3?

I cannot test changing innodb_log_file_buffering as I would have to restart production server and I can't do it unplanned. Another fact is that lock backup times are more or less 2 seconds in the dev environment (just like with 10.7.3), I only get degraded lock time performance in the live server. (I disabled binary logs in the production server when this started to happen, with binary logs lock time is more than 1 minute, I haven't tested with binary logs since then)

Comment by Nuno [ 2022-06-29 ]

Hey

I was waiting for this morning to be able to change to innodb_flush_method = fsync as that requires a restart.

Then I ran /usr/bin/mariabackup --backup --parallel=4 --target-dir=/backup/mysql/ and it went fine - took just 1 minute to run everything and database wasn't locked during that period, but I suspect that because this was right after a restart, the checkpoint has already been made. I don't see the usual table renames, and not so many log scanned up to.

I'll try again tomorrow and let you know, marko.

Thank you

Comment by Marko Mäkelä [ 2022-06-29 ]

Thank you both. Currently it seems like the fix would be SET GLOBAL innodb_log_file_buffering=OFF during the last phase of backup. It could be done automatically by mariadb-backup itself. Let’s see how it turns out for nunop tomorrow.

Comment by Marko Mäkelä [ 2022-06-29 ]

cocorossello, yes, you should try to specify --log-copy-interval=1000000 to the 10.8.2 backup, to get the 1-second copying interval that was intended.

Comment by Vicente Rossello Jaume [ 2022-06-29 ]

Thanks Marko, with 10.8.2 and --log-copy-interval=1000000 I get a 3 second lock, more or less same lock as with 10.7.3.

I will try innodb_log_file_buffering patch when available.

Comment by Nuno [ 2022-06-30 ]

Hello marko

No luck... it seemed to be going very well for a while, but after 1 or 2 minutes, everything started to get locked, with:

Waiting for backup lock

As a reminder, I have:

innodb_doublewrite              = 1
innodb_flush_method             = fsync

I ran:
/usr/bin/mariabackup --backup --parallel=4 --target-dir=/backup/mysql/

It was doing those "loops" of "DDL tracking" and "log scanned up to" for a while, until it started locking all the INSERT/UPDATE queries, so I killed the process right away.

Then I added --log-copy-interval=1,
loads of "log scanned up to" appearing for a while, and in about 1 minute, the backup finished and no locks detected.

Then tried to run it again, with --log-copy-interval=1
And this time I did notice for 1 or 2 seconds, some queries locked with Waiting for backup lock, but wasn't so bad. Completed quickly.

EDIT - sorry - the last time I ran was still with --log-copy-interval=1.
I clarified above.

Comment by Marko Mäkelä [ 2022-06-30 ]

nunop, thank you. Your issue does not seem to be the O_DIRECT then. It might be that thanks to MDEV-14425 and MDEV-27774, the 10.8.3 server can generate log faster than earlier versions, and the 1-second interval is not optimal or sufficient for your workload.

Comment by Nuno [ 2022-06-30 ]

Thank you marko. So, what do you think could be the solution for me? Continue to use --log-copy-interval=1, or eventually try SET GLOBAL innodb_log_file_buffering=OFF ?

Comment by Marko Mäkelä [ 2022-07-01 ]

nunop, I think that --log-copy-interval=1 could be the right thing for you. It looks like we have to keep this ticket open until 10.8.4 is available and further experiments with SET GLOBAL innodb_log_file_buffering can be performed.

mleich just told me that he is running into something similar when testing on a PMEM device (but not on "fake PMEM", that is, using mmap on a log file in /dev/shm). At the default 1-second interval, the backup could not keep up with the server. It is worth noting that back when we tested MDEV-14425. the log copying interval was 1 millisecond instead of the intended 1 second, and everything worked fine. He got into this because of MDEV-28994, which is some kind of a race condition between backup and the server when using a memory-mapped log. We are still narrowing it down.

I think that the proper fix to all this trouble would be to let the server process be responsible for duplicating its log (something that is a part of MDEV-14992).

Comment by Nuno [ 2022-07-01 ]

Thank you very much for explaining, marko

Yeah, I'm also interested on that MDEV-14992 (I'm watching it). Great job.

Hopefully cocorossello can also find a way to resolve the situation.

For now, I think I'm good, and I guess MDEV-28955 can be closed (maybe it's a duplicate of this one, at this point).
I can give innodb_log_file_buffering a try when 10.8.4 is released.
Although, if there are no side-effects of using --log-copy-interval=1, then I'm happy to keep it, and the backup runs faster as well.

Thank you!

Comment by Vicente Rossello Jaume [ 2022-07-01 ]

We are fine at the moment. The suggested workaround of using mariabackup 10.8.2 and log-copy-interval=1000000 is enough for us at this moment

Comment by Marko Mäkelä [ 2022-08-02 ]

I expect 10.8.4 and other quarterly MariaDB server releases to be out within a week or so. While I remember, I’m requesting feedback if this works in 10.8.4 when using suitable settings.

Comment by Nuno [ 2022-08-02 ]

Thank you very much marko.

I have a number of changes to try, pending the 10.8.4 release. And then I'll give feedback.
Thanks for giving me some estimate. I thought about asking a few times about some date, but just kept waiting a bit longer anyway

The only other problem I keep having for now, is the swap usage - MDEV-29097
At least I found that the raise seems to happen every time mariabackup runs! But I have no idea why, or how further I can investigate.

Thanks again!
Have a great day.

Comment by Nuno [ 2022-08-16 ]

marko
Tomorrow I'm going to restart MariaDB with the following options:

innodb_page_size = 16384
innodb_doublewrite = 1
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2
innodb_log_file_buffering = ON   (default value for innodb_flush_log_at_trx_commit = 2)

Then the idea is to experiment MariaBackup with these, and see if all is good.

I can try to experiment also removing the --log-copy-interval=1, with innodb_log_file_buffering = ON
And if it locks the database, I'll SET GLOBAL innodb_log_file_buffering=OFF and see what happens.

Comment by Nuno [ 2022-08-18 ]

Backup ran OK with the configurations above, with --log-copy-interval=1

Tomorrow I'm going to run it manually, without --log-copy-interval=1 as planned.

I see also that MDEV-28781 is fixed, so I'll look into adding --rsync back as well.

Comment by Nuno [ 2022-08-19 ]

Hi marko

Ran:

/usr/bin/mariabackup --backup --parallel=4 --target-dir=/backup/mysql/

And eventually started getting many queries locked with Waiting for backup lock, so I killed it.

Then I ran:

SET GLOBAL innodb_log_file_buffering=OFF

and confirmed the variable value changed.

Tried again:

/usr/bin/mariabackup --backup --parallel=4 --target-dir=/backup/mysql/

And eventually started getting many queries locked with Waiting for backup lock, again, so I killed it.

So, unfortunately, innodb_log_file_buffering=OFF doesn't seem to do it.

Or do you mean I need to leave innodb_log_file_buffering=OFF for 1 day after a successful backup, to be able to do the test properly?

Comment by Nuno [ 2022-08-20 ]

Hi marko

One full day of innodb_log_file_buffering=OFF since last successful backup.

I ran the backup again with the default --log-copy-interval, and got Waiting for backup lock.

MariaBackup runs fine for a while, always, until it reaches a log line that says:

Waiting for log copy thread to read lsn 10609231931267

Until then, it wasn't locking. I notice the same behavior always. Once that log line appears, the locks start happening.

If I use --log-copy-interval=1, I only notice the Waiting for backup lock at the very end of the backup, but it's so fast that it doesn't affect anyone on the website.

Anyway – I'll go ahead and keep --log-copy-interval=1, and the default value for innodb_log_file_buffering.

Thank you.

Comment by Marko Mäkelä [ 2022-09-19 ]

nunop, thank you. Your findings are in line with what I wrote on 2022-06-28. Because of the "One size does not fit all" observation, I do not think that we should change any default parameters.

Because of recovery regressions (MDEV-29374, MDEV-29383), I cannot recommend 10.8.4, as you know. An unscheduled batch of releases including 10.8.5 should become available for download this week.

Comment by Nuno [ 2022-09-19 ]

Thank you marko, and no problem. I understand. And great to know that 10.8.5 is coming this week!

I guess the only problem outstanding that I have is the swap issue described in MDEV-29097.
I can see if that improves in 10.8.5, but based on the release notes, it probably will not - https://mariadb.com/kb/en/mariadb-1085-release-notes/

Thanks very much!

Comment by acsfer [ 2022-09-25 ]

MariaDB 10.8.5 doesn't solve this; I've saw this ticket after, so I've created MDEV-29631 (feel free to close it if it is duplicate)

Comment by Nuno [ 2022-09-25 ]

acs-ferreira - have you tried to add --log-copy-interval=1 to the command? It resolved the issue for me, and sped up the backup time as well.

Comment by acsfer [ 2022-09-25 ]

Ive tried it. It blocks here :

[00] 2022-09-25 22:22:27 Waiting for log copy thread to read lsn 59357301845
[00] 2022-09-25 22:22:27 >> log scanned up to (59357301845)
[00] 2022-09-25 22:22:27 >> log scanned up to (59357301845)
[00] 2022-09-25 22:22:27 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
[00] 2022-09-25 22:22:27 mariabackup: The latest check point (for incremental): '57455996308'
mariabackup: Stopping log copying thread.[00] 2022-09-25 22:22:27 >> log scanned up to (59357301845)

Comment by Nuno [ 2022-09-25 ]

For me too, but it's for just 1 second or so. No queries timeout and users don't notice.
Is it locking for longer, to you? (or, 1 second not being acceptable to your usage)

Comment by acsfer [ 2022-09-25 ]

I've stopped the process after ~20sec of locking. We're talking about a nvme disk + 256Gb RAM on a 8Gb dataset (this is a test server).

This has to be a bug/major regression somewhere as the same dataset on a 10.4 server has no such behavior.

Comment by Nuno [ 2022-09-25 ]

Ah ok, that's very strange then!

Yeah, I was on 10.5 before, also in an NVMe with 128GB RAM and never had a problem.

I also started getting these locks (and the website down for several minutes due to everything locking), but adding --log-copy-interval=1 resolved it for me.

One thing I know is that MariaDB says that:

If you would like to use a version of Mariabackup that uses the BACKUP STAGE commands in an efficient way, then one option is to use MariaDB Enterprise Backup that is bundled with MariaDB Enterprise Server.

https://mariadb.com/kb/en/backup-stage/

Comment by Vicente Rossello Jaume [ 2022-09-25 ]

Have you tried using mariabackup from 10.8.2 and --log-copy-interval=1000000 ?

For me this worked.

Comment by Marko Mäkelä [ 2022-10-03 ]

acs-ferreira, in the recently released 10.8.5, would SET GLOBAL innodb_log_file_buffering=OFF work for you? (Note: I recommend avoiding 10.8.4; see MDEV-29364.)

cocorossello, thank you for the idea to try the 10.8.2 mariadb-backup for the --backup part. Does it work for you even when the server runs with SET GLOBAL innodb_log_file_buffering=ON?

On Linux or Windows, a later version of mariadb-backup would likely attempt to read the server’s log file in O_DIRECT mode. Because the backup operation is basically polling the log file for updates, it could make sense to open that file in buffered mode, so that if nothing was changed, the unchanged data would be returned from the file system cache. Currently, there is no way to specify innodb_log_file_buffering=OFF for the backup process. An even better solution might be to invoke the Linux system call ppoll(2) or poll(2) in the polling loop, to only read from the file when it has been changed.

Comment by acsfer [ 2022-10-04 ]

mysql -e "SET GLOBAL innodb_log_file_buffering=OFF"
mariabackup --backup --log-copy-interval=1 ....
mysql -e "SET GLOBAL innodb_log_file_buffering=ON"

So setting both

--log-copy-interval=1

and

innodb_log_file_buffering=OFF

works fine on 10.8.5. Missing either option results in long blocking.

Comment by Marko Mäkelä [ 2022-10-05 ]

acs-ferreira, thank you, this like I expected. I did not debug mariadb-backup yet, so I do not know for sure if it would read the server’s ib_logfile0 via O_DIRECT in this case, but I assume that it would. I think that the 1-millisecond polling interval that worked for you can cause trouble for others who have different type of storage.

Could you please also test the 10.8.5 server with the default innodb_log_buffering=OFF and a copy of mariadb-backup from MariaDB Server 10.8.2, for executing the --backup part, like cocorossello suggested? In that version, the log copying interval was incorrectly specified in microseconds instead of milliseconds (MDEV-27919), so you have to specify it as 1000000 to get the intended default value of 1000 milliseconds (1 second).

I’d appreciate it if all of you (acs-ferreira, nunop, cocorossello) could conduct this experiment.

Comment by acsfer [ 2022-10-05 ]

Marko, where can I find mariadb-backup from MariaDB Server 10.8.2 binary for aarch64?

Comment by Marko Mäkelä [ 2022-10-05 ]

acs-ferreira, I did not get any response from the maintainers of the package repositories yet. I see that the web UI only offers the latest version in each series. Which package or GNU/Linux distribution are you using? It should be possible to get the package straight from a repository mirror. I successfully did that to get some older .deb or .rpm for analyzing some core dumps or checking the addresses in some stack traces.

Comment by Marko Mäkelä [ 2022-10-05 ]

You should find packages of all types in https://archive.mariadb.org/mariadb-10.8.2/. For example for Debian, the .deb files for all CPU architectures and supported Debian releases are in https://archive.mariadb.org/mariadb-10.8.2/repo/debian/pool/main/m/mariadb-10.8/.

Comment by acsfer [ 2022-10-05 ]

Ubuntu 22.04 Jammy. Can't find the binary at https://archive.mariadb.org/mariadb-10.8.2/repo/ubuntu/pool/main/m/mariadb-10.8/

Comment by Nuno [ 2022-10-05 ]

One of these packages, maybe?

mariadb-backup_10.8.2+maria~bionic_amd64.deb 6.3 MiB 2022-Feb-12 03:42
mariadb-backup_10.8.2+maria~bionic_arm64.deb 5.9 MiB 2022-Feb-12 03:48
mariadb-backup_10.8.2+maria~bionic_ppc64el.deb 6.4 MiB 2022-Feb-12 03:43
mariadb-backup_10.8.2+maria~focal_amd64.deb 6.3 MiB 2022-Feb-12 03:51
mariadb-backup_10.8.2+maria~focal_arm64.deb 5.9 MiB 2022-Feb-12 03:54
mariadb-backup_10.8.2+maria~focal_ppc64el.deb 6.4 MiB 2022-Feb-12 03:52
mariadb-backup_10.8.2+maria~focal_s390x.deb 5.8 MiB 2022-Feb-12 03:57
mariadb-backup_10.8.2+maria~impish_amd64.deb

Comment by acsfer [ 2022-10-05 ]

None of them are for Jammy. I can find such packages on the 10.8.3+ branch, maybe before Jammy packages where not available?

Comment by Ralf Gebhardt [ 2022-10-05 ]

Correct, Ubuntu 22.04 GA was after MariaDB Community Server 10.8.2 was released

Comment by acsfer [ 2022-10-05 ]

Thanks Ralf. So i'm sorry but I can't test your case Marko. My only staging machine runs on Jammy+ARM64.

Comment by Marko Mäkelä [ 2022-10-05 ]

I could provide a custom build once I get to implement this. I would first like to hear from nunop and cocorossello and anyone else if disabling O_DIRECT in mariadb-backup --backup (but enabling it in the server) would be the right fix. That is the purpose of the 10.8.2 experiment.

Comment by Vicente Rossello Jaume [ 2022-10-05 ]

I'm testing this now. If I understand you want me to test with 10.8.5 in the server and 10.8.2 in the backup. Something like:

  echo "apt update; apt install pigz; mariabackup --backup --log-copy-interval=1000000 --host="mariadb-host" --parallel=4 --stream=xbstream  --target-dir /backup/dump --user xxxx --password xxxx --use-memory=12288M | pigz -p 4 > /backup/dump/mariadbbackup-2022-10-04.gz" | docker run --name mariabackup -v /services/mysql/data/mysql:/var/lib/mysql -v /opt/backups/mariadb/dump/:/backup/dump -v /services/mysql/conf:/etc/mysql/conf.d:ro --rm -i --entrypoint sh mariadb:10.8.2

Comment by Marko Mäkelä [ 2022-10-05 ]

cocorossello, yes, thank you. I am not familiar with Docker or containers, but as far as I understand your command, it should work.

My old-school approach would have been to extract the mariadb-backup executable and any needed libraries from the 10.8.2 package to a custom directory and invoke it from there. I would think that also the environment variable LD_LIBRARY_PATH would have to be set to point to the libraries extracted from the package.

Comment by Nuno [ 2022-10-05 ]

I'll try to test whenever I can, if not today, I'll try tomorrow.

So is it two experiments?

1) disable O_DIRECT in mariadb-backup --backup on 10.8.5

2) run 10.8.2's mariadb-backup --backup --log-copy-interval=1000000, with innodb_log_buffering=OFF

Is my understanding correct?

How do I disable O_DIRECT just for maria-backup, but keep on the server?
Can it be done on my.cnf?

Seems this is the one I need (for AlmaLinux 8):
https://archive.mariadb.org/mariadb-10.8.2/yum/centos8-amd64/rpms/MariaDB-backup-10.8.2-1.el8.x86_64.rpm

Comment by Vicente Rossello Jaume [ 2022-10-05 ]

Didn't see much difference with innodb_log_buffering:

10.8.3 server with 10.8.2 backup (base): 4 seconds lock

[00] 2022-10-05 11:58:11 Acquiring BACKUP LOCKS...
[00] 2022-10-05 11:58:15 All tables unlocked

10.8.5 server, innodb_log_file_buffering=OFF (default) and 10.8.2 backup: 2 seconds lock

[00] 2022-10-05 12:27:33 Acquiring BACKUP LOCKS...
[00] 2022-10-05 12:27:35 All tables unlocked

10.8.5 server, innodb_log_file_buffering=ON and 10.8.2 backup: 4 seconds lock

[00] 2022-10-05 12:44:43 Acquiring BACKUP LOCKS...
[00] 2022-10-05 12:44:47 All tables unlocked

10.9.3 server, innodb_log_file_buffering=OFF (default) and 10.8.2 backup: 2 seconds lock

[00] 2022-10-05 13:08:54 Acquiring BACKUP LOCKS...
[00] 2022-10-05 13:08:56 All tables unlocked

Comment by Marko Mäkelä [ 2022-10-10 ]

As far as I understood (I did not debug it), there is no way to disable the O_DIRECT for reading the server’s ib_logfile0 in the 10.8.5 mariadb-backup. The purpose of the experiment with the 10.8.2 mariadb-backup is to copy the log without involving O_DIRECT.

nunop, I’d like to see 2 experiments, both with 10.8.2's mariadb-backup --backup --log-copy-interval=1000000 and the 10.8.5 server. The difference would only be on the server:

SET GLOBAL innodb_log_file_buffering=OFF;
# experiment 1
SET GLOBAL innodb_log_file_buffering=ON;
# experiment 2
SET GLOBAL innodb_log_file_buffering=OFF;
# back to the default setting

I am hoping that the performance will be acceptable when using the default innodb_log_file_buffering=OFF on the server.

Comment by Marko Mäkelä [ 2022-10-14 ]

nunop, cocorossello, acs-ferreira and any others, please test this fix. I checked in a debugger that mariadb-backup --backup will skip the O_DIRECT logic, in os_file_create_func(). What I did not test is whether the buffering is disabled on Microsoft Windows. Maybe wlad could test this and suggest an improvement if needed?

Installation packages for this build will be available for some time here as soon as they have been created, within a few hours.

Comment by Nuno [ 2022-10-14 ]

marko - Many apologies, this week was a really busy one and I didn't get a chance to do the other test. Thanks for this, I'll see if I can test tomorrow morning or lunch time, this mariadb-backup build. Cheers!

Comment by Nuno [ 2022-10-16 ]

Hey marko

My current innodb_log_file_buffering is ON.

# ./mariadb-backup --version
./mariadb-backup based on MariaDB server 10.8.6-MariaDB Linux (x86_64)
 
# ./mariadb-backup --backup --parallel=4 --log-copy-interval=1000 --target-dir=/backup/mysql-test/ --rsync

All going well until this bit:

[00] 2022-10-16 12:17:48 Starting prep copy of non-InnoDB tables and files
[00] 2022-10-16 12:17:48 Starting rsync as: rsync -t . --files-from=/var/lib/mysqltmp/xtrabackup_rsyncfiles_pass1 /backup/mysql-test/
[00] 2022-10-16 12:17:48 rsync finished successfully.
[00] 2022-10-16 12:17:48 Finished a prep copy of non-InnoDB tables and files
[00] 2022-10-16 12:17:48 Acquiring BACKUP LOCKS...
[00] 2022-10-16 12:17:48 Starting to backup non-InnoDB tables and files
[00] 2022-10-16 12:17:48 Starting rsync as: rsync -t . --files-from=/var/lib/mysqltmp/xtrabackup_rsyncfiles_pass2 /backup/mysql-test/
[00] 2022-10-16 12:17:48 rsync finished successfully.
[00] 2022-10-16 12:17:48 Finished backing up non-InnoDB tables and files
[01] 2022-10-16 12:17:48 Copying ./aria_log.00000002 to /backup/mysql-test/aria_log.00000002
[01] 2022-10-16 12:17:49         ...done
[01] 2022-10-16 12:17:49 Copying ./aria_log_control to /backup/mysql-test/aria_log_control
[01] 2022-10-16 12:17:49         ...done
[00] 2022-10-16 12:17:49 Waiting for log copy thread to read lsn 10961878316079

Then the Waiting for backup lock started to appear a lot.

Just like it was happening with the other version I tried before (and on the same spot), before changing the interval to 1ms. (I think 10.8.3)

Please let me know if you still want me to test the experiment with innodb_log_file_buffering=OFF,
and if you want me to test with 10.8.2's mariabackup.

Thank you.

Comment by Marko Mäkelä [ 2022-10-17 ]

nunop, thank you for the update. Apparently, you will need --log-copy-interval=1 (1 millisecond) in any case. Possibly the log writing speed of the server was changed due to MDEV-14425 and MDEV-27774 so much that the old polling interval would not work.

A test of 10.8.2 would have been an alternative to testing with this custom build of mariadb-backup.

I see that on 2022-06-28, you reported that the 1-millisecond log copying interval worked for you, apparently even with the unbuffered (O_DIRECT) log file I/O. On the same day, cocorossello reported the following:

I also tried that without much luck. First I tried with log-copy-interval=1, but after 70 minutes I aborted the backup because it didn't finish.

Then I run with log-copy-interval=100 and the results were not good, almost 1 minute of locks.

On 2022-10-05, cocorossello reported the following:

Didn't see much difference with innodb_log_buffering:

10.8.3 server with 10.8.2 backup (base): 4 seconds lock

[00] 2022-10-05 11:58:11 Acquiring BACKUP LOCKS...
[00] 2022-10-05 11:58:15 All tables unlocked

10.8.5 server, innodb_log_file_buffering=OFF (default) and 10.8.2 backup: 2 seconds lock

I wonder which log-copying-interval cocorossello specified to the 10.8.2 backup. Due to MDEV-27919, the default value would have been 1 millisecond instead of the intended 1 second. I believe that with the custom build, the backup with --log-copying-interval=1 (millisecond) would deliver similar results.

Based on what has been reported so far, I assume that the problem would be solved by disabling O_DIRECT in mariadb-backup --backup, and I would think that a 2-second lock is OK. That lock is needed due to other storage engines than InnoDB.

To confirm this, nunop, can you please repeat your experiment with innodb_log_file_buffering=OFF on the server, and similar concurrent workload while executing the backup?

Possibly, the default value of --log-copying-interval should be lowered. Would 100 work reasonably well? Attempting to re-read the last log file block 1,000 times per second (the minimum polling interval of 1 millisecond) would feel too steep to me.

Comment by Nuno [ 2022-10-17 ]

Thanks marko

SET GLOBAL innodb_log_file_buffering=OFF;
 
# ./mariadb-backup --version
./mariadb-backup based on MariaDB server 10.8.6-MariaDB Linux (x86_64)
 
# ./mariadb-backup --backup --parallel=4 --log-copy-interval=1000 --target-dir=/backup/mysql-test/ --rsync

Locked everything again, from here:

[00] 2022-10-17 11:48:41 Starting prep copy of non-InnoDB tables and files
[00] 2022-10-17 11:48:41 Starting rsync as: rsync -t . --files-from=/var/lib/mysqltmp/xtrabackup_rsyncfiles_pass1 /backup/mysql-test/
[00] 2022-10-17 11:48:41 rsync finished successfully.
[00] 2022-10-17 11:48:41 Finished a prep copy of non-InnoDB tables and files
[00] 2022-10-17 11:48:41 Acquiring BACKUP LOCKS...
[00] 2022-10-17 11:48:41 Starting to backup non-InnoDB tables and files
[00] 2022-10-17 11:48:41 Starting rsync as: rsync -t . --files-from=/var/lib/mysqltmp/xtrabackup_rsyncfiles_pass2 /backup/mysql-test/
[00] 2022-10-17 11:48:41 rsync finished successfully.
[00] 2022-10-17 11:48:41 Finished backing up non-InnoDB tables and files
[01] 2022-10-17 11:48:41 Copying ./aria_log.00000002 to /backup/mysql-test/aria_log.00000002
[01] 2022-10-17 11:48:41         ...done
[01] 2022-10-17 11:48:41 Copying ./aria_log_control to /backup/mysql-test/aria_log_control
[01] 2022-10-17 11:48:41         ...done
[00] 2022-10-17 11:48:41 Waiting for log copy thread to read lsn 10970032449773

Then I tried with 100ms:

# ./mariadb-backup --backup --parallel=4 --log-copy-interval=100 --target-dir=/backup/mysql-test/ --rsync

But the locks still happened and was affecting the website again (because it takes longer).
It happened later in the process, though. Maybe because I ran a (partial) backup just before, not sure.

[00] 2022-10-17 11:55:39 Starting prep copy of non-InnoDB tables and files
[00] 2022-10-17 11:55:39 Starting rsync as: rsync -t . --files-from=/var/lib/mysqltmp/xtrabackup_rsyncfiles_pass1 /backup/mysql-test/
[00] 2022-10-17 11:55:39 rsync finished successfully.
[00] 2022-10-17 11:55:39 Finished a prep copy of non-InnoDB tables and files
[00] 2022-10-17 11:55:39 Acquiring BACKUP LOCKS...
[00] 2022-10-17 11:55:39 >> log scanned up to (10969172724194)
[00] 2022-10-17 11:55:39 Starting to backup non-InnoDB tables and files
[00] 2022-10-17 11:55:39 Starting rsync as: rsync -t . --files-from=/var/lib/mysqltmp/xtrabackup_rsyncfiles_pass2 /backup/mysql-test/
[00] 2022-10-17 11:55:39 rsync finished successfully.
[00] 2022-10-17 11:55:39 Finished backing up non-InnoDB tables and files
[01] 2022-10-17 11:55:39 Copying ./aria_log.00000002 to /backup/mysql-test/aria_log.00000002
[00] 2022-10-17 11:55:39 >> log scanned up to (10969174810812)
[00] 2022-10-17 11:55:39 DDL tracking : create 348860 "./mydb1_archive/mytable1_posts_new.ibd"
[00] 2022-10-17 11:55:39 DDL tracking : rename 348858 "./mydb1_archive/mytable1_posts.ibd","./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:39 DDL tracking : rename 348860 "./mydb1_archive/mytable1_posts_new.ibd","./mydb1_archive/mytable1_posts.ibd"
[00] 2022-10-17 11:55:39 DDL tracking : delete 348858 "./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:39 DDL tracking : create 348861 "./mydb1_archive/mytable1_posts_internal_new.ibd"
[00] 2022-10-17 11:55:39 DDL tracking : rename 348859 "./mydb1_archive/mytable1_posts_internal.ibd","./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:39 DDL tracking : rename 348861 "./mydb1_archive/mytable1_posts_internal_new.ibd","./mydb1_archive/mytable1_posts_internal.ibd"
[00] 2022-10-17 11:55:39 DDL tracking : delete 348859 "./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:39 >> log scanned up to (10969176907855)
[01] 2022-10-17 11:55:39         ...done
[01] 2022-10-17 11:55:39 Copying ./aria_log_control to /backup/mysql-test/aria_log_control
[01] 2022-10-17 11:55:39         ...done
[00] 2022-10-17 11:55:39 Waiting for log copy thread to read lsn 10970050879235
[00] 2022-10-17 11:55:39 >> log scanned up to (10969179001072)
[00] 2022-10-17 11:55:39 >> log scanned up to (10969181097554)
[00] 2022-10-17 11:55:39 >> log scanned up to (10969183189061)
[00] 2022-10-17 11:55:39 >> log scanned up to (10969185285265)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969187380764)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969189473614)
[00] 2022-10-17 11:55:40 DDL tracking : create 348862 "./mydb1_archive/mytable1_posts_new.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348860 "./mydb1_archive/mytable1_posts.ibd","./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348862 "./mydb1_archive/mytable1_posts_new.ibd","./mydb1_archive/mytable1_posts.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : delete 348860 "./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : create 348863 "./mydb1_archive/mytable1_posts_internal_new.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348861 "./mydb1_archive/mytable1_posts_internal.ibd","./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348863 "./mydb1_archive/mytable1_posts_internal_new.ibd","./mydb1_archive/mytable1_posts_internal.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : delete 348861 "./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:40 >> log scanned up to (10969191566355)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969193662654)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969195759673)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969197850097)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969199946253)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969202037704)
[00] 2022-10-17 11:55:40 >> log scanned up to (10969204134446)
[00] 2022-10-17 11:55:40 DDL tracking : create 348864 "./mydb1_archive/mytable1_posts_new.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348862 "./mydb1_archive/mytable1_posts.ibd","./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348864 "./mydb1_archive/mytable1_posts_new.ibd","./mydb1_archive/mytable1_posts.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : delete 348862 "./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : create 348865 "./mydb1_archive/mytable1_posts_internal_new.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348863 "./mydb1_archive/mytable1_posts_internal.ibd","./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : rename 348865 "./mydb1_archive/mytable1_posts_internal_new.ibd","./mydb1_archive/mytable1_posts_internal.ibd"
[00] 2022-10-17 11:55:40 DDL tracking : delete 348863 "./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:40 >> log scanned up to (10969206226378)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969208319006)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969210408293)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969212505156)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969214601676)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969216690876)
[00] 2022-10-17 11:55:41 DDL tracking : create 348866 "./mydb1_archive/mytable1_posts_new.ibd"
[00] 2022-10-17 11:55:41 DDL tracking : rename 348864 "./mydb1_archive/mytable1_posts.ibd","./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:41 DDL tracking : rename 348866 "./mydb1_archive/mytable1_posts_new.ibd","./mydb1_archive/mytable1_posts.ibd"
[00] 2022-10-17 11:55:41 DDL tracking : delete 348864 "./mydb1_archive/mytable1_posts_old.ibd"
[00] 2022-10-17 11:55:41 DDL tracking : create 348867 "./mydb1_archive/mytable1_posts_internal_new.ibd"
[00] 2022-10-17 11:55:41 DDL tracking : rename 348865 "./mydb1_archive/mytable1_posts_internal.ibd","./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:41 DDL tracking : rename 348867 "./mydb1_archive/mytable1_posts_internal_new.ibd","./mydb1_archive/mytable1_posts_internal.ibd"
[00] 2022-10-17 11:55:41 DDL tracking : delete 348865 "./mydb1_archive/mytable1_posts_internal_old.ibd"
[00] 2022-10-17 11:55:41 >> log scanned up to (10969218783344)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969220880331)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969222977088)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969225073615)
[00] 2022-10-17 11:55:41 >> log scanned up to (10969227158129)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969229253135)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969231346395)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969233440101)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969235535021)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969237625883)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969239719443)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969241815562)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969243905584)
[00] 2022-10-17 11:55:42 >> log scanned up to (10969246002257)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969248094079)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969250190191)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969252281980)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969254365322)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969256454850)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969258550337)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969260643359)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969262740022)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969264836532)
[00] 2022-10-17 11:55:43 >> log scanned up to (10969266933212)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969269026995)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969271124050)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969273214551)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969275305953)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969277388411)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969279485166)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969281580929)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969283673058)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969285766465)
[00] 2022-10-17 11:55:44 >> log scanned up to (10969287860777)
[00] 2022-10-17 11:55:45 >> log scanned up to (10969289946756)
[00] 2022-10-17 11:55:45 >> log scanned up to (10969292041353)
^C
(killed process)

Comment by Marko Mäkelä [ 2022-10-17 ]

nunop, thank you. It appears that the only thing that works for you is to enable the file system cache for the redo log also in the server while the backup is running. How much log must be copied during backup depends on the type of the workload and also on how long time ago the latest log checkpoint was written.

I will apply a simple fix for now, introducing the parameter --innodb-log-file-buffering to mariadb-backup and setting it ON by default.

A more comprehensive fix might involve the poll() system call. Edit: the Linux inotify family of system calls would allow us to wait for IN_MODIFY events of the server’s ib_logfile0.

This problem would also be solved by MDEV-14992.

Comment by Vicente Rossello Jaume [ 2022-10-18 ]

Marko sorry but I don't really know how to put this mariabackup executable in a Docker container

Comment by Nuno [ 2022-10-18 ]

cocorossello You could put the mariadb-backup file somewhere on your host server, and then mount it as a volume in your docker container (using the "-v" parameter if you use "docker run", or add to the "volumes:" section if you use "docker-compose.yml").

This will require you to restart the container, for it to get the new volume (not sure if it can be done without restarting the container)

Then enter the terminal/shell of your docker container and run something like:

cp /mnt/new-volume/mariadb-backup /root/mariadb-backup
chmod +x /root/mariadb-backup
/root/mariadb-backup (your parameters here)

Comment by Vicente Rossello Jaume [ 2022-10-18 ]

Where can I find the executable? sorry, i'm rather a newbie with deb files and that stuff

Comment by Nuno [ 2022-10-18 ]

Files here: https://hasky.askmonty.org/archive/bb-10.8-release/build-50973/

would it be the "kvm-deb-focal-amd64" one maybe?
https://hasky.askmonty.org/archive/bb-10.8-release/build-50973/kvm-deb-focal-amd64/debs/binary/

https://hasky.askmonty.org/archive/bb-10.8-release/build-50973/kvm-deb-focal-amd64/debs/binary/mariadb-backup_10.8.6+maria~ubu2004_amd64.deb

Then extract the files of the ".deb" file with 7-zip, and go up to the folder "\mariadb-backup_10.8.6+maria_ubu2004_amd64.deb\data.tar\.\usr\bin\"
And there you find the "mariadb-backup" file.

Comment by Vicente Rossello Jaume [ 2022-10-18 ]

Thanks, I was trying to decompress the wrong deb file

Comment by Marko Mäkelä [ 2022-10-18 ]

nunop, thank you for helping out. Yes, it is exactly that build number 50973. Alternatively, you can find the final version at http://hasky.askmonty.org/archive/10.8/build-51011/ or using a later build number.

Comment by Vicente Rossello Jaume [ 2022-10-18 ]

2 second lock with that version of mariadb-backup, innodb_log_file_buffering=OFF on the server and default log-copy-interval. It worked fine for us

Generated at Thu Feb 08 10:03:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.