MDEV-37605 Extend mariadb-binlog to convert innodb based binary logs to legacy#5276
MDEV-37605 Extend mariadb-binlog to convert innodb based binary logs to legacy#5276tarunw07 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the --convert-engine-binlog option to mariadb-binlog for converting InnoDB-based engine binlog files into legacy-format binary logs, along with helper functions for event serialization and new test suites. The code review identified several critical issues, including potential NULL pointer dereferences of gtid_state, fdev, gtid_set, and list, as well as a memory leak on write failures in write_event_to_legacy_binlog. Additionally, the feedback highlights return type mismatches, inconsistent boolean usage, and multiple coding standard violations regarding spacing after if keywords.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (ev->get_type_code() == GTID_EVENT) { | ||
| rpl_gtid ev_gtid; | ||
| Gtid_log_event *gle= (Gtid_log_event*) ev; | ||
| ev_gtid= {gle->domain_id, gle->server_id, gle->seq_no}; | ||
|
|
||
| gtid_state->update_nolock(&ev_gtid); | ||
| } |
There was a problem hiding this comment.
If gtid_state is not initialized (e.g., if the conversion starts or encounters a GTID_EVENT before gtid_state is allocated), dereferencing it via gtid_state->update_nolock(&ev_gtid) will cause a crash. We should ensure gtid_state is initialized or add a defensive NULL check.
if (ev->get_type_code() == GTID_EVENT) {
rpl_gtid ev_gtid;
Gtid_log_event *gle= (Gtid_log_event*) ev;
ev_gtid= {gle->domain_id, gle->server_id, gle->seq_no};
if (gtid_state)
gtid_state->update_nolock(&ev_gtid);
}| bool | ||
| Gtid_list_log_event::to_packet(String *packet) | ||
| { | ||
| uint32 i; | ||
| uchar *p; | ||
| uint32 needed_length; | ||
|
|
||
| DBUG_ASSERT(count < 1<<28); | ||
|
|
||
| needed_length= packet->length() + get_data_size(); | ||
| if (packet->reserve(needed_length)) | ||
| return true; | ||
| p= (uchar *)packet->ptr() + packet->length();; | ||
| packet->length(needed_length); | ||
| int4store(p, (count & ((1<<28)-1)) | gl_flags); | ||
| p += 4; | ||
| /* Initialise the padding for empty Gtid_list. */ | ||
| if (count == 0) | ||
| int2store(p, 0); | ||
| for (i= 0; i < count; ++i) | ||
| { | ||
| int4store(p, list[i].domain_id); | ||
| int4store(p+4, list[i].server_id); | ||
| int8store(p+8, list[i].seq_no); | ||
| p += 16; | ||
| } | ||
|
|
||
| return false; | ||
| } |
There was a problem hiding this comment.
If list is NULL (which can happen if my_malloc failed during construction or if the event is invalid), dereferencing list[i] in the loop will cause a crash. We should check is_valid() or list != NULL before proceeding. This also cleans up a double semicolon typo on line 2850.
bool
Gtid_list_log_event::to_packet(String *packet)
{
uint32 i;
uchar *p;
uint32 needed_length;
if (!is_valid())
return true;
DBUG_ASSERT(count < 1<<28);
needed_length= packet->length() + get_data_size();
if (packet->reserve(needed_length))
return true;
p= (uchar *)packet->ptr() + packet->length();
packet->length(needed_length);
int4store(p, (count & ((1<<28)-1)) | gl_flags);
p += 4;
/* Initialise the padding for empty Gtid_list. */
if (count == 0)
int2store(p, 0);
for (i= 0; i < count; ++i)
{
int4store(p, list[i].domain_id);
int4store(p+4, list[i].server_id);
int8store(p+8, list[i].seq_no);
p += 16;
}
return false;
}| Gtid_list_log_event::Gtid_list_log_event(rpl_binlog_state_base *gtid_set) | ||
| : count(gtid_set->count_nolock()), gl_flags(0), list(0), sub_id_list(0) | ||
| { | ||
| cache_type= EVENT_NO_CACHE; | ||
| /* Failure to allocate memory will be caught by is_valid() returning false. */ | ||
| if (count < (1<<28) && | ||
| (list = (rpl_gtid *)my_malloc(PSI_INSTRUMENT_ME, | ||
| count * sizeof(*list) + (count == 0), MYF(MY_WME)))) | ||
| { | ||
| gtid_set->get_gtid_list_nolock(list, count); | ||
| } | ||
| } |
There was a problem hiding this comment.
If gtid_set is NULL, calling gtid_set->count_nolock() in the initializer list will result in a NULL pointer dereference and crash. We should add a defensive check to handle a NULL gtid_set gracefully.
Gtid_list_log_event::Gtid_list_log_event(rpl_binlog_state_base *gtid_set)
: count(gtid_set ? gtid_set->count_nolock() : 0), gl_flags(0), list(0), sub_id_list(0)
{
cache_type= EVENT_NO_CACHE;
/* Failure to allocate memory will be caught by is_valid() returning false. */
if (gtid_set && count < (1<<28) &&
(list = (rpl_gtid *)my_malloc(PSI_INSTRUMENT_ME,
count * sizeof(*list) + (count == 0), MYF(MY_WME))))
{
gtid_set->get_gtid_list_nolock(list, count);
}
}5caf469 to
e38182e
Compare
e38182e to
cfd3e68
Compare
Summary
This is an initial work-in-progress patch for MDEV-37605: Convert InnoDB Binary Logs to Legacy Format.
The goal of this work is to extend
mariadb-binlogso that it can read InnoDB-based binary log files (.ibb) and generate legacy-format binary log files as output. This is intended to support compatibility use cases such as downgrades, migrations, and tooling that expects the legacy binlog format.The current approach is to avoid fully deserializing and reserializing regular events. Since most event contents are already encoded in the
.ibbfile, the converter should mostly copy event bytes into the generated legacy binlog, while handlingFORMAT_DESCRIPTION_EVENT,GTID_LIST_EVENT,BINLOG_CHECKPOINT_EVENT&ROTATE_EVENT.Current status
This PR currently includes:
--convert-engine-binlogFORMAT_DESCRIPTION_EVENThandlingGTID_LIST_EVENTBINLOG_CHECKPOINT_EVENTend_log_poscorrectly in generated legacy binlog events.ibbfile, converting it to legacy binlog format, and checking the converted outputWork still in progress
This PR is not ready for final review yet. I am opening it early so that discussion, progress tracking, CI feedback, and initial code review can happen on the PR.
The main remaining pieces are:
ROTATE_EVENTFSP_BINLOG_TYPE_GTID_STATEfrom the first page of the.ibbfile to initialize the converter’s GTID state.FORMAT_DESCRIPTION_EVENThandlingDesign notes
The current design direction is:
FORMAT_DESCRIPTION_EVENT,GTID_LIST_EVENT,BINLOG_CHECKPOINT_EVENT&ROTATE_EVENT) that must be synthesized for legacy binlog output.FSP_BINLOG_TYPE_GTID_STATEfrom the first page of the.ibbfile to initialize the converter’s GTID state.rpl_binlog_state_base, so thatGTID_LIST_EVENTcan be generated correctly at the beginning of each output legacy binlog file.BINLOG_CHECKPOINT_EVENTnear the beginning of each generated legacy binlog file.end_log_posfor all the generated legacy binlog events.