Skip to content

MDEV-37605 Extend mariadb-binlog to convert innodb based binary logs to legacy#5276

Draft
tarunw07 wants to merge 1 commit into
MariaDB:mainfrom
tarunw07:MDEV-37605-convert-innodb-binlog-to-legacy
Draft

MDEV-37605 Extend mariadb-binlog to convert innodb based binary logs to legacy#5276
tarunw07 wants to merge 1 commit into
MariaDB:mainfrom
tarunw07:MDEV-37605-convert-innodb-binlog-to-legacy

Conversation

@tarunw07

@tarunw07 tarunw07 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

This is an initial work-in-progress patch for MDEV-37605: Convert InnoDB Binary Logs to Legacy Format.

The goal of this work is to extend mariadb-binlog so that it can read InnoDB-based binary log files (.ibb) and generate legacy-format binary log files as output. This is intended to support compatibility use cases such as downgrades, migrations, and tooling that expects the legacy binlog format.

The current approach is to avoid fully deserializing and reserializing regular events. Since most event contents are already encoded in the .ibb file, the converter should mostly copy event bytes into the generated legacy binlog, while handling FORMAT_DESCRIPTION_EVENT, GTID_LIST_EVENT, BINLOG_CHECKPOINT_EVENT & ROTATE_EVENT.

Current status

This PR currently includes:

  • New option: --convert-engine-binlog
  • Initial legacy binlog writer/conversion path
  • Basic FORMAT_DESCRIPTION_EVENT handling
  • Generate/write GTID_LIST_EVENT
  • Generate/write BINLOG_CHECKPOINT_EVENT
  • Update end_log_pos correctly in generated legacy binlog events
  • Basic test coverage for generating a .ibb file, converting it to legacy binlog format, and checking the converted output

Work still in progress

This PR is not ready for final review yet. I am opening it early so that discussion, progress tracking, CI feedback, and initial code review can happen on the PR.

The main remaining pieces are:

  • Handle ROTATE_EVENT
  • Use the initial FSP_BINLOG_TYPE_GTID_STATE from the first page of the .ibb file to initialize the converter’s GTID state.
  • Finalize minor parts of FORMAT_DESCRIPTION_EVENT handling
  • Add more complete test coverage

Design notes

The current design direction is:

  • Copy regular events byte-by-byte where possible.
  • Add serializer/generation logic for events(FORMAT_DESCRIPTION_EVENT, GTID_LIST_EVENT, BINLOG_CHECKPOINT_EVENT & ROTATE_EVENT) that must be synthesized for legacy binlog output.
  • Use the initial FSP_BINLOG_TYPE_GTID_STATE from the first page of the .ibb file to initialize the converter’s GTID state.
  • Maintain GTID state during conversion, using rpl_binlog_state_base, so that GTID_LIST_EVENT can be generated correctly at the beginning of each output legacy binlog file.
  • Write a synthetic/dummy BINLOG_CHECKPOINT_EVENT near the beginning of each generated legacy binlog file.
  • Update end_log_pos for all the generated legacy binlog events.
  • Rotate output legacy binlogs when needed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the --convert-engine-binlog option to mariadb-binlog for converting InnoDB-based engine binlog files into legacy-format binary logs, along with helper functions for event serialization and new test suites. The code review identified several critical issues, including potential NULL pointer dereferences of gtid_state, fdev, gtid_set, and list, as well as a memory leak on write failures in write_event_to_legacy_binlog. Additionally, the feedback highlights return type mismatches, inconsistent boolean usage, and multiple coding standard violations regarding spacing after if keywords.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread client/mysqlbinlog.cc
Comment on lines +1951 to +1957
if (ev->get_type_code() == GTID_EVENT) {
rpl_gtid ev_gtid;
Gtid_log_event *gle= (Gtid_log_event*) ev;
ev_gtid= {gle->domain_id, gle->server_id, gle->seq_no};

gtid_state->update_nolock(&ev_gtid);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If gtid_state is not initialized (e.g., if the conversion starts or encounters a GTID_EVENT before gtid_state is allocated), dereferencing it via gtid_state->update_nolock(&ev_gtid) will cause a crash. We should ensure gtid_state is initialized or add a defensive NULL check.

  if (ev->get_type_code() == GTID_EVENT) {
    rpl_gtid ev_gtid;
    Gtid_log_event *gle= (Gtid_log_event*) ev;
    ev_gtid= {gle->domain_id, gle->server_id, gle->seq_no};

    if (gtid_state)
      gtid_state->update_nolock(&ev_gtid);
  }

Comment thread client/mysqlbinlog.cc
Comment thread sql/log_event.cc
Comment on lines +2838 to +2866
bool
Gtid_list_log_event::to_packet(String *packet)
{
uint32 i;
uchar *p;
uint32 needed_length;

DBUG_ASSERT(count < 1<<28);

needed_length= packet->length() + get_data_size();
if (packet->reserve(needed_length))
return true;
p= (uchar *)packet->ptr() + packet->length();;
packet->length(needed_length);
int4store(p, (count & ((1<<28)-1)) | gl_flags);
p += 4;
/* Initialise the padding for empty Gtid_list. */
if (count == 0)
int2store(p, 0);
for (i= 0; i < count; ++i)
{
int4store(p, list[i].domain_id);
int4store(p+4, list[i].server_id);
int8store(p+8, list[i].seq_no);
p += 16;
}

return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If list is NULL (which can happen if my_malloc failed during construction or if the event is invalid), dereferencing list[i] in the loop will cause a crash. We should check is_valid() or list != NULL before proceeding. This also cleans up a double semicolon typo on line 2850.

bool
Gtid_list_log_event::to_packet(String *packet)
{
  uint32 i;
  uchar *p;
  uint32 needed_length;

  if (!is_valid())
    return true;

  DBUG_ASSERT(count < 1<<28);

  needed_length= packet->length() + get_data_size();
  if (packet->reserve(needed_length))
    return true;
  p= (uchar *)packet->ptr() + packet->length();
  packet->length(needed_length);
  int4store(p, (count & ((1<<28)-1)) | gl_flags);
  p += 4;
  /* Initialise the padding for empty Gtid_list. */
  if (count == 0)
    int2store(p, 0);
  for (i= 0; i < count; ++i)
  {
    int4store(p, list[i].domain_id);
    int4store(p+4, list[i].server_id);
    int8store(p+8, list[i].seq_no);
    p += 16;
  }

  return false;
}

Comment thread sql/log_event_client.cc
Comment on lines +2362 to +2373
Gtid_list_log_event::Gtid_list_log_event(rpl_binlog_state_base *gtid_set)
: count(gtid_set->count_nolock()), gl_flags(0), list(0), sub_id_list(0)
{
cache_type= EVENT_NO_CACHE;
/* Failure to allocate memory will be caught by is_valid() returning false. */
if (count < (1<<28) &&
(list = (rpl_gtid *)my_malloc(PSI_INSTRUMENT_ME,
count * sizeof(*list) + (count == 0), MYF(MY_WME))))
{
gtid_set->get_gtid_list_nolock(list, count);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If gtid_set is NULL, calling gtid_set->count_nolock() in the initializer list will result in a NULL pointer dereference and crash. We should add a defensive check to handle a NULL gtid_set gracefully.

Gtid_list_log_event::Gtid_list_log_event(rpl_binlog_state_base *gtid_set)
  : count(gtid_set ? gtid_set->count_nolock() : 0), gl_flags(0), list(0), sub_id_list(0)
{
  cache_type= EVENT_NO_CACHE;
  /* Failure to allocate memory will be caught by is_valid() returning false. */
  if (gtid_set && count < (1<<28) &&
      (list = (rpl_gtid *)my_malloc(PSI_INSTRUMENT_ME,
                          count * sizeof(*list) + (count == 0), MYF(MY_WME))))
  {
    gtid_set->get_gtid_list_nolock(list, count);
  }
}

Comment thread client/mysqlbinlog.cc Outdated
Comment thread client/mysqlbinlog.cc
Comment thread client/mysqlbinlog.cc Outdated
Comment thread client/mysqlbinlog.cc Outdated
Comment thread client/mysqlbinlog.cc Outdated
Comment thread client/mysqlbinlog.cc Outdated
@ParadoxV5 ParadoxV5 marked this pull request as draft June 25, 2026 05:22
@ParadoxV5 ParadoxV5 added GSoC External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. Replication Patches involved in replication labels Jun 25, 2026
@tarunw07 tarunw07 force-pushed the MDEV-37605-convert-innodb-binlog-to-legacy branch from 5caf469 to e38182e Compare June 25, 2026 18:46
@tarunw07 tarunw07 force-pushed the MDEV-37605-convert-innodb-binlog-to-legacy branch from e38182e to cfd3e68 Compare June 25, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. GSoC Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

3 participants