Skip to content

feat(logging): add Logger interface and default logger (2/6)#723

Merged
wgtmac merged 12 commits into
apache:mainfrom
kamcheungting-db:logging-block2-interface
Jun 22, 2026
Merged

feat(logging): add Logger interface and default logger (2/6)#723
wgtmac merged 12 commits into
apache:mainfrom
kamcheungting-db:logging-block2-interface

Conversation

@kamcheungting-db

@kamcheungting-db kamcheungting-db commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Part 2 of the logging stack (builds on #722). Adds the logging API and a swappable default logger — the foundation the backends and macros plug into.

What's here

  • Logger: the pluggable sink interface (ShouldLog / Log / SetLevel / Flush / Initialize). ShouldLog() is the single source of truth for runtime filtering.
  • LogMessage owns its formatted text so a sink can safely keep a record; reserves an attributes field for future structured logging.
  • Process-global default logger: GetDefaultLogger / SetDefaultLogger / SetDefaultLevel, with a lock-free thread-local fast path so logging stays cheap.
  • Initialize applies the level property, so config-driven levels actually work. CurrentLogger() is safe to call even from a thread_local destructor during thread shutdown.
  • logger.h stays backend-agnostic (never includes the build config header), so consumers see one stable API regardless of backend.

Examples — using the API directly (the LOG_* macros that wrap it arrive in #725):

// A custom sink, installed as the process default.
class MySink : public Logger {
 public:
  bool ShouldLog(LogLevel level) const override { return level >= level_; }
  void Log(LogMessage&& m) noexcept override { write_line(m.message); }
  void SetLevel(LogLevel level) override { level_ = level; }
  LogLevel level() const override { return level_; }
 private:
  std::atomic<LogLevel> level_{LogLevel::kInfo};
};

SetDefaultLogger(std::make_shared<MySink>());   // install process-wide
SetDefaultLevel(LogLevel::kDebug);              // adjust the threshold

auto logger = GetDefaultLogger();               // borrow the current default
if (logger->ShouldLog(LogLevel::kInfo)) {
  logger->Log(LogMessage{.level = LogLevel::kInfo, .message = "scan ready"});
}

// Or configure from catalog-style properties (applies the "level" key):
auto sink = std::make_shared<MySink>();
auto status = sink->Initialize({{std::string(kLevelProperty), "warn"}});  // -> kWarn

The same example is documented inline in logger.h.

Testslogger_test: default-logger API, level-from-property, invalid level rejected, concurrent swap/read, and logging during thread teardown. Built and run with clang/libc++ (spdlog ON and OFF).

This pull request and its description were written by Isaac.

@kamcheungting-db kamcheungting-db changed the title [Iceberg Logger] [Part-2] Logger Interface feat: [Iceberg Logger] [Part-2] Logger Interface Jun 10, 2026
@kamcheungting-db kamcheungting-db changed the title feat: [Iceberg Logger] [Part-2] Logger Interface feat(logging): add Logger interface and default logger Jun 11, 2026
@kamcheungting-db kamcheungting-db changed the title feat(logging): add Logger interface and default logger feat(logging): add Logger interface and default logger (2/6) Jun 11, 2026
@kamcheungting-db kamcheungting-db force-pushed the logging-block2-interface branch 7 times, most recently from fd8c3b4 to c2b6a92 Compare June 16, 2026 03:27
@kamcheungting-db kamcheungting-db force-pushed the logging-block2-interface branch from c2b6a92 to cf09b69 Compare June 16, 2026 03:40
Second block: the backend-agnostic logging API and the swappable default logger.

- Logger: pure-virtual sink (ShouldLog/Log/SetLevel/level/Flush/IsNoop), with
  ShouldLog() as the single authority for runtime filtering and a documented
  no-reentrancy / thread-safety / noexcept contract. Mirrors MetricsReporter.
- LogMessage owns its formatted text (moved in) so a sink may retain records;
  reserves an owning attributes vector for future structured logging without an
  ABI break. logger.h is backend-agnostic -- it never includes the build config
  header nor references the spdlog feature macro, so consumers see one stable API.
- Process-global default logger: GetDefaultLogger / SetDefaultLogger /
  SetDefaultLevel, plus a lock-free thread-local generation cache on the hot path
  (no lock or refcount churn in steady state; slot mutex only on swap/refresh).
  NoopLogger is the placeholder default here; CerrLogger and the spdlog backend
  arrive in the following blocks.

Builds the source into both CMake and Meson and installs logger.h alongside
log_level.h; logger_test covers the default-logger API, level lowering taking
effect immediately, and concurrent swap/read safety.

Co-authored-by: Isaac
@kamcheungting-db kamcheungting-db force-pushed the logging-block2-interface branch from cf09b69 to 246cab6 Compare June 16, 2026 03:58
Comment thread src/iceberg/test/logger_test.cc Outdated
Comment on lines +103 to +104
(void)l->ShouldLog(LogLevel::kError);
(void)GetDefaultLogger();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(void)l->ShouldLog(LogLevel::kError);
(void)GetDefaultLogger();
std::ignore = l->ShouldLog(LogLevel::kError);
std::ignore = GetDefaultLogger();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/iceberg/test/logger_test.cc Outdated
auto status =
logger.Initialize({{std::string(kLevelProperty), std::string("not-a-level")}});
ASSERT_FALSE(status.has_value());
EXPECT_EQ(status.error().kind, ErrorKind::kInvalidArgument);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use IsError matcher.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/iceberg/logging/logger.h Outdated
/// Both key and value are owned so a sink may retain the record safely.
/// Unused in v1; reserved so structured logging can be added without an ABI
/// break to LogMessage.
struct LogAttribute {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add ICEBERG_EXPORT to this and below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/iceberg/logging/logger.cc
Comment thread src/iceberg/logging/logger.h Outdated
}

/// \brief Cheap check whether a record at \p level would be emitted.
virtual bool ShouldLog(LogLevel level) const = 0;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc says logger implementations must not throw. ShouldLog() is also in the hot path.

But the virtual method is not noexcept. A custom engine logger may throw here by mistake. Then logging can throw from the filter path. Adding noexcept later is also an API/ABI change.

Can we mark ShouldLog() noexcept now? Maybe SetLevel() and level() too, if we want the same rule for all logger methods.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/iceberg/logging/logger.h Outdated
#pragma once

/// \file iceberg/logging/logger.h
/// \brief Pluggable logging interface, the process-global default logger, and

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small doc issue. This header says it has logging macros, but this PR does not add them.

The PR text says macros are in a later block. Users who read the installed header may be confused.

Maybe say this header has the logger API and macro helper hooks. Or move this sentence to the later PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread src/iceberg/logging/logger.h
Comment thread src/iceberg/logging/logger.h
Comment thread src/iceberg/logging/logger.h
Resolves @wgtmac's review comments on PR apache#723:

- Mark ShouldLog()/SetLevel()/level() noexcept so a custom logger cannot throw
  from the hot filter path; propagated to NoopLogger and CapturingLogger.
- Make CurrentLogger() safe during thread teardown without UB: the per-thread
  cache is reached via a trivially-destructible thread_local pointer to a
  never-freed heap slot (kept LSan-reachable by a registry), so logging from a
  thread_local destructor in any order is safe. Replaces the AliveFlag guard.
- Add ICEBERG_EXPORT to LogAttribute/LogMessage, matching Error/ScanReport.
- Trim the logger.h \file brief (the macros it referenced arrive in PR apache#725).
- Tests: use the IsError matcher and std::ignore; strengthen the teardown test
  to log from a thread_local destroyed after the cache.

Co-authored-by: Isaac
Addresses @wgtmac's review comment on PR apache#723 asking for a std::format-style API
that builds the string only after the level check.

- A single overloaded iceberg::Log():
    Log(level, "fmt {}", args...)           -> process-default logger
    Log(logger, level, "fmt {}", args...)   -> explicit logger
  disambiguated by the first argument.
- Formats only after ShouldLog() (no wasted work when disabled) and never throws
  -- a formatting failure routes to EmitFormatError, matching the macros.
- The call-site source_location is captured via a small consteval FmtWithLoc
  wrapper (the std::print technique), preserving compile-time format-string
  checking without a macro.

The sink interface (Logger::Log(LogMessage&&)) intentionally stays
pre-formatted: passing std::format_args across the virtual boundary is
lifetime-unsafe and would break safe record retention. The format-style win is
delivered at the call layer instead.

Co-authored-by: Isaac
Addresses @wgtmac's review comment on PR apache#723 that LogMessage::attributes had no
easy fill path.

Adds a fluent LogMessage::Builder so engine loggers can attach structured fields
(query id, task id, table name, snapshot id, file path, ...):

  auto record = LogMessage::Builder(LogLevel::kInfo)
                    .Message("scan finished")
                    .Attribute("table", table_name)
                    .Attribute("snapshot_id", std::to_string(id))
                    .Build();
  logger->Log(std::move(record));

- Message() sets the formatted text; Attribute() appends an owned key/value;
  Location() overrides the source location (defaults to the build site).
- Build() moves the accumulated state into a LogMessage.

This defines how attributes get populated, so structured logging can grow
without an ABI break.

Co-authored-by: Isaac
Replaces the intentional permanent per-thread leak with a cache that is freed at
thread exit, while staying safe to call CurrentLogger() from any thread_local
destructor during teardown.

The per-thread state is split by lifetime:
- `dead` (bool) and `cache` (pointer) are trivially destructible, so their
  storage stays valid for the whole thread and is readable from any other
  thread_local's destructor, in any destruction order.
- A `guard` (the only piece with a destructor) frees the cache and sets `dead`
  at thread exit. A thread_local destroyed after it reads `dead == true` and
  falls back to an immortal no-op; one destroyed before it still sees a live
  cache. Either order is safe.

This is the correct version of the earlier AliveFlag attempt, which was UB
because the flag object itself had a destructor (reading it after that
destructor ran was the very use-after-lifetime it tried to prevent). Drops the
LSan-reachability registry, which is no longer needed now that nothing leaks.

Validated with a dedicated stress harness under ThreadSanitizer and
AddressSanitizer (no leaks): logging from a thread_local destroyed after the
cache; logging interleaved with concurrent SetDefaultLogger swaps and thread
teardown; and a thread whose only logging happens in a thread_local destructor.
Adds ConcurrentTeardownAndSwapIsSafe to cover the combined path in CI.

Co-authored-by: Isaac
Bind a logger for the current thread + scope so the default path
(CurrentLogger/Log(level,...)/LOG_* macros) routes to it, with the global
default as fallback. Engines can route Iceberg's own internal logs into a
per catalog/session/query/task context without touching call sites.
@kamcheungting-db kamcheungting-db requested a review from wgtmac June 21, 2026 10:16
///
/// Example: `iceberg::Log(LogLevel::kInfo, "loaded {} files", n);`
template <typename... Args>
void Log(LogLevel level, internal::FmtWithLoc<std::type_identity_t<Args>...> fmt,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this API never throws, but the public Log overload itself is not marked noexcept. The helper is noexcept, so the contract is only missing at the public boundary.

Can we mark both public Log overloads noexcept to make the no-throw contract explicit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Now, both public Log() overloads are now noexcept.

Comment thread src/iceberg/logging/logger.h Outdated
/// Location() (e.g. to forward a caller's std::source_location).
class ICEBERG_EXPORT LogMessage::Builder {
public:
explicit Builder(LogLevel level) : level_(level) {}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default location is not the user call site. Since location_ is initialized in the header, Builder(...).Build() records logger.h instead of the caller file/line.

Can we take std::source_location in the Builder constructor, with current() as the default argument, and add a test that checks the recorded line is the call site?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

The doc promised "never throws" but only the internal FormatAndEmit helper
was noexcept; the public boundary was missing it. Both overloads only call
noexcept functions, so the contract is now explicit.

Co-authored-by: Isaac
@kamcheungting-db kamcheungting-db requested a review from wgtmac June 22, 2026 00:14
Comment thread src/iceberg/logging/logger.h Outdated
/// Location() (e.g. to forward a caller's std::source_location).
class ICEBERG_EXPORT LogMessage::Builder {
public:
explicit Builder(LogLevel level)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still gives the wrong default location. Now the test makes the default point to logger.h / Builder, but users will expect the record to point to their call site.

Can we make the constructor take std::source_location with current() as a default argument, and test that the default location is the Builder call site?

Example:

explicit Builder(LogLevel level,
std::source_location location = std::source_location::current())
: level_(level), location_(location) {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

explicit Builder(LogLevel level,
                  std::source_location location = std::source_location::current())
     : level_(level), location_(location) {}
TEST(LoggerTest, BuilderDefaultLocationIsCallerSite) {
  auto here = std::source_location::current();
  auto record = LogMessage::Builder(LogLevel::kInfo).Message("m").Build();
  EXPECT_STREQ(record.location.file_name(), here.file_name());
  EXPECT_EQ(record.location.line(), here.line() + 1);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Take std::source_location as a defaulted constructor argument so Builder(level)
records the caller's file/line by default; current() in the init-list would
record logger.h instead. Add tests for the default caller-site location and the
Location() override.

Co-authored-by: Isaac
@kamcheungting-db kamcheungting-db force-pushed the logging-block2-interface branch from b35fc10 to d1223eb Compare June 22, 2026 08:18
@kamcheungting-db kamcheungting-db requested a review from wgtmac June 22, 2026 08:22

@wgtmac wgtmac left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @kamcheungting-db!

@wgtmac wgtmac merged commit a38e0cd into apache:main Jun 22, 2026
20 checks passed
wgtmac pushed a commit that referenced this pull request Jun 24, 2026
Part 3 of the logging stack — now based on `main` (#723 merged). Adds
the first concrete sink: a dependency-free `std::cerr` logger, also the
default backend when spdlog is off.

**What's here**
- `CerrLogger` writes one line per record: `YYYY-MM-DDThh:mm:ss.mmmZ
LEVEL [tid] [file:line] message`.
- UTC millisecond timestamps, OS-native cached thread id (extracted to
`util/thread_util.h`), lock-free level check, and a whole-line mutex so
concurrent records never interleave.
- Pure standard library, always available. `Log`/`Flush` never throw.
Inherits the base `Initialize("level")`; `pattern` is ignored (fixed
layout; only the spdlog backend honors `pattern`).

**Tests** (`cerr_logger_test`): level filtering, the fixed line layout
incl. the `.mmm` field and `[file:line]`, inherited `Initialize`,
`Flush`, never-throw when the sink itself throws, and concurrent-write
non-interleaving. Built/run under clang-libc++; ASan + TSan clean.

This pull request and its description were written by Isaac.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants