Skip to content

Metastore static#1704

Merged
nikhilsinhaparseable merged 2 commits into
mainfrom
metastore-static
Jun 26, 2026
Merged

Metastore static#1704
nikhilsinhaparseable merged 2 commits into
mainfrom
metastore-static

Conversation

@parmesant

@parmesant parmesant commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Modifications to introduce a new static variable which holds the Metastore implementation at runtime

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • Bug Fixes
    • Improved stream metadata handling across catalog, query, storage, and hot-tier operations for more consistent behavior.
    • Fixed migration-related metadata lookups so upgrades and snapshot updates can use the correct stream state.
    • Updated startup and metadata access flow to reuse shared metastore state more reliably.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The PR adds a shared metastore singleton during startup, extends get_stream_json with a migration flag, and updates catalog, query, storage, HTTP hot-tier, migration, and enterprise metadata reads to the new signature.

Changes

Shared metastore access

Layer / File(s) Summary
Startup wiring and exports
src/lib.rs, src/parseable/mod.rs
cli and clap are exported from the crate root, and PARSEABLE now initializes one shared METASTORE and reuses it during startup.
Metastore contract and migration fetch
src/metastore/metastore_traits.rs, src/metastore/metastores/object_store_metastore.rs, src/migration/mod.rs
Metastore::get_stream_json accepts is_migration, is_missing_optional_dir is public, and migration metadata reads pass true.
Stream metadata readers
src/catalog/mod.rs, src/enterprise/utils.rs, src/handlers/http/logstream.rs, src/query/mod.rs, src/query/stream_schema_provider.rs
Catalog, query, HTTP hot-tier, and enterprise code pass the updated get_stream_json arguments when loading stream JSON.
Storage metadata update helpers
src/storage/object_storage.rs
The object-storage update helpers for partition limits, custom partitions, source metadata, labels, timestamps, stats, and retention all use the new get_stream_json call shape.
Upsert and stream creation paths
src/storage/object_storage.rs
The fallback path in upsert_stream_metadata and the querier/ingestor creation flows use the updated call signature when loading stream metadata.

Sequence Diagram(s)

sequenceDiagram
  participant PARSEABLE
  participant METASTORE
  participant Cli
  participant ObjectStoreMetastore
  participant ParseableNew as Parseable::new

  PARSEABLE->>METASTORE: get_or_init(shared Arc<dyn Metastore>)
  METASTORE->>Cli: try_parse()
  METASTORE->>ObjectStoreMetastore: construct shared metastore
  PARSEABLE->>ParseableNew: metastore.clone()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • parseablehq/parseable#1453 — Touches the same hot-tier HTTP endpoints in src/handlers/http/logstream.rs with related ObjectStoreFormat handling changes.

Suggested labels

for next release

Suggested reviewers

  • nikhilsinhaparseable

Poem

A rabbit hopped through METASTORE light,
With bools and streams all tucked just right.
The carrots nodded, “Hop, hop, hooray!”
For shared metadata all in one place today. 🐰

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description only states the high-level goal and leaves the required template sections and checklist items unfilled. Add Fixes #XXXX if applicable, complete the Description section with goal, approach, and key changes, and fill the checklist items.
Title check ❓ Inconclusive The title is related to the change, but it's too vague to clearly convey the main addition. Use a more specific title like "Introduce runtime Metastore static" to state the primary change.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch metastore-static

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/metastore/metastores/object_store_metastore.rs (1)

868-887: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Wire the migration flag or remove it.

src/migration/mod.rs Line 310 now passes true, but this implementation names the parameter _is_migration and fetches the same path regardless. That makes the new migration flag a no-op.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/metastore/metastores/object_store_metastore.rs` around lines 868 - 887,
The migration flag is currently ignored in get_stream_json, so the true value
passed from the migration flow has no effect. Update get_stream_json to use the
is_migration state when choosing the object path, or remove the parameter if the
same path is always correct; make sure the behavior matches the call site in
migration/mod.rs and that the path logic in get_base/stream_json_path reflects
the intended migration-specific lookup.
src/parseable/mod.rs (1)

140-183: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Parse CLI once before initializing the metastore.

prompt_missing_envs, Cli::try_parse, and save_collected_envs run inside METASTORE.get_or_init and then again immediately after. This duplicates startup side effects and risks constructing the shared metastore from a different parse snapshot than the Parseable instance.

Proposed shape
 pub static PARSEABLE: Lazy<Parseable> = Lazy::new(|| {
-    let metastore = METASTORE.get_or_init(|| {
-        let collected_envs = crate::interactive::prompt_missing_envs();
-
-        let cli = match Cli::try_parse() {
-            Ok(cli) => {
-                crate::interactive::save_collected_envs(&collected_envs);
-                cli
-            }
-            Err(e) => {
-                e.exit();
-            }
-        };
-        let storage = match &cli.storage {
-            StorageOptions::Local(args) => args.storage.construct_client(),
-            StorageOptions::S3(args) => args.storage.construct_client(),
-            StorageOptions::Blob(args) => args.storage.construct_client(),
-            StorageOptions::Gcs(args) => args.storage.construct_client(),
-        };
-        tracing::warn!("creating objectstoremetastore");
-        let metastore = ObjectStoreMetastore { storage };
-        Arc::new(metastore)
-    });
-
     let collected_envs = crate::interactive::prompt_missing_envs();
 
     let cli = match Cli::try_parse() {
         Ok(cli) => {
             crate::interactive::save_collected_envs(&collected_envs);
@@
             e.exit();
         }
     };
+
+    let metastore = METASTORE.get_or_init(|| {
+        let storage = match &cli.storage {
+            StorageOptions::Local(args) => args.storage.construct_client(),
+            StorageOptions::S3(args) => args.storage.construct_client(),
+            StorageOptions::Blob(args) => args.storage.construct_client(),
+            StorageOptions::Gcs(args) => args.storage.construct_client(),
+        };
+        Arc::new(ObjectStoreMetastore { storage }) as Arc<dyn Metastore>
+    });
+
     match cli.storage {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/parseable/mod.rs` around lines 140 - 183, Parse the CLI and prompt/save
env vars only once before calling METASTORE.get_or_init, since
`prompt_missing_envs`, `Cli::try_parse`, and `save_collected_envs` are currently
duplicated and can produce inconsistent startup state. Move that shared setup
out of the `get_or_init` closure, then pass the already-parsed `cli` into the
metastore construction path that creates `ObjectStoreMetastore` and
`storage.construct_client()`.
🧹 Nitpick comments (2)
src/metastore/metastore_traits.rs (1)

222-233: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Document the new migration contract.

The trait docs still only define get_base; callers and implementations have no contract for what is_migration changes. Add a short doc sentence here so the object-store implementation and migration call sites stay aligned.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/metastore/metastore_traits.rs` around lines 222 - 233, The trait docs for
get_stream_json only describe get_base and do not explain the new is_migration
contract. Update the doc comment on MetastoreTraits::get_stream_json to briefly
state what is_migration changes for callers and implementations, so object-store
behavior and migration call sites stay aligned. Keep the wording near the
existing get_base explanation and make sure the intent is clear without changing
the method signature.
src/parseable/mod.rs (1)

102-103: 🗄️ Data Integrity & Integration | 🔵 Trivial

Keep METASTORE private. parseable::parseable::METASTORE is public, so any caller can seed the OnceCell before PARSEABLE initializes and force reuse of the wrong metastore. Expose a crate-local initializer or read-only accessor instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/parseable/mod.rs` around lines 102 - 103, The METASTORE singleton is
publicly exposed, which allows external callers to seed it before PARSEABLE
initializes and lock in the wrong metastore. Make METASTORE private in
parseable::mod and keep the initialization path inside the module, then provide
a crate-local initializer or a read-only accessor tied to PARSEABLE so other
code can only retrieve the metastore, not set it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 868-887: The migration flag is currently ignored in
get_stream_json, so the true value passed from the migration flow has no effect.
Update get_stream_json to use the is_migration state when choosing the object
path, or remove the parameter if the same path is always correct; make sure the
behavior matches the call site in migration/mod.rs and that the path logic in
get_base/stream_json_path reflects the intended migration-specific lookup.

In `@src/parseable/mod.rs`:
- Around line 140-183: Parse the CLI and prompt/save env vars only once before
calling METASTORE.get_or_init, since `prompt_missing_envs`, `Cli::try_parse`,
and `save_collected_envs` are currently duplicated and can produce inconsistent
startup state. Move that shared setup out of the `get_or_init` closure, then
pass the already-parsed `cli` into the metastore construction path that creates
`ObjectStoreMetastore` and `storage.construct_client()`.

---

Nitpick comments:
In `@src/metastore/metastore_traits.rs`:
- Around line 222-233: The trait docs for get_stream_json only describe get_base
and do not explain the new is_migration contract. Update the doc comment on
MetastoreTraits::get_stream_json to briefly state what is_migration changes for
callers and implementations, so object-store behavior and migration call sites
stay aligned. Keep the wording near the existing get_base explanation and make
sure the intent is clear without changing the method signature.

In `@src/parseable/mod.rs`:
- Around line 102-103: The METASTORE singleton is publicly exposed, which allows
external callers to seed it before PARSEABLE initializes and lock in the wrong
metastore. Make METASTORE private in parseable::mod and keep the initialization
path inside the module, then provide a crate-local initializer or a read-only
accessor tied to PARSEABLE so other code can only retrieve the metastore, not
set it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: da92fdf1-d62a-4b50-88dc-7bd72422f577

📥 Commits

Reviewing files that changed from the base of the PR and between 5e7a0f2 and 669b6ca.

📒 Files selected for processing (11)
  • src/catalog/mod.rs
  • src/enterprise/utils.rs
  • src/handlers/http/logstream.rs
  • src/lib.rs
  • src/metastore/metastore_traits.rs
  • src/metastore/metastores/object_store_metastore.rs
  • src/migration/mod.rs
  • src/parseable/mod.rs
  • src/query/mod.rs
  • src/query/stream_schema_provider.rs
  • src/storage/object_storage.rs

@nikhilsinhaparseable nikhilsinhaparseable merged commit 824e00d into main Jun 26, 2026
12 checks passed
@nikhilsinhaparseable nikhilsinhaparseable deleted the metastore-static branch June 26, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants