Skip to content

feat: split bundler-friendly bundle in to individual VFSs to reduce import size#168

Open
jurerotar wants to merge 2 commits into
sqlite:mainfrom
jurerotar:feat/treeshaking
Open

feat: split bundler-friendly bundle in to individual VFSs to reduce import size#168
jurerotar wants to merge 2 commits into
sqlite:mainfrom
jurerotar:feat/treeshaking

Conversation

@jurerotar

Copy link
Copy Markdown
Contributor

I've wanted to do this one for a while, but I'll need feedback from both of you, @tomayac & @sgbeal, along with any other sqlite-wasm user, ideally,.

Currently, importing browser bundle adds about ~620KB (excluding sqlite3.wasm) to the total bundle size. This browser bundle isn't tree-shakable, so users have to load all of it, regardless of how much of it they actually need.
Quickly checking open-source projects, most users default to using a single VFS and don't use other ones, meaning there's a lot of unused code that folks still have to ship to their users.

This PR adds support for importing only individual VFSs. I've added types to make sure the split works correctly, along with narrowed types (so you can't use kvvfs if you use opfs-wl).

Rough comparison is here. We save roughly 150-200KB, depending on VFS used, which is around 50KB gzip on average. To be 100% honest, I was going for a bit larger numbers, but core is mostly Emscripten code and there's not much I can do about it.

Import mode Raw JS Gzip JS
default 623.2 KiB 177.6 KiB
core 434.6 KiB 115.2 KiB
core + vtab 448.9 KiB 120.0 KiB
core + kvvfs 489.9 KiB 133.3 KiB
core + opfs 484.8 KiB 132.3 KiB
core + opfs-wl 482.7 KiB 131.4 KiB
core + opfs-sahpool 478.6 KiB 129.5 KiB

@tomayac

tomayac commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

As we're at it, Emscripten just added support for the proposed Cross-Origin Storage (COS) API. I'd love if you could see if there's a way to (i) compile with the new flag and (ii) see if the JS bundles could be optimized for COS. This would be a pretty big lever given SQLite Wasm is used on many sites. Full disclosure: I'm on the team that wants to make COS happen in browsers. Having SQLite as a project supporter would be huge. Are you interested in looking into it, @jurerotar and @sgbeal.

@jurerotar

Copy link
Copy Markdown
Contributor Author

COS sounds awesome! Count me in, I'd happy to help with this!

@jurerotar

Copy link
Copy Markdown
Contributor Author

I created a test build with the cross-origin storage enabled, seems to (at least) build with no issues 😄

https://github.com/jurerotar/sqlite-wasm-emcc-cos

@tomayac

tomayac commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

I created a test build with the cross-origin storage enabled, seems to (at least) build with no issues 😄

https://github.com/jurerotar/sqlite-wasm-emcc-cos

Amazing! The glue code might need some changes. If you install the COS extension and open a SQLite demo page using this build, you should see the Wasm binary be used (in DevTools, it loads a blob: URL) when it works correctly, but I think the glue code has custom loading.

@jurerotar

Copy link
Copy Markdown
Contributor Author

Yup, you're right, getting no resources found :(
We'll need some upstream changes then.

image

@tomayac

tomayac commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

I hacked something together in https://googlechrome.github.io/samples/sqlite-wasm-cos/. Maybe you can extrapolate from there? I don't think this would work for all variants of the various scripts yet.

Screenshot 2026-06-23 at 18 31 15

@jurerotar

Copy link
Copy Markdown
Contributor Author

Alrighty, as you correctly assessed, SQLite-WASM uses a custom loader in the form of Module.instantiateWasm. I've tried monkey-patching said loader and it seems to work. It's not the prettiest solution, but it gets us there with least amount of breaking changes.

image

@sgbeal, I know you haven't said yes to this work, but if I'm operating under the assumption that you will: do we still require Module.instantiateWasm to load WASM? If not, Emscripten provides their own implementation of COS-compliant file loading. If yes, I can post a (so far) very crude monkey-patch we can add.

@sgbeal

sgbeal commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

i've not looked closely at any of this because (A) i'm nursing a newly-operated dog and (B) it's so hot there that my computer is off most of the time, but FWIW...

i fully sympathize with the complaint and am not against, e.g., adding flags to the upstream build to filter select pieces out at build-time. The infrastructure is all there to support it. It's likely to be at least 2-3 weeks before i can explore that but am flagging this email thread so it won't is far less likely to get lost.

COS i'm hearing of for the first time. The link says:

This feature is experimental. The underlying Cross-Origin Storage browser API is a WICG proposal that has not yet shipped in any browser.

Until it's in one of the Big Three browsers and in progress in the other two, we're not likely to look twice at it upstream.

@tomayac

tomayac commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

COS i'm hearing of for the first time. The link says:

This feature is experimental. The underlying Cross-Origin Storage browser API is a WICG proposal that has not yet shipped in any browser.

Until it's in one of the Big Three browsers and in progress in the other two, we're not likely to look twice at it upstream.

It's no wonder that you hear about this the first time, it's relatively new. We have an extension that you can install to benefit from it today; it's a progressive enhancement. The chicken and egg problem for making it happen in ideally all browsers is to show developer demand, so if we can say SQLite uses this (just like Transformers.js, WebLLM, wllama, Flutter, and Emscripten today already*), this would be a huge lever. The cache benefits for a widely deployed resource like SQLite Wasm are real, and I'd love for you to take part in this.

(Hope the dog is fine! ❤️🐶)

——

@jurerotar

jurerotar commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

I hope your dog has a speedy and painless recovery, Stephan!

My solution for splitting modules is pretty brittle, because I have to split already generated code (https://github.com/jurerotar/sqlite-wasm/blob/a99c7de5234cbb57e846b1dac574c2c09ba7bc18/scripts/split-bundler-friendly-vfss.mjs)

Stephan, if you're okay with it, I'd be happy to prepare an example PR on how we could implement the changes that I did in this PR, upstream. I would of course focus on backwards compatibility, so existing consumers wouldn't notice any difference. Main changes would be introducing a module system, so treeshakability becomes built-in at the source. If you then don't like my solution, we can think of an alternative one, or stick with this PR.

Please let me know if this is something you'd like to see from my side! 😄

@sgbeal

sgbeal commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

The chicken and egg problem for making it happen in ideally all browsers is to show developer demand, so if we can say SQLite uses this (just like Transformers.js, WebLLM, wllama, Flutter, and Emscripten today already*), this would be a huge lever.

A long list of long-term commitments and responsibilities forces me to be highly selective, and features which aren't imminent in all of the major browsers are among the first to get filtered out as candidate side-projects.

i won't quite say "never" but will point to the recent OPFS Web Locks addition as an example: Web Locks was never seriously considered as an option until it was in all of the Big Three Browsers.

(Hope the dog is fine! ❤️🐶)

She's recovering but it will be about two weeks before she can take the stairs and hop on and off the bed (until then i'm sleeping on the floor so that she won't rip her stitches trying to get into the bed). Sigh. #Parenthood

Jure:

We already have some pieces in place anticipating this capability but it's never been fleshed out. It's really just a matter of figuring out how it should plug it into the build process. This is just spitballing, but maybe just something like:

$ make opfs=0 opfs-ws=1 kvvfs=0
# or maybe:
$ make options="opfs-ws"

defaulting to everything on for compatibility. The resulting sqlite3.mjs would simply be missing any of the disabled options.

:-?

@tomayac

tomayac commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

At Chrome, we're confident enough to put COS in Emscripten. What if Jure were to remodel the Wasm instantiation logic so it stays closer to Emscripten's (i.e., reducing the code you as SQLite are responsible for), would this be an option?

Literally LOL'ing about your dog story. Hope you at least have a good air matress!

@sgbeal

sgbeal commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What if Jure were to...

It's still nowhere on my radar for the time being. My local bandwidth simply doesn't support nice-to-haves right now. This filtering of the build is arguably also nice-to-have, but it's one which has bugged me for a couple of years and some work has already been done in that direction so it won't be a big effort to get that working. COS, on the other hand, lives in a whole other scope. Maybe someday, but not today.

Hope you at least have a good air matress!

i'd completely forgotten about my air mattress, but it's too big for the current room arrangements :/. It was bought when i had to wait a few weeks for my furniture after moving in.

@sgbeal

sgbeal commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

BTW:

Rough comparison is here. We save roughly 150-200KB, depending on VFS used, which is around 50KB gzip on average. To be 100% honest, I was going for a bit larger numbers, but core is mostly Emscripten code and there's not much I can do about it.

Search the makefile for "bare-bones". That removes all of the "optional" stuff from the WASM file but does not (AFAIR) affect the JS (which instead (currently) loads everything but skips adding bindings for optional C APIs which aren't in the WASM exports). Like you, we were underwhelmed at the amount of real savings so have never really explored a more modular build:

$ make oz barebones=1 # oz=optimize for smaller size, barebones=1 omits a bunch of C APIs
...
$ ls -la jswasm/*.{mjs,wasm}
-rw-rw-r-- 1 stephan stephan 740572 Jun 23 22:05 jswasm/speedtest1.wasm
-rw-rw-r-- 1 stephan stephan 788310 Jun 23 22:06 jswasm/sqlite3-64bit.mjs
-rw-rw-r-- 1 stephan stephan 626136 Jun 23 22:05 jswasm/sqlite3-64bit.wasm
-rw-rw-r-- 1 stephan stephan 785423 Jun 23 22:06 jswasm/sqlite3.mjs
-rw-rw-r-- 1 stephan stephan 614096 Jun 23 22:05 jswasm/sqlite3.wasm
-r--r--r-- 1 stephan stephan   1516 Jun 23 22:05 jswasm/sqlite3-worker1-bundler-friendly.mjs
-r--r--r-- 1 stephan stephan   1499 Jun 23 22:05 jswasm/sqlite3-worker1.mjs
-r--r--r-- 1 stephan stephan  12692 Jun 23 22:05 jswasm/sqlite3-worker1-promiser-bundler-friendly.mjs
-r--r--r-- 1 stephan stephan  12675 Jun 23 22:05 jswasm/sqlite3-worker1-promiser.mjs

# mjs stripped of comments:
$ stripccomments < jswasm/sqlite3.mjs | wc
  13883   33926  414149

@sgbeal

sgbeal commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Upstream we now have an experimental branch:

https://sqlite.org/src/timeline?r=js-omit-pieces

which allows us to do:

$ make clean; make omit-api="kvvfs OPFS"; # OPFS=>all OPFS vfses
$ make clean; make omit-api="opfs opfs-wl"; # retains opfs-sahpool

The feature/API names are: opfs opfs-sahpool opfs-wl kvvfs oo1 vtab

(You don't want to disable oo1, as that's the API 99% of users (and the test suite) are using, but the option is there for those who really do want it, and more options may be added as good candidates are found.)

This can be used together with the npm target:

$ make npm omit-apis="kvvfs"

However, the resulting file names are not unique to each build, so a small amount of scripting would be involved in getting multiple distinct builds going.

Jure, would that get you where you want to go? Please suggest any changes. If you have a concrete set of builds in mind we can add the aforementioned small amount of scripting in the makefile to generate whatever combinations you'd like to host.

@jurerotar

Copy link
Copy Markdown
Contributor Author

You beat me to it, Stephan! 😄

I took some time and prepared how I envisioned this to work. A preview is available here: https://github.com/jurerotar/sqlite-modular-wasm/pull/1/changes. Keep in mind, it's very much in a first draft state.

The idea behind it was to move away from globalThis.sqlite3ApiBootstrap.initializers. This is what essentially prevents bundlers from tree-shaking the unused modules. I've instead reworked the individual .js files to behave like modules. Each one only imports and exports stuff, but doesn't add anything to global namespace.

Backwards compatibility is then easily ensured by just creating an additional module, that imports all individual modules and registers them to the global namespace.

As a result, generated bundle is also much smaller. Previously, bundler-friendly, node, worker-1 scripts all generated full implementations (around 24k LoC each) of everything. Now, each of these files are just Emscripten code + individual imports.

Here's how to output looks like:

image

If you find the time, please take a look and let me know what you think of this approach! 😄

@sgbeal

sgbeal commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

The idea behind it was to move away from globalThis.sqlite3ApiBootstrap.initializers.

That is unlikely to happen upstream. Those functions are a necessary level of indirection for the various build modes we support and central to our ability to plug the sub-APIs in and out (plus it gives downstream folks a handy way to plug pieces in to their custom builds).

I've instead reworked the individual .js files to behave like modules. Each one only imports and exports stuff, but doesn't add anything to global namespace.

Upstream has to support vanilla JS as well, as well as communicate certain state across boundaries which Emscripten walls off, making the global scope a necessary evil (but a temporary one - the library removes all of that during its bootstrapping). Had it been developed about 3 years later we might have skipped vanilla support altogether, but we're stuck with it forever. Had i known at the time that import() can be used like a function to pull ESM modules into non-ESM code, we might have targeted only ESM, too. Mistakes were made and now i've got to support them forever (because that's the project policy upstream).

We could maybe factor out the initializer queue for (just) ESM builds, but that would be a highly disruptive change i've not got the motivation to pursue.

@jurerotar

Copy link
Copy Markdown
Contributor Author

Stephan, thank you for the explanation!

I think your solution would work best then. On our side, we'll just need to make sure to run both the "regular" npm build (to continue supporting existing API) and the "specific" ones.

However, the resulting file names are not unique to each build, so a small amount of scripting would be involved in getting multiple distinct builds going.

This should be fine to handle on our side. Let me see if I can get it running today and I'll update you on my progress.

Thank you for the help!

@jurerotar

Copy link
Copy Markdown
Contributor Author

Alrighty, I ran the build, here's the results:

Import mode This PR omit API Raw Δ This PR Gzip omit API Gzip Gzip Δ
default 623.2 KiB 627.6 KiB +4.4 KiB 177.6 KiB 170.6 KiB -7.0 KiB
core 434.6 KiB 455.7 KiB +21.1 KiB 115.2 KiB 121.2 KiB +6.0 KiB
core + vtab 448.9 KiB 470.3 KiB +21.4 KiB 120.0 KiB 125.2 KiB +5.2 KiB
core + kvvfs 489.9 KiB 509.8 KiB +19.9 KiB 133.3 KiB 137.1 KiB +3.8 KiB
core + opfs 484.8 KiB 504.2 KiB +19.4 KiB 132.3 KiB 135.4 KiB +3.1 KiB
core + opfs-wl 482.7 KiB 502.0 KiB +19.3 KiB 131.4 KiB 134.6 KiB +3.2 KiB
core + opfs-sahpool 478.6 KiB 498.0 KiB +19.4 KiB 129.5 KiB 133.3 KiB +3.8 KiB

I ran this only for bundler-friendly builds. Since this a potential "new" API, I'd like to support only bundler-friendly. Other users can still use the current way.

Stephan, there's a consistent 20KB difference. The reason for it is worker1 API being included in your builds. Any chance it could be omitted from the the omit builds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants