Skip to content

Add NVFP4 Conv3d export for diffusers VAE (Wan 2.2)#1809

Open
jingyu-ml wants to merge 1 commit into
mainfrom
feat/nvfp4-conv3d-export
Open

Add NVFP4 Conv3d export for diffusers VAE (Wan 2.2)#1809
jingyu-ml wants to merge 1 commit into
mainfrom
feat/nvfp4-conv3d-export

Conversation

@jingyu-ml

@jingyu-ml jingyu-ml commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: new feature

Adds NVFP4 Conv3d weight export to the unified Hugging Face diffusers export. Quantized Conv3d layers — concretely the Wan 2.2 VAE WanCausalConv3d stack — are serialized in the same logical flattened-K NVFP4 schema already used for NVFP4 Linear. Each filter [O, C, kt, kh, kw] is flattened to [O, K_flat] (PyTorch-contiguous), K_flat is padded to a multiple of the block size 16, and the result is stored as packed weight ([O, K_pad/2] uint8), per-block weight_scale ([O, K_pad/16] FP8 E4M3) and a scalar weight_scale_2; input_scale is emitted when the activation amax is calibrated.

Previously these modules fell through _process_quantized_modules unpacked and could leak quantizer buffers. This PR:

  • adds an is_quantconv3d predicate (Conv3d-scoped; excludes ConvTranspose / Conv2d / Conv1d) and a conv branch in _process_quantized_modules;
  • generalizes hide_quantizers_from_state_dict to strip every *_quantizer child so no quantizer state (_amax) is serialized;
  • excludes Conv3d from the opt-in pad_nvfp4_weights / swizzle_nvfp4_scales post-processing so conv stays in logical layout (kernel-side layout preparation is a downstream-runtime concern).

Scope: dynamic NVFP4 Conv3d. Out of scope (downstream or future work): Conv2d / ConvTranspose / Conv1d packing, static/MSE NVFP4 conv, and kernel-side 128x4 SF swizzle / channel alignment / KTRSC repack / runtime alpha.

Usage

python quantize.py \
    --model wan2.2-t2v-5b --backbone vae \
    --format fp4 --quant-algo max --collect-method default \
    --model-dtype BFloat16 --trt-high-precision-dtype BFloat16 \
    --batch-size 1 --calib-size 32 --n-steps 30 \
    --hf-ckpt-dir ./wan22_vae_nvfp4_hf

Testing

  • New CPU unit tests under tests/unit/torch/export/: byte-exact packing vs NVFP4QTensor.quantize, dequant round-trip, exported-tensor schema (dtypes/shapes/scalar weight_scale_2), no quantizer-state leakage, ConvTranspose and static-NVFP4 exclusion, and a pad/swizzle conv-exclusion regression; plus a tiny AutoencoderKLWan save/reload test.
  • Target suite passes (39 tests) including the existing NVFP4 export tests — no regression.
  • End to end: a real Wan 2.2 5B VAE calib-32 export produced 48 NVFP4 conv layers, all schema-valid, with zero quantizer keys on disk.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅ (additive; new parameters default to prior behavior; non-conv paths unchanged).
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A (no new dependencies; reuses existing NVFP4 helpers).
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅
  • Did you get Claude approval on this PR?: ❌ — please run /claude review on the PR.

Additional Information

Kernel-side conv layout (128x4 SF swizzle, channel alignment, KTRSC repack) and the runtime alpha are intentionally left to the downstream runtime (e.g. TRT-LLM); ModelOpt stores only the logical checkpoint. The produced Wan 2.2 5B VAE checkpoint itself is kept out of this PR.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added NVFP4 Conv3d weight export support to Hugging Face diffusers format with serialized packed weights, scale tensors, and logical flattened-K schema layout.
  • Documentation

    • Updated Wan 2.2 VAE quantization guide to document Conv3d weight export behavior and --hf-ckpt-dir checkpoint generation.
  • Tests

    • Added comprehensive unit and integration tests for NVFP4 Conv3d export validation, including byte-exact packing verification and roundtrip accuracy checks.

@jingyu-ml jingyu-ml requested review from a team as code owners June 23, 2026 21:19
@jingyu-ml jingyu-ml requested a review from Edwardf0t1 June 23, 2026 21:19
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fe179b8b-8baa-4841-9954-a16af14872c5

📥 Commits

Reviewing files that changed from the base of the PR and between e0ebadb and ebb674f.

📒 Files selected for processing (7)
  • CHANGELOG.rst
  • examples/diffusers/README.md
  • modelopt/torch/export/diffusers_utils.py
  • modelopt/torch/export/layer_utils.py
  • modelopt/torch/export/unified_export_hf.py
  • tests/unit/torch/export/test_nvfp4_conv_export.py
  • tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py
✅ Files skipped from review due to trivial changes (2)
  • examples/diffusers/README.md
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (5)
  • modelopt/torch/export/layer_utils.py
  • modelopt/torch/export/unified_export_hf.py
  • tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py
  • tests/unit/torch/export/test_nvfp4_conv_export.py
  • modelopt/torch/export/diffusers_utils.py

📝 Walkthrough

Walkthrough

Adds NVFP4 Conv3d weight export support to the unified Hugging Face diffusers export pipeline. A new is_quantconv3d classifier routes Conv3d modules through a dedicated _export_quantized_conv_weight packing routine (flattened-K layout). Pad/swizzle postprocessing helpers gain an exclude_layers parameter so Conv3d tensors bypass linear-oriented transforms. hide_quantizers_from_state_dict is generalized to strip any _quantizer-named submodule.

Changes

NVFP4 Conv3d export pipeline

Layer / File(s) Summary
Conv3d classifier and quantizer-hiding generalization
modelopt/torch/export/layer_utils.py, modelopt/torch/export/diffusers_utils.py
Adds is_quantconv3d boolean predicate (type-name matching, excludes Transpose/Conv1d/Conv2d). Rewrites hide_quantizers_from_state_dict to temporarily delete any immediate child submodule whose name ends with "_quantizer", removing the is_quantlinear dependency.
Exclude-layers support in pad/swizzle helpers
modelopt/torch/export/diffusers_utils.py
Adds optional exclude_layers: set[str] | None to _find_nvfp4_layers, pad_nvfp4_weights, and swizzle_nvfp4_scales, so Conv3d-packed tensors are skipped during NVFP4 linear-oriented postprocessing.
_export_quantized_conv_weight packing routine
modelopt/torch/export/unified_export_hf.py
New function that flattens Conv3d weight [O, C, kt, kh, kw][O, K_flat], pads to BLOCK_SIZE, derives per-block weight_scale and global weight_scale_2, packs bytes into uint8, optionally registers input_scale, and raises NotImplementedError for static NVFP4 quantizers.
Export pipeline routing and postprocess wiring
modelopt/torch/export/unified_export_hf.py
Adds is_quantconv3d branch in _process_quantized_modules. Extends _postprocess_safetensors with nvfp4_exclude_layers. Computes conv_nvfp4_prefixes in the diffusers checkpoint export path and passes it as the exclusion set to postprocessing.
Unit tests for Conv3d export schema and helpers
tests/unit/torch/export/test_nvfp4_conv_export.py
New CPU-only test module: routing predicate, packed shape/dtype schema, static-NVFP4 rejection, byte-exact packing + round-trip dequantization, K-order sensitivity, _process_quantized_modules integration, quantizer-hiding hygiene, logical-layout scale assertion, and pad/swizzle/postprocess exclusion.
End-to-end Wan 2.2 VAE diffusers export test
tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py
Quantizes a tiny AutoencoderKLWan with NVFP4, runs the full pack+save pipeline, reloads safetensors, asserts no _quantizer keys leaked, and validates every NVFP4 Conv3d on disk matches the flattened-K schema.
Docs and changelog
examples/diffusers/README.md, CHANGELOG.rst
Documents --hf-ckpt-dir HF export behavior for Conv3d NVFP4 with updated example command. Adds CHANGELOG entry for the new export feature.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as Diffusers Export CLI
  participant Exporter as _export_diffusers_checkpoint
  participant Process as _process_quantized_modules
  participant ConvPack as _export_quantized_conv_weight
  participant Post as _postprocess_safetensors
  participant Pad as pad_nvfp4_weights
  participant Swizzle as swizzle_nvfp4_scales

  CLI->>Exporter: --hf-ckpt-dir provided
  Exporter->>Exporter: compute conv_nvfp4_prefixes via is_quantconv3d
  Exporter->>Process: iterate named modules
  Process->>Process: is_quantconv3d(sub_module)?
  Process->>ConvPack: fsdp2_aware_weight_update → flatten/pad/quantize
  ConvPack-->>Process: weight(uint8), weight_scale(fp32), weight_scale_2(fp32)
  Exporter->>Post: save safetensors, nvfp4_exclude_layers=conv_nvfp4_prefixes
  Post->>Pad: state_dict, exclude_layers=conv_nvfp4_prefixes
  Post->>Swizzle: state_dict, exclude_layers=conv_nvfp4_prefixes
  Post-->>Exporter: safetensors written (conv tensors unchanged by pad/swizzle)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/Model-Optimizer#1794: Modifies _postprocess_safetensors in unified_export_hf.py, the same function this PR extends with the nvfp4_exclude_layers parameter.

Suggested reviewers

  • vishalpandya1990
  • ynankani
  • kevalmorabia97
  • jenchen13
  • Edwardf0t1
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding NVFP4 Conv3d export support for diffusers VAE (Wan 2.2), which is the primary objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Security scan complete. No violations of SECURITY.md coding practices found in the PR: no torch.load with weights_only=False, numpy.load with allow_pickle=True, hardcoded trust_remote_code=True, ev...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/nvfp4-conv3d-export

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.23.0)
modelopt/torch/export/layer_utils.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.15][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

modelopt/torch/export/diffusers_utils.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.18][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.17][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

  • 2 others
🔧 markdownlint-cli2 (0.22.1)
examples/diffusers/README.md

markdownlint-cli2 v0.22.1 (markdownlint v0.40.0)
Error: Unable to use configuration file '/coderabbit-0.markdownlint-cli2.jsonc'; ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at throwForConfigurationFile (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:48:9)
at readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:169:5)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
[cause]: Error: ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at async open (node:internal/fs/promises:640:25)
at async Object.readFile (node:internal/fs/promises:1287:14)
at async readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:141:17)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/coderabbit-0.markdownlint-cli2.jsonc'
}
}


Comment @coderabbitai help to get the list of available commands.

@jingyu-ml jingyu-ml requested review from cjluo-nv, mxinO and sychen52 June 23, 2026 21:22
@jingyu-ml

Copy link
Copy Markdown
Contributor Author

/claude review

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1809/

Built to branch gh-pages at 2026-06-23 21:38 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py (1)

108-123: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Scope on-disk schema checks to the Conv3d modules discovered earlier.

conv_prefixes is currently built from every *.weight_scale_2 key, so this test can pass even if Conv3d export regresses but other NVFP4 layers remain valid.

Suggested patch
-    conv_prefixes = [
-        k[: -len(".weight_scale_2")] for k in state_dict if k.endswith(".weight_scale_2")
-    ]
-    assert conv_prefixes, "no NVFP4 conv layers found on disk"
-    for prefix in conv_prefixes:
+    exported_prefixes = {
+        k[: -len(".weight_scale_2")] for k in state_dict if k.endswith(".weight_scale_2")
+    }
+    expected_conv_prefixes = set(quant_convs)
+    missing = expected_conv_prefixes - exported_prefixes
+    assert not missing, f"missing exported NVFP4 conv layers on disk: {sorted(missing)}"
+    for prefix in sorted(expected_conv_prefixes):
         weight = state_dict[f"{prefix}.weight"]
         scale = state_dict[f"{prefix}.weight_scale"]
         scale_2 = state_dict[f"{prefix}.weight_scale_2"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py` around lines 108
- 123, The conv_prefixes list is currently built from all state_dict keys ending
with .weight_scale_2, which means the subsequent assertion checks can pass even
if Conv3d export regresses. Instead, build conv_prefixes from the list of Conv3d
modules that were discovered earlier in the test (before this section), so that
the assertions in this block only validate the Conv3d modules that were actually
expected to be exported. This ensures the test properly scopes its validation to
the specific Conv3d modules rather than all NVFP4 layers present in the
state_dict.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/torch/export/test_nvfp4_conv_export.py`:
- Line 318: Move the `from safetensors.torch import load_file, save_file` import
statement from line 318 inside the test function to the module scope at the top
of the file with the other imports. This ensures import failures are caught at
module load time rather than at runtime, and follows the project's import
convention that requires imports at the top of the file unless there is a
specific reason (circular imports or optional dependencies) to place them inside
functions.

---

Outside diff comments:
In `@tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py`:
- Around line 108-123: The conv_prefixes list is currently built from all
state_dict keys ending with .weight_scale_2, which means the subsequent
assertion checks can pass even if Conv3d export regresses. Instead, build
conv_prefixes from the list of Conv3d modules that were discovered earlier in
the test (before this section), so that the assertions in this block only
validate the Conv3d modules that were actually expected to be exported. This
ensures the test properly scopes its validation to the specific Conv3d modules
rather than all NVFP4 layers present in the state_dict.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8ecadbbc-27bb-42c7-ba9f-6e8897cfd119

📥 Commits

Reviewing files that changed from the base of the PR and between 37dbbda and e0ebadb.

📒 Files selected for processing (8)
  • CHANGELOG.rst
  • examples/diffusers/README.md
  • examples/diffusers/quantization/quantize.py
  • modelopt/torch/export/diffusers_utils.py
  • modelopt/torch/export/layer_utils.py
  • modelopt/torch/export/unified_export_hf.py
  • tests/unit/torch/export/test_nvfp4_conv_export.py
  • tests/unit/torch/export/test_nvfp4_conv_export_diffusers.py


def test_postprocess_safetensors_excludes_conv(tmp_path):
"""Conv stays logical on disk when pad/swizzle are enabled for other layers."""
from safetensors.torch import load_file, save_file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Move safetensors imports to module scope.

Line 318 imports inside a test function without justification; this delays import failures until runtime and violates the test import convention.

As per path instructions, “Imports belong at the top of the file ... The only acceptable in-function imports are for circular imports or optional dependencies ... with a brief comment naming the reason.”

Suggested patch
 import pytest
 import torch
 import torch.nn as nn
+from safetensors.torch import load_file, save_file
 
 import modelopt.torch.quantization as mtq
 from modelopt.torch.export.diffusers_utils import (
@@
 def test_postprocess_safetensors_excludes_conv(tmp_path):
     """Conv stays logical on disk when pad/swizzle are enabled for other layers."""
-    from safetensors.torch import load_file, save_file
 
     sd = {**_mk_nvfp4_layer("transformer.proj", 64, 256), **_mk_nvfp4_layer("vae.conv1", 120, 240)}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/export/test_nvfp4_conv_export.py` at line 318, Move the
`from safetensors.torch import load_file, save_file` import statement from line
318 inside the test function to the module scope at the top of the file with the
other imports. This ensures import failures are caught at module load time
rather than at runtime, and follows the project's import convention that
requires imports at the top of the file unless there is a specific reason
(circular imports or optional dependencies) to place them inside functions.

Source: Path instructions

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 60.49383% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.86%. Comparing base (c6f8f07) to head (ebb674f).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/export/unified_export_hf.py 52.94% 32 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1809       +/-   ##
===========================================
+ Coverage   62.89%   75.86%   +12.97%     
===========================================
  Files         511      511               
  Lines       56632    58620     +1988     
===========================================
+ Hits        35616    44472     +8856     
+ Misses      21016    14148     -6868     
Flag Coverage Δ
examples 42.11% <25.53%> (+4.11%) ⬆️
gpu 57.88% <25.53%> (+37.31%) ⬆️
regression 14.71% <6.38%> (+0.04%) ⬆️
unit 54.70% <85.45%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Adds NVFP4 Conv3d weight export for the diffusers Wan 2.2 VAE to the unified HF export path. +636/-18, 8 files. I traced the core logic and it holds up: the conv flatten weight.reshape(O, -1) matches the calibration-time flatten in _nvfp4_quantize_weight_along_k (so byte-exactness is plausible and is pinned by a real test); the pad-before-scale ordering is correct (get_weights_scaling_factor requires block-divisible last dim); the static-quantizer rejection via NVFP4QTensor._is_static_quantizer is sound and tested; is_quantconv3d correctly matches _QuantDiffusersWanCausalConv3d/QuantConv3d while excluding Conv1d/Conv2d (no "Conv3d" substring) and ConvTranspose; and the nvfp4_exclude_layers plumbing correctly keeps conv prefixes out of the Linear-targeted pad/swizzle. New optional kwargs are backward-compatible. The lazy ONNX import in quantize.py has a valid documented justification (heavy optional dependency). Test coverage is strong (byte-exact vs NVFP4QTensor.quantize, dequant round-trip, schema, K-order sensitivity, dispatch routing, quantizer-hiding, pad/swizzle exclusion, plus a tiny real Wan VAE e2e). New test files' license headers match the canonical LICENSE_HEADER (standard-header exception applies — no licensing concern). No prompt-injection in PR text.

Why nudge rather than approve:

  • hide_quantizers_from_state_dict was generalized from "strip QuantLinear's weight/input/output quantizers" to "strip every *_quantizer child of every module." This now also strips attention bmm/softmax quantizers across ALL diffusers exports, not just conv. My read is this is a cleanup (the unified HF safetensors path never converted attention quantizers into usable scale buffers — FP8 MHA scales go through the ONNX export_fp8_mha symbolic path, so the previously-serialized *_quantizer._amax keys were unusable junk), but it changes a shared code path with broad blast radius (Flux/SD3/Wan/LTX-2) and only has manual GPU e2e validation.
  • The real numeric conv export was only manually validated on a Wan 2.2 5B VAE (48 layers); CI coverage is CPU-only.
  • Minor: some scale-derivation/input_scale duplication between _export_quantized_conv_weight and _export_quantized_weight, acceptable given the conv-specific flatten/static-reject.

Recommend a diffusers-export owner confirm the hide_quantizers_from_state_dict broadening is safe for the FP8-MHA/Linear paths and that the GPU conv export numerics are signed off.

Route quantized Conv3d modules through the unified HuggingFace diffusers export
so their weights serialize in the logical flattened-K NVFP4 schema, matching
NVFP4 Linear. Previously such modules fell through `_process_quantized_modules`
unpacked and leaked quantizer buffers.

- layer_utils: add `is_quantconv3d` (Conv3d-scoped; excludes ConvTranspose /
  Conv2d / Conv1d).
- unified_export_hf: add `_export_quantized_conv_weight` + a dispatch branch.
  Flatten `[O,C,kt,kh,kw] -> [O,K_flat]`, pad K to a multiple of 16, derive scales
  from the flattened weight (dynamic NVFP4), and pack to `weight [O,K_pad/2]`
  uint8 + `weight_scale [O,K_pad/16]` fp8 + scalar `weight_scale_2` (+ `input_scale`
  when the activation amax is calibrated). Static NVFP4 conv is rejected.
- diffusers_utils: strip every `*_quantizer` child in
  `hide_quantizers_from_state_dict` so no quantizer state leaks; exclude Conv3d
  layers from the opt-in `pad_nvfp4_weights` / `swizzle_nvfp4_scales`
  post-processing (conv stays logical; the conv kernel does its own layout prep).
- examples/diffusers: document the Wan 2.2 VAE NVFP4 `--hf-ckpt-dir` command.
- tests: CPU coverage (byte-exact vs `NVFP4QTensor.quantize`, dequant round-trip,
  schema, no-leak, ConvTranspose/static exclusion, pad/swizzle conv-exclusion) +
  a tiny Wan VAE save/reload test.

Verified end to end: a Wan 2.2 5B VAE calib-32 export produces 48 NVFP4 conv
layers, all schema-valid, with zero quantizer keys on disk.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml force-pushed the feat/nvfp4-conv3d-export branch from e0ebadb to ebb674f Compare June 23, 2026 21:34

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no blocking issues found. LGTM

Scope reviewed: All 8 changed files (modelopt/torch/export/{diffusers_utils,layer_utils,unified_export_hf}.py, examples/diffusers/quantization/quantize.py, both new test files, README, CHANGELOG). Note: the two-dot git diff origin/main HEAD was polluted by unrelated MoE-refactor commits already on main; I reviewed against the authoritative gh pr diff (8 files).

What I verified:

  • Algorithm correctness — The export-time flatten weight.reshape(O, -1) exactly matches the calibration-time flatten in _nvfp4_quantize_weight_along_k (quant_conv.py:81), so per-block grouping is identical between calibration and export. Grouped-conv reduction dim K = (in_channels // groups) * kt*kh*kw is correctly captured. Padding K with zeros leaves amax unchanged, so weight_scale_2 and per-block scales are stable and the byte-exact round-trip holds.
  • Scope guards — static NVFP4 weight quantizer is rejected loudly (NotImplementedError); non-NVFP4 conv warns and leaves the weight unpacked. Both are strict improvements over the prior fall-through that leaked quantizer buffers.
  • Mode/State — the generalized hide_quantizers_from_state_dict strips every *_quantizer child and restores on exit; the delete-during-named_modules() walk is safe (the child _modules mutation precedes that level's .items() iterator) and matches the prior pattern.
  • Export compatibility — Conv3d prefixes are correctly excluded from pad_nvfp4_weights / swizzle_nvfp4_scales via nvfp4_exclude_layers, keeping the logical flattened-K layout for the downstream conv kernel. input_scale emission mirrors the NVFP4 Linear path verbatim.
  • Routingis_quantconv3d matches _QuantConv3d / QuantConv3d / _QuantDiffusersWanCausalConv3d, and excludes Conv1d/Conv2d and all ConvTranspose variants; consistent string-matching style with is_quantlinear.
  • Lazy ONNX import in quantize.py is confined to export_onnx (its only use site).

Additive, backward-compatible, and covered by focused CPU unit + e2e tests. No correctness, mode/state, export, compatibility, or performance concerns found.

export_dir: Path,
pipe: Any | None = None,
hf_quant_config: dict | None = None,
nvfp4_exclude_layers: set[str] | None = None,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the layers that we would like to exclude? can we infer by naming instead of introducing a new arg?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants