0 0 0 Python

fix(mm): support diffusers FLUX LoRAs on NF4/8-bit quantized base models (#9118)

CustomInvokeLinearNF4 and CustomInvokeLinear8bitLt were missing the
_cast_weight_bias_for_input / _cast_tensor_for_input methods that the
sidecar-patches branch in autocast_linear_forward_sidecar_patches calls.
This caused an AttributeError whenever a non-LoRALayer/FluxControlLoRALayer
patch (e.g. MergedLayerPatch produced by the diffusers FLUX LoRA converter
for fused Q/K/V/mlp into linear1) was applied to a quantized FLUX module.

The weight is exposed as a meta-device tensor with the correct logical
shape (read from quant_state for Params4bit, since .shape reports the
packed-byte layout). Shape-only patches (LoRA, LoHA, MergedLayerPatch)
work; SetParameterLayer / DoRA on quantized modules remain unsupported.

Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>

Alexander Eichhorn committed 1mo ago

71102efab8b0801afb05284d7d885d4282e0bbd9

Parent: 04d476d

Committed by GitHub <noreply@github.com> on 5/15/2026, 8:25:58 PM