fix(mm): support diffusers FLUX LoRAs on NF4/8-bit quantized base models (#9118)
CustomInvokeLinearNF4 and CustomInvokeLinear8bitLt were missing the _cast_weight_bias_for_input / _cast_tensor_for_input methods that the sidecar-patches branch in autocast_linear_forward_sidecar_patches calls. This caused an AttributeError whenever a non-LoRALayer/FluxControlLoRALayer patch (e.g. MergedLayerPatch produced by the diffusers FLUX LoRA converter for fused Q/K/V/mlp into linear1) was applied to a quantized FLUX module. The weight is exposed as a meta-device tensor with the correct logical shape (read from quant_state for Params4bit, since .shape reports the packed-byte layout). Shape-only patches (LoRA, LoHA, MergedLayerPatch) work; SetParameterLayer / DoRA on quantized modules remain unsupported. Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
A
Alexander Eichhorn committed
71102efab8b0801afb05284d7d885d4282e0bbd9
Parent: 04d476d
Committed by GitHub <noreply@github.com>
on 5/15/2026, 8:25:58 PM