0 0 0 Python

fix: correct Qwen3.5 causal masking, add lazy logits, compiled verify, KOD benchmarks

Previous Qwen3.5 benchmarks were INVALID — bidirectional attention mask bug
caused artificially high acceptance rates and degenerate outputs. Fixed by
using cache[fa_idx] (KVCache) for create_attention_mask instead of cache[0]
(ArraysCache).

Also adds:
- Lazy logit computation (no speedup — MLX eval overhead)
- Accept-all-block path (no speedup — MLX lazy eval handles it)
- Compiled full-attention verify (~5% on Qwen3.5)
- New benchmark files spec-2.json, spec-with-kod-2.json with correct results
- Server --no-think flag for Qwen3.5 enable_thinking=False

clandestine.eth committed 7d ago

e220fb3594b408996d43745406c0b555e0cf625a

Parent: 8284f4d