COMMITS
April 16, 2026
C
Merge pull request #5 from 0xClandestine/docs/improve
clandestine.eth committed
C
docs: slim down README to ANE W8A16 work with story arc and citations
clandestine.eth committed
C
Merge pull request #4 from 0xClandestine/fix/ane-qk-norm
clandestine.eth committed
C
docs: update README paths python/ → scripts/
clandestine.eth committed
C
refactor: rename python/ to scripts/
clandestine.eth committed
C
refactor: move bench/test scripts to python/
clandestine.eth committed
C
docs: rewrite README with full story and measured ANE results
clandestine.eth committed
C
fix: switch ANE RoPE to neox (split-half) style, fix large ctx_len compile
clandestine.eth committed
C
fix: q8 path in profiled benchmark wrapper + README with ANE results
clandestine.eth committed
C
feat: W8A16 ANE quantization — 1.43x step speedup (67ms→47ms)
clandestine.eth committed
C
fix: restore run_uncached + _ane_kernels_pending pattern, default block_size=32
clandestine.eth committed
C
docs: log EXP-005 results (read_output in thread, -2% step, -13% combined)
clandestine.eth committed
C
perf: move read_output + lm_head-submit into ANE thread (IDEA-14)
clandestine.eth committed
C
docs: integrate ANE_RULES.md findings into optimization backlog
clandestine.eth committed
C
docs: update optimization logs with EXP-003 results and new ideas
clandestine.eth committed
C
perf: defer GPU lm_head onto _draft_stream to overlap with verify
clandestine.eth committed
C
tooling: add bench_ane_pipeline.py — full draft step pipeline profiler
clandestine.eth committed
C
tooling: add ANE power meter, timing+power profiler, and optimization logs
clandestine.eth committed
C
ANE: switch to run_cached for all kernels, add run_cached/run_direct/pre_map to Rust wrapper
clandestine.eth committed
April 15, 2026
C
feat: three acceptance-rate improvements for ANE||GPU pipeline
clandestine.eth committed
C
feat: add fused final-norm+lm_head ANE kernel infrastructure
clandestine.eth committed
C
docs: document prefix-split analysis, disable dead code
clandestine.eth committed
C
fix: sync ANE draft cache rope_offset from absolute sequence position
clandestine.eth committed
C
feat: ANE||GPU pipeline with thread-safe run_kernels/read_output split
clandestine.eth committed
C
perf: release GIL in read_f32 output buffer read
clandestine.eth committed
C
perf: fp16 weight path + GIL release in ANE kernel dispatch
clandestine.eth committed
C
docs: add ANE DFlash profile report and benchmark script
clandestine.eth committed
April 14, 2026
C
Merge pull request #2 from 0xClandestine/feat/benchmark-chart-2pass-sliding-window
clandestine.eth committed
C
feat: add 3-way benchmark chart, 2pass SDPA kernel, sliding window draft KV cache
clandestine.eth committed
C
Merge pull request #1 from 0xClandestine/perf/spec-decode-speed
clandestine.eth committed