COMMITS
May 19, 2026
C
update links to quant.exposed now that it's in llm-almanac (#1567)
Charles Frye committed
May 18, 2026
M
Rename web_endpoint in repo structure (#1566)
Michael Waskom committed
M
Rename 'web endpoint' to 'Web Function' (#1565)
Michael Waskom committed
May 12, 2026
C
extend GitHub Actions timeout to match timeout extension in #1562 (#1563)
Charles Frye committed
C
extend timeouts (#1562)
Charles Frye committed
C
fix typo in deepgemm-cache path (#1560)
Charles Frye committed
May 7, 2026
C
remove all llm-structed examples and ollama example (#1557)
Charles Frye committed
C
update whisper version 20230314->20250625 (#1556)
Charles Frye committed
May 4, 2026
L
Add restricted volumes example (#1548)
Lucy Zhang committed
April 29, 2026
C
update SGL, CUDA, Qwen 3.5->3.6 (#1555)
Charles Frye committed
April 26, 2026
C
extends timeouts on sglang_low_latency (#1554)
Charles Frye committed
April 24, 2026
C
add deployment example for deepseek v4 pro (#1552)
Charles Frye committed
April 21, 2026
E
Fix async usage warning in doc_ocr_webapp (#1550)
Elias Freider committed
April 20, 2026
C
yolo: add curl to image for ultralytics download retry path (#1547)
Charles Frye committed
April 17, 2026
C
add daily deployment (#1545)
Charles Frye committed
April 16, 2026
C
removes endpoint, perf, and full inference examples from getting_started (#1544)
Charles Frye committed
C
Drop -FP8 from Qwen model name in 01_getting_started examples (#1543)
Charles Frye committed
April 15, 2026
C
remove GPU packing example (#1542)
Charles Frye committed
April 14, 2026
A
[feat] adding vector similarity search example (#1539)
Alex Korbonits committed
April 7, 2026
C
Update generate_music (#1540)
Charles Frye committed
April 6, 2026
C
fix: add -m flag to modal deploy for module paths (#1538)
Charles Frye committed
C
Fix 'Goole' typo to 'Google' in vllm_inference.py (#1537)
Charles Frye committed
April 4, 2026
C
Revert "try out 31B NVFP4"
Charles Frye committed
C
try out 31B NVFP4
Charles Frye committed
C
move to Qwen 3.5 MoE, update SGLang (#1536)
Charles Frye committed
April 3, 2026
C
update vllm, pin speculator revision (#1535)
Charles Frye committed
C
switch from removed experimental API to stable API (#1534)
Charles Frye committed
March 24, 2026
C
remove old file, remove outdated line in internal readme (#1529)
Charles Frye committed
March 12, 2026
C
update nemotron config (#1526)
Charles Frye committed
March 11, 2026
C
fix typo (#1525)
Charles Frye committed