Commit Graph

  • f1668af932 uv packages must be 7 days old (#994) main Sebastian Raschka 2026-03-31 20:10:15 -04:00
  • d977841fad Swap urllib.request with requests (#993) Sebastian Raschka 2026-03-30 22:03:13 -04:00
  • 6b9502056f fix: pin 1 unpinned action(s) (#987) dagecko 2026-03-26 12:49:44 -04:00
  • 9320a5e252 fix: added KVcache in generate_text_basic_stream (#981) casinca 2026-03-21 01:47:52 +01:00
  • 130cc1f63c harded the link checker rasbt 2026-03-07 17:05:41 -06:00
  • 9ab6e894ac Minor typo fix (#974) Sebastian Raschka 2026-03-07 17:31:40 -05:00
  • 052c2dea4f Bpe whitespace fixes (#975) Sebastian Raschka 2026-03-07 14:56:25 -05:00
  • 3a7b98a36a Add more analysis to qwen3.5 image Sebastian Raschka 2026-03-04 08:47:06 -06:00
  • ae8eebf0d7 Use full HF url Sebastian Raschka 2026-03-03 16:38:05 -06:00
  • 7892ec9435 Qwen3.5 from scratch (#969) Sebastian Raschka 2026-03-03 17:31:16 -05:00
  • 4612d20fa8 User argpars utils to show default args on command line rasbt 2026-03-01 20:15:21 -06:00
  • c079904491 Jupyter scrolling glitch tips (#965) Sebastian Raschka 2026-02-27 18:33:39 -05:00
  • ec78de32dc image size rasbt 2026-02-19 16:42:19 -06:00
  • 10bffd62b7 image size rasbt 2026-02-19 16:41:43 -06:00
  • c745ded43d formatting fix rasbt 2026-02-19 16:40:28 -06:00
  • 62f0356e0d Add Tiny Aya from scratch (#962) Sebastian Raschka 2026-02-19 17:33:22 -05:00
  • 1ed48c2450 remove redundant assignment (#961) Sebastian Raschka 2026-02-18 23:03:49 -05:00
  • 2d600ccb5b Use correct input in layernorm example (#960) Sebastian Raschka 2026-02-18 22:35:57 -05:00
  • be5e2a3331 Readability and code quality improvements (#959) Sebastian Raschka 2026-02-17 19:44:56 -05:00
  • 7b1f740f74 Fix flex attention in PyTorch 2.10 (#957) Sebastian Raschka 2026-02-09 15:12:40 -05:00
  • 82010e2c77 Fix docstring parameter names in compute_dpo_loss function (#953) Dawid Woźniak 2026-01-29 23:51:17 +01:00
  • e155d1b02c Update unit tests for CI (#952) Sebastian Raschka 2026-01-27 17:44:55 -06:00
  • 59d9262047 chore: Update outdated GitHub Actions versions (#951) Pádraic Slattery 2026-01-19 19:22:29 +01:00
  • 47cfc61800 link GRPO notebook (#950) Sebastian Raschka 2026-01-18 11:42:03 -06:00
  • 9c4be478f8 Optional weight tying for Qwen3 and Llama3.2 pretraining (#949) casinca 2026-01-14 16:07:04 +01:00
  • e0dbec3331 Fix encoding of multiple preceding spaces in BPE tokenizer. (#945) Maxwell De Jong 2026-01-10 11:27:23 -05:00
  • 90e0f3cc15 Chapter 5 with alternative LLMs (Qwen3, Llama 3) (#943) Sebastian Raschka 2026-01-09 14:58:20 -06:00
  • 9df9e69cd2 Correct batch_idx in appendix A logging (#942) Henry 2026-01-08 09:14:17 +08:00
  • 491fd58463 Fix Olmo3 YaRN RoPE implementation bug (#940) Gerardo Moreno 2026-01-03 16:59:57 -08:00
  • b26fa01381 Correct 'pix' to 'pixi' in README.md (#935) Maheshkumar P 2026-01-03 02:47:56 +05:30
  • e10af0a1b9 Clean up native-uv.md documentation (#938) Sebastian Raschka 2026-01-02 15:17:43 -06:00
  • 14c7afaa58 Fix GitHub CI timeout issue for link checker (#937) Sebastian Raschka 2026-01-02 14:34:31 -06:00
  • 5f3268c2a6 yearly update rasbt 2026-01-01 18:19:44 -06:00
  • d85ba93799 Cover Python 3.12 (#933) Sebastian Raschka 2025-12-27 16:30:13 -06:00
  • 1cea30f9b1 upload saved nb rasbt 2025-12-20 18:41:34 -06:00
  • 2b9a67c00d Update memory efficient loading nb rasbt 2025-12-20 18:35:13 -06:00
  • 695ecb61ce update submodule rasbt 2025-12-20 11:38:06 -06:00
  • 1c9f49c812 Add some appendix E runtimes (#927) Sebastian Raschka 2025-12-19 14:13:43 -06:00
  • 57430d2a13 Gated DeltaNet updates (#926) Sebastian Raschka 2025-12-18 20:28:53 -06:00
  • d7f178d28b Sliding window KV Cache bug fix (#925) talentJay-ux 2025-12-15 16:47:01 -08:00
  • a11965fbd9 Remove persistent flag from cache buffers (#916) Sebastian Raschka 2025-11-24 20:10:02 -06:00
  • c19533851f Add Olmo 3 README (#915) Sebastian Raschka 2025-11-23 10:53:48 -06:00
  • bc6f335526 Olmo 3 from scratch (#914) Sebastian Raschka 2025-11-22 22:42:18 -06:00
  • 398b079efa RoPE decay plot (#910) Sebastian Raschka 2025-11-17 17:29:49 -06:00
  • 28a8408d4d Update README wrt multi-query attention Sebastian Raschka 2025-11-17 16:39:32 -06:00
  • a4094470c7 Write-up on how to get the most out of this book (#909) Sebastian Raschka 2025-11-12 20:20:48 -06:00
  • 7d92267170 fix(GatedDeltaNet): Init param A from log of a uniform distrib (#906) casinca 2025-11-09 21:22:52 +01:00
  • 35354fac80 Use consistent title case rasbt 2025-11-06 15:22:24 -06:00
  • 58f45ae5a7 Fix empty device issue (#904) Sebastian Raschka 2025-11-05 20:04:44 -06:00
  • bcc73f731d n_heads × d_head -> d_head × d_head in DeltaNet (#903) Sebastian Raschka 2025-11-05 18:28:37 -06:00
  • 488bef7e3f Image resizing Sebastian Raschka 2025-11-02 21:05:38 -06:00
  • c6b8332a59 Gated DeltaNet write-up (#901) Sebastian Raschka 2025-11-02 21:03:42 -06:00
  • d6c3990c57 Training on MPS in PyTorch 2.9 (#900) Sebastian Raschka 2025-11-01 16:55:09 -05:00
  • 27d52d6378 Fix MHAEinsum weight dimension bug when d_in != d_out (#857) (#893) Aviral Garg 2025-11-01 08:15:31 +05:30
  • b1db33b384 simplify uv command (#898) Sebastian Raschka 2025-10-31 19:44:57 -05:00
  • 760f4c9ecc Add bonus dependencies to pyproject (#897) Sebastian Raschka 2025-10-28 20:36:21 -05:00
  • 0adb5b8c65 Fix ffn link (#892) Sebastian Raschka 2025-10-21 21:19:44 -05:00
  • 7ca7c47e4a Make quote style consistent (#891) Sebastian Raschka 2025-10-21 19:42:33 -05:00
  • 9276edbc37 - docs(moe): correct arXiv link for DeepSeekMoE (#890) casinca 2025-10-21 02:29:06 +02:00
  • 218221ab62 Mixture-of-Experts intro (#888) Sebastian Raschka 2025-10-19 22:17:59 -05:00
  • 27b6dfab9e Make it easier to toggle between thinking and instruct variants (#887) Sebastian Raschka 2025-10-16 20:37:31 -05:00
  • 7fe4874dda Update the compression rate comment in MLA (#883) Sebastian Raschka 2025-10-14 11:10:06 -05:00
  • b969b3ef7a Use figure numbers in ch05-7 (#881) Sebastian Raschka 2025-10-13 16:26:35 -05:00
  • bf039ff3dc Add alternative attention structure (#880) Sebastian Raschka 2025-10-13 14:31:13 -05:00
  • 6eb6adfa33 sliding window attention (#879) Sebastian Raschka 2025-10-12 22:13:20 -05:00
  • 21f0617ea3 Add other appendices for completeness (#878) Sebastian Raschka 2025-10-12 19:04:53 -05:00
  • 44eda5340a rm plot rasbt 2025-10-12 08:55:03 -05:00
  • 9b9586688d Multi-Head Latent Attention (#876) Sebastian Raschka 2025-10-11 20:08:30 -05:00
  • bf27ad1485 Use GB instead of GiB consistently (#875) Sebastian Raschka 2025-10-11 09:11:33 -05:00
  • c814814d72 Grouped-Query Attention memory (#874) Sebastian Raschka 2025-10-11 08:44:19 -05:00
  • b8e12e1dd1 Use inference_device rasbt 2025-10-09 10:59:17 -05:00
  • fecfdd16ff Add simpler BPE, and make previous BPE better (#870) Sebastian Raschka 2025-10-08 22:22:34 -05:00
  • 1164cb3e8f Qwen3 and evaluation bonus materials (#869) Sebastian Raschka 2025-10-08 18:22:19 -05:00
  • 7bd263144e Switch from urllib to requests to improve reliability (#867) Sebastian Raschka 2025-10-07 15:22:59 -05:00
  • 9f7dbb2493 Update docker file dockerfile rasbt 2025-10-06 18:31:59 -05:00
  • 8552565bda Add missing comma in imports in README (#865) Sebastian Raschka 2025-10-06 16:03:04 -05:00
  • 7084123d10 Note about output dimensions (#862) Sebastian Raschka 2025-10-01 10:47:04 -05:00
  • 4d9f9dcb6c Update ollama address (#861) Sebastian Raschka 2025-09-30 21:05:53 -05:00
  • 00c240ff87 some typo fixes (#858) casinca 2025-09-30 18:18:02 +02:00
  • 458f2d9b67 Test dependencies with Python 3.13 (#843) Sebastian Raschka 2025-09-27 08:38:07 -05:00
  • 47867bc1cb Update generate script (#847) Sebastian Raschka 2025-09-27 08:03:54 -05:00
  • 9bc827ea7e Numerically stable generate on mps (#849) Sebastian Raschka 2025-09-26 22:42:44 -05:00
  • f492c949d3 Requirements update (#851) Sebastian Raschka 2025-09-26 22:19:57 -05:00
  • b1f852c1ba Update requirements.txt requirements-update rasbt 2025-09-26 21:57:22 -05:00
  • 3c10919c32 Numerically stable generate on mps rasbt 2025-09-26 21:37:25 -05:00
  • 322000d833 Windows compile (#845) Sebastian Raschka 2025-09-26 12:01:19 -05:00
  • 3b83705988 Update package dependencies (#842) Sebastian Raschka 2025-09-22 18:32:39 -05:00
  • e742d8af2c Improve MoE implementation (#841) Sebastian Raschka 2025-09-22 15:21:06 -05:00
  • 20041fb94b Note about devcontainer root usage (#833) Sebastian Raschka 2025-09-21 11:12:44 -05:00
  • 2aa8e8130d Note about RoPE usage (#839) Sebastian Raschka 2025-09-20 11:25:58 -05:00
  • 42c130623b Qwen3Tokenizer fix for Qwen3 Base models and generation mismatch with HF (#828) casinca 2025-09-17 15:14:11 +02:00
  • bfc6389fab fix code comment (#834) Synix 2025-09-17 09:36:02 +08:00
  • 862df48e38 use apply_chat_template qwen-tokenizer-fix rasbt 2025-09-16 08:12:01 -05:00
  • 8237b3fda0 removed duplicate code fragment intest_chat_wrap_and_equivalence casinca 2025-09-16 11:32:05 +02:00
  • 16f30a0395 added copy of test def test_tokenizer_equivalence() from reasoning-from-scratch in test_qwen3.py casinca 2025-09-16 11:12:29 +02:00
  • 4ea2fb4a76 copied download_file in utils from https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/utils.py casinca 2025-09-16 11:10:01 +02:00
  • 186e83c579 Revert "prevent self.apply_chat_template being applied for base Qwen models" casinca 2025-09-16 09:43:01 +02:00
  • 02a1cb1159 Revert "- added no chat template comparison in test_chat_wrap_and_equivalence" casinca 2025-09-16 09:42:47 +02:00
  • 701b5ad54d Merge branch 'main' into qwen-tokenizer-fix casinca 2025-09-16 09:38:45 +02:00
  • b6cd0a312f More efficient angles computation in RoPE (#830) Sebastian Raschka 2025-09-15 22:23:33 -05:00