Making large AI models cheaper, faster and more accessible
[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841)
* fix imports * add ray-serve with Colossal-Infer tp * trivial: send requests script * add README * fix worker port * fix readme * use app builder and autoscaling * trivial: input args * clean code; revise readme * testci (skip example test) * use auto model/tokenizer * revert imports fix (fixed in other PRs)
Y
Yuanheng Zhao committed
573f270537ac1ec1de4aef80a8f74b42af89db35
Parent: 3a74eb4
Committed by GitHub <noreply@github.com>
on 10/2/2023, 9:48:38 AM