[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841)

* fix imports

* add ray-serve with Colossal-Infer tp

* trivial: send requests script

* add README

* fix worker port

* fix readme

* use app builder and autoscaling

* trivial: input args

* clean code; revise readme

* testci (skip example test)

* use auto model/tokenizer

* revert imports fix (fixed in other PRs)

Yuanheng Zhao committed 2y ago

573f270537ac1ec1de4aef80a8f74b42af89db35

Parent: 3a74eb4

Committed by GitHub <noreply@github.com> on 10/2/2023, 9:48:38 AM