sampling : refactor init to use llama_sampling_params (#3696)

* sampling : refactor init to use llama_sampling_params

* llama : combine repetition, frequency and presence penalties in 1 call

* examples : remove embd-input and gptneox-wip

* sampling : rename penalty params + reduce size of "prev" vector

* sampling : add llama_sampling_print helper

* sampling : hide prev behind API and apply #3661

ggml-ci

Georgi Gerganov committed 2y ago

d1031cf49c3b958b915fd558e23453471c29ac33

Parent: 8cf19d6

Committed by GitHub <[email protected]> on 10/20/2023, 6:07:23 PM