News in English

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed

If you want to scale a large language model (LLM) to a few thousand users, you might think a beefy enterprise GPU is a hard requirement. However, at least according to Backprop, all you actually need is a four-year-old graphics card.…

Читайте на 123ru.net