Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

24.08.2024 00:00

The Register

For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed

If you want to scale a large language model (LLM) to a few thousand users, you might think a beefy enterprise GPU is a hard requirement. However, at least according to Backprop, all you actually need is a four-year-old graphics card.…

Читайте на 123ru.net

For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed

Читайте на 123ru.net

Документальные новости

Происшествия

Настроение

Ru24.pro