LIBRISTO
LIBROAMANTO
obvezno
Pridružite se zajednici ljubitelja knjige iz cijelog svijeta i ostvarite mnoštvo pogodnosti. Izradite besplatni račun
0
Besplatna dostava Overseas kurirskom službom iznad 69.99 €
DPD kurir 3.99 Pošta 4.99 Overseas 4.99 Box Now 4.49 GLS 4.99 DPD točka 3.49 GLS paketomat 3.99

Besplatna dostava putem Box Now paketomata i Overseas kurirske službe iznad 69,99 €.

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Jezik EngleskiEngleski
Knjiga Meki uvez
Knjiga AI Inference Optimization Engineering ChatVariety Team
Libristo kod: 52770465
Nakladnici Independently published, lipanj 2026
Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass... Cijeli opis
? points 26 b Pripremamo Pripremamo Novo Novo
10.80
Očekivane nove zalihe Dobivanje novih zaliha 07. 06. 2026

30 dana za povrat kupljenih proizvoda

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:
  • Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
  • State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
  • Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
  • Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
  • Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Glumica & Poliglotkinja
EWA KASP za
Pusti video
Ewa Kasp
Libristo ima najveći izbor literature na stranim jezicima. Zato svoje knjige kupujem ovdje.

Informacije o knjizi

Puni naziv AI Inference Optimization Engineering
Jezik Engleski
Uvez Knjiga - Meki uvez
Datum izdanja 2026
Broj stranica 96
EAN 9798199720021
Libristo kod 52770465
Težina 142
Dimenzije 152 x 229 x 5
Poklonite ovu knjigu još danas
To je jednostavno
1 Dodajte knjigu u košaricu i odaberite isporuku kao poklon 2 Zauzvrat ćemo vam poslati kupon 3 Knjiga dolazi na adresu poklonoprimca

Prijava

Prijavite se na svoj račun. Još nemate Libristo račun? Otvorite ga odmah!

 
obvezno
obvezno

Nemate račun? Ostvarite pogodnosti uz Libristo račun!

Sve ćete imati pod kontrolom uz Libristo račun.

Otvoriti Libristo račun