Groq LPU (Inference Engine)

Groq · LPU (Tensor Streaming)

Shipping (limited) inferencegroqalternativeinnovativehigh-throughput
Memory
TBD (SRAM-based)
FP8 TFLOPS
1500
Power
300W
Released
Feb 1, 2024

Technical Specifications

ManufacturerGroq
ArchitectureLPU (Tensor Streaming)
MemoryTBD (SRAM-based)
Memory Bandwidth80 TB/s (on-die SRAM)
FP16 TFLOPS750
FP8 TFLOPS1500
InterconnectGroqLink (multi-chip)
Power (TDP)300W
ReleasedFeb 1, 2024
StatusShipping (limited)

Availability

Limited — extreme inference throughput (tokens/sec) for LLMs. Deployed at GroqCloud for fast inference API.