From Rubin CPX to Groq 3 LPX: Nvidia’s Inference Stack Shifts Toward SRAM and Split-Phase Serving
Nvidia’s Groq 3 LPX rack changes the company’s inference story in a specific way: it does not replace GPUs, but adds a memory-centric decode layer beside them. That matters because agentic AI workloads are increasingly constrained by latency stability and context handling, not just raw compute, and Nvidia is now framing Vera Rubin as a…