The race for AI inference hardware is entering a new gear. AI chip startup Groq is significantly scaling up semiconductor manufacturing as demand for efficient AI processing surges. According to industry sources, Groq has directed Samsung Foundry to raise production of its 4-nanometer AI chips, pushing wafer orders from roughly 9,000 to approximately 15,000 units, a jump of nearly 70%.
The timing is no coincidence. Nvidia has been deepening its strategic involvement with Groq, entering a technology licensing agreement and recruiting several of the company's senior engineers and executives as part of a broader partnership. Groq continues to operate independently, but the collaboration signals Nvidia's push to extend its footprint from GPU-based model training into the fast-growing inference segment.
Why Groq's SRAM Architecture Changes the Energy Equation
What sets Groq apart is its chip architecture, engineered ground-up for inference.
Energy efficiency has become a major challenge for large AI deployments - running trained models continuously requires enormous computational resources.
Rather than leaning on high-bandwidth memory, Groq's system places Static Random Access Memory (SRAM) directly next to its compute cores.
This cuts data-travel distance dramatically, delivering ultra-low latency while slashing power draw - two properties that matter enormously when running AI models at scale. Real-world AI deployments are already generating significant revenue, which makes inference efficiency a direct business priority, not just an engineering goal.
Samsung 4nm Ramp Reflects a Shifting Competitive Landscape
While Nvidia and AMD continue to dominate chips used to train large AI models, the inference segment is carving its own competitive space. Running those models in production demands processors optimized for throughput and efficiency rather than raw training power. Groq's production ramp at Samsung Foundry is a direct response to that shift - and a signal that purpose-built inference silicon is moving from niche to mainstream.
For data-center operators, the calculus is straightforward: reducing power consumption per inference query means lower operating costs and a smaller carbon footprint. Samsung is also in talks to supply chips for OpenAI's hardware device, underscoring how central the Korean foundry has become to the next phase of AI infrastructure buildout. Meanwhile, Nvidia's $6.9B deal with Groq signals that the inference race is now drawing the biggest players in the industry.
Victoria Bazir
Victoria Bazir