From Natural Language to Silicon: The Representation Bottleneck in LLM Hardware Design

Weimin Fu, Zeng Wang, Minghao Shao, Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri, Muhammad Shafique, Xiaolong Guo

Published: April 18, 2026PDF arXiv

Abstract

Edge applications increasingly demand custom hardware, yet Field-Programmable Gate Array (FPGA) design requires expertise that domain engineers lack. Large Language Models (LLMs) promise to bridge this gap through zero-knowledge hardware programming, where users describe circuits in natural language and an LLM compiles them to a hardware intermediate representation (IR) targeting silicon. Modeling this flow as a cascade of binary filters, this work demonstrates that IR choice, not model choice, is the dominant factor governing end-to-end success, a phenomenon termed the representation bottleneck. An evaluation of three frontier LLMs across six IRs spanning Verilog, VHDL, Chisel, Bluespec, PyMTL3, and HLS C on 202 tasks through a pipeline of compilation, simulation, FPGA synthesis on a Lattice iCE40UP5K, and LLM-based repair shows that simulation pass rates range from 3% to 88% across IRs but typically vary less than 1.25x across models within any single IR. On the resource-constrained iCE40, LLM designs achieve a higher conditional FPGA pass rate than reference solutions, 86.5% vs. 68.7%, not because they are better but because a simplicity bias makes them small enough to fit. The analysis reveals an accessibility-competence paradox: the most user-friendly IRs yield the worst LLM performance, suggesting that optimal IR selection will evolve as LLM capabilities grow.