From Natural Language to Silicon: The Representation Bottleneck in LLM Hardware Design
Weimin Fu, Zeng Wang, Minghao Shao, Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri, Muhammad Shafique, Xiaolong Guo
Abstract
Edge applications increasingly demand custom hardware, yet Field-Programmable Gate Array (FPGA) design requires expertise that domain engineers lack. Large Language Models (LLMs) promise to bridge this gap through zero-knowledge hardware programming, where users describe circuits in natural language and an LLM compiles them to a hardware intermediate representation (IR) targeting silicon. Modeling this flow as a cascade of binary filters, this work demonstrates that IR choice, not model choice, is the dominant factor governing end-to-end success, a phenomenon termed the representation bottleneck. An evaluation of three frontier LLMs across six IRs spanning Verilog, VHDL, Chisel, Bluespec, PyMTL3, and HLS C on 202 tasks through a pipeline of compilation, simulation, FPGA synthesis on a Lattice iCE40UP5K, and LLM-based repair shows that simulation pass rates range from 3% to 88% across IRs but typically vary less than 1.25x across models within any single IR. On the resource-constrained iCE40, LLM designs achieve a higher conditional FPGA pass rate than reference solutions, 86.5% vs. 68.7%, not because they are better but because a simplicity bias makes them small enough to fit. The analysis reveals an accessibility-competence paradox: the most user-friendly IRs yield the worst LLM performance, suggesting that optimal IR selection will evolve as LLM capabilities grow.