这是用户在 2024-7-4 11:46 为 https://arxiv.org/html/2406.06549v1#bib.bib4 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
License: CC BY 4.0
许可:CC BY 4.0
arXiv:2406.06549v1 [cs.AR] 24 May 2024
arXiv:2406.06549v1 [cs.AR] 2024 年 5 月 24 日

Large Language Model (LLM) for Standard Cell Layout Design Optimization
用于标准单元布局设计优化的大型语言模型 (LLM)

Chia-Tung Ho  何家栋 NVIDIA Research  英伟达研究
Santa Clara, CA, USA

   Haoxing Ren  任浩星 NVIDIA Research  英伟达研究
Austin, TX, USA  美国德克萨斯州奥斯汀
Abstract 摘要

Standard cells are essential components of modern digital circuit designs. With process technologies advancing toward 2nm, more routability issues have arisen due to the decreasing number of routing tracks, increasing number and complexity of design rules, and strict patterning rules. The state-of-the-art standard cell design automation framework is able to automatically design standard cell layouts in advanced nodes, but it is still struggling to generate highly competitive Performance-Power-Area (PPA) and routable cell layouts for complex sequential cell designs. Consequently, a novel and efficient methodology incorporating the expertise of experienced human designers to incrementally optimize the PPA of cell layouts is highly necessary and essential.
标准单元是现代数字电路设计的基本组成部分。随着工艺技术向 2nm 推进,由于布线轨道数量减少、设计规则数量和复杂性增加以及严格的图案规则,出现了更多的可布线性问题。最先进的标准单元设计自动化框架能够在先进节点上自动设计标准单元布局,但在为复杂的顺序单元设计生成具有高度竞争力的性能-功耗-面积(PPA)和可布线单元布局方面仍然存在困难。因此,结合经验丰富的人类设计师的专业知识逐步优化单元布局的 PPA 的新颖且高效的方法是非常必要和必不可少的。

High-quality device clustering, with consideration of netlist topology, diffusion sharing/break and routability in the layouts, can reduce complexity and assist in finding highly competitive PPA, and routable layouts faster. In this paper, we leverage the natural language and reasoning ability of Large Language Model (LLM) to generate high-quality cluster constraints incrementally to optimize the cell layout PPA and debug the routability with ReAct prompting. On a benchmark of sequential standard cells in 2nm, we demonstrate that the proposed method not only achieves up to 19.4% smaller cell area, but also generates 23.5% more LVS/DRC clean cell layouts than previous work. In summary, the proposed method not only successfully reduces cell area by 4.65% on average, but also is able to fix routability in the cell layout designs.
高质量的设备聚类,考虑到网表拓扑、扩散共享/断开和布局中的可布线性,可以减少复杂性并有助于更快地找到具有高度竞争力的 PPA 和可布线的布局。在本文中,我们利用大型语言模型(LLM)的自然语言和推理能力,逐步生成高质量的聚类约束,以优化单元布局 PPA 并通过 ReAct 提示调试可布线性。在 2nm 的顺序标准单元基准测试中,我们证明了所提出的方法不仅实现了高达 19.4%的单元面积缩小,还生成了比以前的工作多 23.5%的 LVS/DRC 干净单元布局。总之,所提出的方法不仅成功地平均减少了 4.65%的单元面积,还能够修复单元布局设计中的可布线性。

I Introduction 一、引言

Standard cells are essential components of modern digital circuit designs. As process technologies relentlessly advance toward 2nm, designing a cell with competitive Performance-Power-Area (PPA) while considering routability becomes increasingly challenging due to the decreasing number of routing tracks, increasing complexity of design rules, and strict patterning rules. The state-of-the-art standard cell design automation framework is capable of automatically designing standard cell layouts in advanced nodes, but it still struggles to generate highly competitive PPA and routable cell layouts for complex sequential cell designs. As a result, a novel and efficient methodology, leveraging the expertise of experienced human designers, to optimize the PPA and routability of cell layouts incrementally, has emerged as a critical need.
标准单元是现代数字电路设计的基本组成部分。随着工艺技术不断向 2nm 推进,在考虑可布线性的同时设计具有竞争力的性能-功耗-面积(PPA)的单元变得越来越具有挑战性,因为布线轨道数量减少、设计规则复杂性增加以及严格的图案规则。最先进的标准单元设计自动化框架能够在先进节点上自动设计标准单元布局,但在为复杂的顺序单元设计生成高度竞争的 PPA 和可布线的单元布局方面仍然存在困难。因此,一种新颖且高效的方法,利用经验丰富的人类设计师的专业知识,逐步优化单元布局的 PPA 和可布线性,已成为关键需求。

Recently, automated standard cell synthesis tools such as NVCell [1] and BonnCell [2], have been shown to generate high quality cell layouts on advanced technology nodes. Due to routability issues, one of the key challenges is that the generated placement for any given cell could be unroutable or unable to be routed without DRC errors. NVCell2 [3] develops a lattice graph routability model and successfully improves the routability in the advanced technology nodes. However, its performance is not scale to hundreds of transistors because the model inference needs to be performed for every action in the simulated annealing-based placement algorithm [1] and the cell-level metrics (i.e., cell width (CW) and total wirelength (TWL)) are compromised for routability. Ho et al. [4] proposed a transformer model-based cluster approach to generate high-quality device cluster constraints, which considers diffusion sharing/break, routability, and DRCs of routing metals in the layout of different technology nodes, and achieved better PPA, routability, and performance than [3] in the advanced nodes. However, selecting a good set of LVS/DRC clean layouts to train the transformer cluster model for optimizing PPA and routability of complex sequential cells together is quite challenging. This is because cells with routability issues typically have a larger cell width to reduce transistor pin density, while cells with a more compact layout could exacerbate routability issues. Additionally, there is a limited amount of LVS/DRC clean layouts available for training the transformer cluster model in the early development stage of cell library in a new technology node.
最近,自动标准单元综合工具如 NVCell [1]和 BonnCell [2],已被证明能够在先进技术节点上生成高质量的单元布局。由于可布线性问题,一个关键挑战是生成的任何给定单元的布局可能不可布线或无法在没有 DRC 错误的情况下布线。NVCell2 [3]开发了一个晶格图可布线性模型,并成功提高了先进技术节点的可布线性。然而,其性能无法扩展到数百个晶体管,因为在基于模拟退火的布局算法中,每个动作都需要进行模型推断[1],并且单元级指标(即单元宽度(CW)和总线长(TWL))为了可布线性而被妥协。Ho 等人[4]提出了一种基于变压器模型的集群方法,生成高质量的器件集群约束,考虑了扩散共享/断开、可布线性以及不同技术节点布局中布线金属的 DRC,并在先进节点上实现了比[3]更好的 PPA、可布线性和性能。然而,为了优化复杂顺序单元的 PPA 和可布线性,选择一组良好的 LVS/DRC 干净布局来训练变压器集群模型是相当具有挑战性的。这是因为具有可布线性问题的单元通常具有较大的单元宽度以减少晶体管引脚密度,而布局更紧凑的单元可能会加剧可布线性问题。此外,在新技术节点的单元库早期开发阶段,可用于训练变压器集群模型的 LVS/DRC 干净布局数量有限。

Refer to caption
Figure 1: An illustration of the PPA and routability optimization loop includes an agent with design expertise, adjusted cluster constraints, and a standard cell layout automation framework (i.e., NVCell).
图 1:PPA 和可布线性优化循环的示意图包括具有设计专业知识的代理、调整后的集群约束和标准单元布局自动化框架(即 NVCell)。

An agent with designers’ expertise can adjust and fine-tune device clustering constraints incrementally based on the netlist and the layout of the previous iteration to efficiently optimize PPA and routability together as shown in Figure 1. Lately, Large Language Models (LLMs) have shown great promise across various tasks in language understanding and interactive decision-making, incorporating reasoning and actions. In this paper, we leverage the natural language and reasoning ability of LLMs to adjust the device clustering constraints incrementally, optimizing cell layout PPA and routability with guidance from designers’ expertise and ReAct [5] prompting techniques. Our main contributions are as follows.
具有设计师专业知识的代理可以根据网表和前一迭代的布局逐步调整和微调设备聚类约束,以有效地优化 PPA 和可布线性,如图 1 所示。最近,大型语言模型(LLMs)在语言理解和交互式决策方面展示了巨大的潜力,结合了推理和行动。在本文中,我们利用LLMs的自然语言和推理能力逐步调整设备聚类约束,在设计师专业知识和 ReAct [5]提示技术的指导下优化单元布局 PPA 和可布线性。我们的主要贡献如下。

  • We are the first to explore LLM for optimization in Electronic Design Automation (EDA) on an industrial-level benchmark. We propose a novel, and efficient LLM for standard cell layout design optimization methodology to generate high-quality cluster constraints to optimize the cell layout PPA and debug the routability with guidance of designers’ expertise and ReAct [5] prompting techniques on an industrial technology node. The proposed methodology can improve the cell-level PPA and generate the device clusters incrementally considering netlist, previous cluster constraints, routability, and physical layout, simultaneously.

    • 我们是第一个在工业级基准上探索LLM用于电子设计自动化(EDA)优化的人。我们提出了一种新颖且高效的LLM用于标准单元布局设计优化方法,以生成高质量的集群约束来优化单元布局 PPA,并在设计师专业知识和 ReAct [5]提示技术的指导下调试可布线性。所提出的方法可以在考虑网表、先前的集群约束、可布线性和物理布局的同时,逐步改进单元级 PPA 并生成器件集群。
  • We conduct holistic assessments and studies on the capabilities and domain knowledge of existing LLM on SPICE netlist language, cluster design constraint format, and physical layout description. Then, we automate the domain knowledge extraction with guidelines from designers’ expertise for optimizing the PPA and routability together.

    • 我们对现有LLM在 SPICE 网表语言、集群设计约束格式和物理布局描述方面的能力和领域知识进行全面评估和研究。然后,我们根据设计师的专业知识指南自动提取领域知识,以优化 PPA 和可布线性。
  • The proposed novel LLM for standard cell layout design optimization methodology achieves up to 19.4% smaller cell area, and generates 23.5% more LVS/DRC clean cell layouts than a state-of-the-art baseline on a benchmark of sequential standard cells in industrial 2nm technology node.

    • 所提出的新型LLM标准单元布局设计优化方法在工业 2nm 技术节点的顺序标准单元基准上,单元面积减少了多达 19.4%,并生成了比最先进的基线多 23.5%的 LVS/DRC 干净单元布局。

The remaining sections are organized as follows: Section II demonstrates the study and assessment of existing LLM on understanding the netlist and standard cell layout domain knowledge. Section III describes our novel LLM for standard cell layout design optimization methodology. Section IV presents our main experiment. Section V concludes the paper.

II Standard Cell Layout Design Domain Knowledge
II 标准单元布局设计领域知识

We conduct assessments and studies on the capabilities and domain knowledge of existing LLMs of SPICE language format, device cluster constraints, and physical layout of standard cells.
我们对现有的 SPICE 语言格式、器件集群约束和标准单元物理布局的能力和领域知识进行评估和研究。

Refer to caption
Figure 2: Standard cell layout design domain knowledge assessment of existing LLM on: (a) SPICE language, (b) cluster constraint, and (c) physical layout.
图 2:现有LLM的标准单元布局设计领域知识评估:(a) SPICE 语言,(b) 集群约束,(c) 物理布局。

II-A SPICE Netlist Language

To assess the domain knowledge of LLM on SPICE netlist language, we input the technology independent device (i.e., MOSFET) description in SPICE language format and let LLM explain the MOSFET information. The technology independent description of a MOSFET includes name, terminal connections, and the type of MOSFET. Figure 2(a) shows that LLM can understand technology independent MOSFET description in SPICE language format and identify the net connection information at the drain, gate, and source terminals111The [MOSFET format] is ”A MOSFET can be described in SPICE format as MOSFET_NAME d:DRAIN g:GATE s:SOURCE MOSFET_TYPE”..
为了评估LLM在 SPICE 网表语言方面的领域知识,我们输入了技术无关的器件(即 MOSFET)描述的 SPICE 语言格式,并让LLM解释 MOSFET 信息。MOSFET 的技术无关描述包括名称、端子连接和 MOSFET 的类型。图 2(a)显示了LLM能够理解 SPICE 语言格式的技术无关 MOSFET 描述,并识别漏极、栅极和源极端子的网络连接信息。

II-B Cluster Constraint II-B 集群约束

We study the ability of existing LLM on understanding the cluster design constraint format including multiple devices and clusters. We show one of the example studies in Figure 2(b). We task the LLM with summarizing the provided clusters and the devices associated with them for subsequent reasoning tasks. The study’s findings indicate that the existing LLM can accurately identify the total number of clusters and the information regarding devices within these clusters.
我们研究了现有LLM在理解包括多个设备和集群的集群设计约束格式方面的能力。我们在图 2(b)中展示了一个示例研究。我们要求LLM总结提供的集群及其相关设备,以便进行后续推理任务。研究结果表明,现有LLM能够准确识别集群的总数及这些集群内设备的信息。

II-C Standard Cell Layout
II-C 标准单元布局

The standard cell layout includes the placed device locations and the net connection of device terminals. We use coordinates and the corresponding device and its terminals to represent the standard cell layout, as shown in Figure 2(c). We study the capability of LLM in understanding the location of placed devices and transistor terminal connections for the standard cell layout design optimization task. In one of the study examples in Figure 2(c), the LLM successfully explains the placed MOSFETs and their terminals at each coordinate.
标准单元布局包括放置的器件位置和器件端子的网络连接。我们使用坐标及相应的器件及其端子来表示标准单元布局,如图 2(c)所示。我们研究了LLM在理解放置器件的位置和晶体管端子连接方面的能力,以优化标准单元布局设计任务。在图 2(c)的一个研究示例中,LLM成功解释了每个坐标处放置的 MOSFET 及其端子。

In summary, the existing LLM (i.e., GPT3.5) has the capability to understand the netlist topology through SPICE language, device cluster constraint, and device placement, and the terminal connections in standard cell layout description for generating better device cluster constraints for PPA and routability debugging.
总之,现有的LLM(即 GPT3.5)具备通过 SPICE 语言、器件簇约束和器件布局理解网表拓扑结构,以及标准单元布局描述中的端子连接的能力,以生成更好的器件簇约束用于 PPA 和可布线性调试。

Refer to caption
Figure 3: Overview of LLM for Standard cell layout design optimization flow
图 3:LLM标准单元布局设计优化流程概述

III LLM for Standard Cell Layout Design Optimization
III LLM 用于标准单元布局设计优化

We introduce the details of LLM for standard cell layout design optimization, which leverages the natural language and reasoning ability of LLM to generate high-quality cluster constraints to optimize the cell-level PPA and debug routability issues. The overall flow is outlined in Section III-A, and Figure 3. Section III-B introduces the developed netlist tools, which provide accurate sub-circuit retrieval from the netlist and cluster evaluation, for assisting LLM on layout design optimization task. Finally, we discuss the application of ReAct [5] in Section III-C.
我们介绍了LLM在标准单元布局设计优化中的细节,该方法利用LLM的自然语言和推理能力生成高质量的集群约束,以优化单元级 PPA 并调试可布线性问题。整体流程在第 III-A 节和图 3 中概述。第 III-B 节介绍了开发的网表工具,这些工具提供了准确的子电路检索和集群评估,以协助LLM进行布局设计优化任务。最后,我们在第 III-C 节讨论了 ReAct [5] 的应用。

III-A Flow Overview III-A 流程概述

The proposed LLM for standard cell layout design optimization comprises the following components: knowledge extraction to initiate queries and provide domain knowledge prompts, netlist tools to assist the LLM in generating valid cluster constraints, and reasoning and action in ReAct [5] for exploring high-quality cluster candidates for the PPA/routability of layout design.
所提出的标准单元布局设计优化LLM包括以下组件:知识提取以启动查询并提供领域知识提示,网表工具以协助LLM生成有效的集群约束,以及在 ReAct [5]中的推理和行动以探索布局设计的 PPA/可布线性的高质量集群候选。

Fristly, designers input the initial layout and its corresponding cluster constraints. The knowledge extraction is used to create the domain knowledge prompt with netlist topology in technology independent descriptions of MOSFETs, initial cluster constraints, standard cell layout, and guidance from designers’ expertise as described in Section II. Then, the ReAct prompting method allows LLM to perform dynamic reasoning to create and adjust plans for acting (reason to act), while also interacting with the netlist tools (i.e., grouping MOSFET, evaluating clusters, etc.) to incorporate additional information into reasoning (act to reason). These cluster constraints can be fed into NVCell [3] to generate layouts. Designers can repeat this process until the PPA meets the requirements without rouability issues.
首先,设计师输入初始布局及其相应的集群约束。知识提取用于创建领域知识提示,包括技术无关的 MOSFET 网表拓扑描述、初始集群约束、标准单元布局以及设计师专业知识的指导,如第二节所述。然后,ReAct 提示方法允许LLM进行动态推理,以创建和调整行动计划(推理行动),同时与网表工具(如分组 MOSFET、评估集群等)交互,将额外信息纳入推理(行动推理)。这些集群约束可以输入到 NVCell [3]中以生成布局。设计师可以重复此过程,直到 PPA 满足要求且没有可布线性问题。

III-B Netlist Tools III-B 网表工具

We have developed a set of netlist tools to assist LLM in generating valid cluster constraints and correctly retrieving sub-circuits in the ReAct reasoning and action loop. The netlist tools include a cluster evaluator, a function to retrieve group devices from nets, a mechanism to save potential clusters, and a function to obtain the best cluster result.
我们开发了一套网表工具,以帮助LLM生成有效的集群约束并在 ReAct 推理和行动循环中正确检索子电路。网表工具包括一个集群评估器、一个从网络中检索组设备的功能、一个保存潜在集群的机制以及一个获取最佳集群结果的功能。

Cluster evaluator: This tool is used to evaluate the quality of the generated cluster result using simple cluster score to capture the potential diffusion sharing and common gate in the layout. Here, we use the simple cluster score for evaluation in ReAct because the turnaround time is too long to launch layout generation to collect accurate cell layout metrics (i.e., CW, TWL, etc.). The simple cluster score is calculated in Equation (1). A larger score means the devices can potentially be placed with more common diffusion sharing and common gate inside each cluster.
簇评估器:该工具用于评估生成的簇结果的质量,使用简单的簇评分来捕捉布局中的潜在扩散共享和公共栅极。在这里,我们在 ReAct 中使用简单的簇评分进行评估,因为启动布局生成以收集准确的单元布局指标(如 CW、TWL 等)所需的周转时间太长。简单的簇评分在公式(1)中计算。较大的评分意味着设备在每个簇内可以有更多的潜在扩散共享和公共栅极。

cluster_score=c𝐂(n𝐍𝐜𝐝Pn2+Nn2Tc+n𝐍𝐜𝐠min(Pn,Nn)Tc)𝑐𝑙𝑢𝑠𝑡𝑒𝑟_𝑠𝑐𝑜𝑟𝑒subscript𝑐𝐂subscript𝑛subscriptsuperscript𝐍𝐝𝐜subscript𝑃𝑛2subscript𝑁𝑛2subscript𝑇𝑐subscript𝑛subscriptsuperscript𝐍𝐠𝐜subscript𝑃𝑛subscript𝑁𝑛subscript𝑇𝑐\displaystyle cluster\_score=\sum_{c\in{\bf C}}(\frac{\sum_{n\in\bf{N^{d}_{c}}% }\lfloor\frac{P_{n}}{2}\rfloor+\lfloor\frac{N_{n}}{2}\rfloor}{T_{c}}+\frac{% \sum_{n\in\bf{N^{g}_{c}}}\min(P_{n},N_{n})}{T_{c}})italic_c italic_l italic_u italic_s italic_t italic_e italic_r _ italic_s italic_c italic_o italic_r italic_e = ∑ start_POSTSUBSCRIPT italic_c ∈ bold_C end_POSTSUBSCRIPT ( divide start_ARG ∑ start_POSTSUBSCRIPT italic_n ∈ bold_N start_POSTSUPERSCRIPT bold_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⌊ divide start_ARG italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ⌋ + ⌊ divide start_ARG italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ⌋ end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG + divide start_ARG ∑ start_POSTSUBSCRIPT italic_n ∈ bold_N start_POSTSUPERSCRIPT bold_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_min ( italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG ) (1)

Where c𝑐citalic_c is a cluster in a set of clusters, 𝐂𝐂\bf{C}bold_C. 𝐍𝐜𝐝subscriptsuperscript𝐍𝐝𝐜\bf{N^{d}_{c}}bold_N start_POSTSUPERSCRIPT bold_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT and 𝐍𝐜𝐠subscriptsuperscript𝐍𝐠𝐜\bf{N^{g}_{c}}bold_N start_POSTSUPERSCRIPT bold_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT represent the set of nets at diffusion (i.e., source or drain terminals) and gate in cluster c𝑐citalic_c. Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and Nnsubscript𝑁𝑛N_{n}italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT represent the number of net n𝑛nitalic_n at PMOS and NMOS terminals, respectively. Tcsubscript𝑇𝑐T_{c}italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the number of transistors in cluster c𝑐citalic_c. For potential diffusion sharing in a cluster, we calculate the number of same net pairs at diffusion terminals for PMOS and NMOS, respectively. The minimum number of the same gate net of PMOS and NMOS is the potential number of common gates in a cluster.
其中 c𝑐citalic_c 是一个簇集中的一个簇, 𝐂𝐂\bf{C}bold_C𝐍𝐜𝐝subscriptsuperscript𝐍𝐝𝐜\bf{N^{d}_{c}}bold_N start_POSTSUPERSCRIPT bold_d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT𝐍𝐜𝐠subscriptsuperscript𝐍𝐠𝐜\bf{N^{g}_{c}}bold_N start_POSTSUPERSCRIPT bold_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT 表示簇 c𝑐citalic_c 中扩散(即源极或漏极端子)和栅极的网络集合。 Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPTNnsubscript𝑁𝑛N_{n}italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT 分别表示在 PMOS 和 NMOS 端子的网络 n𝑛nitalic_n 的数量。 Tcsubscript𝑇𝑐T_{c}italic_T start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT 是簇 c𝑐citalic_c 中的晶体管数量。对于簇中的潜在扩散共享,我们分别计算 PMOS 和 NMOS 在扩散端子的相同网络对的数量。PMOS 和 NMOS 的相同栅极网络的最小数量是簇中潜在的公共栅极数量。

Get group devices from nets: This netlist tool returns the group of transistors from an arbitrary number of nets in the netlist. LLM can use this tool to search and explore potential device clusters.

Save potential cluster: This tool returns the current clusters and cluster score after LLM inputs a new potential cluster. The duplicated devices in different clusters are fixed based on the number of shared nets of these duplicated devices in each cluster.

Get best cluster result: This tool returns the cluster result with the best simple cluster score (i.e., Equation (1)). It helps LLM revert back or restart the search from the previous best cluster result when it is stuck in the searching potential cluster phase.

Refer to caption
Figure 4: An example of ReAct steps of Thought-Action-Observation for standard cell layout design optimization.
图 4:标准单元布局设计优化中思维-行动-观察的 ReAct 步骤示例。

III-C ReAct: Reason + Act Loop
III-C ReAct: 推理 + 行动循环

We enable LLMs to function as autonomous circuit design agent for reasoning and acting with the netlist tools through ReAct prompting mechanism [5]. In ReAct, the LLM initiates the generation of subsequent steps with Thought, Action, and Observation components. The action is querying one of the netlist tools described in Section III-B. The output of the queried netlist tool from action becomes the observation in the prompt. The agent continues the reasoning and acting traces until selecting the ”Final Answer” action. Figure 4 shows an example of optimizing the cell area of a standard cell. Here, the agent starts with querying the group of devices connected to NET027 to explore good clusters incrementally to reduce the diffusion break for area reduction since NET027 is one of the high connection nets in the netlist topology and abutted to the diffusion break dummy device in the physical layout. Finally, the agent successfully generates high-quality cluster result through reasoning and leveraging the netlist tools traces in ReAct.
我们使LLMs能够作为自主电路设计代理,通过 ReAct 提示机制[5]与网表工具进行推理和行动。在 ReAct 中,LLM通过思考、行动和观察组件启动后续步骤的生成。行动是查询第 III-B 节中描述的一个网表工具。行动中查询的网表工具的输出成为提示中的观察。代理继续推理和行动的轨迹,直到选择“最终答案”行动。图 4 显示了优化标准单元的单元面积的示例。在这里,代理从查询连接到 NET027 的设备组开始,逐步探索良好的集群,以减少扩散断裂,从而减少面积,因为 NET027 是网表拓扑中高连接网之一,并且在物理布局中紧邻扩散断裂虚拟设备。最终,代理通过推理和利用 ReAct 中的网表工具轨迹成功生成了高质量的集群结果。

Refer to caption
Figure 5: The cluster constraints and cell layouts of 2nm weak transformer cluster model [4], and proposed method of Seq3 cell. The proposed method successfully reduce the CW, and TWL by 18.18%, and 16.48%, respectively.
图 5:2nm 弱变压器簇模型的簇约束和单元布局[4],以及提出的 Seq3 单元方法。所提出的方法成功地将 CW 和 TWL 分别减少了 18.18%和 16.48%。
TABLE I: CW (CPPs) table of the 17 complex sequential cells in 2nm of transformer cluster model [4], simulated annealing (SA), and the proposed method. The SA algorithm is implemented base on [6]. #Devs=number of transistors. Opt.(a)/(b)=Optimize layouts from (a)/(b). X=unroutable. V=Fix routability. Success(%)=the percentage of LVS/DRC clean cell layouts.
表 I:在 2nm 变压器簇模型[4]中,17 个复杂顺序单元的 CW(CPPs)表,模拟退火(SA)和所提出的方法。SA 算法基于[6]实现。#Devs=晶体管数量。Opt.(a)/(b)=优化(a)/(b)的布局。X=不可布线。V=修复布线能力。Success(%)=LVS/DRC 清洁单元布局的百分比。
Cell 细胞 #Devs Transformer Cluster [4] 变压器集群 [4] SA Proposed Method 提出的方法
(a) 2nm weak (a) 2 纳米弱 (b) 5nm (b) 5 纳米 Opt. (a) 选择 (a) Opt. (b) 选择 (b) Opt. (a) 选择 (a) Opt. (b) 选择 (b) Impr. (%) 改进 (%)
Over (a) 超过(a) Over (b) 超过 (b)
Seq1 序列 1 40 26 31 X 31 25 25 3.85 19.35
Seq2 序列 2 60 X 41 X X 39 39 V 4.88
Seq3 序列 3 40 33 26 27 26 27 26 18.18 0.00
Seq4 38 27 25 X 25 25 25 7.41 0.00
Seq5 序列 5 36 23 22 26 22 22 22 4.35 0.00
Seq6 序列 6 36 22 25 22 22 22 22 0.00 12.00
Seq7 序列 7 34 20 20 20 20 20 20 0.00 0.00
Seq8 序列 8 38 25 25 25 25 24 25 4.00 0.00
Seq9 序列 9 32 19 19 19 19 19 19 0.00 0.00
Seq10 序列 10 34 20 21 20 20 20 20 0.00 4.76
Seq11 序列 11 40 26 26 25 28 25 25 3.85 3.85
Seq12 序列 12 38 24 27 26 28 24 24 0.00 11.11
Seq13 序列 13 56 X X X X 41 40 V V
Seq14 序列 14 44 X 36 X X 34 34 V 5.56
Seq15 序列 15 42 X 32 X X 31 31 V 3.13
Seq16 序列 16 56 42 40 X 40 35 35 16.67 12.50
Seq17 序列 17 42 25 25 25 25 25 25 0.00 0.00
Success (%) 成功率 (%) 76.50 94.10 58.80 76.50 100 100 - -

IV Experimental Results 四、实验结果

Our work is implemented with Python and LangChain [7]. We conduct all experiments with gpt-3.5-turbo-16k-0613 as the LLM through OpenAI APIs [8]. We set the sampling temperature of LLM to 0.1. For ReAct prompting, we restrict the LLM to a maximum of 15 iterations of Thought-Action-Observation.
我们的工作是用 Python 和 LangChain [7] 实现的。我们通过 OpenAI API [8] 使用 gpt-3.5-turbo-16k-0613 进行所有实验。我们将LLM的采样温度设置为 0.1。对于 ReAct 提示,我们将LLM限制为最多 15 次的思考-行动-观察迭代。

We conduct extensive studies on the cell area and routability using 17 complex sequential standard cells in industrial 2nm technology node. Due to a lack of cell layout data in the early stage development of cell library in 2nm, we trained (a) 2nm weak transformer cluster model [4] using 124 LVS/DRC clean cell layouts, and (b) 5nm transformer cluster model [4] using 512 LVS/DRC clean cell layouts as the baselines. We implemented a simulated annealing algorithm (SA), which is based on the modified Lam annealing scheduler [6] that required no hyper parameter tuning, for comparing the efficiency of the proposed method. In SA, we sample arbitrary 1 to k𝑘kitalic_k nets considering the weights of nets and querying a group of devices from Get group devices from nets tool for each action. The unrouted nets, and nets abutted to diffusion break dummy device have higher weights. Then, this queried group of devices will be accepted or rejected to be saved using Save potential cluster tool based on the temperature and the delta simple cluster score. Finally, we apply the proposed method, and SA to optimize the cell area and routability from the initial cluster constraints and cell layouts of (a), and (b) on the selected unseen 17 complex sequential cells. For the proposed method, we launch 10 optimization runs for each cell and select valid cluster results to generate cell layout using NVCell [3].
我们使用工业 2nm 技术节点中的 17 个复杂顺序标准单元,对单元面积和可布线性进行了广泛研究。由于在 2nm 单元库早期开发阶段缺乏单元布局数据,我们训练了(a)使用 124 个 LVS/DRC 干净单元布局的 2nm 弱变压器集群模型[4],以及(b)使用 512 个 LVS/DRC 干净单元布局的 5nm 变压器集群模型[4]作为基线。我们实现了一个基于修改后的 Lam 退火调度器[6]的模拟退火算法(SA),该调度器不需要超参数调整,用于比较所提方法的效率。在 SA 中,我们考虑网的权重并从 Get group devices from nets 工具中查询一组设备,为每个动作采样任意 1 到 k𝑘kitalic_k 个网。未布线的网和与扩散断开虚拟设备相邻的网具有更高的权重。然后,根据温度和 delta 简单集群得分,使用 Save potential cluster 工具接受或拒绝保存查询到的设备组。最后,我们应用所提方法和 SA,从(a)和(b)的初始集群约束和单元布局中优化选定的未见过的 17 个复杂顺序单元的单元面积和可布线性。对于所提方法,我们为每个单元启动 10 次优化运行,并选择有效的集群结果使用 NVCell[3]生成单元布局。

Table I shows the CW (CPPs) of the selected 17 sequential cells of transformer cluster method [4], SA, and the proposed method. SA fails to optimize the cell area and fix the routability because the simple cluster score can not fully capture diffusion sharing, and routability in the layout. It requires more holistic and efficient methodology for cluster exploration than a naive net sampling-based approach. Compared to (a), the proposed method can not only reduce up to 18.18% of CW, but also improve the success rate from 76.5% to 100%. In addition, the proposed method reduces up to 19.35% of CW, and improves the success rate from 94.10% to 100% when optimizing the initial cell layout from (b). Figure 5 shows the layouts and generated cluster constraints of (a), and using proposed method to optimize from (a) of Seq3 cell. The proposed method achieves 18.18% reduction in CW, and 16.48% reduction in TWL.
表格显示了所选的 17 个顺序单元的变压器簇方法[4]、SA 和所提方法的 CW(CPPs)。由于简单的簇得分无法完全捕捉扩散共享和布局中的可布线性,SA 未能优化单元面积和修复可布线性。它需要比简单的网络采样方法更全面和高效的簇探索方法。与(a)相比,所提方法不仅可以减少最多 18.18%的 CW,还可以将成功率从 76.5%提高到 100%。此外,在优化(b)的初始单元布局时,所提方法最多减少 19.35%的 CW,并将成功率从 94.10%提高到 100%。图 5 显示了(a)的布局和生成的簇约束,以及使用所提方法从(a)优化 Seq3 单元。所提方法实现了 18.18%的 CW 减少和 16.48%的 TWL 减少。

In summary, excluding the fix routability cells, the proposed method achieves 4.48%, and 4.82% reduction in cell area on average over (a), and (b). We successfully demonstrate that the proposed method incorporates holistic information from the netlist and layout of complex sequential cells to conduct efficient cluster exploration with a simple cluster score for the standard cell layout design optimization task.
总之,排除固定可布线单元后,所提出的方法在单元面积上平均比(a)和(b)分别减少了 4.48%和 4.82%。我们成功地证明了所提出的方法结合了来自网表和复杂顺序单元布局的整体信息,通过简单的集群评分进行高效的集群探索,以优化标准单元布局设计任务。

V Conclusion 五、结论

We propose a novel, efficient, and the first LLM for standard cell layout design optimization methodology to generate high-quality cluster constraints to optimize the cell layout PPA and debug the routability with the guidance of designers’ expertise and ReAct [5] prompting techniques. We have demonstrated that the proposed method achieves up to 19.4% smaller cell area, and generates 100% LVS/DRC clean cell layouts on the selected complex sequential cell benchmark in 2nm. This research not only provides a novel autonomous LLM agent for standard cell layout design optimization and debugging but also introduces the new application of using LLM assistance for optimization in the EDA field for further exploration.
我们提出了一种新颖、高效且首创的标准单元布局设计优化方法论,以生成高质量的集群约束来优化单元布局的 PPA,并在设计师专业知识和 ReAct [5] 提示技术的指导下调试可布线性。我们已经证明,该方法在 2nm 的选定复杂顺序单元基准上实现了高达 19.4%的单元面积缩小,并生成了 100% LVS/DRC 干净的单元布局。这项研究不仅为标准单元布局设计优化和调试提供了一种新颖的自主LLM代理,还引入了在 EDA 领域中使用LLM辅助进行优化的新应用,供进一步探索。

References 参考文献

  • [1] Haoxing Ren and Matthew Fojtik. Nvcell: Standard cell layout in advanced technology nodes with reinforcement learning. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 1291–1294. IEEE, 2021.
    任浩星和马修·福伊蒂克。Nvcell:使用强化学习在先进技术节点中的标准单元布局。在 2021 年第 58 届 ACM/IEEE 设计自动化会议(DAC),第 1291-1294 页。IEEE,2021 年。
  • [2] Pascal Van Cleeff, Stefan Hougardy, Jannik Silvanus, and Tobias Werner. Bonncell: Automatic cell layout in the 7-nm era. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(10):2872–2885, 2019.
    Pascal Van Cleeff, Stefan Hougardy, Jannik Silvanus, 和 Tobias Werner. Bonncell: 7 纳米时代的自动单元布局. IEEE 集成电路与系统计算机辅助设计汇刊, 39(10):2872–2885, 2019.
  • [3] Chia-Tung Ho, Alvin Ho, Matthew Fojtik, Minsoo Kim, Shang Wei, Yaguang Li, Brucek Khailany, and Haoxing Ren. Nvcell 2: Routability-driven standard cell layout in advanced nodes with lattice graph routability model. In Proceedings of the 2023 International Symposium on Physical Design, pages 44–52, 2023.
    何家栋, 何家骏, 马修·福伊蒂克, 金敏秀, 魏尚, 李亚光, 布鲁斯克·凯拉尼, 任浩星. Nvcell 2: 基于晶格图可布线模型的先进节点可布线标准单元布局. 载于《2023 年国际物理设计研讨会论文集》,第 44-52 页,2023 年.
  • [4] Chia-Tung Ho, Ajay Chandna, David Guan, Alvin Ho, Minsoo Kim, Yaguang Li, and Haoxing Ren. Novel transformer model based clustering method for standard cell design automation. In Proceedings of the 2024 International Symposium on Physical Design, pages 195–203, 2024.
    何家栋, Ajay Chandna, 关大卫, 何铨, 金敏秀, 李亚光, 任浩星. 基于新型变压器模型的标准单元设计自动化聚类方法. 载于 2024 年国际物理设计研讨会论文集, 第 195-203 页, 2024 年.
  • [5] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
    姚舜禹,赵杰夫,余典,杜楠,伊扎克·沙夫兰,卡尔蒂克·纳拉辛汉,曹源。React:在语言模型中协同推理与行动。arXiv 预印本 arXiv:2210.03629, 2022。
  • [6] Vincent A Cicirello. On the design of an adaptive simulated annealing algorithm. In Proceedings of the international conference on principles and practice of constraint programming first workshop on autonomous search, 2007.
    文森特·A·西西雷洛。关于自适应模拟退火算法的设计。在国际约束编程原理与实践会议第一届自主搜索研讨会论文集,2007 年。
  • [7] Harrison Chase. LangChain, October 2022.
    哈里森·蔡斯。LangChain,2022 年 10 月。
  • [8] OpenAI. Openai models api. 2023.
    OpenAI. OpenAI 模型 API. 2023.

VI Appendix 附录六

We present examples of netlist topology, physical layout, and routability report prompts from proposed knowledge extraction component as mentioned in Fig. 4. We describe the netlist topology and physical layout prompts of the simple cell example (i.e., OA333X1) in Fig. 4. For the routability report prompt, we select the Seq13 in Table I for demonstration, as there are no unrouted nets in the OA333X1 layout depicted in Fig. 4.
我们展示了图 4 中所提到的知识提取组件的网表拓扑、物理布局和可布线性报告提示的示例。我们描述了图 4 中简单单元示例(即 OA333X1)的网表拓扑和物理布局提示。对于可布线性报告提示,我们选择表中的 Seq13 进行演示,因为图 4 所示的 OA333X1 布局中没有未布线的网络。

Refer to caption
Figure 6: An example of extracted netlist topology prompt of the simple cell example in Fig. 4.
图 6:图 4 中简单单元示例的提取网表拓扑提示示例。

VI-A Netlist topology prompt
VI-A 网表拓扑提示

The netlist topology prompt consists of MOSFET connection and description, as well as previous cluster constraints, as shown in Fig. 6. In the MOSFET connection and description, each MOSFET is described using the technology-independent device description format introduced in Section II. For the previous cluster constraints, we present them in JSON BLOB format with the action labeled as ”Final Answer”. At the end, we provide the simple cluster score and the number of clusters resulting from the previous cluster constraints.
网表拓扑提示包括 MOSFET 连接和描述,以及先前的集群约束,如图 6 所示。在 MOSFET 连接和描述中,每个 MOSFET 使用在第二节中介绍的与技术无关的设备描述格式进行描述。对于先前的集群约束,我们以 JSON BLOB 格式呈现,操作标记为“最终答案”。最后,我们提供简单的集群评分和先前集群约束产生的集群数量。

With the netlist topology prompt information, the LLM agent can understand the netlist connections of each device, previous cluster constraints, and the simple cluster score of previous cluster constraints for the following reasoning and action for cell layout design optimization.

VI-B Physical layout prompt
VI-B 物理布局提示

We design the physical layout prompt with the placed device locations and net connection of device terminals for LLM to compile the netlist topology and layout together for ReAct steps of Thought-Action-Observation.
我们设计了物理布局提示,其中包含放置的设备位置和设备端子的网络连接,以便为LLM编译网络拓扑和布局,以进行思维-行动-观察的 ReAct 步骤。

Fig. 7 shows the example of extracted physical layout prompt of OA333X1 layout. From the Fig. 7(a), the unit of x coordinate is the half contacted-poly-pitch (CPP), and the unit of y coordinate is half cell row. As a result, there are 29 columns and 2 rows in OA333x1 layout in the extracted coordinate based physical layout prompt in Fig. 7(b). For each coordinate, we show the net name, placed device, and the terminals (i.e., source, drain, gate) of the placed device. The net name, and device are dummy when there are no devices in the netlist being placed at the coordinate. Here, the column-based physical layout report format helps LLM identify the common gate and diffusion connections of PMOS and NMOS.
图 7 显示了提取的 OA333X1 布局的物理布局提示示例。从图 7(a)可以看出,x 坐标的单位是半接触多晶硅间距(CPP),y 坐标的单位是半单元行。因此,在图 7(b)中提取的基于坐标的物理布局提示中,OA333x1 布局有 29 列和 2 行。对于每个坐标,我们显示了网络名称、放置的器件以及放置器件的端子(即源极、漏极、栅极)。当在坐标处没有放置器件时,网络名称和器件是虚拟的。在这里,基于列的物理布局报告格式有助于LLM识别 PMOS 和 NMOS 的公共栅极和扩散连接。

Refer to caption
Figure 7: An example of extracted physical layout prompt of the simple cell example in Fig. 4.
图 7:图 4 中简单单元示例的提取物理布局提示示例。

VI-C Routability report prompt
VI-C 可布线性报告提示

We select the Seq13 in Table I for describing the routability report prompt format since there are no unrouted nets in OA333X1 layout as shown in Fig. 8. In the routability report, we provide information about the unrouted nets, the corresponding pairs of x-coordinates of net terminals, and the placed devices inside the unrouted region. These placed devices within the unrouted region offer insight into routing congestion and required transistor pin access, assisting LLM in identifying potential good cluster constraints to improve routability. For example, if routing congestion or pin density is too high in an unrouted net region, leveraging common transistor terminal sharing across PMOS and NMOS, as well as diffusion sharing, can reduce pin density and routing resource usage by creating cluster constraints that consider the high connection nets or problematic nets of transistor pins in an unrouted net region.
我们选择表格中的 Seq13 来描述可布线报告提示格式,因为如图 8 所示,OA333X1 布局中没有未布线的网络。在可布线报告中,我们提供了关于未布线网络的信息、网络终端的对应 x 坐标对以及未布线区域内放置的器件。这些放置在未布线区域内的器件提供了关于布线拥塞和所需晶体管引脚访问的见解,帮助LLM识别潜在的良好集群约束以改善可布线性。例如,如果未布线网络区域中的布线拥塞或引脚密度过高,通过在 PMOS 和 NMOS 之间共享常见的晶体管终端以及扩散共享,可以通过创建考虑高连接网络或未布线网络区域中晶体管引脚问题网络的集群约束来减少引脚密度和布线资源使用。

Refer to caption
Figure 8: An example of extracted routability report prompt of Seq13 in Table I.
图 8:表中 Seq13 的可布线性报告提示示例。