Achieve Fast Design Closure with ExPass Design Flow


Shane (Zhen-shan) Wang
Alchip Technologies Limited
shanew@alchip.com

ABSTRACT

This paper discusses ALCHIP’s ExPass (Express Pass) VDSM IC Design Flow and the issues, problems, solutions and advantages seen by ALCHIP during designing a multi-million-gate ASIC.

ExPass Design flow focuses on advanced design methodology and exploits major features of ALCHIP’s internal tools and Synopsys Module Compiler, Physical Compiler, Astro etc. It provides integrated VDSM IC design solutions for the whole backend flow and some specific design solutions, such as data-path architecture exploration and building, physical aware synthesis, signal integrity closure, hold time fixing, and timing correlation among Physical Compiler, Astro and Star RCXT.

ExPass Design Flow and the solutions within it improve design performance, decrease layout surprises and greatly reduce turnaround times and time to market. It has been fully qualified on a multi-million-gate design, named “A-Chip”, with 2.8ns high performance 17×17 multiplier data path (0.18 um technology). This paper discusses the details of flow and solutions of implementation on the design.

1. Introduction

In deep sub-micron technologies, wire delay dominates the timing. The assumptions being made about interconnect delay, bases on statistical wire load models, were adversely affecting the logic. During the synthesis process, logic designers over constrain their designs in general, in an attempt to add a safety margin to make timing closure easier during place and route. This would cause P&R tools optimize far more paths than really required, thus increasing design congestion and leading to poor quality of results after routing. The timing may look different or worse after physical implementation or the design is simply not routable due to unexpected congestion. Meanwhile, signal integrity, IR drop and antenna effect, introduced by design scale downsizing, would dispute design performance if they cannot be accurately predicted in early design stage. Much time and iteration between netlist optimization and layout implementation would be performed to achieve final design closure.

Traditional design automation tools are individually mature at each design task, such as timing analysis, synthesis, placement and routing. However, individual tools are no longer adequate for efficiently solving the design problems and improving design’s performance. New systematic design methodology and solutions based on current excellent tools should be provided to conquer these challenges.

ExPass design flow exactly provides such methodology and solutions. ExPass is a VDSM IC design solution set from ALCHIP Inc. It provides comprehensive backend design solutions by integrating and exploiting the major features of internal tools and Synopsys Module Compiler, Physical Compiler, Astro, etc. it includes

- Data-path architecture exploration and building;

- Physically aware timing correction and optimization;

- Hold time and Xtalk prevention, analysis and repairing

- Clock tree distribution solution

- Post layout design optimization and closure

This paper only discusses the first two topics.

The high level design flow is described in figure 1, each specific solution are tightly integrated in it. Integrated solutions reduce complexity of design process and improve the whole design correlation. The solution set has been fully proven during several 0.18 um and 0.13 um processes. This paper will discuss the implementation details on a 1.2 million gates ASIC, named “A-Chip”, and analyze the quality of results, at the same time compare the results with traditional flow and address the advantage of ExPass flow. The “A-Chip” contains 168.1 K instances and 12 macros as well as several 2.8 ns high performance multiplier data paths. It is designed with 160 MHz clock and targets for 0.18 um, 5 metal layers CMOS process.

Section 2 addresses ExPass Physical Compiler and Astro based physical synthesis design flow and compares the results with that of the traditional Astro based flow; section 3 discuss the identified issues and solutions during design process; section 4 discusses timing correlation issues between PC and Star RCXT. Finally, section 5 summarizes the conclusion for the PC and Astro based VDSM IC design solutions.

2. Physical Aware Synthesis Flow

In 0.18 um technology, the netlists optimized by logic synthesizer doesn’t always benefit physical design process if followed by traditional physical design flow (as described in figure 2). It somewhat disturbed design closure because of the mismatching between wire load model estimation and real physical delay. Placement tools can’t minimize the timing slack without optimizing the circuit, and additional efforts have to be performed on post layout and ECO stages to little by little improve timing and correct routing violations. It makes slow design closure and pushes great pressure on post-layout optimizer; much time and effort would cost on post layout process. Poor physical implementation may become a nightmare for STA engineer and prolong the whole design cycle.

Due to the fundamental problems of wire load model and weak placement optimizer in traditional physical design methodology, high performance circuit especially high performance data path is very hard to archive.

ExPass physical aware synthesis flow integrates Module Compiler (MC) and Physical Compiler to perform physical synthesis for data path and entire design (as described in figure 3). Physical Compiler takes placement information into account during logic optimization, which allows designers perform logic optimization and timing-driven placement in one environment. Physical aware synthesis accurately predict timing performance of post layout concurrently relax routing congestion.

In this case, to evaluate ExPass’s physical aware synthesis flow, a 17X17 multiplier data path is designed then integrated into a 1.2 M gates ASIC “A-Chip” on the same architecture and technology, targeting 0.18 um and 5 metal layers process. A sample script in following lines explains the basic process from RTL description to placed gate level net lists by using Physical Compiler.

# ————————————————————-

# run in psyn_shell

source [getenv MCDIR]/lib/tcl/mcdc.tcl

# design environment setup

source ./setup.tcl

# read in and compiling mcl datapath description

read_mcl mul.mcl

compile_mcl

compile

# read in the rest of non-datapath gate-level logics

read_verilog –netlist top.v

read_sdc ./cons.tcl

link

# read in the floorplan

read_pdef top.pdef

current_design top

# create initial timing driven placement

create_placement –timing_driven

legalize_placement

# perform physical synthesis on the chip

physopt –timing_driven_congestion

# ————————————————————-

Table 1 indicates the quality of results for the multiplier datapath respectively followed by traditional and ExPass physical aware datapath design flow.

From the table 1 we find that the 17×17 multiplier’s performance is improved from 3.14 to 2.87 by implementing new physical synthesis methodology. The penalty is approximately 18% increase in cell area.

Table2 and table 3 show the whole design performance at each major stage respectively followed by traditional flow and ExPass physical aware synthesis flow.

Indicated by table 3, traditional flow does not achieve good timing results and bring some tough DRC violations after detailed routing; it will definitely cause much iteration during post-layout optimization. On the contrast, table 4 shows that if followed by ExPass physical aware design flow, timing closure can be easily achieved after detailed route by introducing some minor DRC violations. After just one post layout optimization process, the violations are decreased from 9646 to 284. The only penalty is that physical aware synthesis cost 25% more memory and x3 CPU time than traditional flow. But the timing closure is achieved without any iteration. So the total turnaround time is reduced dramatically.

The usage of Physical Compiler enabled ALCHIP to design high performance data path and deliver cleaned timing and congestion databases to minimize the post layout surprises.

3. Identified Issues and their solutions

3.1 Data exchanging between Physical Compiler and Astro

Physical Compiler does perfect job in physical synthesis, but it can not perform floorplan and routing, which have to be done in Astro. As described in figure 1, Physical Compiler and Astro use different database, db and milkyway. Data exchanging need to be performed from Astor’s floorplan to Physical Compiler, and Physical Compiler’s placed net list to Astro router.

3.1.1 Scheme to PDEF

Synopsys provides a binary utility, scheme2pdef to convert the Astro floorplan into a PDEF file for Physical Compiler. The scheme2pdef utility performs the following tasks:

1. scheme2pdef generates a scheme file dump.scm, that contains the procedure dumpEverything;

2. dumpEverything dumps the floorplan information into files in a specified directory (argument to dumpEverything);

3. scheme2pdef converts the PDEF file from the data dumped by dumpEverything, as below command.

scheme2pdef -tech ./library/ tsmc13fsg_6lm.tf -sitemap ./ sitemap -dumpdir dump -output fp.pdef

And the following data/files are the requirements to convert Astro floorplan scheme into PDEF

- Astro design database

- Astro technology file

- site map file

The difficulty is preparing the sitemap file. The sitemap file defines the unit title size for Physical Compiler, its width and height can be found in Synopsys physical library (.plib) or milkyway technology file (.tf). For example you can read such lines from the .plib file

site ( “sitemap46m” ) {

site_class : core;

symmetry : y;

size ( 0.56, 6.2 );

} /* end site */

As defined in the site group “sitemap46m” the unit width and height are respectively 0.56 and 6.2. Then edit a file named sitemap and fill the following contents in it.

sitemap46m 6.2 0.56

Height is in the second field and width in the third field.

3.1.2 PDEF to Scheme

After Physical Compiler completed placement/optimization, we use the Synopsys utility pdef2scheme to convert PDEF file to scheme format, the basic step is:

1. Translate the PDEF to Scheme by pdef2scheme by below command line

pdef2scheme -pdef placed.pdef -tech library/tsmc13fsg_6lm.tf –sitemap ./sitemap -output placed.scm

2. Load placement scheme file into Astro database, the design is then route ready.

During load the placed cells in Astro, you may find some errors reported as:

fail to get net N4021

fail to get net N4015

fail to get net N4174

fail to get net N4011

Checking the pdef files, you can find the following mistakes exist in your pdef.

(PIN out[17]

(DEF_NET_NAME “N4021″)

(PIN out[13]

(DEF_NET_NAME “N4174″)

The mismatch in net and port names causes the problems during converting Physical Compiler PDEF data to Astro scheme. It is a mistake during physical synthesis.

As workarounds, the problems can be solved by using “-ignoredefnetname” option with pdef2scheme or use the following UNIX command to create a new .pdef file without the DEF net names, grep -v DEF_NET_NAME placed.pdef > route_ready.pdef

3.2 Placing double height cells in Physical Compiler

In this design flow, some double height cells are instantiated. Astro can correctly handle the cells, but Physical Compiler fails to process these special cells, as showed in figure 5.

It is because that Physical Compiler aligns standard cells to site array and no double height sitemap is defined in Synopsys physical library. Physical Compiler can’t correctly process these cells without sitemap’s guiding. Additional definition for double height site should be defined in Synopsys physical library to solve this issue.

1. Add to plib (library/pdb/nlc18.plib) a new definition of double height site group, as below,

site ( “sitemap46m” ) {

site_class : core;

symmetry : y;

size ( 0.56, 6.2);

} /* unit height site definition */

site ( “sitemap46mx2″ ) {

site_class : core;

symmetry : y;

size ( 0.56, 12.4);

} /* double height site definition */

2. Update the double height cell with the new definition:

macro ( “<double height cell>” ) {

cell_type : core;

symmetry : xy;

in_site : sitemap46mx2

/* original: in_site:sitemap46m */;

After adding the new row definition in Synopsys physical library and updating related double height cells, Physical Compiler can successfully process double height cell.

4. PC and Star RCXT Timing Correlation

Physical Compiler uses Steiner Global Route to estimate the wire length and automatically derives RC values for timing calculations from physical library and RC models. Based on the models as well as the width, direction, capacitance, edge capacitance and resistance defined in physical library, Physical Compiler extract unit resistance and capacitance values for the routing layers.

As described in below figure, during global routing, the chip is divided into localized rectangular regions called tiles. Steiner Global Route compresses several pin locations in each tile to a single pin location. All the shapes, wires and open are represented in terms of global track capacity and usage. In pre-layout design stage, the wire length is estimated from global information.

Parasitic estimation is a critical component for pre-route and post-route timing correlation. However initial RC values provided in the libraries or extracted by Physical Compiler are somewhat off compared to the RC provided by Astro Router and the accurate parasitic from a 3D RC extraction tool (like Star RCXT).

The main reason for the miscorrelation is that the information contained in physical library (.pdb) does not complete. By default, physical library only use “1D” capacitance model, there are no coupling capacitance in addition to area and fringe capacitance information for 2.5D capacitance model. Lacking of physical information results Physical Compiler under or over buffer some critical timing paths and ignore or aggressively fixing DRC violations. Incremental post route optimization runs are needed to fix the problems.

Physical Compiler computed net delay depends on derived auto RC values; Design topology; layer blockages; power nets (PNETs) and cell blockages. ExPass design flow improves the RC correlation of Physical Compiler from three approaches:

1. Adding 2.5D extraction parameters to the .plib file by command “extract2plib”.

The 2.5D capacitance model provides better capacitance estimation than the 1D capacitance model. Using the 2.5D model incurs some runtime penalty, but the penalty usually is less than 5 percent of the total runtime compared with using the 1D model.

By using “compare_rc”, Physical Compiler to evaluate the back annotated capacitance on the nets with the estimated values, then plots the % deviation against the % nets falling in that deviation range, good correlation is indicated by a tall, thin bell shaped curve around the Y axis. In figure 9,

From figure 6 we find the average error is decreased from -48.3872 to -12.1553 and standard deviation is increased from 28.8313 to 48.2627. After implementing 2.5D parasitic model, timing correlation is improved.

2. Tuning RC values with estimated RC values based on actual Post Routed Parasitic and SDF.

Physical Compiler can use “estimate_rc” to compute a set of RC coefficients from the back annotated set_loads and pin-pin delays, which minimize the error between annotated and computed values. And by using “set_delay_estimation_options”, Physical Compiler allows user to specify the horizontal and vertical RC parameters to override the Auto-RC value.

With the built-in extraction capability of PC, specifying a user-defined RC from estimate_rc is not recommended. Based on this, we manually derive a scaling factor and apply this scaling factor to scale the Auto-derived RC using

set_delay_estimation_options

-max_unit_horizontal_capacitance_scaling_factor <scale >

-min_unit_vertical_capacitance_scaling_factor <scale>

-min_unit_horizontal_resistance_scaling_factor <scale>

Finally, to ensure Physical Compiler sees the same topology as the router, the design constraints such as “create_obstructions” should be set in the design environments.

A sample script for post layout optimization with Physical Compiler is listed in below paragraph.

# ————————————————————-

# read design

read_verilog top.v

read_pdef top.pdef

# annotate block level annotation written out by PT

read_sdf $top.sdf # sdf

source $top.load # internal RC nets

source $top.tcl # timing cons

# the actual command

compare_rc

estimate_rc> $top.newrc

# scaling factor based on the values from compare_rc and estimate_rc

set_delay_estimation_options

# ————————————————————-

5. Conclusions

ExPass Design flow based on Physical Compiler and Astro provides integrated VDSM IC design methodology and solution. Designers can achieve faster design closure than traditional methodology. This is possible because the physical design problems are identified early by using Physical Compiler, which results in timing and congestion cleaned data. It finally improves the quality of routing and made post layout tasks easy. Designers could focus most effort on improving design performance. In the ExPass design flow we need make data format exchanging at three places, the first is Astro floorplan to PC physical synthesis; the second is PC placed netlist to Astro router; the last is data exchanging during post layout optimization. Data exchanging between PC and Astro will cost additional time and decrease the design technology, timing and data correlation. An integrated design environment or consistent data exchanging solution should be provided soon to fully perform the power of PC and Astro. In this design process, final timing targets of both the high performance 17×17 multiplier and A-Chip were achieved. All of our goals were finally reached by the usage of PC and Astro.

6. References

Physical Compiler User Guide, Volume 1 2003.03

http://solvnet.synopsys.com

http://www.snug-universal.org

7. Appendix