使用SPG流程提高DCG和ICC之间的时序一致性


廉玉平yplian@marvell.com

美满电子科技(上海)有限公司

Marvell Shanghai

摘要

Topographical 模式的综合已经开始被用户广泛使用并集成到芯片实现的流程中。由于能够直接读入ICC使用的floorplan,所以 DCG能够减少前端和后端之间迭代的次数。从2010.03版本起,DCG开始支持SPG特性,该特性可以使ICC重复利用综合时的 placement相关信息。如此,ICC与DCG之间的时序相关性得到了较大提高从而对前后端之间的迭代次数以及最终制成硅片的时间 起着重要作用。
本文从实际芯片出发,描述了SPG流程如何工作,并基于该流程,提出了一种提高ICC与DCG时序一致性的有效方法。文章将从 一个简单的模块综合出发,描述该模块的SPG流程,并分析其ICC placement结果与DCG综合结果之间的时序进而将该SPG流程 修改并在整个芯片的综合中得以实现。经观察分析发现,在整个芯片综合中,只有经过一些优化处理才能够得到满意的时序一致 性,文章将具体阐述如何才能使一致性的问题变的最小。

Abstract

Topographical synthesis is nowadays becoming more natural to users and starts to become integrated into chip implementation flow. Design-Compiler topographical (DCG) can help cut the loop between front-end and back-end since it can read in the real floorplan which is used by IC-Compiler (ICC). From 2010.03 releases, DCG starts to support physical guidance feature which makes the placement information of synthesis database can be reused by IC-Compiler, thus, the correlation between ICC and DCG is greatly improved and this is leading to loops between front-end and back-end as well as having an impact on time-to-silicon.
This paper shows on a real chip how recent Synthesis Physical Guidance (SPG) flow works and provides an efficient strategy to achieve better timing and its correlation between DCG to ICC based on SPG flow. It will start from a simple block level SPG flow in terms of the timing correlation between DCG and ICC. The analysis can be performed and this flow then is modified and implemented in full chip synthesis. It is observed that a few skills are required to optimize the flow so as to improve the correlation between ICC placement timing and DCG synthesis timing, and this paper will show how it helps minimize such correlation problem.

1. Introduction

Synthesis takes a great role in chip implementation since its result is a key start point for P & R as well as a main reference for the RTL timing fix. And any physical or logical changes to the design could cause the product’s schedule to slip, thus, getting a good correlative timing in synthesis is extremely important since it can reduce the iteration between front-end and back-end as well as any big surprise coming out from physical implementation. Synopsys DCG can provide a good synthesis result for physical implementation and the latest DC Version can provide physical guidance for ICC placement thus greatly improves the timing correlation.
We use the physical guidance to compile the design in DCG and export the database to ICC to do placement on real project. Specifically, in our flow, we import ddc and floorplan to ICC and then do place_opt and check the ICC placement timing as well as comparing it to original DCG timing[1][2]. Using this approach; we found that, sometimes, the timing result did not have a good correlation between DCG and ICC or even between different compile stage results in DCG. The following chapters will show the block level and full chip level experiment using DCG SPG flow, the correlation results will be analyzed and some non-correlation cases were studied. This paper will show these challenges in detail and provide the solutions to overcome such difficulties. In our project, we made substantial effort to get the tool to do the desired job through controlling different options and variables.
In following chapters, we will firstly introduce the SPG flow in DCG and ICC, then, block level experiment is carried out to check the timing correlation between DCG and ICC. The full chip SPG flow synthesis is described and result is also analyzed. Finally, efforts are used to improve the timing correlation between different DCG stages as well as DCG and ICC placement result.

2. Physical Guidance Flow in DCG and ICC

Design Compiler topographical synthesis performs placement-driven design mapping and optimization in order to achieve high quality of results (QoR) and tight correlation between Design Compiler and IC Compiler. And the physical guidance flow is jointly supported in Design Compiler and IC Compiler in order to improve runtime and correlation. It is very easy to enable such Physical Guidance feature by using –spg option with the compile_ultra command in Design Compiler and by using the –spg option with the place_opt command in IC Compiler. The minimal change to your current flow is to add “-spg” options to your existed “compile_ ultra” and “place_opt”[3][4][5] command.
We used the physical guidance feature in our block and full- chip level synthesis respectively, the section of this chapter below will introduce such flow and its timing result. Timing correlation will be analyzed and some efforts are used to improve it.

2.1 Block experiment

2.1.1 Block level synthesis flow

We took one functional block as an experiment to evaluate the physical guidance feature. This block has no SRAMS, we created an initial floorplan using ICC by fixing the pin locations and design core area. Then, we use this floorplan both in Design-Compiler and IC-Compiler to make sure the physical constraint is exactly same. Same floorplan is the basic and also important element to get tight correlation between DCG and ICC, not only the same floorplan can provide the same physical constraint, but also it is the indispensable condition for SPG flow.
The flow for block level SPG synthesis is shown in Figure 1. From this figure, we can find it is similar as normal synthesis flow except that both compile_ultra and place_opt have an additional spg option. Basically, there are two requirements for SPG flow: 1. setup requirements: Both Design-Compiler and IC-Compiler must use consistent physical and logical constraints and use the same libraries as well as floorplan; 2, make sure place_opt –spg is the first placement related command in IC-Compiler, avoid using commands like create_ placement, remove_buffer_tree etc which impact physical guidance benefit. Considering the situation that maybe there are some bounds or other floorplan related constraints are created in synthesis script, and to make sure ICC uses the floorplan exact same as that of DCG, in our flow, we write floorplan from DCG before it exits in terms of DCG floorplan modification feature by using below commands ( we need to show voltage area and bound in DCG layout window, otherwise, the floorplan dumped does not have voltage area or bound information). The new interactive and graphical feature of DCG allows the user to open IC-Compiler in DC’s console. Before opening floorplan exploration layout, make sure you set ICC version same as DCG already. In our DCG flow, we created and opened Milkway library automatically by adding below script in .synopsys_dc.setup

if { [file isdirectory $mw_design_library ] == 0 } {
create_mw_lib -technology $TECH_FILE \
-mw_reference_library $mw_reference_library \
$mw_design_library
}
open_mw_lib $mw_design_library

However, when we started to open ICC using“start_icc_ dp”command, the tool complained that the Milkway cell was already open since start_icc_dp command internally started IC Compiler and sourced .synopsys_dc.setup again. We updated our setup file by adding if-else selectively for DCG, such as: “if {$synopsys_program_name == “dc_shell-t”}”, this time, IC-Compiler was successfully opened. Next we executed some commands in DCG to make it automatically dumped floorplan: firstly, a layout window should be created, then, the bound and voltage area were showed and finally, we used start_icc_dp command to write needed physical constraints. An example script is shown below:

gui_start gui_create_window -type Layout -show_state max
gui_set_setting -window [gui_get_current_window -types Layout -mru] -setting showMovebound -value true
gui_set_setting -window [gui_get_current_window -types Layout -mru] -setting showVoltageArea -value true
start_icc_dp -verbose -f scripts/icc_dp_fpdump.tcl gui_stop exit

There were only two simple commands in icc_dp_fpdump.tcl file, which were:
‘write_floorplan –create_bound –sm_voltage_area –row –track –placement {io hard_macro terminal} –preroute icc_ dp_fp.tcl’and‘exit’
For ICC placement, we imported DDC database as well as floorplan file and then did place_opt with spg option to make sure ICC use physical guidance information stored in DDC database. Since DDC already had all needed constraints for the design, what we need to do in ICC is to create a floorplan and read in DDC, then, place_opt with spg option is used to do placement in ICC.

Figure 1 Block-synthesis flow

2.1.2 Experiment result

Design-Compiler synthesis result and IC-Compiler placement result were analyzed, specifically, the timing report was checked and actual cell placement both in DCG and ICC was studied. Results showed that same cell location (especially flops and main non-combinational logic) between Design- compiler and IC-compiler is similar.
Figure 2 shows the same two paths placement in DCG and ICC. The left picture is the placement of DCG layout window and the right one is the result from ICC cell view. From the overview of these two pictures, we can tell that the total cell placement information and density distribution are almost same. However, DCG does not consider the legalization of cell location. We chose two top paths (the yellow and red color in these pictures) for example; the red color path had almost exact same cell location between DCG and ICC, but the cell location of yellow color path was a little different between DCG and ICC since ICC also considers legalizing cell location and fixing overlapped problem but DCG uses virtual placer. We used seven path groups to synthesize this design and compared the timing deviation using the worst path in each path group, from the timing comparison table shown in Figure 3; we found that all timing deviations were less than five percents and the biggest variation between ICC and DCG was about three percents which was actually small and acceptable.

Figure 2 DC and ICC cell placement comparison

Figure 3 DCG and ICC timing comparison

2.2 Full chip experiment

2.2.1 Full chip synthesis flow

The success of block level timing correlation also gives us the confidence to take this physical guidance flow to full chip level synthesis. Our Full chip is much more complicated with many SRAMs and thousands of pins. And, it’s a real chip; we need to do all needed steps including DFT (design-for-test) and EDT (embedded-deterministic-test). Figure 4 shows our synthesis flow of Full chip level.

Figure 4 Full chip synthesis flow

2.2.2 Full chip synthesis result

We added additional DFT, EDT as well as manual timing fixing step by swapping cells between standard threshold library cell (SVT) and low threshold library cell (LVT) in full chip synthesis. The timing comparison between ICC placement and DCG synthesis timing was performed. However, from the timing report, ICC placement timing didn’t have much correlation with DCG, and actually was much worse than that of DCG. We set several path groups when doing synthesis, DCG timing always showed better timing than that of ICC placement result. Table 1 below shows the WNS (worst negative slack) in some path groups, the timing unit is ‘ps’, G1 indicates the first path group name.

Table 1 WNS comparison


From table 1, we can see that, the ICC placement timing of most path groups were much worse than that of DCG synthesis result. We definitely know there will be some differences between ICC placement timing and DCG synthesis result since DCG uses virtual placer and does not consider cell overlapping while ICC does real placement as well as checking cell legalization, but since we are using SPG flow, the cell location in DCG and ICC should not have major difference and from the block level experiment result, we are pretty sure about this. The scripts between block level and full chip level were almost same except that we added DFT/ EDT and some LVT/SVT swapping in full chip. Is it possible that these steps bring some miscorrelation? Our chip is multi- threshold design and for initial compile, we did not set any leakage or area constraints on top design so that the cells picked up by DCG were mostly LVT cells. After initial compile, the DFT/EDT steps were implemented and we also applied constraints to recover some leakage from compiled database, and finally, incremental compile was executed. Incremental synthesis result had more SVT cells than that of initial compile database. Such compile flow determines that the incremental timing should not be much better than initial compile result since LVT cell is much faster than that of SVT cells. However, the incremental timing was decent and some path groups showed much better timing than initial result which we did not expect at all. Table 2 shows the WNS of some path groups for initial and incremental DCG result, the time unit is ‘ps’.

Table 2 Initial and incremental compile timing


From Table 2, we can see that most paths show better timing in incremental database. Such circumstance did not happen in previous block level synthesis, and this makes us to think if the timing reported from incremental database was not real situation because of additional DFT/EDT or cell swapping? We did additional extract_rc –estimate based on incremental compile result and reported the timing again, somehow, this timing report was totally different from that of the one we did not do “extract_rc –estimate”. Table 3 shows the WNS in some path groups before and after “extract_rc –estimate” based on incremental compile database, the time unit is “ps”.

Table 3 Timing before and after extract_rc -estimate


Even though this is WNS for path groups, we can tell that after ‘extract_rc –estimate’, seems the tool can get reasonable RC delay so as to represent the real timing. We also reported same timing paths both in initial compile database and RC extracted incremental result, the timing report after extract_rc was worse than that of before extract_ rc. But, normally, the timing is more or less similar, and sometimes, the incremental result shows slightly better timing than that of initial compile result because of compile optimization, and in some circumstance, the incremental result shows a little worse timing than initial result because initial database has all LVT cells but incremental database has some SVT cells for leakage or area constraints. Both situations make sense.
Now, the difference between initial result and incremental result is reasonable and acceptable and ICC also can maintain and match DCG result.
In our experiments, we found the max_capacitance constraint can impact the timing between ICC and DCG. Basically, you need to set same max_capacitance constraint both in DCG and ICC. However, same value is not enough for getting good timing and correlation. You can improve the correlation by setting appropriate max_capacitance. This value is determined by your design and library. In our flow, we reported the worst timing path and checked the biggest capacitance of the worst timing path. Trying to set that number as the max_capacitance constraint. And from our experiments, the tighter capacitance than the maximum of WNS path is not recommended since tight capacitance constraint will make the tool to insert many inverters or buffers to meet DRC problems, thus, the timing is not good regarding both WNS and TNS. We have timing results of two synthesis database, one is the flow with maximum WNS capacitance and the other has 75% maximum WNS capacitance. Table 4 shows the WNS of some critical path groups. It demonstrates that the overall timing of tighter capacitance constraint is hard to get better timing since it needs additional effort to fix DRC thus may adding many unnecessary buffers or invertors which could decrease the timing. Then, an appropriate constraint is vital to design synthesis.

Table 4 Timing comparison with different maximum capacitance


In ICC, we want to use placement result as a start point and do incremental placement optimization or clock tree synthesis after placement, which means we want ICC placement result to be a real useable database instead of just a timing reference as that of DCG database. Of course we can add power rings/straps and necessary physical cells including TAP cells and ESD cells based on SPG placement database, but somehow, it is fallible to add additional cells based on placement result and it is especially difficult and needs more effort to fix some multi-voltage etc problems if your flow is UPF related. Such circumstances make us to consider adding power shapes and needed physical cells in DCG before initial compile, which means we want to use a floorplan containing all necessary physical information prepared for a real ICC placement. We use “extract_physical_constraints -allow_ physical_cells physical.def ” command to get all physical cells in DEF and then do same SPG synthesis flow as described above.
By checking the layout window of DCG initial compile result, we found all physical cells and power shapes were added as expected, but somehow, the initial timing result was much worse than the original synthesis result which used the floorplan without power shapes or physical cells. Such situation should not happen, because we already added placement blockage in original floorplan where physical cells were placed, there should be seldom possibility to make the tool do very different compile because the available placement area stayed same. From our previous experience, is it possible that the tool did not calculate the RC correctly? We reported timing after initial compile and then did “extract_rc –estimate” to force the tool to do a standalone RC extraction and then reported timing again, and the timing after extract_rc was better than the report before extract_rc, which confirmed our surmise. And basically, the capacitance after extract_rc –estimate was lower than original timing report, which means the tool was pessimistic about the delay calculation. So, it is strongly recommended that extract_rc –estimate is executed before reporting timing to make sure you are seeing the reasonable timing report.

3. Conclusions and Recommendations

We used physical guidance both in DCG and ICC by enabling ‘spg’ option for compile_ultra and place_opt respectively in DCG and ICC. Both block level and full chip level SPG flow synthesis were carried out, timing report was analyzed and the correlation between DCG and ICC placement result was performed. From the experiments above, to get the better timing correlation between DCG and ICC, it is recommended to do.
1. Using exact same floorplan and constraints both in DCG and ICC, to overcome the weakness of DCG adding floorplan related constraints in script, a sample script is shown to dump the floorplan from DCG so that ICC can use the same floorplan easily.
2. Using DDC as the start point instead of netlist since DC physical guidance information is not stored in netlist.
3. Make sure the first placement related command in ICC is “place_opt –spg”; avoid performing other placement related command such as: remove_buffer_tree or create_ buffer_tree etc.
4. Executing “extract_rc –estimate” before each report_ timing to make DCG estimate RC delay, especially the design has manual cell swapping or EDT/DFT steps using the third company tool as well as your design has power information.
5. Set a reasonable max capacitance constraint on your current design, a suggested variable is using the max capacitance of WNS timing path.
6. Make sure the libraries and dont_use list are same in DCG and ICC, a good method is to add such libraries and variable list in .synopsys_dc.setup file so that DCG and ICC have same environment and design setup information.

4. Acknowledgements

I would like to thank my coworker (also my leader) Hua Tang who is in Marvell US site, he contributed a lot in developing SPG flow from DCG synthesis to ICC placement. I also want to thank Synopsys support team in Shanghai, China, especially Info Ge and Johna He for their continuous support

5. References

[1] How Design-Compiler Graphical helps solving floorplanning issues, Laurent Besson, SNUG 2010
[2] Design Compiler and IC Compiler Physical Guidance Technology Application Note, Version D-2010.03-SP4, Sep 2010, SYNOPSYS
[3] Design Compiler User Guide Contents SYNOPSYS [4] Using Design Compiler Topographical Technology SYNOPSYS [5] Design Compiler 2010.03 Update Training
https://solvnet.synopsys.com/retrieve/030101.html?otSearchResultSrc=advSearch&otSearchResultNumber=10&otPageNum=1