Transistor Level STA and SI Analysis with NanoTime on Custom Digital Blocks


Lin Wei, Yao Gang, Ji Bingwu, Zhao Tanfu
HiSilicon Ltd.

wei.lin@huawei.com

Abstract

The design team at HiSilicon conducted evaluation of NanoTime as the standard transistor level timing sign-off (Tx-STA) tool for custom digital blocks. The evaluation was based on a SRAM design under TSMC 65nm technology, mainly on the decoding logic, clock distribution tree, and in/out data path. The post-lay results from NanoTime and FastSPICE were compared and the difference is around 5% generally. Technically, NanoTime has been proven to correlate well with HSPICE with acceptable margin in path delay. In addition, runtime for NanoTime is significantly smaller than FastSpice and HSPICE. For example, it would only take 6 minutes for one of our test case to finish NanoTime Tx-STA while FastSPICE would run for 1.5 hours. Furthermore, the crosstalk analysis capability in NanoTime Ultra can give us a worst case scenario on how bad the timing can be negatively affected under severe noise attack condition. It’s almost impossible to generate all the testing vectors to cover all the cases by using SPICE/FastSpice solution considering the complexity, long runtime and huge workload.

1. Introduction

With the process geometries reaching 90 and 65-nanometers, there are many nanometer effects that can impact timing. Accurate analysis of these effects is required to identify real timing issues especially for the custom digital blocks and macros. The traditional verification method of the custom digital designs is SPICE/FastSPICE simulation with tedious vector generation process and long simulation runtime. The increasing networks of RC complicate simulation goals and incomplete simulation coverage will also cause low productivity and re-spins. New discipline is required for low re-spin rates.

For certain SRAMs, TCAMs and regular shape/routing blocks, in-house custom design can achieve better timing and area output than memory compiler and P&R results. This is particularly true for some of HiSilicon’s performance/power/area hungry network chips. Since the design size is pretty big, both machine runtime and noise attack effect need to be carefully examined when setting up design flow and choosing sign-off tools. PathMill was the old industry-standard transistor level static timing analysis tool. But it lacks the capability to do timing adjustment due to noise attack. Synopsys claims that NanoTime is the next-generation timing analysis tool on transistor level designs much like PrimeTime on Standard cell designs. And there is a real need on timing analysis for one of our in-house SRAM test run. As a result, we started the evaluation on NanoTime /Ultra with the goal to prove whether NanoTime can be integrated into our custom block design flow as the standard transistor level timing sign-off tool. All of the result comparisons will be based on NanoTime and FastSPICE post-lay outputs under TSMC 65nm technology.

1.1 About NanoTime

As Synopsys announced, NanoTime is the next-generation transistor-level Static Timing Analysis (Tx-STA) solution that addresses the emerging challenges in signal integrity (SI) analysis associated with custom designs to achieve higher silicon accuracy. NanoTime offers concurrent timing and SI analysis, and helps ensure silicon-accurate analysis and delivers overnight analysis results for complex million-transistor designs. It can also provide seamless integration with Synopsys’ PrimeTime product enables hierarchical SoC full-chip STA that includes both gate- and transistor-level blocks (Figure 1.1). NanoTime further boosts designers’ productivity by offering significant ease-of-use features, including interactive static timing analysis, save/restore, extracted timing model (ETM) creation including NLDM and CCS timing, Dynamic Clock Simulation (DCS), Dynamic Data Simulation (DDS), Path Based Skew Analysis (PBSA) and so on.

Figure 1.1: NanoTime and PrimeTime in Full-chip STA

1.2 NanoTime Basic Flow

An analysis session of NanoTime consists of a sequence of phases that must be performed in the proper order: netlist, clock propagation and topology recognition, constraint specification, path tracing, and analysis reporting (Figure 1.2). In each phase, data and instructions are provided for NanoTime to use in that phase. NanoTime checks each instruction for correct syntax and usability in the current context of the flow. Each phase ends with a command that reports the successful completion or error condition for that phase.

NanoTime offers a Tcl-based shell interface for entering commands, writing scripts, and viewing results. Tcl shell provides easy customization. It is command-line-driven and runs under the UNIX or Linux operating system. The NanoTime command-line interface is called nt_shell. To start nt_shell, enter this command at the operating system prompt:

% nt_shell

For nanometer model technology, if the SPICE technology file contains a wrapper or macro transistor modeling, the technology SPICE file must be assigned using the read_spice_model command as well as specified as a SPICE netlist with the register_netlist command. For example:

read_spice_model -name ss_85_120 tech_ss.sp

……

set_technology ss_85_120

……

check_topology

……

With read_spice_model, the set_technology command must be specified after link_design and before the check_topology stage to associate the technology name with the transistors. With the same procedure, NanoTime provides the capability of multiple operation conditions analysis. NanoTime read in the multiple SPICE model files to create the multiple technology corners. Each technology is assigned a name. To invoke the technology, the set_technology command is adopted.

register_netlist -format spice {design.sp}

read_spice_model -name ss_85_120 tech_ss.sp

read_spice_model -name ff_0_150 tech_ff.sp

read_spice_model -name tt_25_135 tech_tt.sp

……

set_technology ss_85_120

……

NanoTime contains a new netlist engine which does not compile netlists until link_design. Netlist are registered with register_netlist command.

register_netlist -format spice “cells.sp”

register_netlist -format verilog “test.v”

The extracted SPF/DPF is read in and back-annotated after link_design stage.

read_parasitics –format spf/spef/dpf

Figure 1.2: NanoTime Analysis Flow Overview

During evaluation we found that NanoTime does not automatically recognize flip-flop structures by default, but we can use the mark_flip_flop command to manually specify flip-flops blocks. We can also specify the setup and hold timing endpoints of the flip-flop structure. This is the command syntax:

mark_flip_flop

-master_latch master_object

-slave_latch slave_object

[-name structure_name]

2. Methodology

Our current project is a fully custom SRAM module. It includes row/column address decoder, Read/Write/SenseAmp clock generation circuit, write data path, sense amplifier timing control circuit, read data output path, memory precharge circuit, memory read/write circuits, etc.

Fig 2.1 shows the design flow for our custom blocks. Some portions of the module, such as sense amplifier, write pull-down transistors etc, are not well suitable for NanoTime to recognize and perform analysis. FastSPICE is used for those special cells simulation and representative timing models generation during NanoTime black box flow. FastSPICE is also used for the whole custom block simulations with some typical-case input vectors to get a general idea about timing and slope. Its results are also used as the base point for comparison against NanoTime results to check the accuracy. The main purpose for NanoTime in this project is for two considerations. Firstly, it is evaluated in this project on timing accuracy, ease of use, tool stability, etc. Secondly, since we can only have limited input vectors for FastSPICE, the verification with only FastSPICE can not cover all the corner cases. NanoTime is used as a fast supplement tool to provide full coverage on path timing and noise impact. In addition regarding to the accuracy evaluation, we have a very tight schedule. Thus, NanoTime results are compared against FastSPICE results instead of HSPICE as golden accuracy reference.

Fig 2.1: Design Flow Description

3. Procedure

3.1.1 General Description

During the pre-lay design period of this test, we did not use NanoTime. All the block design is based on SPICE and FastSPICE. The reason is simply due to tool availability. After the design flow build-up process, we will use NanoTime to have a sanity check on pre-lay schematic to make sure there is no obvious timing problem before starting layout.

After the schematic and layout are done, we used StarRCXT to do the RC extraction. Fast corner and slow corners parasitic were extracted. This same parasitic data were back-annotated onto FastSPICE and NanoTime models for post-lay simulations.

The netlist for NanoTime usage is a SPICE netlist with the same process models. The parasitic is in spf format. Both R and coupling C were included in the spf file.

We performed Tx-STA and SI analysis with parasitic back annotated and black-box flow. The black-boxed cells are represented with manually generated timing models with the NanoTime hierarchical analysis flow to guarantee the accurate result.

3.1.2 Hierarchical Tx-STA Analysis with Black Box and Representative Timing Model

Since our design is a memory SRAM module, there are quite a few special analog cells on which NanoTime can not recognize very well. A black box needs to be created for each of these special sub-circuits. We are black-boxing some of the analog sub-circuits and use SPICE/FastSPICE simulators to generate the timing arcs into a representative lib file. So the tool can have accurate timing and reasonable running time.

 

Figure 3.1 NanoTime Hierarchical Analysis Flow

We can analyze a large chip design in hierarchical stages by using timing models to represent the lower-level blocks in the hierarchy. Using timing models can reduce the total analysis time by breaking down a large task into smaller, reusable units. The extract_model command generates a timing model from a netlist representation of a block. Figure 3.1 shows the process of generating a timing model in NanoTime. A block is read in as a subdesign, then we set the timing constraints, and perform a conventional timing analysis of the block. If there are no violations, we reset the analysis of the design and then perform another analysis, this time using the extract_model command to produce a Synopsys database (.db) timing model for the block. For higher-level analysis, the original netlist of the block is replaced with the timing model. The extract_model command generates a static timing model in .lib and .db formats from the current design. The generated model has the same timing behavior as the original netlist. The model can be used in NanoTime for higher-level timing analysis and can also be used with other tools such as PrimeTime.

The extracted timing model will contain lookup tables to determine the model behavior as a function of input transition times and output loads. A set of index values specifies the range and the intervals between parameter values used for characterization. Using more index values gives better model accuracy at the cost of more runtime. To set the input transition time index values:

nt_shell> set_model_input_transition_indexes \

-nominal 0.1 {0.05 0.1 0.15 0.2} \

{ck1 data1 data2}

1

This command causes the extracted model to use the values 0.05, 0.1, 0.15, and 0.2 as the input transition time index values. When the model is used, the timing analysis tool will use interpolation for transition times between these values or extrapolation for transition times outside this range of values. To set the output load index values:

nt_shell> set_model_load_indexes {.05 .10 .15} \

{out1 out2}

1

This command causes the extracted model to use the values .05, .10, and .15 as the output load index values. The timing analysis tool will use interpolation for actual output loads between these values or extrapolation for output loads outside this range of values.

The procedure to generate black box for special cells are about the same with standard hierarchical analysis flow. The special analog sub block is viewed and processed just as a normal module in the NanoTime, the only difference is at the end of the script where the sub block’s timing lib and db file was extracted with the NanoTime ETM capability.

nt_shell > extract_model –name Block1

After executing this line, the timing lib and db files will be generated for the special analog sub-block. The extract_model command performs path tracing to determine the timing behavior of the block, and then generates the timing model. It writes out the following files:

• Block1.lib: The library block in Liberty source format

• Block1_lib.db: The library block in .db format

• Block1.sdc: A constraint script in Synopsys Design Constraints format that is to be run when the block is used

Then in our case, we use FastSPICE to get all the timing information according to the block1.lib timing arcs, and hand-hack them into the lib file generated by NanoTime extract_model and replace the old value.

We then read in the lib and generate the corresponding block1.db to be used for higher level blocks STA analysis. To use a timing model in place of a netlist, we put the name of the model in the link_path variable and set one of the link_prefer_model variables. The link_prefer_model variable causes NanoTime to use the block1 timing model in place of the block1 spice subcircuit, using name-based port mapping. NanoTime replaces any subcircuit named block1 with the timing model and looks for port names that match the names used in the subcircuit reference. The following lines are executed to read in the subcell db generated by previous steps in our design.

nt_shell > set link_path {* subcell.db}

nt_shell > set link_prefer_model_port {subcell0 subcell1}

nt_shell > read_library subcell.db

Thus, those black-boxed special cells will be replaced by their accurate timing model and NanoTime can go on with the subcell timing models to do fullchip STA analysis.

3.1.3 SI Crosstalk Analysis

For designs at 90nm or under, noise is a serious issue. It’s basically impossible to use HSPICE or any other tools to thoroughly check noise impact for a very large custom digital design. NanoTime does a good job on it. Its noise simulation capability is one of the main reason (in addition to provide full timing coverage on all potential paths) for us to include NanoTime into our standard custom design flow.

NanoTime Ultra offers crosstalk analysis, also known as signal integrity (SI) analysis. It analyze the effects of crosstalk on transition arrival times on victim nets and the resulting changes to timing slack, and the effects of crosstalk noise on steady-state nets. When this option is enabled, NanoTime calculates the changes in transition arrival times on victim nets by taking into account the capacitive coupling between aggressor and victim nets, aggressor and victim drive strengths, and transition arrival time windows. It reports the changes in path slack resulting from crosstalk.

By default, noise analysis is not included in NanoTime flow. We need to enable it first.

nt_shell > set si_enable_analysis true

Then the following script is used for SI analysis

In order to start SI analysis, the coupling capacitance needs to be preserved. Then we set some parameters for SI. NanoTime will use an iteration flow to gradually calculate the timing impact starting from the most pessimistic iteration 1 in which it assumes that all attackers will attack at the same time, ignoring the possibilities that different attackers may attack at different times, which will make the impact much smaller. Then it will include the timing difference in the next iteration gradually. Whenever the timing impact difference is within a range between two iterations, NanoTime will drop the nets and only analyze those with big timing impact difference above the margin. Thus, the nets number are getting smaller and smaller until it goes to zero or it reaches the iteration number limit set by user. In our case, we examine the result and found that two iterations is just enough for us since our evaluated design is not too large.

In the timing report, we asked the tool to give us delta delay before and after SI analysis, the biggest difference is around 7ps, which is a 300um clock line driven from the middle.

The above number is an example for timing report. Here for Net A_LINE9, there is a 2ps noise impact delta, while for NET079, the impact becomes 6ps. For larger designs, the noise impact will become quite significant.

The machine running time does not increase much for our case to perform SI analysis. The total running time only increased by around 3 minutes in our case. It means the cost for a very valuable information output is almost nothing.

3.1.4 Procedure Summary

During the previous described process, we observed a few good or not points on NanoTime and summarize them here with our limited experience.

Pros:

The following items are the main reason for us to determine that NanoTime is a good next-generation transistor level timing analysis tool.

• The turn-around time (TAT) by using NanoTime for timing analysis is quite short. While it takes about 2 hours for FastSPICE to finish 20 cycles of read or write simulation with reasonable accuracy settings, it only costs 6 minutes for NanoTime to finish the simulation, find the worst case condition, and generate MAX and MIN timing report without noise consideration. When noise setting is enabled, it just cost 1.5 minutes more for a 3-pass noise simulation.

• NanoTime is an easy-to-use tool. Based on our experience, it will take less than a week for an experienced designer (having past experience on PathMill and Synopsys PrimeTime) who is new to NanoTime to go through the manuals, writing up the first-cut scripts and finish the first-time trial run. The commands for NanoTime is quite similar to those used for PrimeTime, while the concept behind the tool is very similar too as compared against PathMill.

• While it takes quite some efforts to tell PathMill how to recognize the topology, NanoTime apparently does a better job. Users generally don’t need to do many things before NanoTime figures out what the circuit is and gives the correct timing analysis. In our case, the only manual topology work is to tell the tool one set of flip-flops. NanoTime can recognize most of the logic structures smartly.

• The results from NanoTime is with acceptable margin as compared against FastSPICE results. We will give detailed comparisons in the next section.

• There are many options for NanoTime to give various kinds of information for delay, slope and timing. It basically covers every aspects of the timing analysis for all kinds of interests.

• The noise simulation capability is another beauty of the tool which gives users good and accurate information on how worse it can go under coupling effect. This is especially useful when more and more designs are entering deep sub-micron area.

Cons:

During the evaluation period, we found some areas which may need developer’s attention for improvements. However, those are just some individual issues which we believe Synopsys can fix them in the upcoming releases. For example,

• The automatic topology recognition by the tool can recognize most latches and clock gaters. However, it seems if a net is both a latch net and a clock gater net, the tool will get confused and strange timing analysis results will be given. We are working together with Synopsys for further investigation.

4. Timing Accuracy Comparison

Regarding to the accuracy analysis, we only performed post-lay timing comparison between NanoTime and a FastSPICE tool. We did not get enough bandwidth to perform further detail comparison again HSPICE so far. The parasitic extraction tool is StarRCXT. The generated spf file can be easily back annotated into NanoTime by adding a line in Tcl script as “read_parasitics …”. There is no error for FastSPICE post-lay back annotation. There is also no error for NanoTime when not enabling SI analysis. Some typical delay numbers on some of the nodes are given in Table 4.1 as an example. FastSPICE parameters are set to certain value to save time. In this way, the tool running time for FastSPICE will be significantly shorter than full SPICE simulation, but will introduce very small result difference and make our reference model slightly different than referenced FastSPICE result. The following result difference is the combination effect of FastSPICE and NanoTime.

Path: input clock to write clock

FastSPICE

NanoTime With SI

Percentage With SI

NanoTime Without SI

Percentage Without SI

Path Delay

0.364ns

0.383ns

5.2%

0.385ns

5.6%

Rise Transistion(when measuring 20% and 80% point on FastSPICE waveform)

0.077ns

0.080ns

3.9%

0.080ns

3.9%

Path Delay

0.57ns

0.609ns

6.8%

0.58ns

1.7%

Rise Transistion (when measuring 20% and 80% point on FastSPICE waveform)

0.104ns

0.094ns

9.6%

0.096ns

7.7%

We think one major contribution to the discrepancy is how the tools calculate the coupling capacitance. When noise analysis switch is not enabled, NanoTime will split the coupling capacitance between two wires as if it is connected to ground, even if the wire may attack each other and produce different result. By default, the two new grounded capacitors each have the same value as the original cross-coupling capacitor as showed in figure 4.1. While in the FastSPICE simulation, all the coupling capacitors based on input vectors are kept.

 

Figure 4.1: NanoTime Split Coupling Capacitor to Ground by Default

In the second case, slope discrepancy is fairly big due to different measure slot. Since the loading for the last stage is very big, the slope from FastSPICE degraded a lot when passing over 0.7 vdd area. If we measure 20% and 70% point, the slope becomes 0.093ns and is very close to NanoTime result.

Overall, NanoTime has a timing result at around 5% range of FastSPICE. For those large loading and bad slope points, we shall only use the transition number as a reference.

5. Conclusions and Recommendations

The timing accuracy for NanoTime is acceptable. Noise simulation capability is a big plus which can give designers a good and comprehensive understanding on how much impact the noise can assert. The tool is fairly easy to use, and machine running time is short. Compared against PathMill, it has one major advantage on noise capability. While the configuration file for PathMill is quite different, the designer can now feel that NanoTime is part of the Synopsys tool family with so many familiar commands as in other Synopsys tools.

Technically, NanoTime shall be recommended for future custom digital block and macro design based on the following considerations

• So far it is generally the best tool on the market for quick transistor level timing analysis

• It has a good noise simulation capability, which gives a good number on how big the noise impact can be

• Fast throughput, easy of use

• Acceptable timing accuracy

• Full coverage of worst case paths

• Topology recognition capability is among the best within similar tools

• In the future, a system may contain portions of transistor level custom blocks. We can use NanoTime to generate their corresponding libs, and then provide them to higher level PrimeTime analysis.

6. Acknowledgements

During the evaluation period for about 2.5 months, many people from HiSilicon and Synopsys contributed their effort to it. From HiSilicon side, we would like to thank Lei Ting for his continuous support on this activity and Liang Wei for her IT support. From Synopsys side, we like to thank Jing Tao for his help on license availability support.