Static Timing Verification for Complex SoC Design – Application of “Internal Path” Approach
Ahaneku Ogu,Qu Fu Yang
Infineon Technologies
ABSTRACT
Various methodologies for the timing analysis of DDR interfaces have been presented in SNUG literature. The “Internal path”approach presented by Fu Hui in his SNUG Singapore 2002paper, Static Timing Verification for Complex SoC Design-Part I, is one of the most elegant and intuitive methodologies in which the use of STAMP models to model external memory devices simplifies the DDR interface timing analysis. Although DDR design is becoming increasingly common, each implementation presents its own unique challenges. In this paper we describe the application of the “Internal path” approach to the timing closure of the 156 MHz-DDR interface of a 90 nm EGPRS baseband chip.
1.0 Introduction
Faced with the challenge of performing timing closure on a DDR controller for an EGPRS chip we searched the literature for existing STA methodologies. We came across several methodologies such as the latch approach and the virtual clock approach [1].
https://solvnet.synopsys.com/wwwauth/wwwauth:8000/news/pubs/sjsnug/sj02
However, the most promising approach for postlayout timing analysis we came across is the internal path methodology presented by Fui Hui in his SNUG Singapore 2002 paper, “Static Timing Verification for Complex SoC Design-Part I [2]
https://solvnet.synopsys.com/wwwauth/wwwauth:8000/news/pubs/snug/singapore2002/papers/8_STA_Infineon.pdf
The internal path methodology is also describe in another SNUG paper [3] by Robin W L Ko,were it is called System level STA (SSTA) methodology.
https://solvnet.synopsys.com/wwwauth/wwwauth:8000/news/pubs/sjsnug/sj04/ko_paper.pdf
The main drawback of the “Internal path”methodology is that it is not possible to use the same constraints through the complete flow. The constraints used for post-layout STA of the interface are different from the constraints used for synthesis and layout.
In our paper we outline how we applied the internal path methodology to STA DDR controller of our EGPRS baseband chip. The paper will also cover the challenges posed by the unique architecture of our DDR implementation.
The internal path methodology required some minor adaptations. Techniques were borrowed from other SNUG technical papers. All points of difference from the original methodology will be clearly highlighted in the paper.
2.0“Internal Path?Methodology
We shall start with a brief description of the “Internal Path”or SSTA methodology. The internal path methodology avoids the need to derive constraints based on the timing requirements of the memory devices for DDR controller interface. This is achieved by creating a timing test harness for the DUTA (design under timing analysis-which in our case is the DDR controller module). Within the test harness the DUTA is connected to timing models of the memory devices. This arrangement makes the paths between the controller and the memory devices internal. These internal paths are treated like register-to-register paths by PrimeTime hence precluding the need for I/O constraints at the interface of the controller.
PrimeTime is used to compile a description of the memory device timing in the Stamp modeling language into timing models for use in the test harness. See [5] to find out more about the Stamp modeling language.
In our application of the internal path methodology Stamp models were not used to model the PCB track delays. We decided to annotate aggregate loads on the controller I/Os to account for the PCB track delays and the package parasitics.
For a detailed description of the internal path methodology please refer to either papers [2] or [3].
Figure 1 “Internal Path Approach” model
3.0 DDR Controller Architecture
Table 1 Clock Description
In this section we cover only unique architectural features of our DDR Controller that has a bearing on the timing analysis.
Figure 2 DDR Controller Architecture
3.1 Generation of CK and CK_N
The internal Controller clock (IN_CK) is a pulse-swallowed clock, which is unsuitable for the DDR interface. A 50:50 duty cycle clock is generated within the Controller by the circuitry shown in the figure below. A delay line (dcc_dll_delayline) is used to phase shift the pulseswallowed clock by 180o. The clock circuit combines the original pulse-swallowed clock IN_CK and the phase shifted clock to generate a 50:50 duty cycle clock (see the circuits and waveform diagram below).
Figure 3 CK Generation
Figure 4 CK_N Generation
Figure 5 Waveform For Generation of 1:1 Duty Cycle Clock (CK)
3.2 Generation of DQS
The generation of DQS write strobe is similar to the CK/CK_N differential clock.
Figure 6 DQS generation
3.3 Out Data Generation
In order to match the clock and data paths the 2n-prefetch architecture is implemented in a unique way.
The data is encoded across the two n-bit words fed into FFs OUT_FFP an OUT_FFN in such a way that the XOR output of the FFs result in the two required data words per clock cycle. The first encoded n-bit data word is clocked into the OUT_FFP FF by the clock IN_CK and the second encoded data word is clocked into the OUT_FFN FF by the 1800 phase shifted IN_CK clock.
The delay lines wr_dll_delayline and wr_n_dll_delayline are used to adjust the data in relation to the DQS write strobe so that DQS is centre aligned to the data. This is achieved by phase shifting the clock to the data registers by a quarter of a clock period.
Figure 7 DQ generateion
Figure 8 Waveform Data Transfer Transaction
3.4 Read Path
This part of the DDR controller is relatively straightforward. The read data is latched into a FIFO by the DQS read strobe which is centre aligned to the data by the delay line, rd_dll_delayline.
The read data is asynchronously transferred from the DQS read strobe (DQS_IN) clock domain to the internal controller clock domain via the FIFO.
4.0 Component Modeling
In this section we cover the modeling of critical elements of the timing environment.
4.1 Memory Devices
A central feature of the 襂nternal Path?methodology is the modeling of memory devices. Like in paper [2] Stamp models were used to model the external memory devices.
All the Stamp models were generated from the memory timing requirements specified in the memory datasheets. Stamp models were not available from the vendor for the memory devices supported by our DDR controller.
4.1.1 DDR timing parameters
It is noted by R. Ko in his paper [3] that not all the relevant DDR timing parameters were modeled in the paper published by Fu Hui [2]. These parameters are:
tDQSQ
tDSS
tDSH
We specified these DDR timing parameters in our Stamp model description. See the appendix for a complete description of the Stamp model from which the memory timing models were derived.
4.1.2 OCV treatment of memory timing models
The use of on-chip variation (OCV) was an STA sign-off criterion for our 90 nm CMOS process.It is important to ensure that OCV is not applied to the memory timing models. This can be achieved by the following PrimeTime commands, which were specified after the global OCV derate commands:
4.1.3 Modeling limitations
Like in papers [2] and [3] we did not model the memory setup and hold checks as a function of the input transition. The derating factors specified in the datasheets were deemed too pessimistic.
Nonetheless, ignoring the input transition introduces a modeling error.
4.2 Delay Lines
The delay lines of the DLL were built from discrete inverter and multiplexor library cells.
The pulse width requirement of the DDR memory devices makes the asymmetric rise and fall times through the delay lines a concern. Therefore unlike in paper [2] it was decided not to use a Quick Timing model for the delay lines as the actual rise and fall times through the delay lines were of interest. See miscellaneous timing checks section.
Case analysis is used to set each delay line to it required tap value. These values are worked out before the STA testcases are run. See the example below for the master delay line:
set_timing_derate -cell_delay -late 1.00 [get_lib_cells "RAM_lib/RAM"]
set_timing_derate -cell_delay -early 1.00 [get_lib_cells "RAM_lib/RAM"]
4.2.1 OCV treatment of delay lines
The OCV treatment of the delay lines was an issue that we were unable to solve to oursatisfaction. The global derating was applied to all the delay lines.
4.3 External Loads
The external loads of the DDR controller were modeled as lumped capacitances which were annotated onto the output ports. The lumped capacitance represented the following:
- PCB track load
- Loading of the external memory devices
- Package load (bump and ball routing)
For example:
set_load 8.241837 [get_nets "pd_ad_s2[28]“]
set_load 7.931107 [get_nets "pd_ad_s2[29]“]
set_load 8.81701 [get_nets "pd_ad_s2[30]“]
set_load 8.96805 [get_nets "pd_ad_s2[31]“]
5.0 Timing Analysis of Read Transacion
The analysis of the read and write transactions are performed in separate STA runs.
The waveform below shows the timing relationship between DQ and DQS during the read operation.
Figure 9 Basic READ Timing Parameters for DQs
The DQS read strobe is phase shifted by 90o by the delay line rd_dll_delayline in the DDR controller, to centre align it with the data. The phase shift DQS read strobe is used to latch data into the FIFO.
Clock definitions
# DDR clocks
create_generated_clock -name “ddrclk” -source [get_port IN_CK] \
-multiply_by 1 -invert -duty_cycle 50 [get_pins "ASIC_int/CK_PAD/DQ"]
create_generated_clock -name “ddrclkn” -source [get_port IN_CK] \
-multiply_by 1 -duty_cycle 50 [get_pins "ASIC_int/CK_N_PAD/DQ"]
# DQS Clocks
create_clock -name “dqs_in” -period $CLK_PERIOD \
-waveform {0.0 3.205} [get_pins "ASIC_inst/DQS_PAD/PAD"]
create_generated_clock -name “fifo_clk_a_o” \
-source [get_pins SIC_inst/DQS_PAD/PAD"] \
-multiply_by 1 -duty_cycle 50 [get_pins "ASIC_inst/rd_dll_delayline/OUT"]
create_generated_clock -name “fifo_clk_b_o”\
-source [get_pins "ASIC_inst/DQS_PAD/PAD"] \
-multiply_by 1 -invert -duty_cycle 50 [get_pins "ASIC_inst/Inverter/Z"]
set_propagated_clock [list [get_clocks*] [get_generated_clock *]]
The generated clocks, fifo_clk_a_o and fifo_clk_b_o are defined to make the reports more readable.
Figure 10 DDR Read Waveform
In addition to defining the clocks, above, the false path and multi-cycle path timing exceptions are required to ensure that the correct timing checks are performed.
False paths timing constraints
The false path timing exceptions are required to ensure that the data checks are performed at the correct clock edges (see DDR read waveform above).
set_false_path -setup -rise_from dqs_in -fall_to fifo_clk_a_o
set_false_path -setup -fall_from dqs_in -rise_to fifo_clk_a_o
set_false_path -setup -rise_from dqs_in -rise_to fifo_clk_b_o
set_false_path -setup -fall_from dqs_in -fall_to fifo_clk_b_o
set_false_path -hold -rise_from dqs_in -rise_to fifo_clk_a_o
set_false_path -hold -fall_from dqs_in -fall_to fifo_clk_a_o
set_false_path -hold -rise_from dqs_in -fall_to fifo_clk_a_o
set_false_path -hold -fall_from dqs_in -rise_to fifo_clk_a_o
Multi cycle path timing constraints
In order to prevent PrimeTime from performing a setup check between edge 1 of DQS and edge 3 of fifo_clk_a a multi-cycle path timing exception is used. The command set_multicycle_path is used to ensure that setup/hold checks are performed on the correct edges of fifo_clk_a_o and fif_clk_b_o.
set_multicycle_path -setup -from dqs_in -to fifo_clk_a_o 0
set_multicycle_path -hold -from dqs_in -to fifo_clk_a_o -1
set_multicycle_path -setup -from dqs_in -to fifo_clk_b_o 0
set_multicycle_path -hold -from dqs_in -to fifo_clk_b_o -1
Timing report:
Clock domain : fifo_clk_a_o
Setup check :
Startpoint: ASIC_int/DQS_PAD/PAD
(clock source ‘dqs_in’)
Endpoint: ASIC_int/FIFO/ff_pos_reg
(rising edge-triggered flip-flop clocked by fifo_clk_a_o)
Path Group: IO
Path Type: max
Point Incr Path
————————————————————————————————————
clock dqs_in (rise edge) 0.00 0.00
clock source latency 0.00 0.00
ASIC_int/DQS_PAD/PAD (XXX_PAD) 0.00 0.00 r
ASIC_int/MEM_DQS (ASIC_int) 0.00 + 0.00 r
inst_RAM/DQS (RAM) 0.00 + 0.00 r
inst_RAM/Tdqsq_pos_max_in_join (RAM) 0.00 0.00 r
inst_RAM/DQ (RAM) 0.50 H 0.50 r
ASIC_int/MEM_DQ (ASIC_int) 0.00 0.50 r
ASIC_int/DQ_PAD/PAD (XXX_PAD) 0.00 0.50 r
ASIC_int/DQ_PAD/OUTI (XXX_PAD) <- 0.80 & 1.30 r
ASIC_int/FIFO/u10/Z (R_SMX2X010) 0.33 & 1.63 r
ASIC_int/FIFO/u620_C4_1/Z (R_SMX2X015) 0.27 & 1.90 r
ASIC_int/FIFO/u102_C2/Z (R_SAN2IX015) 0.23 & 2.13 r
ASIC_int/FIFO/u98_C3_2/Z (R_SAORI1X010) 0.17 & 2.30 r
ASIC_int/FIFO/fifo_data_regX0Xhb1_0_I/Z (BUFX010) 0.16 & 2.46 r
ASIC_int/FIFO/fifo_data_regX0Xhb3_0_I/Z (BUFX010) 0.13 & 2.60 r
ASIC_int/FIFO/fifo_data_regX0Xhb3_0_I_1/Z (BUFX010) 0.15 & 2.75 r
ASIC_int/FIFO/ff_pos_reg/D (R_SFD6QSX010) <- 0.00 & 2.75 r
data arrival time 2.75
clock fifo_clk_a_o (rise edge) 0.00 0.00
clock network delay (propagated) 3.17 3.17
clock reconvergence pessimism 0.00 3.17
ASIC_int/FIFO/ff_pos_reg/CP (R_SFD6QSX010) 3.17 r
library setup time -0.21 2.96
data required time 2.96
————————————————————————————————————
data required time 2.96
data arrival time -2.75
————————————————————————————————————
slack (MET) 0.20
Point Incr Path
————————————————————————————————————
clock dqs_in (fall edge) 3.20 3.20
clock source latency 0.00 3.20
ASIC_int/DQS_PAD/PAD (XXX_PAD) 0.00 3.20 f
ASIC_int/MEM_DQS (ASIC_int) 0.00 + 3.20 f
inst_RAM/DQS (RAM) 0.00 + 3.20 f
inst_RAM/Tdqsq_neg_max_in_join (RAM) 0.00 3.20 f
inst_RAM/DQ (RAM) -0.50 H 2.70 f
ASIC_int/MEM_DQ (ASIC_int) 0.00 2.70 f
ASIC_int/DQ_PAD/PAD (XXX_PAD) 0.00 2.70 f
ASIC_int/DQ_PAD/OUTI (XXX_PAD) <- 0.57 & 3.28 f
ASIC_int/FIFO/u10/Z (R_SMX2X010) 0.18 & 3.45 f
ASIC_int/FIFO/u620_C4_1/Z (R_SMX2X015) 0.16 & 3.62 f
ASIC_int/FIFO/u67_C2/Z (R_SND2X015) 0.10 & 3.72 r
ASIC_int/FIFO/u87_C6_2/Z (R_SOND1I1X010) 0.08 & 3.79 f
ASIC_int/FIFO/fifo_data_regX6Xhb1_0_I/Z (BUFX010) 0.08 & 3.87 f
ASIC_int/FIFO/fifo_data_regX6Xhb3_0_I/Z (BUFX010) 0.09 & 3.97 f
ASIC_int/FIFO/ff_pos_reg/D (R_SFD6QSX010) <- -0.01 & 3.96 f
data arrival time 3.96
clock fifo_clk_a_o (rise edge) 0.00 0.00
clock network delay (propagated) 3.96 3.96
clock reconvergence pessimism 0.00 3.96
ASIC_int/FIFO/ff_pos_reg/CP (R_SFD6QSX010) 3.96 r
library hold time -0.00 3.96
data required time 3.96
————————————————————————————————————
data required time 3.96
data arrival time -3.96
————————————————————————————————————
slack (MET) 0.00
Point Incr Path
————————————————————————————————————
clock dqs_in (fall edge) 3.20 3.20
clock source latency 0.00 3.20
ASIC_int/DQS_PAD/PAD (XXX_PAD) 0.00 3.20 f
ASIC_int/MEM_DQS (ASIC_int) 0.00 + 3.20 f
inst_RAM/DQS (RAM) 0.00 + 3.20 f
inst_RAM/Tdqsq_neg_max_in_join (RAM) 0.00 3.20 f
inst_RAM/DQ (RAM) 0.50 H 3.70 r
ASIC_int/MEM_DQ (ASIC_int) 0.00 3.70 r
ASIC_int/DQ_PAD/PAD (XXX_PAD) 0.00 3.70 r
ASIC_int/DQ_PAD/OUTI (XXX_PAD) <- 0.80 & 4.51 r
ASIC_int/FIFO/u10/Z (R_SMX2YX010) 0.37 & 4.87 r
ASIC_int/FIFO/u570_C4_1/Z (R_SMX2YX040) 0.26 & 5.13 r
ASIC_int/FIFO/u101_C2/Z (R_SAN2IX010) 0.27 & 5.40 r
ASIC_int/FIFO/u98_C3_2/Z (R_SAORI1X010) 0.15 & 5.55 r
ASIC_int/FIFO/fifo_data_regX5Xhb1_0_I/Z (BUFX010) 0.14 & 5.69 r
ASIC_int/FIFO/fifo_data_regX5Xhb3_0_I/Z (BUFX010) 0.13 & 5.83 r
ASIC_int/FIFO/ff_neg_reg/D (R_SFD6QSX010) <- 0.01 & 5.84 r
data arrival time 5.84
clock fifo_clk_b_o (rise edge) 3.20 3.20
clock network delay (propagated) 3.15 6.36
clock reconvergence pessimism 0.00 6.36
ASIC_int/FIFO/ff_neg_reg/CP (R_SFD6QSX010) 6.36 r
library setup time -0.22 6.14
data required time 6.14
———————————————————————————————————–
data required time 6.14
data arrival time -5.84
———————————————————————————————————–
slack (MET) 0.31
Point Incr Path
———————————————————————————————————–
clock dqs_in (rise edge) 6.41 6.41
clock source latency 0.00 6.41
ASIC_int/DQS_PAD/PAD (XXX_PAD) 0.00 6.41 r
ASIC_int/MEM_DQS (ASIC_int) 0.00 + 6.41 r
inst_RAM/DQS (RAM) 0.00 + 6.41 r
inst_RAM/Tdqsq_pos_max_in_join (RAM) 0.00 6.41 r
inst_RAM/DQ (RAM) -0.50 H 5.91 f
ASIC_int/MEM_DQ (ASIC_int) 0.00 5.91 f
ASIC_int/DQ_PAD/PAD (XXX_PAD)
0.00 5.91 f
ASIC_int/DQ_PAD/OUTI (XXX_PAD) <- 0.57 & 6.48 f
ASIC_int/FIFO/u10/Z (R_SMX2X015) 0.18 & 6.67 f
ASIC_int/FIFO/u587_C4_1/Z (R_SMX2X020) 0.12 & 6.79 f
ASIC_int/FIFO/u94_C1/Z (R_SAN2X015) 0.08 & 6.87 f
ASIC_int/FIFO/u61_C3_2/Z (R_SANR2X010) 0.11 & 6.98 r
ASIC_int/FIFO/u61_C3_2_MP_INV/Z (R_SIVYX060) 0.01 & 6.99 f
ASIC_int/FIFO/fifo_data_regX2Xhb1_0_I/Z (BUFX010) 0.08 & 7.07 f
ASIC_int/FIFO/fifo_data_regX2Xhb3_0_I/Z (BUFX010) 0.09 & 7.16 f
ASIC_int/FIFO/ff_neg_reg/D (R_SFD6QSX010) <– 0.00 & 7.16 f
data arrival time 7.16
clock fifo_clk_b_o (rise edge) 3.20 3.20
clock network delay (propagated) 3.95 7.16
clock reconvergence pessimism 0.00 7.16
ASIC_int/FIFO/ff_neg_reg/CP (R_SFD6QSX010) 7.16 r
library hold time -0.00 7.16
data required time 7.16
———————————————————————————————————–
data required time 7.16
data arrival time -7.16
———————————————————————————————————–
slack (MET) 0.00
6.0 Timing Analysis of Write Transaction
The timing analysis of the write transaction is more complex than the read transaction. This is due to the clock and data structure.
Timing checks performed by write STA testcase:
Figure 11 DDR Interface Write Path
Figure 12 Basic WRITE Timing Parameters for DQs
The half cycle phase shifted output of the DCC delay line is not recognized by Primetime as the inverted version of IN_CK. This problem is overcome by avoiding timing checks between the IN_CK and IN_CK_N.
The symmetrical nature of the FF-XOR circuits of the datapath and the FF-XNOR circuit of DQS allows us to avoid timing checks between IN_CK and IN_CK_N without compromising
the accuracy of the timing analysis.
These circuits are symmetrical in the sense that the delays between the FFs and the XOR/XNOR gates are balanced. And latencies to the complementary FF pairs are matched during clock tree synthesis.
This involved two Primetime runs and disabling the timing arcs of the XOR/XNOR gates. It was later realized that this could be done in one run with the use of additional generated clocks.
Timing arcs disabled in the first write STA run:
A -> Z timing arc of the DQS XNOR gate
A-> Z timing arc of the data path XOR gates
Timing arcs disabled in the second write STA run:
B -> Z timing arc of the DQS XNOR gate
B -> Z timing arc of the data path XOR gates
Clock definition
The clock definitions are the same for both runs.
create_clock -name IN_CK -waveform {0.0 3.205} -period 6.41 [get_port "IN_CK"]
create_generated_clock -name “dqs_out” -source [get_port IN_CK] \
-multiply_by 1 -invert -duty_cycle 50 [get_pins "ASIC_int/DQS_PAD/DQ"]
Figure 13 DDR Write Waveform
Setup checks : The setup checks are performed from the rising edge of wrck to the falling edge of DQS and from the rising edge of wrck_n to the rising edge of DQS.
Hold checks : The hold checks are performed from the rising edge of wrck to the rising edge of DQS and from the rising edge of wrck_n to the falling edge of DQS
The following false path are specified so that timing checks are performed at the right edges.
set_false_path -setup -rise_from IN_CK -through inst_ RAM/DQ -fall_to dqs_out
set_false_path -setup -fall_from IN_CK -through inst_ RAM/DQ -rise_to dqs_out
set_false_path -hold -rise_from IN_CK -through inst_ RAM/DQ -rise_to dqs_out
set_false_path -hold -fall_from IN_CK -through inst_ RAM/DQ -fall_to dqs_out
Timing report:
out_ffp
Setup check :
Startpoint: ASIC_int/write_data_inst/out_ffp/Q_reg
(rising edge-triggered flip-flop clocked by IN_CK)
Endpoint: inst_RAM
(rising edge-triggered flip-flop clocked by dqs_out’)
Path Group: IO
Path Type: min
P=oint Incr Path
————————————————————————————————————
clock IN_CK (rise edge) 0.00 0.00
clock network delay (propagated) 3.30 3.30
ASIC_int/write_data_inst/out_ffp/Q_reg/CP (R_SFD6QSX060) 0.00 3.30 r
ASIC_int/write_data_inst/out_ffp/Q_reg/Q (R_SFD6QSX060) 0.14 & 3.44 f
ASIC_int/write_data_inst/xor_gate/Z (R_SEO2YX010) 0.08 & 3.52 f
ASIC_int/write_data_inst/BL2_BUF15/Z (R_SBUFX060) 0.05 & 3.58 f
ASIC_int/write_data_inst/u11/Z (R_SMX2IX080) 0.05 & 3.62 r
ASIC_int/DQ_PAD/PAD (XXX_PAD) 1.87 H 5.49 f
ASIC_int/MEM_DQ (ASIC_int) 0.00 + 5.49 f
inst_RAM/DQ (RAM) 0.00 + 5.49 f
inst_RAM/DQ__check_pin_1 (RAM) 0.00 5.49 f
data arrival time 5.49
clock dqs_out (rise edge) 0.00 0.00
clock network delay (propagated) 4.94 4.94
clock reconvergence pessimism -0.21 4.73
inst_RAM/DQS (RAM) 4.73 r
library hold time 0.65 5.38
data required time 5.38
———————————————————————————————————–
data required time 5.38
data arrival time -5.49
———————————————————————————————————–
slack (MET) 0.11
Hold check:
Setup check :
Startpoint: ASIC_int/write_data_inst/out_ffp/Q_reg
(rising edge-triggered flip-flop clocked by IN_CK)
Endpoint: inst_RAM
(rising edge-triggered flip-flop clocked by dqs_out’)
Path Group: IO
Path Type: max
Point Incr Path
———————————————————————————————————–
clock IN_CK (rise edge) 0.00 0.00
clock network delay (propagated) 4.20 4.20
ASIC_int/write_data_inst/out_ffp/Q_reg/CP (R_SFD6QSX060) 0.00 4.20 r
ASIC_int/write_data_inst/out_ffp/Q_reg/Q (R_SFD6QSX060) 0.22 & 4.42 r
ASIC_int/write_data_inst/xor_gate/Z (R_SEO2YX010) 0.16 & 4.58 r
ASIC_int/write_data_inst/BL2_BUF7/Z (R_SBUFX060) 0.13 & 4.71 r
ASIC_int/write_data_inst/u11/Z (R_SMX2IX080) 0.06 & 4.77 f
ASIC_int/DQ_PAD/PAD (XXX_PAD) 2.12 H 6.89 r
ASIC_int/MEM_DQ (ASIC_int) 0.00 + 6.89 r
inst_RAM/DQ (RAM) 0.00 + 6.89 r
inst_RAM/DQ__check_pin_1 (RAM) 0.00 6.89 r
data arrival time 6.89
clock dqs_out (fall edge) 3.20 3.20
clock network delay (propagated) 4.19 7.40
clock reconvergence pessimism 0.21 7.60
inst_RAM/DQS (RAM) 7.60 f
library setup time -0.65 6.95
data required time 6.95
———————————————————————————————————–
data required time 6.95
data arrival time -6.89
———————————————————————————————————–
slack (MET) 0.06
Point Incr Path
———————————————————————————————————-
clock IN_CK (rise edge) 0.00 0.00
clock network delay (propagated) 5.60 5.60
ASIC_int/write_data_inst/out_ffn/Q_reg/CP (R_SFD6QSX060) 0.00 5.60 r
ASIC_int/write_data_inst/out_ffn/Q_reg/Q (R_SFD6QSX060) 0.17 & 5.76 r
ASIC_int/write_data_inst/out_xor/xor_gate/Z (R_SEO2YX010) 0.06 & 5.82 f
ASIC_int/write_data_inst/out_xor/BL2_BUF18/Z (R_SBUFX060) 0.05 & 5.87 f
ASIC_int/ write_data_inst/u11/Z (R_SMX2IX080) 0.05 & 5.92 r
ASIC_int/DQ_PAD/PAD (XXX_PAD) 1.89 H 7.81 f
ASIC_int/MEM_DQ (ASIC_int) 0.00 + 7.81 f
inst_RAM/DQ (RAM) 0.00 + 7.81 f
inst_RAM/DQ__check_pin_1 (RAM) 0.00 7.81 f
data arrival time 7.81
clock dqs_out’ (rise edge) 0.00 0.00
clock network delay (propagated) 8.18 8.18
clock reconvergence pessimism -1.06 7.12
inst_RAM/DQS (RAM) 7.12 r
library hold time 0.65 7.77
data required time 7.77
———————————————————————————————————-
data required time 7.77
data arrival time -7.81
———————————————————————————————————-
slack (MET) 0.04
Hold check:
Startpoint: ASIC_int/write_data_inst/out_ffn/Q_reg
(rising edge-triggered flip-flop clocked by IN_CK)
Endpoint: inst_RAM
(rising edge-triggered flip-flop clocked by dqs_out’)
Path Group: IO
Path Type: max
Point Incr Path
———————————————————————————————————-
clock IN_CK (rise edge) 0.00 0.00
clock network delay (propagated) 7.43 7.43
ASIC_int/write_data_inst/out_ffn/Q_reg/CP (R_SFD6QSX060) 0.00 7.43 r
ASIC_int/write_data_inst/out_ffn/Q_reg/Q (R_SFD6QSX060) 0.22 & 7.64 r
ASIC_int/write_data_inst/out_xor/xor_gate/Z (R_SEO2YX010) 0.16 & 7.80 r
ASIC_int/write_data_inst/out_xor/BW1_BUF14003/Z (R_SBUFX060) 0.12 & 7.92 r
ASIC_int/ write_data_inst/u11/Z (R_SMX2IX080) 0.06 & 7.98 f
ASIC_int/DQ_PAD/PAD (XXX_PAD) 2.13 H 10.11 r
ASIC_int/MEM_DQ (ASIC_int) 0.00 + 10.11 r
inst_RAM/DQ (RAM) 0.00 + 10.11 r
inst_RAM/DQ__check_pin_1 (RAM) 0.00 10.11 r
data arrival time 10.11
clock dqs_out’ (fall edge) 3.20 3.20
clock network delay (propagated) 6.55 9.76
clock reconvergence pessimism 1.04 10.80
inst_RAM/DQS (RAM) 10.80 f
library setup time -0.65 10.15
data required time 10.15
———————————————————————————————————-
data required time 10.15
data arrival time -10.11
———————————————————————————————————-
slack (MET) 0.03
7.0 Miscellaneous timing checks
7.1 Clock and Data Skew Check
The skew between the clocks to the complementary FFs of the FF-XNOR/XOR circuits were checked by PrimeTime to ensure that these closed were closely balanced. This is necessary to maximize the data valid window at the output of the XNOR/XOR gate.
Figure 14 Skew Checks
Skew checking on CP pin
Scripts :
DQ
get_attribute [get_pin ASIC_int/write_data_inst/out_ffn/Q_reg/CP] max_rise_arrival]
get_attribute [get_pin ASIC_int/write_data_inst/out_ffp/Q_reg/CP] max_rise_arrival]
DQS
get_attribute [get_pin ASIC_int/DQS_inst/out_ffn/Q_reg/CP ] max_rise_arrival]
get_attribute [get_pin ASIC_int/DQS_inst/out_ffp/Q_reg/CP ] max_rise_arrival]
CK
get_attribute [get_pin ASIC_int/CK_inst/out_ffn/Q_reg/CP ] max_rise_arrival]
get_attribute [get_pin ASIC_int/CK_inst/out_ffp/Q_reg/CP ] max_rise_arrival]
CK_N
get_attribute [get_pin ASIC_int/CK_N_inst/out_ffn/Q_reg/CP ] max_rise_arrival]
get_attribute [get_pin ASIC_int/CK_N_inst/out_ffp/Q_reg/CP ] max_rise_arrival]
Report :
Object name max_rise_arrival
ASIC_int/write_data_inst/out_ffn/Q_reg/CP 2.483227
ASIC_int/write_data_inst/out_ffp/Q_reg/CP 2.483556
Delta : -0.000329
ASIC_int/DQS_inst/out_ffn/Q_reg/CP 2.529315
ASIC_int/DQS_inst/out_ffp/Q_reg/CP 2.569854
Delta : -0.040539
inst_combo_core_int/CK_inst/out_ffn/Q_reg/CP 2.001494
inst_combo_core_int/CK_inst/out_ffp/Q_reg/CP 2.009042
Delta : -0.007548
inst_combo_core_int/CK_N_inst/out_ffn/Q_reg/CP 1.994411
inst_combo_core_int/CK_N_inst/out_ffp/Q_reg/CP 2.008940
Delta : -0.014529
Skew on XOR pins :
Scripts :
DQ
get_attribute [get_pin ASIC_int/write_data_inst/xor_gate/A] max_rise_arrival]
get_attribute [get_pin ASIC_int/write_data_inst/xor_gate/A] max_fall_arrival]
get_attribute [get_pin ASIC_int/write_data_inst/xor_gate/B] max_rise_arrival]
get_attribute [get_pin ASIC_int/write_data_inst/xor_gate/B] max_fall_arrival]
DQS
get_attribute [get_pin ASIC_int/DQS_inst/xnor_gate/A] max_rise_arrival]
get_attribute [get_pin ASIC_int/DQS_inst/xnor_gate/A] max_fall_arrival]
get_attribute [get_pin ASIC_int/DQS_inst/xnor_gate/B] max_rise_arrival]
get_attribute [get_pin ASIC_int/DQS_inst/xnor_gate/B] max_fall_arrival]
CK
get_attribute [get_pin ASIC_int/CK_inst/xnor_gate/A] max_rise_arrival]
get_attribute [get_pin ASIC_int/CK_inst/xnor_gate/A] max_fall_arrival]
get_attribute [get_pin ASIC_int/CK_inst/xnor_gate/B] max_rise_arrival]
get_attribute [get_pin ASIC_int/CK_inst/xnor_gate/B] max_fall_arrival]
CK_N
get_attribute [get_pin ASIC_int/CK_N_inst/xnor_gate/A] max_rise_arrival]
get_attribute [get_pin ASIC_int/CK_N_inst/xnor_gate/A] max_fall_arrival]
get_attribute [get_pin ASIC_int/CK_N_inst/xnor_gate/B] max_rise_arrival]
get_attribute [get_pin ASIC_int/CK_N_inst/xnor_gate/B] max_fall_arrival]
Report :
Object name max_rise_arrival max_fall_arrival
ASIC_int/write_data_inst/xor_gate/A 2.699914 2.669041
ASIC_int/ write_data_inst/xor_gate/B 2.700185 2.669248
Delta: -0.000271 -0.000207
ASIC_int/DQS_inst/xnor_gate/A 2.818815 2.783856
ASIC_int/DQS_inst/xnor_gate/B 2.820106 2.764866
Detla : -0.001291 0.01899
inst_combo_core_int/CK_inst/xor_gate/A 2.286438 2.236481
inst_combo_core_int/CK_inst/xor_gate/B 2.255162 2.215980
Difference : 0.031276 0.020501
inst_combo_core_int/CK_N_inst/xor_gate/A 2.285899 2.236262
inst_combo_core_int/CK_N_inst/xor_gate/B 2.251448 2.210799
Difference : 0.034451 0.025463
8.0 Conclusions and Recommendations
In this paper we have shown the successful application of the Internal Path approach (also known as the System level STA methodology) to the timing closure of our DDR interface.
The challenges arising from the complex timing requirements of the DDR interface are elegantly addressed by the internal path methodology. The authors agree with R Ko [3] that the benefits of this methodology are: flexibility, scalability and visibility. We would like to add that this methodology is very intuitive. The authors speculate that the elegance and intuitive nature of this technique will result in its wide spread adoption in our industry.
9.0 Acknowledgements
The author would like to thank all members of the Infineon Bristol design center EBU team and Infineon Xi’an design center COMBO Implementation team for their kind support and valuable inputs.
10.0 References
[1] A. Cheng , “Working with DDRs in PrimeTime”, SNUG San Jose 2002
[2] F. Hui, “Static Timing Verification for Complex SoC design-Part I, DDR SDRAM Timing Check in Primetime Revisit”,SNUG Singapore 2002
[3] R. Ko, “System level STA methodology with DDR”, SNUG San Jose 2004
[4] Double data rate (DDR) SDRAM HYB/E18M512160AF datasheet, Infineon.
[5] Synopsys, “PrimeTime Modeling User Guide”, Ch 6, App A-D.
For information on how we accomplished the timing closure of the delay lines and the phase detector please email
fu-yang.qu@infineon.com
11.0 Appendix ?Stamp model
RAM.mod :
MODEL
MODEL_VERSION “1.0″;
DESIGN “RAM”;
/*CTRL =
INPUT CAS;
INPUT RAS;
INPUT WE;
INPUT CS[0:3];
INPUT CSA[0:3];
INPUT ADV;
*/
INPUT CKE;
INPUT CTRL[0:11];
INOUT DQS;
INPUT DM[0:1];
INPUT CK;
INPUT CK_N;
INPUT A[0:26];
INOUT DQ[0:15];
MODE rw = read, write;
/* Data Out */
Tacc_pos_max : DELAY (POSEDGE, EQUIVALENT) CLK DQ MODE(rw=read);
Tacc_neg_max : DELAY (NEGEDGE, EQUIVALENT) CLK DQ MODE(rw=read);
Tdqsq_pos_max : DELAY (POSEDGE, EQUIVALENT) DQS DQ MODE(rw=read);
Tdqsq_neg_max : DELAY (NEGEDGE, EQUIVALENT) DQS DQ MODE(rw=read);
Tdqsq_retain_pos_max : RETAIN (DELAY = Tdqsq_pos_max);
Tdqsq_retain_neg_max : RETAIN (DELAY = Tdqsq_neg_max);
/* DQS Out */
Tacc_dqs_pos_max : DELAY (POSEDGE, EQUIVALENT) CK DQS MODE(rw=read);
Tacc_dqs_neg_max : DELAY (NEGEDGE, EQUIVALENT) CK DQS MODE(rw=read);
/* CTRL & ADDR */
Tctrl_setup : SETUP (POSEDGE, EQUIVALENT) CTRL CK;
Tctrl_hold : HOLD (POSEDGE, EQUIVALENT) CTRL CK;
Taddr_setup : SETUP (POSEDGE, EQUIVALENT) A CK;
Taddr_hold : HOLD (POSEDGE, EQUIVALENT) A CK;
Tcke_setup : SETUP (POSEDGE) CKE CK;
Tcke_hold : HOLD (POSEDGE) CKE CK;
/* DQS IN */
Tdss : SETUP (POSEDGE, EQUIVALENT) DQS CK;
Tdsh : HOLD (POSEDGE, EQUIVALENT) DQS CK;
/* DQS[0] Data In */
Tinput_setup_pos : SETUP (POSEDGE, EQUIVALENT) DQ DQS MODE(rw=write);
Tinput_hold_pos : HOLD (POSEDGE, EQUIVALENT) DQ DQS MODE(rw=write);
Tinput_setup_neg : SETUP (NEGEDGE, EQUIVALENT) DQ DQS MODE(rw=write);
Tinput_hold_neg : HOLD (NEGEDGE, EQUIVALENT) DQ DQS MODE(rw=write);
Tinput_setup_dm_pos : SETUP (POSEDGE, EQUIVALENT) DM DQS MODE(rw=write);
Tinput_hold_dm_pos : HOLD (POSEDGE, EQUIVALENT) DM DQS MODE(rw=write);
Tinput_setup_dm_neg : SETUP (NEGEDGE, EQUIVALENT) DM DQS MODE(rw=write);
Tinput_hold_dm_neg : HOLD (NEGEDGE, EQUIVALENT) DM DQS MODE(rw=write);
ENDMODEL
RAM.da :
MODELDATA
DESIGN “RAM”;
DATE “01 Aug 05″;
PROGRAM “Manually Composed”;
VERSION “1.0″;
PORTDATA
CTRL[0:11] : CAP(0.0);
CLK : CAP(0.0);
CLK_N : CAP(0.0);
CKE : CAP(0.0);
DQS : CAP(0.0);
DM[0:1] : CAP(0.0);
A[0:26] : CAP(0.0);
DQ[0:15] : CAP(0.0);
ENDPORTDATA
TIMINGDATA
/* Data Out */
ARCDATA
Tacc_pos_max :
CELL_RISE(SCALAR) {
VALUES(”5.4″);
}
CELL_FALL(SCALAR) {
VALUES(”5.4″);
}
ENDARCDATA
ARCDATA
Tacc_neg_max :
CELL_RISE(SCALAR) {
VALUES(”5.4″);
}
CELL_FALL(SCALAR) {
VALUES(”5.4″);
}
ENDARCDATA
ARCDATA
Tdqsq_pos_max :
CELL_RISE(SCALAR) {
VALUES(”0.5″);
}
CELL_FALL(SCALAR) {
VALUES(”0.5″);
}
ENDARCDATA
ARCDATA
Tdqsq_neg_max :
CELL_RISE(SCALAR) {
VALUES(”0.5″);
}
CELL_FALL(SCALAR) {
VALUES(”0.5″);
}
ENDARCDATA
ARCDATA
Tdqsq_retain_pos_max :
CELL_RISE(SCALAR) {
VALUES(”-0.5″);
}
CELL_FALL(SCALAR) {
VALUES(”-0.5″);
}
ENDARCDATA
ARCDATA
Tdqsq_retain_neg_max :
CELL_RISE(SCALAR) {
VALUES(”-0.5″);
}
CELL_FALL(SCALAR) {
VALUES(”-0.5″);
}
ENDARCDATA
/* DQS OUT */
ARCDATA
Tacc_dqs_pos_max :
CELL_RISE(SCALAR) {
VALUES(”5.4″);
}
CELL_FALL(SCALAR) {
VALUES(”5.4″);
}
ENDARCDATA
ARCDATA
Tacc_dqs_neg_max :
CELL_RISE(SCALAR) {
VALUES(”5.4″);
}
CELL_FALL(SCALAR) {
VALUES(”5.4″);
}
ENDARCDATA
/* CTRL & ADDR */
ARCDATA
Tctrl_setup :
CONSTRAINT(SCALAR) {
VALUES(”1.3″);
}
ENDARCDATA
ARCDATA
Tctrl_hold :
CONSTRAINT(SCALAR) {
VALUES(”1.3″);
}
ENDARCDATA
ARCDATA
Taddr_setup :
CONSTRAINT(SCALAR) {
VALUES(”1.3″);
}
ENDARCDATA
ARCDATA
Taddr_hold :
CONSTRAINT(SCALAR) {
VALUES(”1.3″);
}
ENDARCDATA
ARCDATA
Tcke_setup :
CONSTRAINT(SCALAR) {
VALUES(”1.3″);
}
ENDARCDATA
ARCDATA
Tcke_hold :
CONSTRAINT(SCALAR) {
VALUES(”1.3″);
}
ENDARCDATA
/* DQS IN */
ARCDATA
Tdss:
FALL_CONSTRAINT(SCALAR) {
VALUES(”1.3″)
}
ENDARCDATA
ARCDATA
Tdsh:
FALL_CONSTRAINT(SCALAR) {
VALUES(”1.3″)
}
ENDARCDATA
/* related to DQS Data In */
ARCDATA
Tinput_setup_pos :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ARCDATA
Tinput_hold_pos :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ARCDATA
Tinput_setup_neg :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ARCDATA
Tinput_hold_neg :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ARCDATA
Tinput_setup_dm_pos :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ARCDATA
Tinput_hold_dm_pos :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ARCDATA
Tinput_setup_dm_neg :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ARCDATA
Tinput_hold_dm_neg :
CONSTRAINT(SCALAR) {
VALUES(”0.65″);
}
ENDARCDATA
ENDTIMINGDATA
ENDMODELDATA



