Improve Synthesis Results with Useful Clock Skew


Lei Zhang(Jerry ), Feng Wang(Richard), Pei Wen(Adrian), Hui Fu
Infineon Technologies ( Xi’an ) Co., Ltd.
5F, Block A, No.38, Gaoxin 6th Road, Xi’an, Shaanxi 710075

Abstract

It is a challenge that designer completes a complex chip design with low cost constraints such as small chip size and short development cycle when the price of silicon is much higher for UDSM technology and the requirements are much more complicated. How to ensure the timing closure becomes one of the hottest topics in the EDA community.

Synthesis is very important in the design flow of the chip development. SYNOPSYS tools are leveraging tapeout-proven RTL to GDSII implementation flows with a relatively predictable results. This paper investigates how to improve the synthesis results with useful clock skew for timing optimization. This paper uses an automotive microcontroller as an example to show the synthesis strategies, especially how to improve the synthesis results with useful clock skew during the implementation..

1. Background of design

Automotive industrial semiconductor demands quality, reliability and wider range of temperature profile. To improve the quality and test coverage, designer also needs to consider the implementations of DFT. Designer must reduce the chip size and power consumption with full implementations of the high quality requirement. Synthesis is a key process of the design flow, the result of synthesis is based on several factors such as complexity of design, requirements of custom and clock frequency. Therefore, it is necessary and important that designer chooses a good synthesis strategy with reasonable and precise constraints.

XC866 is a new type of 8bit microcontroller which integrates high performance versions of the popular 8051 8-bit core, on-chip flash and additional functional peripherals. The instruction cycle of this 8051 core is 75-150ns@26.67Mhz of CPU clock (for memory access without wait state). XC866 has an on-chip oscillator and PLL for clock generation. Its I/O port supplies supports 3.3 to 5.0V and core logic supply at 2.5 V (generated by embedded voltage regulator).

The block diagram of Elektra is shown in Figure 1-1.

Clock Generation structure of Elektra is shown in Figure 1-2.

Due to the high price of UDSM technology and low selling price of 8bit chip required by custom, synthesis should get a result that the die-size of chip is small enough to reduce the cost. The difficulties are below:

1. High clock frequency, small die-size

2. Complex clock structure. Tight timing of flash interface circuit, any timing problem can bring on failing of read/write.

3. >99% test coverage

2. RTL to GDS flow and Synthesis strategy

We are using synopsys tools as our main synthesis flow. Firstly the RTL design is from the VHDL coding; secondly, use Design Compile to do logic synthesis and test compiler to do the ATPG scan insertion; if there is no big timing violation, (setup time violation within 1 ns in this project setting) then we goes to Astro to do the layout Placement and Routine; finally, if there is no congestion problem in the pre-routine, it will goes to detail routine; ultimate result will be checked by STA with Primetime. If any unresolved violation in a certain stage, design will back to its previous step.

Synthesis strategy should be decided according to design specification at the beginning. It influences the way of splitting the modules and timing budget of the sub-modules. For the complex design, designers use top-down strategy. The steps of Synthesis flow is as below ( please refer to appendix for the details of synthesis script file ) :

1. Compile for gdb file mapping to db file, do not consider timing and area.

2. Prepare inserting scan chain for DFT.

3. Use compile_ultra to reduce area, get regular timing results and insert scan chain.

4. Do boundary optimization for reducing area and write out the final synthesis results.

After these traditional methods of synthesis, the critical path still has timing violation even after we have try various tricks to improve the timing and area. According the clock structure, designer adds useful clock skew on the clock tree. The synthesis results can be improved.

Synthesis platform: Sever Solaris Operation System

Synthesis tool: SYNOPSYS Design Compiler

Script language: TCL

3. Influence of timing and area constraints

Timing and area constraints are driving the design to different directions. Normally, area is increased by tight timing constraints. On the other hand if area is reduced to smaller size, it may introduce timing violations on critical path. It is an aim that how to balance the timing and area constraints to get best results of synthesis.

If we can not reach the timing closure (setup violation) with the traditional commands of synthesis, such as wire load module,flatten sub-module, set critical_range and group_path etc. Designer can try to improves synthesis results with useful clock skew to reduce area as same time getting better timing results.

4. Useful clock skew implementation

4.1 The timing view before using the useful clock skew

Now, the below is the clock structure diagram for Elektra project.

The design are divided into four clock domains: PCLK_O(26.6Mhz), CCLK_O(26.6Mhz), MCCLK_O(26.6Mhz) and CCLK3_O(80Mhz), which are generated clocks from the same source of PLL. So, we will set multi-cycle path from CCLK3_O to other clock domains.

According to the traditional synthesis methodology and using the DC_Ultra compiler tool, we can get a good synthesis results. However, some of timing for critical paths still can’t meet. The timing report is the following.
Timing histogram diagram

From the timing reports and the histogram diagram, we found the max. Negative slack is about 2.5ns. Now, let’s try to analyze the one of the critical paths, which came from ‘Flash_array’ to 8051w module.

From the timing report, the combination logics’ cell had already replaced by high speed cells and DC had already try to optimized the combinational logic. However, the data required time is 32.12ns, the data arrival time is more than 33.19ns, so far, the setup timing violation couldn’t be avoided.

Except change the design strategy or even concept, it looks no good solution to fix this setup time violation. But it means more develop cycle in the design procedure, we must look some way to resolve this problem.

4.2 Application of useful clock skew

Fortunately, through our study of the clock structure, we found a way to solve the issue. The key is the useful clock skew. What’s the useful clock skew? To make it simple, the useful clock skew means intentionally introduced clock latency. You can set different latency to different clock domain in DC, such as:

set_clock_latency  xxx  clock_domain

The different clock latency will make the difference in the clock arrival time to the clock pin of sequential cells (Flip-flops, Latches, memories, etc.). To reduce the latency of the clock pin in launch cell, but add more latency on the capture cells, the critical paths setup timing violation is resolved without changing any design.

The clock structure diagram after inserting the clock skew is below:

Of course, we will use the same critical path to show the timing results after using the clock skew. The new critical path circuit is the below:

The timing histogram diagram:

Form the above timing reports and histogram diagram, the maximum setup violation is less than 0.9ns and the total of setup timing violations has already decreased about 78% than the previous results. With this logic synthesis result, and from our experience, it should have no problems to start a layout place and routine.

Although had we already gotten a better results for setup timing violations, unfortunately, we noticed that the adjusted clock skew would introduce a certain amount of hold timing violations for data path at test mode.

The main reason is, in this design, to save package cost; we have quite minimum number of pins. Meanwhile, ATPG compress technology is used, so only one scan clock pin can be defined. It is not possible to have multiple scan clocks with different clock skews. In another condition, test compiler is used to insert scan chain, and the chain stitch is according to alphabetical order. It can not follow the sequence of latency among the clocks, unless we purposely write very complex scripts to do such. So the scan flip-flops are connected without any order of their clock latency, which will induce great number of hold time violations. As showed in figure 4-7.

If all of them need to be fixed at data paths by adding delay buffers, which is the common way we are used for balanced the clock tree, it will increase the area of chip significantly or spend a lot of develop time on balancing the clock tree between function mode and test mode. So the remaining problem is: how to find an easy way to implement the clock tree and reduce the iteration to balance between functional mode and test mode?

4.3 Fixed the hold violations for test mode

Because we make the difference intentionally on the clock skew, if you fixed the hold violations at clock for test mode, which will affect the setup timing of function mode. After analyzing the clock domain structure, finally, we found a way to solve the issue:

1. Make the test mode and function mode go through different paths.

2. Fix the hold violation of test mode at clock tree, instead of fixing at data path

The below diagram shown the new adjustment for clock structure.

Form the figure 4-8, we inserted a multiplexer (MUX) and some delay buffers at each of clock trees. The MUX will separate the different timing paths for the function mode and test mode. Reader can easily found here an iteration of pre-layout is needed to get the preliminary values of clock skew. And the delay buffers are to be used to compromise these skews.

Finally, DC recompile the design with these useful skew setting and generate the SDC file for BE. The file includes all constraints in detail and will instruct the BE tool how to implement the clock tree. Of course, DC also needs to take care of the big hold violation and to delivery a clean result for backend. The below diagram will be used as a reference for layout engineer. That means we will use this clock tree skew requirement for our layout clock tree building up.

Now, we will give an explanation for how to implement the clock tree as we shown in the diagram, such as the CCLK_O and CCLK3_O domain. In function mode, the CCLK_O will go through the path 1 and the clock skew is 5.6ns (0.6+ 2.0 + 0.2 + 2.8). The CCLK3_O will go through the path 3 and the clock skew is 1.8ns (0.6 + 0.2 + 1.0). So, we will make about 3.8ns (5.6 – 1.8) slack for critical path at the function mode. However, in test mode, we only need to focus on the path 2 and path 4. The clock skew of path 2 is 3.4ns (0.6 + 2.8), the path 4 is also 3.4ns (0.6 + 2.8). So, the functional clock skew will not affect the clock skew in the test mode. According to the methodology, the BE engineer will be easier to fix the hold violation.

One side beneficent of big skewed clock tree is the power signal quality, especially in the design like us mix-of the digital and analog power/ground. As not all sequential cells switch at same time, so there will be less pick current and less noise in the power lines. It is only consider as an extra-bonus, rather than our purpose.

5. Limitation for useful clock skew

From the above content, although the useful skew helps us to solve the complex timing issue, we also notice there are some limitations.

1. Make the clock structure more complex

Because the division of clock path is used for different mode, the MUX is inserted. We need to define more clocks then the original clock structures. So, the relationship between each clock groups needs to pay special attention. Especially an pre-layout is need to found out these skew value, as mentioned before. By the way, all of MUX need to be instantiated and need to be kept during the optimization of synthesis. So, the design has to be a little bit changing.

2. Timing squeeze another clock path

As we discussed, the useful skew solves the timing issues for the critical paths. However, the backyard paths which are started from the capture cells in the critical path will be squeezed on timing. They also have possibility to become new critical paths. The bellowing will show this case.

From the Figure (5-1), the path is our critical path at beginning. After the using of useful skew, the timing violation is disappeared. However, the timing from the clock domain of CCLK_O to CCLK3_O becomes very tight. As Figure (5-2) shown, Because of the un-balanced clock tree, the previous slack of this path has been eaten up, a new critical path will appear. Under this situation, the designer needs to analyze the circuit very carefully to balance the timing of those critical paths to get the best synthesis results. Had been checked the result of layout by STA, the timing closure of the chip implementation has been achieved by using this useful clock skew method..

6. Summary

Synthesis is playing a more and more important role in today’s complex ASIC design. It helps the designers to implement the real circuit using their design code and provides the guidance for the following layout task. Synopsys Design Compiler, as the desirable and leader in the synthesis industry, provides the designer powerful supports by its comprehensive command set, evolutional optimization algorithm and advanced synthesis strategy. In order to meet the stringent timing and area requirements, designers should not only just use the commands provided by the tool, but also need to analyze the real circuit situation. In this article, the authors illustrate the usage of clock skew which is always to be avoided by the designer in the synthesis process, turn the disadvantage points in the design to be the strong points to fully exploit the tool’s strength.

.

7. Acknowledgment

The authors would like to thank the cooperation of our Singapore layout engineer Koh Keng Kuay and Nabiwullah Rehman on this strategy, and the help from all colleagues of AIM MC Dep. of Infineon Technology (Xi’an), and last but not least, the highly support from Synopsys ACs Mrs. Tina (Jiang Tianjing).

8. Reference

[1] Synopsys Design Compiler User Manual
[2] VHDL coding and logic Synthesis with SYNOPSYS LEE,Advanced Micro Devices,Inc.
[3] Synthesis Highlights in DC2002.05 Snug 2002
[4] Module Compiler Incorporating Module Compiler into a Design Compiler Synthesis Flow
[5] Choosing a Compile Strategy Synopsys solvent DOC ID:901176
[6] Design Compiler Compile Strategies Synopsys solvent DOC ID:001999