Browse >
Home /
信息中心 /
SNUG /
2009论文集 / Using Separate Compile Technology to Improve VCS Compile Performance
Using Separate Compile Technology to Improve VCS Compile Performance
Lin xuemei, Kong yingqi,Wang xin,
Vimicro Corporation
linxuemei@vimicro.com
kongyingqi@vimicro.com
wangxin@vimicro.com
Abstract
Increasing system complexity has created a pressing need for better design tools and associated methodologies. Based on reuse of software and hardware functionality, has also gained increasing exposure. More and more demands have been introduced to the verification environment: such as coverage analysis, verification IP, reference module, etc. For Compiled-based simulator, the traditional incremental compile technology can not meet all needs of verification now. How to improve compile efficiency further to meet more requirements on verification, this article will introduce how to improve compile performance and adaption method by separate compile in detail.
Introduction
With the increasing complexity of design & verification, more and more demand for intelligent test was introduced into verification environment: such as coverage analysis, verification IP, reference module, etc. In order to pursue higher speed performance on simulation, the compiled-based simulator gradually replaced traditional explained-based simulator and became dominate simulator in the industry. However, it was also found that compile-based simulator need more time to analyze all files together at the compile time .Gate-level compile time requiring hours has become more and more common. Traditional technologies like incremental compile can’t meet the requirement of today’s verification complexity. Some specific methods, such as compile direction, parameter dynamic switching, compile option optimization, etc are not universal methods. To consider reuse and efficiency together, what is a common way to all users? How to use a common approach to improve compile efficiency further? This article describes how to build up a command method. This method has high-performance and flexible capabilities based on up-to-date VCS compile technology-separate compile.
Problem & requirement to classic verification flow
The classic IC design flow shown in Figure 1: In most cases, the design team and verification team co-work together from behavior-level to block & system-level. At different stage, varied tasks become design & verification focused.

Figure 1: Classic IC design flow
After RTL freeze, functional verification becomes the most important task for both design & verification teams. At this stage, more complex scenario appears compared to block level verification. According to different applications, frequent modifications are introduced at testbench and test cases to verify those scenarios. In a middle level complexity SoC chip, hundreds of testcases are very common. In order to speed up simulation, engineers often run different test cases in parallel. Test bench with tiny differences will be compiled lots of times according to parallel requirement. Even recompile is required just due to change the compile options only. Long compile time generally becomes a problem. How can we improve compile efficiency? By analysis, it is found that the most of the compile time is spent in three steps, re-scan all files, compiling update file, and complete linking. Even if there are no changes to most files, the scan process is inevitable at compile time. As compiled simulator becomes dominant simulator for great simulation performance, it’s expected that the compile time will be drastically reduced if separate compile & dynamic-link are supported. It’s beneficial to use shared libraries as in all advance compilers. It will be able to setup a more efficient and flexible method based on simulator technology. Considering project management simultaneously, it will be able to help at code consistency, except compile performance improvement and disk space saving. After all, it’s a new technology to introduce separate compile, would any new problems appear? By testing on image processing chip at VIMICRO, Separate compile shows 10x compile performance improvement without the need of any major code changes. This article shows the reader how to adopt separate compile step by step.
Solution
In the process to introduce separate compile, the first step is to understand verification environment and then partition the testbench. Test environment is shown in Figure 2; this TB (testbench) is compliant to VMM standard, including the TC / transaction layer, driver layer, interface layer. DUT (design under test) and TB are connected by interface. In most cases, the entire TB can be reused within a series of similar projects. To solve separate compile problem on this TB should be much helpful to series project. This TB contains: random control, transaction analysis, DUT case management, and some complex control. It is said that it’s hard enough to introduce separate compile on such TB. According to the requirement of separate compile, this verification environment should be divided into two major parts first, and then it should change some code to support the partition for separate compile, for detail process, it’s figured out in red as below:
Figure 2: Separate compile adaption flow
To generate shared library and complete separate compile, it’s most important to settle down step 1 to step 6 as figure 2. There are detail introduction as below:
a) XMR (cross module reference in TB)
i. TB->DUT (output) (TB to DUT)
vclp_intra_drv.sv: vclp_test_top.dut.dut.u_pad_ip.u_otp_rom2.mem
[16'h1800+cnt*6] = 8′h22;
1. Add interface port
vclp.if.sv
Logic [7:0] bd_data;
Logic [15:0] bd_addr;
2. Connect signal port
vclp_top.sv
always@* vclp_test_top.dut.dut.u_pad_ip.u_otp_rom2.mem
[vc_dut.bd_addr] = vc_dut.bd_data;
3. Control memory (back door)
vclp_intra_drv.sv
`ifndef VCS_SEP_COMP
vclp_test_top.dut.dut.u_pad_ip.u_otp_rom2.mem [16'h1800+cnt*6] = 8′h22;
`else
#0
sigs.bd_addr = 16′h1800+cnt*6;
sigs.bd_data = 8′h22;
`endif
ii. DUT->TB (Input) (DUT to TB)
1. Add DPI task (monitor event)
sync_cb.sv
`ifndef VCS_SEP_COMP
@(posedge (vclp_test_top.dut.dut.u_asicbody.u_asicbody_nor.u_sif.sg_en|vclp_test_top.dut.dut.u_asicbody.u_asicbody_nor.u_sif.mipi_en));
`else
xmr_dut_task(`XMR_PATH,1);
`endif
xmr_dpi.c
#include “svdpi.h”
#include “vcsuser.h”
//dut task to be called
extern void xrm_sync(int i);
extern int xrm_rd(int i);
void xmr_dut_task(char *task_scope,int i) {
svSetScope(svGetScopeFromName(task_scope));
xrm_sync(i);
}
int xmr_dut_function(char *task_scope, int i){
svSetScope(svGetScopeFromName(task_scope));
xrm_rd(i);
}
vclp_top.sv
module vc338lp (vc338lp_if vc338_dut);
export “DPI-C” task xrm_sync;
…
task xrm_sync(input int i);
begin
@(posedge (vc338lp_test_top.dut.dut.u_asicbody.u_asicbody_nor.u_sif.sg_en | vc338lp_test_top.dut.dut.u_asicbody.u_asicbody_nor.u_sif.mipi_en));
end
endtask
2. Add DPI task (fetch context)
`ifndef VCS_SEP_COMP
case (rx_cnt % 4)
0: rx_data[i] = `USB_MEM[reg_addr+i];
1: rx_data[i] = `USB_MEM[reg_addr+i] & 32′hff;
…
endcase // case (rx_cnt % 4)
`else
case (rx_cnt % 4)
0: rx_data[i] = xmr_dut_function(`XMR_PATH,reg_addr+i);
1: rx_data[i] = xmr_dut_function(`XMR_PATH,reg_addr+i) & 32′hff;
…
endcase
`endif
vclp_top.sv
module vc338lp (vc338lp_if vc338_dut);
export “DPI-C” function xrm_rd;
…
function int xrm_rd(input int i);
begin
xrm_rd = vclp_test_top.dut.U_usb_bfm_top.U_dsram2kx32.mem[i];
end
endfunction
b) DUT partition
Remove TB from file list and create separate file list for DUT. The limitation of separate compile shows that globe definition header should be included either inside package or in program block. The solution is to use macros to ensure parsing happen only one time to avoid override parameter. The same rule should be noticed at overlap class.
`ifndef Global_def
`define Global_def
c) Package partition
i. TB without vip to create package
`ifndef VCS_SEP_COMP
`include “vmm.sv”
`include “define.h”
`include “vclp_tb_cfg.sv”
`include “vclp_data.sv”
`else
package vclp_pkg;
`define XMR_PATH “vclp_test_top.dut”
import “DPI-C” context task xmr_dut_task(string task_scope, int i);
import “DPI-C” function int xmr_dut_function(string task_scope, int i);
`include “vmm.sv”
`include “define.h”
`include “vclp_tb_cfg.sv”
`include “vclp_data.sv”
endif
d) Program Modification
vclp_tb.sv: program changes to support separate compile
`ifndef VCS_SEP_COMP
//program vc338lp_main();
`include “vmm.sv”
`include “define.h”
`include “vclp_tb_cfg.sv”
…
`else
import vc347lp_pkg::*;
`endif
e) Gen shell file for separate compile
Command: vcs -sep_cmp -ntb_opts genShellOnly p1 -lca -override_timescale=1ns/10ps -l ./logs/gen_shell.log +notimingcheck
f) Linking & simulation
i. Control delay unit
To introduce separate compile, it’s restrict to enforce timescale consistency between TB with DUT. Here, timescale not impact delay and back-annotation Due to variable situation such as protected IP, it’s not easy to define timescale at top-level only. To resolve timescale inconsistency problem, it’s better to use option –override_timscale =1ns/10ps to define DUT & TB timescale at compile time.
ii. Random stability
For multi-threads execution (for example, a fork join of two or more threads with non-blocking code), different optimization strategies can impact random numbers repeatability. If those random numbers are just on control path, it will effect simulation execution and result confirmation. So it’s a safe way to replace $urandom_range or $random by $dist_uniform(seed,min,max) or $random(seed).The parameter of seed will help random generate to get same random number and keep matched execution. Though result & log comparison direct, simulation confirmation will become easy. It’s recommended here to use DVE to accelerate random debug.
iii. modifcaiton validation
It’s recommended to pat all changes inside a macro e.g.: `ifdef VCS_SEP_COMP instead of making inline change or overwriting code. To run and test changes under the same verification environment, it’s proven to confirm changes with matching result and log. This step is pre-condition to use separate compile.
iv. Environment setup and script overview:
After above steps have already been completed, it’s time to start to setup separate compile flow. It should include working libraries definition by Synopsys_sim.setup, make file update to add pack/program/dut compile & shell generation. With server farm (HPCC) such as LSF or Sun grid engineer, Separate compile flow is supported to distribute and localization. Of course, compile relation should be considered in advance. For example:
while (-e ./work/AN.DB/p1)
./simv -sep_cmp=p1 +direct +vcs+lic+wait -l ./logs/run.log
break
end
g) Limitation & suggestion:
i. Module is not allowed to use at package or program block
ii. The limitation of separate compile is that all classes should either be inside package or in program block
iii. it’s recommended to use option to increase compile-time performance first such as parallel compilation –j or dpi-c share library compile flow when separate compile flow is introduced. Because you have to change code to introduce separate compile.
To new project, it’s better to introduce separate compile from the beginning. With the same code, program stability should be guarantied.
Result Analysis
To analyze separate compile, please find test report as figure 3

Figure 3: Separate compile Result
1. Single compile
a) Compile time: 284(s)
b) Increment Compile time: 191(s)
2. Single compile with changes (+VCS_SEP_COMP)
c) Compile time: 284(s)
d) Increment compile Time: 226 (s)
Separate Compile
e) First compile : comp_pack(16s) +comp_prog(4s) +Gen_shell(1) +comp_dut(227) =248 (s)
f) Second compile:
i. TB change: Comp_pack:~16(s)
ii. Dut Change: Comp_dut :~227(s)
So the max time reduction by using separate compilation is 191-16/16=10.93(x) for TB change. For DUT changes, compile time increase (227-191)/191 =0.189(x).During RTL regression & GLS regression, compile once should be enough if DUT is not change. Then maximum compile performance should be introduced by separate compile. Also Disk space could be saved by using shared library. At the same time, this flow is help to avoid mistake introduced by different code.
Actually, separate compile technology is much helpful during the whole development process. From figure 4: To meet project goals, both TB & design are all need to modify according to detail requirements till all functions are correct. All modifications are discrete with each other. So the compile step could achieve the maximum efficiency and speed by separate compile technology. Beside above result, we have more data to prove this.

Figure 4: Coverage driven methodology

Figure 5: Simulation result
From figure 5, it’s no effect to simulation performance by separate compile.
Conclusion
From above date, compile performance could be improve by separate compile. With distribution control and share library management, compile problem should be resolved easy to avoid additional task like re-scan unchanged file, etc. As new technology supported by VCS, separate compile could be introduce to verification flow as a common way to reduce specific operation and improve reusability.
References
[1] VCS® / VCSiTM LCA Features Version B-2008.12 –Synopsys
[2] Verification methodology manual for Systemverilog – Janick Bergeron
Acknowledgements
Synopsys verification specialist: Chunlin Zhang
Synopsys VG CAE: Nitin Agarwal