Integrated Co-Simulation and Verification for Microprocessor on VCSTM Platform
Zhao Yulai, Sun Hanxin, Xie Jinsong, He Xi, Lin Xinfen
Microprocessor Research and Design Center
Peking University, China
{zhaoyulai, sunhanxin, xiejinsong, hexi, linxinfen}@mprc.pku.edu.cn
ABSTRACT
Traditional verification tool lacks the efficiency for integrated co-simulation of different modeling languages. The newly released VCSTM platform with its Native Testbench(NTB) kernel enables native compilation of testbenches written in OpenVera as well as DUT written in SystemC and Verilog. In this way, modules in different languages as well as in different abstraction level can be flexibly connected and compiled into executable simulators. A new modeling and verification tool flow is proposed under VCSTM platform to promote the efficiency over traditional methodology. The following problems are discussed:
● How to choose efficient abstraction level for co-simulation with SystemC modeling?
● How to use reference model for automatic randomization and vector test?
● How to exploit platform facilities and strategies for code and functional coverage?
● How to schedule computing resource efficiently with VCS simulation farm?
This paper addresses these problems by illustrating the modeling and verification process of a concrete microprocessor using VCSTM platform. The paper shows that the integrated platform simplifies the verification task and improve the verification performace.
1 Introduction
The great success of the VLSI design technology in the last decade has paved the way for System-on-Chip. The technological advances enable the SoC to integrate more processors and peripheral IPs, which dramatically increases the complexity. Meanwhile, embedded system must satisfy many constraints like cost, performance, design-time, etc. Under these constraints, HW/SW co-design methodologies have been widely adopted and continuously evolved. The HW/SW co-design language SystemC has been well supported in advanced verification platform like VCSTM, along with HDL and HVL to do co-simulation, which highly promotes the efficiency of the verification.
A new modeling and verification tool flow for microprocessor is shown in Figure 1. At the early stages of design definition, microarchitects start with analytical CPI (cycle-per-instruction) performance models that lead to execution-driven, cycle-by-cycle simulators. The goal of this design space exploration phase is to optimize the choice of microarchitectural parameters for CPI performance under design constraints known at that stage. These models are written in C/C++, and can be refined to form a behavior level SystemC model at the modeling stage. Meanwhile, the register transfer level (RTL) model is developed using a hardware description language such as Verilog from the specification. At the verification stage, behavior level SystemC and RT level Verilog models are co-simulated and co-verified. At the end of the high-level design phase, RT level model is ready for synthesis and behavior model is ready for integration into other HW/SW co-simulation environments.

In the modeling and verification tool flow, a complete behavior level SystemC model is used as a bridge between software and hardware. As a reference model, it can be refined to be timing-accurate and interface-consistent with RT level description. Meanwhile, the capacity of co-simulation with large-scale software better helps in debugging and improves functional coverage. How to choose appropriate abstraction level for co-simulation oriented behavior SystemC modeling will be discussed.
VCSTM platform provides powerful and flexible features to this co-simulation based modeling and verification flow: RTL assertion, NTB compiled simulation, co-simulation interface, graphical debugging, built-in coverage metrics, etc. They could dramatically help in accelerating the convergence of the verification process. We focus on using these tools in co-simulation and functional coverage. How to schedule large-scale verification tasks with VCS farm is also discussed in detail.
1.1 Overview
Section 2 gives an overview of VCSTM platform, marking the key components and flows. Section 3 describes the microarchitecture of the superpipeline RISC microprocessor under design. Section 4 details the behavior level SystemC modeling and RT level Verilog modeling along with some comparison. Section 5 illustrates writing testbenches with OpenVera for co-simulation based verification. Section 6 is the focus of the paper, which concentrates on the integrated co-simulation and verification process. It elaborates on the approach of gaining functional coverage. Section 7 shows how to schedule large-scale verification tasks. Finally Section 8 wraps up the ideas presented in the paper with conclusions and future directions.
2 An Overview of VCSTM platform
VCSTM is a high-performance, high-capacity Verilog@ simulator that incorporates advanced, high-level abstraction verification technologies into a single open native platform. As the foundation for complete functional verification solution, VCS supports Verilog, VHDL, mixed-HDL, SysetmVerilog, OpenVera, Assertion and mixed-signal simulation for complex SoC designs. In addition to its standard Verilog compilation and simulation capabilities, VCS includes the integrated set of features and tools.
More detail information on VCS main components and VCS workflow can be found in Apendix A and Apendix B.
3 Application Focus: Superpipeline RISC Microprocessor
Our microprocessor utilizes a single-issue pipeline and a high-frequency clock to obtain high performance and low power. The main integer pipeline is eight stages, and memory operation with non-zero-shift address calculation follows a nine-stage pipeline. The cache employs separate I-Cache and D-Cache structure, which is virtual-tagged and physical-indexed. A simplified diagram of the processor pipeline is shown in Figure 2, where the state boundaries are indicated by gray.

The stages have the following functions:
● IF1: paralleled I-tlb tag-reading and I-Cache line-fetching.
● IF2: tag comparison and instruction fetching.
● DEC: instruction decoding and branch prediction.
● ISS: operands reading from register file with dependency check.
The execution core is a three-stage pipeline organization, with three distinct pipelines.
MAC pipeline1
● MULT1: first pipeline stage of multiplier.
● MULT2: second pipeline stage of multiplier.
● MADD: accumulator processing.
ALU pipeline
● EXE1: shifter processing.
● EXE2: ALU processing.
● MSW: status writing back.
MEM pipeline
● AGEN: memory access addresses calculation, if non-zero-shift address, another stage is inserted after.
● MEM1: paralleled D-tlb tag-reading and D-Cache line fetching.
● MEM2: tag comparison and memory data loading.
The last stage has the same function for all instructions.
● WB: operand writing back to register file and D-Cache.
Features that allow the microarchitecture to achieve high performance are as follows.
● Decoupled Instruction Fetch Unit. A two-instruction deep queue is implemented between the second fetch and decode pipe stages. This allows stalls generated later in the pipe to be deferred by one or more cycles in the earlier pipe stages, thereby allowing instruction fetches to proceed when the pipe is stalled.
● PHT Based Branch Prediction. The instruction fetch unit employs gshare branch predictor with 1K entry PHT and 4 bit history. The PHT is looked up at DEC stage and updated at X2 stage. If the direction is predicted to be taken, the target PC is calculated and sent out at DEC stage. The actual direction is resolved at X2 stage.
● Pipelined Cache with Store Buffer. Caches are fully pipelined, allowing concurrent tlb lookup. I-Cache follows two stages and D-Cache follows three stages. All cache operations, load, store, fill, and replace can issue on each cycle. 3-entry store buffer allows the pipeline operation transparent, with bypassing and interlock to resolve data and structural hazards. Cache operations such as invalidation, clean and flush satisfy the requirements for the resource management of the operating system.
● Two-level On-chip TLB. If the first-level tlb misses, two extra cycles are spent for looking up the second-level tlb if hit, which reduces the penalty for page table walking.
4 Verification Methodology
4.1 Verification Work in the Design Flow

The picture above shows our verification work in our design flow.
As soon as the first version of hardware specification is produced, we begin three aspect of work——SystemC architectural level model, block level design and verification plan. After the SystemC architectural level model is implemented, we execute some basic test and make sure the specification is properly produced. In the mean while, we implement the design and develop some directed test vector. Both SystemC and Verilog model can be tested by the directed test vector.
After directed test vector is developed, we run the pseudorandom test and compare the result between the RTL design and SystemC reference model. Generally we believe that the SystemC reference model is functionally correct because it has been exercised by a large scale of application program.
The FPGA prototyping is also used. Therefore, we reuse our directed test vector during simulation.
If the comparison does not introduce any bug for a period of time, we sign off our RTL code and deliver it to the physical group.
4.2 Verification Metric
In our project, both code and functional coverage are our verification metrics.
When we develop directed test vector, we use code coverage as the metric of test efficiency, which consists of line coverage, state machine coverage, toggle coverage, and condition coverage.
However, just using directed test and getting 100 percent of code coverage are not enough. We should make sure all the function of the design is exercised. Therefore, we introduce functional coverage to our verification work and make it the most important verification metric in the whole verification process.
In our opinion, functional coverage consists of both specification coverage and structure coverage. The former can be used to guide black-box test, while the latter can be used to guide white-box test.
4.3 Bug and Coverage Driven Verification Activity
Our verification activity is driven by both bug and coverage.
If a bug is discovered, both verification engineer and design engineer will debug and solve the bug. As soon as the bug is solved, some regression test is made so as to make sure it will not introduce new bugs. Every week bugs are classified and analyzed, which make our verification work toward a more efficient bug-finding direction.
Functional coverage is also a very important verification metric. Every week, functional coverage report is analyzed, which make our test vector generated more efficiently.
4.4 Selection of Simulation Tools
We use VCS7.2 native test bench technology in order to make our simulation faster than VCS7.0. And VCS7.2 supports all of Verilog, SystemC, Assertion and testbench automation.
5 Behavior Level SystemC and RT Level Verilog Modeling
Top-down design methodology is often used to partition different design abstraction levels and to fill up the gap between them. As the level goes down, the model is usually more accurate with low-level details, and there are more constraints on simulation speed. For the microprocessor design, at least three levels will be explored: instruction level, microarchitecture level, logic level and physical level. Macro-levels may be divided into sub-levels which may contain small dependent procedures with some regressions. At the end of microarchitecture level exploration, a detailed specification is documented concerning about pipeline organizations, timing information, microarchitecture parameters, etc. We also get a cycle-by-cycle microarchitecture simulator. As is shown in Figure 1, we develop the behavior level SystemC model and RT level Verilog model concurrently at the beginning of logic level design. The behavior level SystemC is derived from microarchitecture simulator written in C/C++, and is refined through module partitioning and interface wrapping. The object of this model is to have the capability of co-simulation with RT level design. The behavior level SystemC model has the same sequences of certain events with RT level model, but it uses quite a different control logic comparing with the RT level FSM (finite state machine) control. Meanwhile, RT level design is developed based on specification, and some datapath components can be reused. Finally, the two models are co-simulated under VCS platform with comparison of certain events on each clock cycle.
5.1 Behavior Level SystemC Modeling
High level modeling allows the designer to partition the design in a coarse-grained way. We partition the microprocessor into main components: core, icache, dcache, immu, dmmu, coprocessor and biu. The functionalities of micro structures are uniformly implemented within the macro module. Although the data structures as well as data flows within these modules may be quite different from RT level model, the ports of each module are consistent with the specification and RT level design. Meanwhile, the algorithm should guarantee that the behavior model and the RTL model have the same extrinsic signal values as well as physical register contents on each clock cycle. By regarding each macro module as a black-box whose behavior is defined by the specification, the algorithm can be implemented without considering about logic delays.
Module declaration starts from definition of HW/SW data variables, processes and methods. We use a basic set of data types for simplicity, including sc_in, sc_out, sc_signal, sc_bv, sc_int, sc_uint, sc_bool with some basic data types in C++. All processes we declare are sc_thread type, with a basic process sensitive to positive edge of the clock. The auxiliary variables as well as methods are used by the basic process directly or indirectly. Basic process is activated on each clock cycle to communicate and synchronize between modules. It performs calculation and state transition for each pipeline stage between this and next cycle. For simplicity, we describe the concurrency by backward traversing the pipeline stages, synchronizing the events and producing the extrinsic signal. For example, in the core module, we declare a process entry as the basic process, and we declare some variables with the type sc_in or sc_out to model the hardware components which communicate with other modules. The variables of sc_signal type are used to connect different modules. sc_int, sc_bv, sc_uint, sc_bool with C++ basic data types are used to model the inner state of module. As examples below, variable CLK declared to be sc_in type to receive the synchronization clock signal and the IA is send to other modules outside the core module with the variable nreset1_reg identifying a specific state of the core module.
SC_MODULE(core)
{
sc_in<bool> CLK;
sc_in<bool> nWAIT;
sc_in<bool> nRESET;
sc_in<bool> IABORT;
sc_in<bool> DABORT;
sc_out<bool> IA;
sc_in<bool> CFHiVec; //signal from coprocessor
sc_out<bool> pass; //signal to coprocessor
sc_out<sc_bv<32> > WDATA;
bool nreset1_reg;
sc_bv<32> INSTR_reg;
…
void entry();
SC_CTOR(core) {
SC_THREAD(entry);
sensitive_pos (CLK);
…
}
}
As code shown below, in a pipeline cycle stage Writeback is operated before other stages, which means the instruction in stage Writeback is executed prior to others in the pipeline. By making use of the sequential semantics of software, we model the simple concurrency of hardware easily, and we resolve the data forwarding in a natural way. For example, there is a data hazard between the instruction in stage X2 and X3. The hardware resolves this data hazard by forwarding. According to our software execution order, while the stage X2 uses the data which causes hazard, the stage X3 has been operated, which means the register which causes data dependence has been updated with the newest value. Therefore, the model does not need complicated forwarding as the hardware does.
void core::entry()
{
while(true)
{
if(nWAIT)
{
RegisterInput();
Writeback();
X3();
X2();
X1();
Issue();
Decode();
Fetch2();
Fetch1();
}
wait();
}
}
The module is a black-box module, which generates correct signal at correct cycle. The hardware and register file in the microprocessor are used for creating correct output signals. In our SystemC modeling for the microprocessor, we select a very effective strategy. By using the strategy, our module not only keeps output signal consistent with logic implement, but also simplifies the control of hazards and speeds up simulation efficiency. The strategy will be summarized below.
1) Because we do not need to consider hardware delay, some operation put in special cycle because of hardware delay can be executed at some other place for facility, if the output signals are not affected.
2) If the output signals’ correctness is insured, we do register write operation as early as possible and do register read operation as late as possible. The register file in physical implement is actually an array, so the operation on register is actually accessing a variable. It can do the read and write operation on register at any time.
3) Every cycle, the module detects hazards that will cause pipeline delay to insert needing bubbles.
4) Every cycle, the module does sequent operation by backward traversing the pipeline stages to simulate concurrency in logic
implementation. It means, the module does the operation in Writeback stage, and then does the operation in X3 stage, and so on.
The read and write operations on register use the following strategy: all instructions in Issue do not access register file for reading, but later at necessary time. For example, the store instruction must send data address at X1 stage, so this instruction access register file at X1 stage. This is different for the type of data calculation instructions. This type of instructions access register file for calculation in X2 stage. The situation of accessing register file for writing is more complex. The module must keep the register file cycle accurate, so the module must do write operation on register at just the cycle when physical implementation do the operation. For example, the add instruction must write the register file at Writeback for register cycle accurate. But the pipeline must bypass the register value after X2 stage. The SystemC model does not use pipeline registers, and it implements forwarding by register file backup. There are two register files in SystemC module of the microprocessor. One is alias_regfile, the other is regfile. The alias_regfile is used for forwarding, it means any instruction which does the read operation on register file actually access the alias_regfile. The regfile is used to keep the register file content cycle accurate, which means the regfile is the real register file which is visible out of the module. For example, the add instruction must be executed at X2 stage to bypass the register value to next instruction at X1 stage (the earliest stage the register file may be accessed in pipeline of the microprocessor), so the instruction write the execution result to alias_regfile at X2 stage. And then when the add instruction enters the Writeback stage, it write the execution result to regfile for register file cycle accurate. Because of the module’s backward execution order, when the Load instruction at X1 stage need use the result of the add instruction at X2 stage, the result of add instruction at X2 stage has been generated and written to the alias_regfile.

The following is an illustration of this new execution strategy with more details. Consider the following instruction sequence:
add r1,r2,r3
str r1,[r4] //store r1 to memory space addressed by r4
and r5,r1,r6
or r7,r1,r8
The three instructions after add use the destination register r1 of add, therefore data hazards exist. The model can handle all data hazards between them. Figure 4 shows all forwarding paths in real hardware. The execution of instructions in the module is illustrated in Figure 5. In every cycle the module executes the instructions from up to low in the figure. For example, at cycle 5 the add instruction is executed first, add the value of r2 and r3 to r1. Then, the str instruction is executed. The value of r4 is read directly and used for addressing the memory. Next two instructions are executed in order. After every instruction in pipeline has been executed already, the module enters cycle 6. At that time, the register r1 has already been written into alias_regfile, in subsequent cycles any read operation on register r1 will not cause data hazards. Compare the above two figures, the model using the strategy can handle the data hazards without any extra operations.
Inter-Module Synchronization is based on signal communication between modules on each clock cycle edge under most circumstances. There are synchronizations within a cycle in which signal value transition will influence the behavior of other modules. This hand-shake like synchronization is common in real hardware by using combinational logic. Take the interaction between core and coprocessor for example, the core module sends hand signal pass to coprocessor in X1 stage according to the instruction type in X1 stage this cycle, at the same clock cycle, the coprocessor module use the hand signal pass to take corresponding action. It is very natural and easily implemented in hardware design by wire in combinational logic. But in software model, it is not easy because our basic process is sensitive only to the positive clock edge. The basic process can only be executed at the positive clock edge, so the coprocessor is unknown about the updating of the pass signal at this cycle. In order to handle this, dynamic sensitive is used in inter-module synchronization. Dynamic sensitive means the events which trigger the process to execute are not specified in process sensitive table. Following we describe the dynamic sensitive used in modules communication with details. First we declare a global event pass_event, and then at every core cycle, we notify a pass_event to let the coprocessor receive the pass signal, then the coprocessor module does wait (SC_ZERO_TIME) for the updated value of the pass signal. So the coprocessor is sensitive to the pass signals in the same cycle when the core module changes its output pass signal.
5.2 RT Level Verilog Modeling
Specification Development: Specification is developed according to the requirement of both hardware and software. Such as frequency, power, performance, cost, instruction set, etc. For SystemC and Verilog Co-Simulation, specification should include the following content:
Module Partition: This is a functional partition of a microprocessor. In our design, processor is divided into core, icache, dcache, immu, dmmu, cp0 and biu. SystemC and Verilog Co-Simulation is based on this partition, each module is Co-Simulated separately to make sure every part of the microprocessor is implemented correctly.
Interface Signal and Internal Register: For each functional module of a design, they are implemented in different structure. There is only one thing they have to follow: Interface protocol and key register, for example timing sequence of memory accessing, interaction between core and co-processor, program viewable registers. These signals and registers are checked every cycle to make sure SystemC and Verilog module are identical when Co-Simulation.
Pipeline Partition: Pipeline partition is developed to meet the requirement of performance, frequency, power, cost and etc. Action of each kind of instruction in every pipeline stage is need to build a cycle precise SystemC module.
RTL Coding starts when specification is done. In fact, while developing specification, certain amount of Verilog code is done for estimating power and timing. Of course, such code is some what different from finished RTL code, but based on the code we can develop RTL code for simulation, synthesis quickly. RTL code must follow certain coding style, when coding is done, we can use LEDA to check the code for syntax and coding style errors.
Sometimes RTL code may not meet timing constraints of the design, we can use pre-computing or register retiming etc to solve this problem. Pre-computing generates a temporary result a few cycles ahead the real computing cycle, so that we can get the result faster. Register retiming add registers in a combinational logic and moves them to proper location so that delay of the combinational is shortened. If those methods do not work, one possible way is to adjust the pipeline or control structure of design, but this does not always happen, because while developing specification, we have an evaluation of timing, so when the coding is done, timing result will not have much difference with out evaluation.
6 Writing Testbenches Using OpenVera
Testbenches using OpenVera Language can check and drive DUT easily and efficiently. During testing testbench can generate input signal flexibly while checking the output signal of DUT. With OpenVera we can stat functional coverage and add temporal assertions for DUT. Also OpenVera support DUT written in either VHDL, Verilog or SystemC.
6.1 Structure of OpenVera Testbench

Above is the structure of OpenVera testbench, which contains two main parts: Co-Simulation Top Level Module (DUT) and OpenVera Testbench. Two parts interact through the interface structure in OpenVera.
DUT contains two modules, one is Behavior Level SystemC module and the other is RT Level Verilog Module. DUT is written in Verilog language, so SystemC module needs a Verilog Wrapper to put it in the DUT. Because both modules have the same interface signal, they share the same input from OpenVera Teshbench. Output signal and internal key registers of both modules are sent to OpenVera Testbench for signal checking at every cycle, once any signal or register doesn’t match at a cycle, Testbench stops and reports a error. Top Level Module is some what like this:

Interface structure in OpenVera is important for Vera interacts with DUT, a interface structure is declared as followed:

This structure can be generated automatically from interface signal of DUT. Sometimes this is not enough, if we want to check internal register of both module, so some extra works have to be done. For Verilog module, we can easily add a internal signal or register to the interface using:
![]()
“path.to.reg0” is the hierarchy path of reg0, in our Testbench it is: “co_sim_top.Verilog_module.datapath.regfile.reg0”.
For SystemC module, we need to use hdl_connect method in SystemC module to connect SystemC internal signal to its Verilog pair and then add the verilog signal to the interface. Another more direct solution is to change internal registers to SystemC module’s output port and add them to DUT’s output port, then we can see SystemC’s internal registers.
6.2 Module Checking in OpenVera Testbench
Output signal checking of the two modules are quite simple, just compare them and if not match then prints an error and stops the testbench if necessary. But as we know, not every signal at every cycle is useful. If we do not access cache then IA is not necessary for checking. This allows SystemC and Verilog module can be a little different at some negligible condition, and avoids unnecessary debugging between SystemC and Verilog and thus make the verification process faster and more efficient.
6.3 Module Driving in OpenVera Testbench
Our testbench can drive DUT in following different way: directed asm code, restricted random asm code, full random asm code, restricted interrupt signal and full random interrupt signal. Different driving mode has different usage while verification. Generally, while verification goes on, we run our testbench from directed asm to restricted random asm, then full random asm. Test of interrupts is similar to asm code, in restricted interrupt test DUT runs directed asm code, and Vera checks internal signal of processor and decides when to send the interrupt signal; while in full random interrupt test Vera send interrupt signal randomly. Only when directed test is passed we can start the next restricted test and then random test.
In driver module of OpenVera Testbench there are three main sub block: memory , Random instruction generator and Other control signal generator. Memory module is used for directed asm test, it reads asm code from a given file, and simulates the behavior of cache, sending instruction, writing data. Random instruction generator sends full or restricted random generated instruction words. Other ctrl signal generator send interrupt and reset signal to DUT.
There is a problem when driving the DUT with interrupt signal, because we check processor internal signal to determine when to send interrupt signal, when certain condition is met, we send interrupt signal at the same cycle. The problem is that by default Vera samples input signal at posedge of clock and drives output signal at posedge clock, so when we send the interrupt signal, in fact we’ve already missed the cycle when we really want to send a interrupt. One solution is to sample input and drive output at negedge of clock, so we can send interrupt signal at the second half period of a wanted cycle.
7 Integrated Co-Simulation and Verification
7.1 Functional Co-Simulation between ISS and SystemC
The ability of running large-scale software such as the operating system and the applications has often been expected for pre-silicon verification. FPGA is capable of simulating the whole operating system but weak in debugging. Software simulator such as VCSTM is capable of debugging the design but weak in simulating large-scale software. Hardware emulators provide strong debugging and simulating ability but they are expensive and not flexible. ISS and SystemC co-simulation provides an alternative for simulating large-scale software. It also brings the benefit for finding larger coverage for the design because software knows about how to use the instruction set. An important problem which restricts the software simulation is speed. To our experience, ISS can simulate at approximately 1~5 MIPS of target processor. The behavior level SystemC model simulates at about 2 magnitudes slower than ISS, but 2 magnitudes faster than RTL model. If the ISS can boot up the kernel within 2 minutes, then it will take 200 minutes and 20000 minutes to boot up the kernel for the behavior level model and RTL model respectively. By dynamically switching simulation speed, the co-simulation model can simulate at a preferable speed between ISS and behavior level SystemC model. Figure 5 shows this co-simulation flow.

ISS Integration enables co-simulation ability between ISS and behavior level SystemC model. One method for implementing co-simulation mechanism is through IPC (inter-process communication) and bus wrapper. But this assumes one machine status (register file and memory). For verification purpose, our method considers for two copy of machine status: one is the verified ISS, and the other is the SystemC model under test. Each of the two models simulates on its private machine status and communicates through ISS primitives. Each primitive is an atomic operation for invoking by the behavior level SystemC model. These primitives are inserted into the SystemC code and used for bug detection. The general ISS primitives include:
void iss_init(); /* initialize the machine status ISS */
int iss_exec_instr(); /* simulate one instruction */
word_t iss_get_pc(); /* get program counter of current instruction */
word_t iss_get_IR(); /* get the instruction just simulated */
word_t iss_get_psr(); /* get processor status register content */
word_t iss_get_reg(int num); /* get register content */
Two types of ISS are considered for integration: user-level ISS for simulating applications and system-level ISS for simulating operating system kernel. User-level ISS simulates target system calls by wrapping and invoking host system calls. Two copy of machine status may lead to contentions when invoking host system calls more than once. One method for solving the problem is to let the ISS simulate each system call and copy its modifications on machine status to the behavior System model subsequently. System-level ISS integration will also encounter the contention problems. To avoid these contentions, it must be assured that interrupts and buses simulated in ISS will produce signals and data to ISS and behavior SystemC models simultaneously, but only ISS could write data to buses. Therefore, several other primitives should be added to accomplish the synchronizations between ISS and behavior SystemC model.
The functional verification of the behavior level SystemC model is done through run-time status checking using ISS primitives. Certain status of instruction execution, such as PC, IR, PSR, and register file contents, can be referenced for detecting faults. Status checking not only determines on which instruction the fault occurs but also finds out certain pattern of instruction sequences which causes the fault. These instruction sequences can also be used as new vectors which would substantially improve coverage with increasing the spectrum of test software.
With increasing the spectrum of test software, dynamically switching simulation speed is no more optional for debugging and co-simulation. The technique permits simulating at fast mode or detailed mode, which can be adjusted at run time. The underlying ISS and behavior SystemC model provide such capability, but the status copying between the ISS and the behavior SystemC model should be correctly implemented to smooth the switching procedure. Since the switching point is between successive instructions, the behavior SystemC model needs to flush instructions in pipeline stages to nullify subsequent instructions. After all these have been implemented, the designer can quickly skip over certain code segments and go straight forward to the occurrence of the fault.
7.2 Timing Co-Simulation between SystemC and Verilog
When co-simulating between the SystemC model and RTL Verilog model, testbenches including randomizations and directed vectors are used with per-cycle signal checking. For examples, the name of SystemC core module is sysc_core.cpp. Following is the general processing flow:
● Generating the SystemC wrapper:
syscan –cpp –g++ sysc_core.cpp:core –Mdir=sc_res –cflags “-g” –V –o sysc_core
● Generating the Testbench Template, the Interface, and the Top-level Verilog Module from the Design
ntb_template core_ref.v –t core_ref –c clk
● Compiling a Verilog Design Containing SystemC Modules
vcs –ntb –pp –Mupdate –cpp g++ -Mdir=sc_res –sysc –timescale=1ps core.v core_ref.if.vrh core_ref.test_top.v core_ref.vr.tmp …
● Execute simulation
./simv
At the beginning of co-simulation between the SystemC model and RTL Verilog model, the SystemC model is not satisfied with the demand of cycle accuracy with the RTL model generally. This is stopped by the checker written in Vera language. The checker will stop the simv if the SystemC model and the RTL model are different at any cycle. The differences are dumped into a file. So SystemC model must be adjusted to fit the cycle-accurate. At the same time, the adjusted SystemC model is detected if it can pass the test and keep the consistent with the ISS. If the SystemC model can not pass the test but its output is the same as the Verilog model, it is decided that the Verilog model and the SystemC model have occurred errors. So the SystemC model and the Verilog model must be modified to satisfy with the demand of function. Thus the target is focusing on the SystemC model and the Verilog model. Because the ISS can locate which instruction occur errors, it is easy to modify these errors. This is an iteration stage continually. With the iterations of this procedure to test single instruction, simple application, the SystemC model and RTL Verilog model are modified to be consistent in the function and timing. After the timing verification between the SystemC model and the RTL Verilog model have completed, random testes which are written by the Vera language go on sequentially. At this stage, SystemC model disables the function of the integrated ISS because the signal inputs are random completely. The comparison is limited to between the SystemC model and RTL Verilog Model. When the checker found the difference of the two outputs at any cycle, simv stopped at once. Generally the two models have to be modified respectively to assure the functional and timing consistency.
7.3 Validation of SystemC model
After the random test has been completed, the SystemC model enables the integrated instruction set level simulator. New round test between the ISS, the SystemC model and the RTL Verilog model begin. At this time, the inputs are the single instruction, simple application and the SPEC 2000, etc. The target is to adjust the cycle to assure the function-right.
Through continual iterations, the SystemC model and the RTL Verilog model become more and more authentic, reliable and robust.
8 Running Large-Scale Verification Tasks with VCS Simulation Farm
8.1 Some Problems of Simulation-Based Verification
There are some problems which verification engineers have to do with:
● Automation of the test bench.
● Test vector efficiency.
● Computing resource schedule.
In order to solve the problems above, we use VCS Farm. VCS Farm plays an important role in the following verification tasks:
● Directed test
● Constrained random test
● Post Layout Simulation
8.2 Grid Computing of Sun Grid Engine 5.3
A grid is a collection of computing resources that perform tasks. It appears to users as a large system, providing a single point of access to powerful distributed resources. Users treat the grid as a single computational resource. Resource management software, such as Sun Grid Engine, accepts jobs submitted by users and schedules them for execution on appropriate systems in the grid based upon resource management policies. Users can literally submit thousands of jobs at a time without being concerned about where they run.
We use Sun Grid Engine 5.3 and submit a lot of simulation jobs by adding VCS run time option “+vcs+lic+wait”. Every job is scheduled by Sun Grid Engine software by itself. Therefore, it takes us little time to schedule the simulation jobs manually, which makes us devoted to some verification hot spot.
8.3 Dividing Test Vector into a Series of Test Jobs
In constrained random test, it is worth dividing a single test jobs into a series of small test jobs and run them in parallel.
For example, if the constrain of a stimuli is from 1 to 100, we can divide it into ten stimuli: (1,10), (11,20), (21,30), ……, (91,100). Each job is submitted and run as long as the license permits.
If only a few test vectors exist, VCS Farm is not necessary. However, if the verification work has more than one hundred jobs to run, VCS Farm is very useful. It takes full advantage of all the computing resources and licenses, and saves us a lot of valuable time. VCS coverage(code and functional) features, like merging, auto-grading, can also be applied on the multiple results from multiple simulations on the farm. VCS Farm has become an important environment our verification flow.
9 Conclusions
Engineers can dramatically improve the efficiency of co-simulation and verification using the integrated VCSTM platform. The newly released version has integrated many features and tools that could maximize the design automation, leaving engineers unaware of the differences between languages, abstraction levels, etc. A smooth co-simulation and verification flow under VCS is summarized for a general-purpose microprocessor design. Behavior level SystemC and RT level Verilog modeling are discussed in detail, along with integrated co-simulation flow for functional and timing verification. ISS integration enables automatic verification for behavior SystemC model. The OpenVera Assertions compiled by Native Testbench kernel support run-time status checking between the behavior level SystemC and RT level Verilog models. This bi-directional co-simulation process improves the co-simulation capability and coverage. VCS Coverage Metrics provide automatic quality checking and evaluation method for the design. The evaluations drive the design of high-coverage testbenches like randomizations and directed vectors. Therefore a simulation-evaluation-regression procedure is indispensable. VCS farm provide efficient machine resource scheduling when there is dramatic increase in verification tasks.
10 Acknowledgements
Our sincere thanks to the verification group of the Microprocessor Research and Design Center of Peking University. Thanks for all the group members’ hard work and efforts. Thanks for the advices and inspiration from Prof. Cheng and Assoc Prof. Tong. A special thanks to Synopsys Inc. for their wonderful EDA software and technical supports.
11 References
1 “VCS Native Testbench User Guide Version 7.2”, Synopsys Inc., 2004.
2 “VCSTM / VCSi™ User Guide”, Synopsys Inc, 2004.
3 “VCS / VCS MX Coverage Metrics User Guide Version 7.2”, Synopsys Inc., 2004.
4 “OpenVera® Language Reference Manual: Assertions Version 1.4.2”, Synopsys Inc., 2004.
5 “OpenVera® Assertions Checker Library Reference Manua: Version 7.2”, Synopsys Inc., 2004.
6 “Native ISS-SystemC integration for the co-simulation of multi-processor SoC”, Fummi F., Martini S., Perbellini G., Poncino M., Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings; Volume 1, 16-20 Feb. 2004 Page(s):564 – 569 Vol.1.
7 “RTL/ISS co-modeling methodology for embedded processor using SystemC”, Yuyama Y., Aramoto M., Kobayashi K., Onodera H., Circuits and Systems, 2004, ISCAS ‘04. Proceedings of the 2004 International Symposium on, Volume 5, 23-26 May 2004, Page(s): V-305 – V-308 Vol.5.
8 “A timing-accurate HW/SW co-simulation of an ISS with SystemC”, Formaggio L., Fummi F., Pravadelli G., Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004. International Conference on8-10 Sept. 2004 Page(s):152 – 157.
9 “SystemC 2.0.1 Language Reference Manual Revision 1.0”, Open SystemC Initiative, San Jose, California, 2003. www.systemc.org
10 “SystemC Version 2.0 Users Guide”, San Jose, Open SystemC Initiative, California, 2002. www.systemc.org
11 “A SystemC™ Primer”, J. Bhasker, Star Galaxy Publishing, 2002
12 “A VHDL Primer”, 3th ed. J. Bhasker, Prentice Hall,1999
13 “A Verilog HDL Primer”, 2nd ed. J.Bhasker, Star Galaxy Publishing,1999
14 “Integrating SystemC Models with Verilog and SystemVerilog Models Using the SystemVerilog Direct Programming Interface”, Stuart Sutherland, SNUG Boston 2004
15 “Design Flow for Processor Development using SystemC”, Mario Steinert, Oliver Schliebusch, Olaf Zerres, SNUG Europe 2003
Apendix A: VCS Main Components
OpenVera Assertions (OVA) is a hardware verification language which provides a clear, easy way to describe sequences of events and to test for their occurrence. With clear definitions and less code, testbench design is faster and easier. Testing starts with a temporal assertion file, which contains the descriptions of the sequences and instructions for how they should be tested. Temporal expressions are descriptions of the event sequences, which are changes in value of any Verilog regs, integers, or nets. Temporal assertion specifies an expression or combination of expressions to be tested. The temporal expressions and assertions must be associated with a clock that specifies when the assertions are to be tested. A functional set of such expressions and assertions associated with a specified module or instance is a checker.
Native Testbench (NTB) is a high-performance, single-kernel technology in VCS that enables native compilation of testbenches written in the OpenVera hardware verification language and their subsequent simulation along with their designs. NTB is built around the preferred methodology of keeping testbench separate from design, which ensures a smooth synthesis flow. Separate compilation of testbenches from design and loading them at runtime saves designer from unnecessary recompilations of design, and facilitates the maintenance and reusability of testbench.
VirSim is a graphical debugging environment which enables designer to control an interactive simulation or to analyze saved results of simulation. It is used to trace signals of interest while showing annotated values in the source code or schematic diagrams. It can also be used to compare waveforms, to extract specific signal information, and to generate test benches based on waveform outputs.
Coverage Metrics is built-in coverage analysis functionality that includes condition, toggle, line, observed, finite-state-machine, path and branch coverage. It can be used to determine the quality or coverage of the verification test during simulation. The results of analysis are reported in several ways to discover the shortcomings in the testbench.
These four components plus basic vcs Verilog compiler are basic and important in our modeling and verification tool flow. There are some other components such as DirectC Interface, Mixed Signal Simulation, etc. These features are not focuses of our discussion and hence ignored.
Apendix B: VCS NTB-SystemC Workflow
The basic process of using VCS to simulate a model consists of two steps: 1. compiling source files with vcs command into the simv executable binary file, 2. running the simv binary file. The compiled simulation approach is faster and uses less memory than interpretive simulation. The process of compiling an executable binary avoids the extra layers and inefficiency of an interpretive simulation environment. VCS can generate object code directly without generating C or assembly language files on Linux, Solaris, and HP platforms. Incremental compilation saves time from compiling unchanged source code.
Figure 2 illustrates the VCS workflow for compilation and co-simulation of SystemC and Verilog design, and OpenVera testbenches. This workflow is more complex, which forms the basis of our integrated co-simulation and verification. First, SystemC source codes are automatically scanned by syscan command to produce Verilog wrappers for SystemC modules. Second, Verilog design code along with Vera testbenches and SystemC wrappers are probed by ntb_template command to produce native testbenches. Third, all Verilog source codes as well as produced codes are compiled by vcs compiler to build executable binary simv. Finally, the simulator is executed to produce waveforms as well as SystemC debugging information. During the compilation, VCS/SystemC co-simulation interface creates the necessary infrastructure to co-simulate SystemC models with Verilog models. The infrastructure consists of the required build files and any generated wrapper or stimulus code. VCS writes these files in subdirectories in the ./csrc directory. The whole procedure is transparent to the designer. During co-simulation, the VCS/SystemC co-simulation interface is responsible for synchronizing the SystenC kernel and VCS and exchanging data between the two environments.

The modeling style in Figure 8 is Verilog design containing SystemC modules, which means that SystemC modules are wrapped and instantiated in the Verilog design. The ports of the created wrapper are connected to the signals attached to the ports of corresponding SystemC modules. Figure 9 illustrates VCS DKI communication.

The other modeling style is SystemC design containing Verilog modules, which means that Verilog modules are wrapped and instantiated in the SystemC design. In this modeling style, vlogan tool is used for generating the wrapper of a Verilog module and syscsim is used for compiling SystemC design containing Verilog modules. Because HDL design is the main body for verification and testbench written in OpenVera HVL has close relation with HDL design, using the first modeling style is straightforward. But the flexible design, compilation and co-simulation model under VCSTM platform could service many requirements.



