Verification of UniCore2-F64 using SystemVerilog and VMM
Zheng Shang, Zichao Xie, Ziming Gong, Jingwei Xu
Microprocessor Research and Development Center of Peking University
Abstract
In the process of design and verification, experience shows that it is the latter task that dominates time scales. It becomes more difficult to get our goal of verification using the old verification method. In order to minimize the time necessary to meet the verification requirements, we choose the SystemVerilog as our verification language and use VMM as our guider in our new projects. Their powerful new features really make our verification easier and effectively. Under the Synopsys’s powerful tool support, our design successfully passes all the SPEC95 and SPEC2000 float point benchmark and runs the graphic application of GNOME on FPGA without any bug in a short verification time.
Introduction
SystemVerilog is Hardware Description and Verification Language. It is an extensive set of enhancements to the IEEE 1364 Verilog-2001 standard. These enhancements provide powerful new capabilities for modeling hardware at the RTL and system level, along with a rich set of new
features for verifying model functionality.
VMM (Verification Methodology Manual) is a new methodology proposed by Synopsys. It improves the productivity of a verification project through four different mechanisms: assertions, abstraction, automation and reuse. These benefits of SystemVerilog in design and verification and VMM in verification have been observed from the practical experience described in this paper.
The verification project focused in this work is UniCore2-F64, which is an IEEE754 compatible FPU. (Float Point Unit) The use of SystemVerilog was instrumental in obtaining the concise verification code of high quality, which makes the verification convenient and efficient. We use the VMM as a guider to instruct the way to success. It really facilitates our verification workload.
Features of SystemVerilog and VMM
SystemVerilog is built on top of Verilog, which improves the productivity, readability, and reusability of Verilog based code. It adds verification enhancements in the following important areas: verification functionality, synchronization, classes, dynamic memory, cycle-based functionality, etc. Firstly, with regard to verification functionality, SystemVerilog adds reusable and reactive testbench data-types and functions, with the built-in types of string, associative array, and dynamic array. Secondly, SystemVerilog provides the mechanisms for dynamic process creation, process control, and inter-process communication. Thirdly, SystemVerilog provides classes and the object-oriented mechanism that provides abstraction, encapsulation, and safe pointer capabilities. Lastly, clocking domains and cycle-based attributes in SystemVerilog help reduce development, ease maintainability, and promote reusability.
The methodology presented in VMM improves the productivity of a verification project through four different mechanisms: assertions, abstraction, automation and reuse. VMM recommends hierarchical testbench. Once low-levels of functionalities are verified, verification can proceed at higher levels using layered testbench architecture. Reusing code avoids having to duplicate its functionality. By reusing code, test cases should become simple reconfigurations of highly reusable verification components forming a design-specific verification environment or platform. Automation can accelerate the verification process. Complete automation of the verification process is impossible, but random stimulus can emulate automation. A properly-designed random source can eventually generate the desired stimulus. Random stimulus also creates conditions that may not have been foreseen. Constraints can be added to the random stimulus to generate the required stimulus. Due to the random nature of the stimulus, it is necessary to use coverage mechanism to identify which testcases have been pseudo-automatically produced so far. The coverage metrics measure the progress and productivity of the verification process.
Verification goal of UniCore2-F64
Introduction to the design of UniCore2-F64
UniCore2-F64 is a 600MHz, 8-stage(IF1,IF2,DEC,ISSUE,EXE1,EXE2,MEM,WB) pipelined coprocessor which performs floating point arithmetic. It is compatible with IEEE754 standard, supporting the single and double precision operation. It also supports the exceptions in the IEEE754 standard, which gives an interface to the system software to handle it. Figure 1 shows the basic structure of our design.

Figure 1
In Figure 1 we can find that there are multiple arithmetic components in our design. They get their input operands from the UniCore2-F64 control logic and compute the final result in several cycles according to the input operands. The arithmetic components have the same input format(IEEE754 operand) while they have some various requirements about the input operands when they do rare operations. Table 1 shows the number of cycles of each kind of operation.
Operation Cycles(single/double)
ADD/SUB 1/1
MULTIPLY 1/6
DIV 28/40
CONVERT-FORMAT 1/1
ABS-NEG-MOV 1/1
SQRT 23/54
Table 1
The control unit decodes the instructions and gets the instruction types in the DEC stage. It reads the register file and sends operands and control signals to the arithmetic components in ISSUE stage. Arithmetic components compute the results at EXE1 and EXE2 stage. All of the results from the arithmetic components will be sent to the register file in MEM stage. If there’s memory access requirement, UniCore2-F64 access memory in MEM stage. At WB stage, results are written into the register file. The bypass control logic will give the right operands to the right stages or stall the pipeline. The control logic also make sure the coprocessor keep the same step with the main processor.
Verification goal
Our verification goal is to make our designs free of bugs in just 4 months. It’s a tough job for us to verify many arithmetic components and control units in our complex design. We finally decided to use SystemVerilog and VMM to accelerate our original method. They really give us an excellent result.
Verification of UniCore-F64 using VMM and SystemVerilog
Background analysis
At first, we want to use the original testbench to do our verification. But we find it’s impossible to achieve our goal. Because the original testbench is based on the separate assemble file. We write one test case in assembly language and run it on the entire SoC simulation environment. It’s inefficient to write one test case after another. And it’s impossible for us to test the arithmetic components because we need huge test cases and some corner cases which are very difficult to generate in advance. Luckily, we use the VMM to conduct our verification. We find good methods to solve our problems.
We find that our arithmetic components have the same interface with our control logic unit. They all get the operands and enable signals from the control logic unit and the operands have the same formats. Although they maybe have some different corner cases or special requirements about the operands, we decide to develop a testbench which can generate reusable test vectors for all the components. We also want to make our testbench hierarchy. Then we can avoid writing boring binary strings.
We also want to make our test vectors generation easily and powerfully, so we can put more useful test vectors on our design to find some corner cases.
Finally we must decide when our verification is done. Line, toggle, state coverage is not enough for the verification. We get all we want from the VMM and SystemVerilog.
Abstraction & Reuse
VMM signifies that once low-levels of functionality are verified, verification can proceed at higher levels using a hierarchcial testbench architecture. According to this, we don’t need to change the high-level testbench while the lower-level testbench modifies. SystemVerilog supports Object- Oriented programming. We get all the benefits of the object-oriented methodology. So it’s easy for us to build a hierarchical testbench for reuse.
Automation
It’s tough for us to generate constrained random signals in traditional HDL(Verilog&VHDL).
But VMM recommends us we should use randomization to get our test vectors more easily and efficiently. SystemVerilog supports various randomizations. So we can easily generate different kinds of randomization algorithms. And we can set various constraints on our randomization.
Functional coverage
VMM tell us that functional coverage is our verification goal. Functional coverage gives us the end point of our verification. SystemVerilog provides a simply way to let us do functional coverage. We implement our verification documents by simple coverage and cross coverage easily. We also get a detail coverage reports to show us how many coverage points we have tested or not.
Verification flow
Figure 2 shows our verification flow.
Figure 2
Firstly, we make a verification plan to point out what shall we test. It’s the most important thing in our verification because it signifies what test vectors shall we generate and the finish point of work. Secondly, we begin to build our hierarchical testbench using SystemVerilog under Synopsys’s powerful simulation environment. And we make sure all the components can reuse the test vectors. Lastly, we map the entries of verification plan to the test suites using scenario units. If we find some bugs in our design, we analyze the bugs carefully to see if it’s necessary to refine our verification plan. After we finished some test vectors, we check the coverage report to see if it covers all the test point. If not, we modify our constrained test vectors and continue our verification. When we finish all the coverage points in our verification plan and no bugs are found, we can sign off our design. We will show the details in our whole testbench in the next several chapters.
Verification plan
Verification plan defines all the test vectors we should test. It’s the most important task in our verification. We carefully develop the verification plan and make it easy to convert to test vectors and functional coverage points in SystemVerilog. We also make our test vectors hierarchy to test our design in different levels. Here are the details about the verification plan.
We develop our test suites of the arithmetic components according to IEEE754 standard and the speciality of the arithmetic component. And we make sure it’s easy to implement in SystemVerilog, which are recommended by the VMM. Table 2 shows some test vectors for addition in a low level testbench.

Table 2
In Table 2, we can find that we just define what we shall test for the design. When we begin to simulate our design, we don’t directly write these vectors. Because we just use scenario verification units which are based on the constraint randomization. We use coverage to see if they happen and use reference model to check its correctness. Our reference model is written using SystemC and implemented with a different algorithm compared to the hardware arithmetic components. So we don’t need to calculate the boring result by ourselves.
We can also coverage the result. If we use the original method, it’s so tough for us to generate an input test vectors to get our desired result. But now we just use huge randomization to find one while we just use coverage to make sure it happened.
At a higher level we check for the interaction of different kinds of arithmetic computation. In table 3 we show a simplified test case.

Table 3
It’s easy for us to test these sequences in our hierarchcial testbench. We just write the test vectors in the higher hierarchy.
In the instruction level, We use a simulator(SimpleScalar3.0) as our reference model. We can also test our instruction dependency and have a good coverage because of our hierarchy testbench. Table 4 shows some test vectors in instruction level.

Table 4
We can see that we mainly consider the interaction between instructions( data dependencies or control dependences).
After we establish our verification plan , we begin to implement our hierarchy testbench.
Testbench structure
Hierarchy of Test Platform
The testbench for UniCore2-F64 follows the methodology advocated by VMM. SystemVerilog, the language with object-oriented features, is properly used to greatly enhance the reusability of testbench components. Our main goal is to establish a hierarchical verification environment for the UniCore2-F64. It is not only containing lots of reusable components, but also work more efficiently than previous ones. Our task is to generate all the new test vectors according to the test documents while transfer some old test vectors to the new reusable environment. The latter is also the problem that other IP design companies encountered in establishment of new verification environment using VMM methodology.
This section describes the testbench architecture for UniCore2-F64. In order to reflect reusability, we place the test cases on top of verification environment and implement the abstraction and automation functions in every layer that help minimize the number and details of test cases that need to be written as VMM recommends. Another benefit of this testbench organization is easy for verification engineers to assemble the previous developed test vectors and test environments into the new verification platform. From top to bottom, the whole platform has four layers: scenario layer, functional layer, command layer, signal layer. We put some layers which have close relationship together to form several small structures. As repeatedly emphasized by VMM, the benefits for doing so is to maximize the reusability of test platform. Investing in the few verification platforms to save a single line in potentially thousands of test cases will be a worthwhile investment. Figure 3 and Figure 4 shows the general structure used to implement complete UniCore2-F64 testbench.

Figure 3

Figure 4
Signal layer
The function of signal layer is connecting the signals of DUT and platform. The implementation is easy and reusable, because the interfaces of IEEE754 compatible module are solid and simple.
Command layer
This layer is one of the key layers of the verification platform. It implements the atomic functions of each module though signals assignment and combination to the upper layer, which have been defined and bound on the lower layer. We define the basic data operations and transfers as atomic functions in processor verification through the assignment to operator, precision, round mode, etc. In this layer, we don’t constraint the concrete values of those parameters. This is the difference between bus-based platform and processor-based one. The parameters are constraint or assigned in the upper layer, functional layer. And also it is easy to establish this layer, because some of this part, like operation generators, can be translate or migrate from previous test platform.
Function layer
It is the middle layer of our platform. This layer supplies the basic service to the upper layer and sends pseudo-random generated parameters to the command layer. We can regard these services as basic data transfers and operations, such as floating-add, floating-multiply, floating-divide, etc. And this layer has little relationship with concrete logic implementation.
Scenario layer
This layer is to form test cases by using the units from lower layers. Scenarios are composed by the organization units which are mainly made of test cases from the functional layer. For the requirement of floating processor, this layer acts as the instruction combinations, which describe the basic floating calculation and transfer.
Test vectors
After the foundation of total four layers, we start to define the test suites according to the verification plan. For the highest abstraction of test platform, they can be easily implemented from verification plan without tough translating and dividing.
Functional Coverage
After we establish our platform, developing coverage is a easy task. Functional Coverage plays as a primary assistant method that makes the whole verification environment more efficiency. The coverage is divided into two parts: module-level and system-level. Both two-level functional coverage collects all kinds of the coverage information and creates a coverage report. We use huge randomized test vectors to stimulate our design while we collect the coverage in both levels. According to the feedback mechanism, we can also guide the stimulus generator effectively.
Detail Architecture of Unicore2-F64 environment
In our verification environment, we adopt many techniques to make the platform more efficient and hierarchical. Following we will introduce them for detail.
Signal Connecting Scheme
This signal connecting scheme is just implemented in signal layer, which is fully related with concrete logic implementation. In this layer, there are two differences between new verification platform and previous ones, which usually be built by OpenVera language. One is clocking, and the other is modport. The former that includes the declaration and direction of signals is used to define the synchronization event in order to avoid the race condition between design and verification environment. The latter binds the signals defined by clocking to individual perspectives.

Besides binding the signals of DUT, we bind the interfaces for Monitors to sample the related signals.
Stimulus and Response
The stimulus and response mechanism is the most important part to the test platform. The content of these two functions could be easily derived by transferring the codes of generators, monitors, checkers that developed in previous projects. Verification engineers usually put the value assignment and random generator together in previous work. As VMM recommended, we separates those codes to three layers, command layer, functional layer and scenario layer. So they would not influence each other while someone is modifying one of these parts.
Command layer
Most parts of previous verification work can be regarded as they focusing on command layer. Verification engineers have designed a lot of test vectors with concrete signals, which contain lots of duplicate definitions and drivers that may lead fault increase. There are three important members in this layer: driver, monitor, module-based checker. Driver module receives values of parameters generated from functional layer and assigns them the concrete related signals. It is actually the atomic generator implementing the basic data operation and transfer of specified DUT. The class fpu_driver contains every atomic operation assignment task, such as do_fadd(), do_mov(), do_convert(), etc.

Monitor cooperates with module-based checker to accomplish the co-simulation (we use reference module to check the result).
It monitors the stimulus to the DUT and records them and the results. Monitor belongs to the command layer because it’s sampling on concrete signals. For the feature of floating process, we record the values of operators, operands and other operation information at its sending cycle. When monitoring an atomic operation to the DUT, it traces this operation till its end and then records the results unless an interrupt occurs. We define several tasks with trace_fifo prefix to accomplish the trace function with cycle-accurate mode. It includes the timing information of DUT because the processor mechanism is different to the bus transfer behavior, many modules especially the calculation modules receive operations cycle by cycle, and these requirements must be satisfied and provide the results in their order. There are often two or three (depends on their calculation stages) operations in one component. So, normal mechanism that fits the bus behavior would not be suitable for the processor verification.

Checker is used to generate the correct results of the stimulus. Thus it has two components: one is the reference model, the other is the co-sim module. In our platform, the former is written by SystemC language. The latter then contrast the result from DUT simulation and reference model. In floating processor, we just check the data outputs and status flags to improve the verification efficiency of the co-simulation.


Functional layer
It is the basic part of random mechanism in our test platform. This layer uses directed or pseudo-random method to generate higher hierarchical reusable verification test cases. The constraints determine the legal values that can be assigned to the random variables. The combination of directed and pseudo-random testing that VMM suggested can be more effective than a traditional, single testing approach. We use these two methods to generate the parameters which lower layer needs, such as operands, operators, single or double precisions and rounding modes. Because of the complexity of data format in IEEE754 standard, the generation of test cases mainly emphasize on this part.

The constraint of operands is described as below:

In the layer, there are several classes with _cfg post-fix. They define the random variables and constraint blocks. We also use the := operator to specify the weight to the item. When executing the pseudo-random test, the platform base on these specified values and their weights to generate operations. We notice that the generation ratios of zero, infinity, NaN(Not A Number) and normal numbers are nearly the same in the upper codes. Thus it can meet some corner cases more frequently than not specifying them individual weights.
Scenario Layer
It is the highest layer of the platform. It generates and schedules various scenarios by calling the functional units defined in the lower layer. We can regard these scenarios as instruction sequences in UniCore2-F64 verification environment. The scenarios usually are extracted from the verification plan. For example, interlock, bypass, multi-cycle operation are the main elements in test suites.
For example:

As the mentioned above, the se_multicycle task actually is composed by a series of functional layer units. This is an obvious advantage of hierarchical verification structure.
Finally, we define the class fpu_vmm and instantiate the scenario layer:

Test Cases
After established the four layers, the next work for verification engineers is to organize the verification units from layers to build the test suites. For example, one test case is verifying the sequence that a floating addition instruction is next to a multi-cycle scenario. We can just put the verification unit from scenario layers and functional layers to form such requirement. Obviously, hierarchical structure makes writing test cases more easily and quickly.

Functional Coverage implementation
As mentioned earlier, the coverage is divided to two parts: module-level and system-level. In the low level coverage, we mainly consider on the module input coverage, output coverage and state coverage. Input coverage and outputs coverage mainly concentrate on the data formats of IEEE754 standard. We drive the operation and transfer with special data types in command layer and functional layer and then establish the coverage of each operation and transfer coverage.

Our strategy is first getting the minimum elements by analyzing the verification plan. For example, the basic elements of data formats are the field stands for different significance, such as sign field, exponent field, significant field. Using cross-coverage method, we combine these elements to form a data format which stands for a special number. We also use the state coverage to verify the correctness of inner state transfer of DUT.Another direction is system level. We also use the same strategy, but the elements are operations and transfers. Directed mapped from verification plan, some coverage points are sequential actions corresponding to the sequential events, such as float addition operation is next to a float multiplication operation.

In our verification environment, it also exerts also exert functional feedback mechanism in the verification environment according to the sampled functional coverage. It mainly avoid redundant tests consume valuable CPU cycles and shorten the verification cycles. The verification engineers usually modify the weights of pseudo random generator manually. But by using SystmVerilog, it can use query() function to change weights automatically. This function cumulates coverage information for coverage group.
These are all the details about our verification environment. This platform is implemented under Synopsys’s VCSMX-2006.06-SP1.
Using VMM recommended by Synopsys and SystemVerilog in Synopsys’s EDA tools, we finished our verification successfully in just 4 months. Our design passes all the SPEC95 and SPEC2000 float point benchmark and runs the graphic application of GNOME on FPGA without any bug in a short verification time.
Summary
In this paper, we show a successful project under VMM and SystemVerilog’s support. It really boosts our verification prominently. And it frees us from low-level test vectors developing.
In the future, we will use more recommendations of Synopsys’s VMM. And we will use more new features in SystemVerilog in our design and verification under Synopsys’s powerful tool support.
Acknowledgement
We would like to our great thanks to Microprocessor Research and Development Center of Peking University. It gives us an excellent lab to do our great job using Synopsys’s great tools. Thanks for the advices and inspiration from Prof. Cheng and Assoc Prof. Tong. And we would give our sincere thanks to Synopsys’s technical support. It really lets us make progress in the IC design and verification area. We are very appreciated to have the chance of providing SNUG for users to share their engineering experience.
References
[1] Jason C. Chen,“Applying CRV to Microprocessor Verification” Synopsys Professional Services Synopsys Inc.
[2] IEEE Standard for SystemVerilog – Unified Hardware Design, Specification, and
Verification Language IEEE Std1800-2005
[3] Janick Bergeron, Eduard Cerny , Alan Hunter, Andrew Nightingale “Verification Methodology Manual for SystemVerilog”, Springer 2005
[4] Janick Bergeron, “Writing Testbenches: Functional Verification of HDL models, Seconde Edition”, Springer 2003
[5] Stuart Sutherland, “An Overview of SystemVerilog 3.1” EEdesign, May 23, 2003
[6] Richard Raimi, “A Unique Functional Coverage Flow using SystemVerilog
and NTB”
[7] “SystemVerilog testbench constructs” Synopsys.





