Performance Modeling and Analysis Using VHDL and System:Processor Model

Processor Model

The processor model has a fairly simple structure. There are three methods in this model, the constructor, and two member functions. The first member function describes the performance only behavior of the

Performance Modeling and Analysis Using VHDL and SystemC-0098

Performance Modeling and Analysis Using VHDL and SystemC-0099

processor, the second describes the mixed-level behavior. The mixed-level functionality will be described in detail in Section 77.5.9.

In addition to those methods, the processor model has a number of objects that are members of the class. It has three integer variables for passing command arguments to the interface methods, a pointer for a command_in object that opens the command file and parses the model execution commands for the processor model, a command_type object that is used to return the commands from the command_in object, a signal of enumerated type action_type to display the current action, an unsigned signal to display the current task number, and a pointer for a refined computation model. Figure 77.41 graphically shows the objects in the processor model. On the far right is the IO port. It is mapped to the interconnect interface. Next on the right are the three integer variables used to pass information to the interconnect calls. Below the variables are boxes representing the two signals that allow the state of the processor to be viewed in the ModelSim waveform window. Then to the left there are the performance and mixed-level descriptions. One of these member functions will be turned into an SC_THREAD and will control the model’s behavior during simulation. On the top labeled as refined model, is an outlined box representing the pointer to a refined model object. Below is a storage location (labeled command) that the command_in object will return values in. In the bottom left is a dashed box representing a pointer to a command_in object. In the top left is a box representing the constructor. The io-port and constructor are both on the edge of the processor model because they are the only ones that interact with other objects in the simulation.

The constructor receives a processor number, and creates a command_in object with the appropriate processor number for the processor the current instance represents. The command_in object creates a string with the proper processor number in the middle, and uses it to open the processor’s command file. It is called by the active behavior to read in the next command once the previous one has executed. Its primary function is to remove command parsing from the processor model. Having it as a separate object makes changing how the commands for the processor are read in or generated a simple matter of including a different implementation of the object. Once the command_in object is created and initia- lized, the processor model constructor then instantiates either the performance only or mixed-level implementation, and opens its log file. It does this by registering the proper member function with the simulation kernel as an SC_THREAD. During simulation the processor model essentially reads in a command from a file, then uses a case statement to perform whatever command was read in and repeats until it reaches a done command, an end of file, or some command it does not recognize. Figure 77.42 shows the framework of the main processor loop.

Performance Modeling and Analysis Using VHDL and SystemC-0100

Notice in the source code that the processor has a port of the same type as the base type for the interconnect channels my_basic_rw_ port. This port is bound to the interface of the channel object. If the command read in from the command file is a send or receive command, then the processor uses the port as a pointer to the interface to the channel object and accesses the appropriate interface method to perform the send or receive operation. In the models here the thread in the processor model actually executes all the code in the blocking send and receive methods, so the processor model is incapable of doing anything else until the blocking io function returns.

Channels

The channels used in these models are considered hierarchical channels. They are not any of the predefined SystemC primitive channel types, they are composed of multiple objects, and they contain a number of threads. To allow for a variable number of processors to connect via the channels they have only an interface and no ports. In SystemC, all ports must be bound to something, be it another port, an interface, or a signal. Interfaces however may exist even if nothing is bound to them. So for maximum flexibility, channels should provide an interface, and any connected modules should have a port of the type of the interface and have that port bound to the interface on the channel.

Since multiple ports can be bound to a single interface, the channel object can have any number of processors bound to it. However, the crossbar and fully connected channels’ behavior is determined in part by the number of processors present, so all channels are passed a constructor argument that tells them how many processors are present in the simulation. The channel models are the most extensive models since their behavior is an abstract representation of all of the characteristics of an interconnect topology. They model the arbitration, data transfer, and blocking/nonblocking characteristics of the interconnect without restricting the designer to a particular implementation. Since nonblocking sends are allowed they also implicitly model a sending queue.

All the channel models are all based on the comm_medium class. The comm_medium class provides a logical connection between processors, with signals for the source processor number, the destination processor number, and transaction type, as well as blocking and nonblocking read and write methods, and an arbitration thread. In addition, the comm_medium class provides two threads to allow for nonblocking reads and w r ites. Figure 77.43 shows the members of the comm_medium class. The four signals shown in the top left are signals to show the current state of the logical connection in the waveform window. To the right are the two wait queues, one for write request, and one for read requests. New transaction requests received via the interface methods are placed into these queues. The boolean no_match variable maintains whether there is currently a match between read operations and received data, the integers below it are used to store the values of the current processor for the sender and receiver, and to store the location in the queue of the current send request being executed, and the current receive request being executed. Below the integers is a dummy variable whose sole purpose is to fix an existing bug in the implementation of the SystemC simulator. In the bottom left of the figure are the member functions of the object. The functions all the way to the left are the functions intended to be accessed by other objects, the remaining four are intended for internal use only, though they are declared as public and thus visible to other objects. The functions with a star after them are registered with the simulation scheduler as SC_THREADs. The top two events in the bottom right of the figure are used by the read and write methods to notify the arbitrator process that there may be new pairs of requests that could be activated to communicate. The event in the very bottom is used to coordinate the execution of two threads when they commu- nicate. In the top right of the figure are two transaction pointers. These pointers are used by the arbitrator to keep track of which two transactions it is dealing with. The event pointers in the bottom center of the figure are also used by the arbitrator. When the arbitrator has selected two transaction requests to communicate it does not actually handle the communication. There is a write thread, and read thread that execute the transact()code in each of the two transaction objects. For any blocking transaction objects the calling processor’s thread is suspended in the interface call waiting on the transaction to notify it that the transaction is complete. Any nonblocking transaction objects have already had their calling thread return to the processor model code. The arbitrator has no need to check for any potential request pairs until after the two threads have completed their transaction.

Performance Modeling and Analysis Using VHDL and SystemC-0101

These two pointers are set to allow the arbitrator to suspend until both threads have completed, rather than wasting simulation resources polling to see if they are done.

While support exists in the comm_medium object for nonblocking reads, the channels do not have methods to give access to that functionality to the processor models. This was done on purpose to avoid having to check data dependencies before beginning a computation. This also keeps the simulation simple, and more efficient in terms of simulation time. The functionality was built into the comm_medium object because it was easy to do and makes adding nonblocking reads at some future point much easier. The arbitration scheme provided is a longest waiting first scheme. As soon as at least one transaction pair, a matching send and receive, is present the pair with the largest sum of positions in the wait queues is selected to transact next. The crossbar channel uses a variant of the comm_medium class. In the crossbar variant there are pointers to the transaction wait queues which allow a single set of queues to be used by all of the logical channels.

The comm_medium class also makes use of the class transaction_wait_queue, which is a specialized linked list to allow for a large number of waiting transactions without allocating a fixed large amount of memory. The elements of the linked list are of the class transaction_element, which contains all the essential information about a transaction request. The only item of importance from the linked list is the class that actually holds the transaction information. This transaction_element class contains all of the information about the transaction request. Figure 77.44 graphically depicts the key elements of the transaction_element class.

The integers in the bottom center contain the source processor number, the destination processor number, the destination task’s id number and the size, in nanoseconds, associated with the transaction. The boolean variables in the top right tell whether the transaction element is a write or read, and whether it is blocking or not. The handshakes object in the upper middle is a set of four events that are used by the transact method to logically “perform” the transaction. The complement pointer below the handshakes is a pointer to a transaction_element object. To communicate two transaction_element objects must be paired up the channel’s arbitrator process. It does this pairing by setting a read and a

Performance Modeling and Analysis Using VHDL and SystemC-0102

write’s compliment pointer to each other. The activate_me event in the top left is used by the arbitrator to activate the thread executing the element’s side of a transaction. While the im_done event is used to notify the arbitrator, and in the case of a blocking transaction the requesting processor’s thread, that the transaction is complete. The constructor for this object depicted in the bottom left has three different implementations with different parameter lists, the first constructor implementation that sets all of the transaction values is the one used in the current version of the models. The others were left to maintain backward compatibility with previous versions, and may be useful for future versions. The purpose of the methods displayed in the bottom left of the figure are self-explanatory. The transact method is what controls the actual behavior of a transaction pair once it has been scheduled. The nondebugging parts are repeated below in Figure 77.45. The blocking and nonblocking versions of the read and write routines are the same in this version of the models.

The nondebugging version of the blocking read and write methods are shown in Figure 77.46. These methods show how the four-way handshaking is implemented. The use of a full four-way handshaking

Performance Modeling and Analysis Using VHDL and SystemC-0103

in the channel model is somewhat arbitrary, but it makes incremental refinement of the channel easier. However, with the abstract behavior described here a single line for the write method and two for the read method would be sufficient. Figure 77.47 shows how the methods could be implemented in this way. All of the channel models also read in parameter information from the channel_param.txt file located in the directory that the simulation is running in. This file contains two lines. The first line is the bus speed in megabytes per second. The second line is the fixed communication overhead per communication transaction. In the top-level channel models the data size parameter passed to read interface method is run through a data_to_delay function that return the delay in nanoseconds that the communication should take based on the specified bandwidth and communication overhead.

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF