Performance Modeling and Analysis Using VHDL and System:Performance and Mixed-Level Modeling Using SystemC

By Ahmed Farahat - October 13, 2015

Performance and Mixed-Level Modeling Using SystemC

This section describes a performance-modeling environment capable of mixed-level modeling that is based on the SystemC language [50]. The environment is intended to model the system at the Processor Memory Switch level much like the Honeywell PML environment described earlier. The goal of this work was to show how SystemC could be used to construct a mixed-level modeling capability.

In the SystemC-based PBMT (Performance-Based Modeling Tool), the user begins by describing the functions executed by the system as a task graph. A task graph is a representation of the ﬂow of execution through an application. The nodes in a task graph represent computational tasks, and the edges in a task graph represent the ﬂow of control, or the actual transfer of data, between tasks. An example of a task graph for a simple application is shown in Figure 77.36. Note that the topology shown in the ﬁgure, the example application has the opportunity for some tasks, such as like Task 2, Task 3, and Task 4, to be executed in parallel if the system architecture upon which the application is to be executed, allows for it.

Once the task graph model is constructed, the user then selects a system architecture on which the application will execute. The system architecture is speciﬁed by the number of processors in the archi- tecture, and an interconnect topology used to provide communications between them. The available interconnect topologies include a bus (a single, shared communications resource), a crossbar switch (a partially shared communications resource), or fully-connected (a completely nonshared communication resource). Note that in this high-level architecture model, what actually constitutes a processor in the system is not speciﬁed. That is, a processor is simply modeled as a computational resource and may in implementation be a general purpose processor (of any clock speed), a special purpose processor like a DSP, or custom hardware for a speciﬁc task.

Once the system architecture is speciﬁed, all that remains is for the user to specify upon which processor each of the tasks is to execute and what the total execution time for that task on the speciﬁc processor will be. This delay value may be either ﬁxed, or dependent on the amount of data that is passed into the task by the previous task in the graph. Once this task-to-processor mapping and delay speciﬁcation is done, the complete SystemC model is constructed and simulated using either the reference simulator, included as part of the SystemC distribution available in Ref. [12], or the commercial Mentor Graphics ModelSim simulator which includes the capability to co-simulate SystemC models along with Verilog or VHDL models. Figure 77.37 shows the results of executing the task graph of Figure 77.36 on three different system architectures. All of the architectures utilize a single shared bus for communications. The ﬁrst result is for an architecture with only a single processor. In this case, the obvious result is that all of the tasks execute in sequence on the single processor and the run time is simply the sum of the individual task execution times. The second result is for a three-processor architecture. In this case, after Task 1 completes, some latency can be seen before Tasks 3 and 4 begin execution. This accounts for the communication time required to send data from the processor that executed Task 1 to the processors that are executing Tasks 3 and 4. In addition, the graph shows that Task 4 begins execution after Task 3 because of the contention for the single shared bus communications resource. Likewise, the latency between the end of execution for Tasks 5–7, and the start of execution for Task 8 accounts for the time required to commu- nicate Task 6 and 7’s results back to the single processor that is schedule to execute Task 8. The overall run time for this conﬁguration is much less than the ﬁrst example because the inherent parallelism in the application is being exploited by the selected architecture. Finally, the third result is for a four- processor architecture. In this simulation, Tasks 1 and 8 are allocated to the fourth processor, separate from the other Tasks 2–7. This results in additional communications time being required using the single bus to transfer all of the data from, and back to, that extra processor. Thus, as can be seen from the graph this architecture actually takes longer than the three processor architecture to execute the application.

Search This Blog

Integrated circuit course

Performance Modeling and Analysis Using VHDL and System:Performance and Mixed-Level Modeling Using SystemC

Performance and Mixed-Level Modeling Using SystemC

Comments

Post a Comment

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

Architecture:Instruction Set Architecture

Internet-Based Micro-Electronic Design Automation (IMEDA) Framework:Execution Environment of the Framework