Dynamic Voltage Scaling for Low-Power Hard Real-Time Systems:Run-Time Scheduling
Run-Time Scheduling
Off-Line DVS techniques are not able to adapt to run-time variations in task execution times. To take advantage of idle times appearing from such variations, run-time scheduling techniques must be employed. Nevertheless, run-time DVS is most often used in conjunction with static techniques, since many of the system parameters are already known at design time. In this context, run-time methods often perform small speed adjustments on top of an already existing schedule or scheduling strategy. However, for sets of tasks with large run-time variations in execution pattern, such techniques are essential for further reducing the energy consumption.
In general, run-time DVS decisions may be taken at any time while the system is in operation. Nevertheless, the time points that are most likely to introduce variations to the off-line decisions taken
for a certain task set, are those in which task instances finish their execution. These are the moments when, if tasks finish early, new idle times are introduced into the system. Consequently, in most run- time DVS techniques, scheduling points, namely task arrivals and task completion times, become thus also the points where speed-related decisions are taken. However, not all scheduling points need to include a speed decision. Furthermore, the moments used to take speed-related decisions and scheduling points may be totally decoupled, having the scheduler only giving suggestions (hints) to a speed manager that changes processor speed at its own pace, as in the intratask approach from Ref. [74]. Regardless, the efficiency of the run-time DVS is highly dependent on basically two mechanisms. First, the idle time detection/estimation is essential to determining the available slack time, used to slow down tasks. Second, the method of distributing this slack to instances determines the time spent at low speeds, thus the efficiency of the DVS strategy. In the following, the two mechanisms will be identified as slack estimation and slack distribution.
Normally, for systems where variations occur frequently, DVS techniques should be employed often. In this case, the overhead introduced by the scheduling decisions should be small enough to be worth- while. In contrast, more expensive DVS techniques may be used infrequently, if run-time variations are also infrequent. As the overhead of run-time DVS methods must be rather low, such run-time decisions must be simple, taken usually through fast heuristics. Additionally, they only affect a small number of task instances, often just those waiting to execute. Finally, for hard real-time systems, such decisions should not negatively affect the real-time properties of the system exposed by the off-line analyses. All deadlines must still be met.
Overview of Techniques
One of the simplest run-time DVS mechanisms can be found in the lppsRM/EDF strategies, introduced in Refs. [59,65]. Basically, DVS is in effect only when there is a single instance ready to run on the processor. In those cases, the instance may occupy the processor until the nearest arrival of another task instance. The execution speed for the instance ready to run is then chosen in such a manner that the execution will complete right before the next task arrival (NTA). For this reason, this method will be referred to as stretch-to-NTA. Note that the only time the processor runs at a lower speed is for isolated instances now and then, employing a greedy slack distribution mechanism. Furthermore, although slack might be produced early during the schedule, it will only be used by the last instance in the ready queue. This leads to a rather inefficient use of the slack, as most instances run at the maximal speed, while occasionally an instance will run at a very low speed.
A rather similar, but more efficient approach, is the cycle-conserving RM (ccRM) from Ref. [66]. Again, the slack produced by tasks finishing early is used to lower the speed of the instances ready to execute, which can be more than one this time. This method employs a better slack distribution strategy, as more than one isolated instance can use lower speeds. Along with ccRM, in Ref. [66] the same authors describe two EDF-based techniques. Cycle-conserving EDF (ccEDF) distributes newly produced slack more evenly over a larger number of instances than ccRM. In particular, in each scheduling point the processor speed is adjusted according to the current processor utilization, as in situation (a) in Section 18.4.4.2. The current utilization is computed as a sum of the actual utilization for tasks that completed their instances in the current period and the worst-case utilization for the tasks waiting to execute. Techniques that recompute the processor utilization in order to determine a new speed are often referred to as utilization updating methods. An extension of the ccEDF, the more aggressive look-ahead EDF (laEDF), is also presented in Ref. [66]. The processor speed is further reduced by deferring as much work as possible beyond the current deadline, while still making sure that the deadlines can be met even in the worst case.
The two approaches presented in Ref. [64], designed on top of EDF policy, handle the available slack in a more explicit manner. The mechanism used by the dynamic reclaiming algorithm (DRA) distributes the newly produced slack in a greedy manner to the next highest priority instance from the ready queue. Assigning slack according to task priorities is essential, as the real-time behavior of the system is conserved since lower priority task do not interfere with higher priority tasks. The second approach presented in
Ref. [64] is a more aggressive speed reduction (AGR) technique, that speculatively assumes that future instances will probably exhibit a computational demand lower than worst case. In this approach, slack is distributed to tasks according to their expected computational demands.
A similar mechanism for explicitly managing slack is taken in the EDF targeted lpSEH [23]. Slack is considered to be produced by both higher and lower priority instances finished before the currently executing instance. More slack can thus be detected and used to lower the speed of the current task. Nevertheless, all slack is used to compute the speed of the task about to execute, making the slack distribution strategy more greedy than AGR or the utilization-based laEDF.
Explicit slack management can also be successfully employed in RM-based DVS policies. The lpWDA [68] approach uses a work demand-based approach to estimate the available slack for the current task instance. All the time that is not demanded by other tasks, higher priority that are allowed to execute or lower priority that got the chance to execute before the current task is considered to be slack and used by the current instance. Since exact knowledge of the work demand is costly to obtain on-line, safe heuristics are employed to estimate the demand. A more detailed comparison between different existing run-time strategies, including experimental evaluations can be found in Refs. [23] and [75].
A Slack Management Policy for Fixed Priority Tasks
In this section, we give an example of a slack distribution strategy, introduced first in Ref. [61], builing on top of RM scheduling. This method is designed to work on processors with an arbitrary number of speeds and it has a low computational complexity, independent of the characteristics of the task sets. Briefly, in this policy, an early finishing task may pass on its unused processor time to any of the tasks executing next. But this slack cannot be used by any task at any time since deadlines have to be met. This problem is solved by considering several levels of slacks, with different priorities, similar to the slack stealing algorithm [76].
For a task set {ti }1 £ i £ N that exhibits M different priorities, M levels of real-time slack {Sj }1 £ j £ M are
used. Without great loss of generality consider that the tasks have different priorities, or M = N. Also
consider that the task set and slack levels are already ordered by priority, where level 1 corresponds to the highest priority. The slack in each level is a cumulative value, the sum of the unused processor times remaining from the tasks with higher priority. Initially, all level slacks Sj are set to 0. At run time, the slack levels are managed as follows:
Experimentally [61], it turns out that the mean proportional policy performs marginally better than the greedy one. However, depending on the processor, the overhead needed to maintain the information required by the mean proportional method might cancel its gains compared to the simpler greedy strategy. It is important to notice that the scheduling policy described above can be easily combined with other mechanisms, run-time, off-line, or intratask. The stretch-to-NTA run-time strategy can be additionally employed. The off-line minimum required speed computation for RM may also be used in a preparation phase. Furthermore, at task level, intra-DVS can be additionally employed. Nevertheless, as mentioned before, at some point the overhead needed by the additional mechanism will eventually cancel out its gain. Therefore, it is important to carry out an evaluation and tuning of the methods to be used, and select only those that give the largest energy reductions. A simulator such as SimDVS [23] is essential in this sense.
Comments
Post a Comment