System-Level Power Management And An Overview:Markovian Decision Process-Based Approaches
Markovian Decision Process-Based Approaches
The most aggressive predictive power management policies turn off every PMC as soon as it becomes idle. Whenever a component is needed to carry out some task, the component must first be turned on and restored to its fully functional state. As mentioned above, the transition between the inactive and the functional state has latency and energy overheads. As a result, “eager” policies are often unacceptable because they can degrade performance without decreasing power dissipation. The heuristic power man- agement policies are useful in practice although no strong optimality result has been proved for these types of policies. On the other hand, stochastic control based on Markov models has emerged as an effective power management framework. In particular, the stochastic PM techniques have a number of key advantages over predictive techniques. First, they capture a global view of the system, thus allowing the designer to search for a global optimum which can exploit multiple inactive states of multiple interacting resources. Second, they compute the exact solution (in polynomial time) for the performance- constrained power optimization problem. Third, they exploit the vigor and robustness of randomized policies. However, a number of key points must be considered when deciding whether or not to utilize a stochastic DPM technique. First, the performance and power obtained by a policy are expected values, and there is no guarantee that the results will be optimum for a specific workload instance (i.e., a single realization of the corresponding stochastic process). Second, policy optimization requires a priori Markov models of the service provider (SP) and service requester (SR). One can safely assume that the SP model can be precharacterized; however, this assumption may not be true about the SR’s model. Third, policy implementation tends to be more involved. An implicit assumption of most DPM techniques is that the power consumption of the PM is negligible. This assumption must be validated on a case-by-case basis, especially for stochastic approaches. Finally, the Markov model for the SR or SP may be only an approx- imation of a much more complex stochastic process. If the model is not accurate, then the “optimal” policies are also approximate solutions.
In the following, we consider a discrete-time (i.e., slotted time) setting [9]. Time is described by an infinite series of discrete values tn = Tn, where T is the time resolution (or period), and n ≤ +. The EMC is modeled with a single SR (or user) whose requests are en-queued in a single queue, service queue (SQ), and serviced by a single SP. The PM controls over time the behavior of the SP (Figure 15.5).
Service Requester.
This unit sends requests to the SP. The SR is modeled as a Markov chain whereby the observed variable is the number of requests sr sent to the SP during time period tn. The service request process and its relevant parameters are known. Moreover, it is known that in each time period a maximum of Sp requests can be generated.
Example
Consider a “bursty” workload with a maximum of one request per period, i.e., the SR has two states as depicted in Figure 15.6. Since the workload comes in bursts, a request will be generated at time tn+1 with a probability of 0.85 if a request is received at time tn. On the other hand, if there is no request at time tn, then with a probability of 0.92 there will be no request at time tn+1. The mean duration of a stream of requests is equal to 1/0.15 = 6.67 periods.
Service Provider.
The SP is a PMC which services requests issued by the SR. In each time period, the SP can be in only one state. Each state sp ≤ {1, …, Sp} is characterized by a performance level and by a power consumption level. In the simplest example, one could have two states: ON and OFF. In each period, transitions between power states are controlled by a PM through commands: cmd ≤ CMD = {1, …, NC }. For example, one can define two simple commands: Go2Active and Go2Sleep. When a specific command is issued, the SP will move to a new state with a fixed probability depending on the command cmd, and on the departing and arriving states. In other words, when a command is issued by the PM, there is no guarantee that the command is immediately executed. Instead, the command influences the way in which the SP will act in the future. This probabilistic model describes the view where the evolution in time of power states is modeled by a Markov process in which the transition probability matrix is dependent on the commands issued by the PM. In other words, there is one transition probability matrix for each command cmd.
Coming back to the SA-1100 example with two states, Figure 15.7 depicts the probabilistic behavior of the device under influence of Go2Sleep and Go2Active commands.
The transition time from OFF to ON when Go2Active has been issued is a geometric random variable with an average of 1/0.04 = 25 time periods. Each power state has a specific power consumption rate and performance (e.g., clock speed), which is a function of the state itself. In addition, each transition between two power states in annotated with an energy cost and a latency, representing the overhead of transitioning between the two corresponding states. Such information is usually provided in the data-sheets of the PMCs.
Service Queue.
When service requests arrive during one period, they are buffered in a queue of length (Sq ≥ 1). The queue is usually considered to be a FIFO, although other schemes can also be modeled efficiently. The
request is processed and serviced within the period with a probability dependent on the power state of the SP. In this way, the model captures the nondeterministic service time of a request as a geometric random variable, similar to the exponential service time for the G/M/1 class in the queuing theory [21]. It follows that also the queue length (denoted by sq, with 0 £ sq < Sq) is a Markov process with transition matrix PSQ(sp, sr). Again coming back to the example, if a request is serviced with probability 0.9 in the ON state and with probability 0 in the OFF state and the buffer can contain at most one request, then the transition matrix PSQ(sp, sr) will be
This component communicates with the SP and attempts to set its state at the beginning of each period by issuing commands from among a finite set, CMD. This goal is in turn achieved only in a probabilistic sense, that is, the PM changes the transition matrix of the SP by issuing a particular command. In the aforemen- tioned example, the two possible commands are Go2Active, and Go2Sleep. The PM has all specifications and collects all relevant information (by observing SR, SQ, and SP) needed for implementing a power manage- ment policy. The power consumption of the PM is assumed to be much smaller than that of the PMCs it controls and so it is not a concern. The state of the EMC composed of the SP, the SR, and the queue is then a triplet, s = (sp, sq, sr). Being the composition of three Markov chains, s is a Markov chain (with S = Sr X Sp X Sq states), whose transition matrix P(cmd) depends on the command cmd issued to the SP by the PM. Hence, the system is fully described by a set of NC transition matrices, one for each command.
In the above description no mention is made of the energy source (i.e., the battery). In the stochastic approaches the goal is to minimize (or bound) the average power consumption of the SP, and not to maximize the expected battery life. This choice has several advantages: it does not need to consider the details of the power source (rate-dependent energy discharge characteristics, and energy recovery), while it still retains the primary feature of minimizing (or constraining) the power consumption level. However, there have been recent attempts to incorporate the battery behavior in modeling the EMC systems while finding a solution to the DPM problem [22].
At the beginning of each time period tn, the PM observes the “history” of the system, i.e., the sequence of states and commands up to tn-1. It then controls the SP by making a decision. In deterministic policies, the PM makes a single decision to issue specific command on the basis of the history of the system. In contrast, in the much broader set of policies called the randomized policies, PM assigns probabilities to every available command and then chooses the command to issue, according to this probability distribution. In this way, even if the same decision is taken in different periods, the actual commands issued by the PM can be different. Mathematically speaking, let Hn represent history of the EMC, then a decision d(Hn) in a randomized policy is a set of probabilities, pcmd, where each pcmd represents the probability of issuing command, cmd, given that the history of the system is Hn. A deterministic decision is the special case with pcmd = 1 for some command, cmd. Over a finite time horizon, the decisions taken by the PM are a finite discrete sequence d(1), …, d(n). We call this sequence a policy p. The policy p is the free variable of our optimization problem. If a policy p = (d(1), …, d(n)) is adopted, then we can define P p the n-step transition matrix under policy p.
Example
Consider the example of the previous section. Suppose that the PM observes the following history: s1 = (0, ON, 0), s2 = (1, OFF, 0) (states in periods 1, 2), and cmd(1) = Go2Sleep (action taken at time 1). A possible decision at time 2 in a randomized policy, when state s2 is observed, consists of setting probabilities for issuing commands Go2Active and Go2Sleep to pGo2Active = 0.34, pGo2Sleep = 0.66, respectively. In contrast, in case of a deterministic policy, for example, the PM will decide to issue the Go2Active command to the underlying PMC.
Policy Optimization.
The problem is to find the optimal policy (set of state-action pairs) for the PM such that some power- related cost function is minimized subject to a set of performance constraints. Consider that the system is in state s = (sp, sq, sr) at the beginning of time period n. A typical example of a cost function is the expected power consumption of the SP in that time period, which is denoted as c(sp, d(n)) and represents the power consumption when the SP starts in state sp and the PM takes decision d(n). (Note that c(sp ,d(n)) = åcmd Îd(n) Pcmdc(sp ,cmd).) A second parameter of interest is the performance penalty in that time period, which is denoted by d(sq) and is typically set to the queue length, sq. Finally, one can consider the request loss in the time period, denoted by b(sr , sq). The loss factor is in general set to one when a request arrives (sr = 1) and the queue buffer is full (sq = Sq); otherwise it is set to zero. We are interested in finding the optimum stationary PM policy for the system. This means that decision, dn, is only a function of the system state, s, and not of the time period at which the decision is made. In other words, the policy sought is one in which the same decision is made for a given global state regardless of time. With this in mind, we can now define a power consumption vector, c(ds), a performance penalty vector, d(ds), and a request loss vector, b(ds). Each vector has |S| elements. Furthermore, the sth element of each vector is the expected value of the corresponding quantity when the system is in global state s and decision dn is taken. When performing policy optimization, we want to minimize the expected power consumption while keeping the average performance and request loss below some levels specified by the user. Given the probability distribution of the initial system state at the beginning of the first period, p1, the problem of determining the optimal stationary policy can be formally described as follows:
where c(d(n)) = c(ds) if s = s(n) and DM and BM denote the upper bounds on average required performance penalty and request loss in any time period, respectively. The optimization is carried over the set of all possible policies. Hence, solving the aforementioned optimization appears to be a formidable task. Fortunately, if the delay constraint for the EMC is an active constraint, the optimal power management policy will generally be a randomized policy [23]. The randomized optimal policy can be obtained by solving a linear program- ming problem as explained in Ref. [9]. This approach offers significant improvements over previous power management techniques in terms of its theoretical framework for modeling and optimizing the EMC.
There are, however, some shortcomings. First, because the EMC is modeled in the discrete-time domain, some assumptions about the PMCs may not hold for real applications, such as the assumption that each event comes at the beginning of a time slice, or the assumption that the transition of the SQ is independent of the transition of the SP, etc. Second, the state transition probability of the system model cannot be obtained accurately. For example, the discrete-time model cannot distinguish the busy state and the idle state because the transitions between these two states are instantaneous. However, the transition probabilities of the SP when it is in these two states are different. Moreover, the PM needs to send control signals to the PMCs in every time slice, which results in heavy signal traffic and a heavy load on the system resources (and, therefore, more power).
Ref. [10] overcomes these shortcomings by introducing a new system model based on CTMDP. As a result of this model, the power management policy becomes asynchronous which is more appropriate for implementation as part of the operating system. The new model considers the correlation between the state of the SQ and the state of the SP, which is more realistic than previous models. Moreover, the service requester model is capable of capturing complex workload characteristics and the overall system model is constructed exactly and efficiently from those of the component models. An analytical-based approach is used to calculate the generator matrix for the joint process of SP–SQ and a tensor sum-based method is utilized to calculate the generator matrix of the joint process of SP–SQ and SR. The policy optimization problem under the CTMDP model can be solved using (exact) linear programming and (heuristic) policy iteration algorithms. Moreover, this work models the service queue as two queues consisting of low- and high-priority service requests, which furthermore captures the behavior of real-life EMCs.
Because the CTMDP policy is a randomized policy, at times it may not turn off the SP even when there is no request in the SQ. If the stochastic model exactly represents the system behavior, then this policy is optimal. However, in practice, because the stochastic model is not accurate enough, the CTMDP policy may cause unnecessary energy dissipation by not turning off the SP. For example, the real requests pattern on the SP may be quite different from what has been assumed in theory, and the SP idle time may be much longer than one would expect based on the assumption of exponential input interarrival time. In this case, keeping the SP on while it is idle can result in power waste. Qiu et al. [10] thus present an improved CTMDP policy (called CTMDP-Poll) by adding a polling state. The functionality of the polling state is very simple. After adding this state, even if the CTMDP policy allows the SQ to stay on when the SQ is empty, the policy will re-evaluate this decision after some random-length period of time. For example, if the SQ is empty and the PM has made a decision (with probability of 0.1) of letting the SQ to stay ON, then after 2s, if there is no change in the SQ, the models will enter the polling state, and the PM will have to re-evaluate its decision. At this time, the probability for it to still let SQ remain on is again 0.1. So as the time goes on, the total probability of the SQ remaining in the ON state reduces in a geometric manner. In this way, one can make sure that the SP will not be idle for too long, resulting in less wasteful energy dissipation.
The timeout policy is an industry standard that has been widely supported by many real systems. A DPM technique based on timeout policies may thus be easier and safer for users to implement. At the same time, it helps them achieve a reasonably good energy-performance trade-off. To implement a more elaborate DPM technique requires the users to directly control the power-down and wake-up sequences of system components, which normally necessitates detailed knowledge of hardware and involves a large amount of low-level programming dealing with the hardware interface and device drivers. Notice also that the various system modules typically interact with each other implying that sudden power-down of a system module may cause the whole system to malfunction or become unstable, i.e., direct control over the state of a system module is a big responsibility that should not be delegated unceremoniously. A DPM technique based on a simple and well-tested timeout policy and incorporated in the operating system will have none of the above concerns. Based on these reasons, Rong and Pedram [24] present a timeout- based DPM technique, which is constructed based on the theory of Markovian processes and is capable of determining the optimal timeout values for an electronic system with multiple power-saving states. More precisely, a Markovian process-based stochastic model is described to capture the power manage- ment behavior of an electronic system under the control of a timeout policy. Perturbation analysis is used to construct an offline gradient-based approach to determine the set of optimal timeout values. Finally, online implementation of this approach is also discussed.
Comments
Post a Comment