Nagpal, Rahul and Srikant, YN (2012) Compiler-assisted energy optimization for clustered VLIW processors. In: JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 72 (8). pp. 944-959.
jou_par_dis_com_72-8_944-959_2012.pdf - Published Version
Restricted to Registered users only
Download (1811Kb) | Request a copy
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.
|Item Type:||Journal Article|
|Additional Information:||Copy right for this article belongs to Elsiver Ltd|
|Keywords:||Scheduling;Clustered VLIW processors;Energy-aware scheduling|
|Department/Centre:||Division of Electrical Sciences > Computer Science & Automation (Formerly, School of Automation)|
|Date Deposited:||03 Aug 2012 04:52|
|Last Modified:||03 Aug 2012 04:54|
Actions (login required)