Cross-Layer Theory and Models (Theme 1)

Initial objectives

To capture the interplay between energy-efficiency and reliability in embedded systems across the layers of system software and hardware, and effects of core scaling on the interplay.
To develop sound models allowing reasoning about power, performance and reliability in many-core systems to support power-proportionality, multiple operation modes, graceful degradation in the quality of computation and reconfiguration.
To develop model-based algorithmic support (plug-ins) for design-time correctness checking and optimisation and run-time adaptation of energy-reliability trade-offs.

Introduction

Solid understanding of relationships between energy pathways and reliability in many-core systems is vital for identifying the key principles of sustainability in many-core scaling and cross-layer system optimisation in design and run-time. Currently there is little theory that could underpin the interplay between energy and reliability across system layers. This research theme will address the energy-reliability interplay (ERI) at hardware and software layers. In doing so, we firstly analyse the ways in which ERI appears in modern embedded systems, learn about the effects of power management on reliability parameters and, vice versa, the effects of uncertainty and error control on energy consumption. We then capture these relations in formal models and algorithms that are tested on the new system design ideas in Themes 2-4. These theories, models and algorithms are being used to reason about ERI, enabling and exploiting potential trade-offs between energy-efficiency and resilience in many-core architectures. Our vision of the interplay is depicted in a simplified form below

Theme 1 image

Performance, power and reliability factors are shown as functions of the number of cores. One possible design scenario could be as follows. Suppose the system functionality demands a certain level of performance, which can only be achieved with a number of cores k* (at some point, Amdahl’s law limits the performance increase, although the efforts are made to prolong scalability along axis n, say, by introducing heterogeneity). The reliability factor, determined by the probability of fault in one core, pc, will then stipulate that the system will need a redundancy level of r* more cores (we simplify the situation here by using an additive factor of redundancy). As a result, the power consumption will be determined by the fact that k* + r* cores will need to be constantly powered on. This may not be sustainable from the energy budget and thus trade-offs between energy and reliability will need to be activated in order to support the fixed performance levels. This is just one scenario out of a myriad of possibilities and options that this research needs to address, undoubtedly considering redundancy not only at the core level. For this, we will need to analyse existing system design considerations, develop theory and formal models for incorporation into the methods developed in Themes 2 and 3.

A key conceptual challenge of Theme 1 is not only in providing the models that are helping many-core systems to function in an energy-efficient way with effective use of fault-tolerance mechanisms, but also in allowing the system to operate in an energy- and power-proportional manner and avoid interruption when the supplied energy levels drop. This requires a significant paradigm change in the context of embedded systems. For example, modern CPUs in mobile and embedded domains can consume one-tenth or less of peak power in their idle modes, even without engaging software-visible energy-saving modes. However, as soon as the systems begin to scale and involve large memory resources (DRAM), the dynamic power range narrows to a mere 50%. Power-proportionality thus becomes an issue affecting the performance across the layers of abstraction. This theme is investigating the ways an embedded many-core system consumes and distributes energy while maintaining its dependability in a wide dynamic power range, allowing for multiple modes and graceful degradation of performance and Quality of Service (QoS).

This theme is being led by Professor Alex Yakovlev (Newcastle), with individual workpackages led by investigators across institutions. The theme’s research will be tackled by Post-Doctoral Research Associates at Newcastle and Southampton.

Key Outputs and Connections to Other Themes

The outputs from this theme: Characteristics of cross-layer energy-reliability in state of the art embedded systems; methods for profiling energy-reliability parameters; models of energy-performance scalability in state-of-art many-core systems; methods for empirical measurement of scalability and power-proportionality; theory of cross-layer energy-reliability interplay; scalability models for performance, energy and reliability factors, including models for power-proportionality, QoS and system correctness; model validation and chip fabrication; updated models based on observations from Themes 2-4; algorithms for optimisation and verification of cross-layer energy-reliability for adaption and graceful degradation. The outputs from this theme will be used in research across Themes 2-4.

Theme 1 is linked to the other PRiME themes through constant monitoring of developments in Themes 2-4, drawing observations from existing and novel platforms of embedded systems. In return, the theory, models and algorithms developed in Theme 1 will influence both the development of optimisation and verification methods in Theme 2 (e.g. in the form of closed form expressions and algorithmic plug-ins), and also the design of monitors and controls between hardware and OS in Theme 3 (e.g. by means of identifying the correct lists of sensor and actuator parameters at the interface between hardware and software in order to maximise the effect of run-time optimisation). Finally, all models will be evaluated in the technology platform developments in Theme 4.