R. Mameesh and M. Franklin (USA)
Multi-core, micro-architecture, parent thread, childthread, speculation, result integration.
In this paper, a new architecture is proposed that obtains high performance for single-threaded applications in a multi-core environment, while using simpler cores to meet the high throughput and low power requirements. The proposed execution paradigm, hierarchical execution, divides the main thread, into multiple threads arranged in a hierarchy, with the main thread at the top of the hierarchy. This arrangement creates a parent-child relationship between adjacent threads, with the parent located above its child. Each child executes a subset of the instructions executed by its parent, and so it is faster, more speculative and can better hide the latency of long latency instructions. Results generated by a child are forwarded to its parent, which consumes from them the correct ones without executing their corresponding instructions. The parent thread in turn forwards its results along with its child's results to its own parent and so on until the highest level in the hierarchy (main thread). We explored hierarchical execution with up to 5 levels in the hierarchy. We will show that with minor additions of hardware, hierarchical execution achieves a 22% average performance improvement over a more complex single thread superscalar scheme with a much larger instruction window.
Important Links:
Go Back