This paper proposes 2 new trace/region selection algorithms. Many dynamic optimizers attempt to detect hot execution paths (i.e. traces) through the control flow graph during runtime. These methods must be lightweight (as to not incur too much overhead). At the time of this publication the de facto method used was an algorithm called NET (next executing tail). This paper proposes 2 new trace selection techniques. First LEI (last executed iteration) which only selects traces that are cyclic (the branch at the end of the trace branches to the beginning).
I just got an email today indicating that the ISCA (2008) program committee didn't hit their deadline for the rebuttal period. This email was sent out:
from ISCA08 Submissionsto Jason Mars , cc Matt Frank , date Jan 15, 2008 10:39 AM subject [ISCA 08] Rebuttal period Dear Jason Mars, The rebuttal period for ISCA will begin on January 22 (rather than January 16) and close on January 25. -Wen-mei Hwu
We can take the dynamic hot paths of a program, optimize it assumeing we won't exit it early, and patch that new region into or executing code. However, in the rare case that we need to take a cold path, exiting the hot path early requires the undoing of these optimizations. This compensation code adds complexity and overhead. In this work the authors show a scheme for removing this complexity by using hardware structures to enable code region atomicity. They demonstrate their techinque with Java dynamic optimization.
In this work, the authors use Itanium's performance monitoring capabilities to detect and form dynamic traces of hot code. The Branch Trace Buffer provided by the Itanium architecture is used to detect dynamic paths and their hotness. Cache performance information about these traces can then be collected. Phases are determined using 2 tables, the local and global. The local table holds recent traces, the global holds predictions on what traces are hot. As long as these tables are 60% similar we are in a hot phase. They detect a phase change is fewer than 60% of sampled traces are optimized.
This work proposes Trident, a hardware based framework to support dynamic optimization at the native binary level. This framework uses an intricate hardware based hotpath profiler that uses taken branch histories to extract hat path information. It also uses specialized trace management performance counters and hardware to manage the trace code cache. In addition they also propose some hot value profiler hardware support to allow the implementation of low overhead value specialization optimization techniques.
This work states that round robin is the wrong way to schedule the compilation thread of Java VMs. It then demonstrates that setting a static thread processor utilization for the compilation thread performs better. Their experiments show that setting the utilization of the compilation thread to 100% gives the best results. Basically this says whenever we see hotspots, recompile and optimize immediately and until completion.
This work presents a java virtual machine (JVM) that is designed to present a homogeneous layer of abstraction on top of the cell heterogeneous processor. Multithread java programs run accross both the PPE and SPEs. The PPE is used for bytecode instructions that involve syscalls. The SPE is used for all other bytecode instructions. Thread state is kept in main memory and the local store of each SPE is used as a software controlled cache.
First and foremost Bit Sect is a place to come and share knowledge. Everyone has to read papers and keep informed on the work that is out there, why not write a 'bit' about the paper. Everyone has opinions about the latest conference, news updates about the state of our scene, the recent work that's getting published, etc so why not write a piece about it. Bit Sect is a place for you to read and write bits and pieces about breaking edge research. Secondly there is the Bit Sect think tank.