|Lecture:||Mon Wed 11:00 - 12:15, EGR-1108|
|Required Text:||Jacob, Ng, & Wang, Memory Systems: Cache, DRAM, Disk, Morgan Kaufmann, 2007|
|Recommended Text:||Hennessy & Patterson, Computer Architecture: A Quantitative Approach, 4th Ed., Morgan Kaufmann|
|Recommended Text:||Smith & Franzon, Verilog Styles for Synthesis of Digital Systems, Prentice Hall|
|Professor:||Bruce L. Jacob, Electrical & Computer Engineering|
|Office:||1333 A.V. Williams Building|
|Office Hours:||Monday/Wednesday, Open-door policy ...|
|Teaching Assistant:||Jack Wang|
|Office Hours:||to be determined|
Course Handouts and General Information
Background on out-of-order execution
This describes how to schedule instructions in order of data dependences. Basically recreates dynamically, at runtime, the control-flow graph that a compiler generates statically, at compile time.
This describes how to ensure that instructions update the register file in program order, even if they execute and complete execution out of order. This ensures that precise interrupts can be guaranteed in an out-of-order pipeline.
This combines the previous two ideas. :)
This describes an old Verilog implementation of a double-issue RUU (the original RUU was single-issue).
Background on branch prediction
This describes, among other things, the use of adaptive/dynamic state machines (e.g. saturating counters) to track branch behavior and to predict future behavior based on that observed past behavior.
A nice description of the branch-target buffer and a description of what Yeh & Patt call a "static training" branch prediction scheme in which a branch history (string of bits signifying taken/not-taken status of most recent branch outcomes) is used to generate a prediction. That prediction can be generated by profiling a set of benchmarks. The article also explores a few other adaptive state machines besides saturating counters.
Combines the concept of the history-indexed table, as described in Lee & Smith  (called static training by Yeh & Patt) and the use of adaptive state machines, as described in Smith . Instead of putting a static prediction bit into the table, an adaptive state machine is put into each entry in the table. Note that later papers by Yeh & Patt change the name of the scheme to "two-level adaptive branch prediction" (dropping the "training").
Articles well worth reading
Documents describing the RiSC-16 (previous-generation 16-bit, single-issue architecture):
|File Name||Document Name||Document Description|
|F2002-RiSC-ISA.pdf||The RiSC-16 Instruction-Set Architecture||Describes the instruction-set architecture: machine-code forms, assembly-code forms, etc.|
|F2002-RiSC-seq.pdf||RiSC-16: Sequential Implementation||Describes a sequential implementation of the architecture: control flow, data flow, etc.|
|F2002-RiSC-pipe.pdf||The Pipelined RiSC-16||Describes a pipelined implementation of the architecture: control flow, data flow, pipeline stages, pipeline hazards, data forwarding, etc.|
|F2002-RiSC-oo.pdf||An Out-of-Order RiSC-16: Tomasulo + Reorder Buffer = Interruptible Out-of-Order|| Describes an out-of-order implementation: instruction queue (ROB/RUU), fetch buffers,
forwarding logic, wakeup/scheduling logic,|
recovery from branch misspeculations, memory request queue, commit logic, etc. Version 1 does not implement precise interrupts (sorry; I ran out of time).