ENEE 446 - Digital Computer Design by B. Jacob

Spring 2018

Course Information

Lecture:	Mon Wed 11:00 - 12:15, EGR-1108
Mailing List:	enee446-0101-spr18@coursemail.umd.edu
Required Text:	Jacob, Ng, & Wang, Memory Systems: Cache, DRAM, Disk, Morgan Kaufmann, 2007
Recommended Text:	Hennessy & Patterson, Computer Architecture: A Quantitative Approach, 4th Ed., Morgan Kaufmann
Recommended Text:	Smith & Franzon, Verilog Styles for Synthesis of Digital Systems, Prentice Hall

Instructor Information

Professor:	Bruce L. Jacob, Electrical & Computer Engineering
Office:	1333 A.V. Williams Building
Phone:	(301) 405-0432
Email:
Office Hours:	Monday/Wednesday, Open-door policy ...

Teaching Assistant:	Jack Wang
Phone:
Email:	jackwangumd@gmail.com
Office Hours:	to be determined

Midterm Results

Course Handouts and General Information

Syllabus.pdf
verilog-handbook.pdf. This is a concise overview of the Verilog programming language.
realize-verilog.pdf. Gives a functional view of Verilog; i.e. if you want to build a processor model, this shows how. However, it confuses blocking/non-blocking assignments (calls "=" non-blocking and "<=" blocking). Otherwise, it is a decent overview.
IBM 360/91's Out-of-Order Fixed-Point Pipe. Describes (my guess as to) the implementation of instruction enqueue, issue, and commit in the IBM 360/91 fixed-point pipeline. In contrast, the 360/91's floating-point pipeline is famous for its implementation of Tomasulo's algorithm, which is well documented.
"PC processor microarchitecture." Keith Diefendorff. Microprocessor Report, vol. 13, no. 9, pp. 16-22, July 12 2000.
Analysis of Cost and Performance
"Virtual memory: Issues of implementation." Bruce Jacob and Trevor Mudge. IEEE Computer, vol. 31, no. 6, pp. 33-43. June 1998.
The OS and Multitasking: An Example
"Object code optimization." E. S. Lowry and C. W. Medlock. Communications of the ACM, vol. 12, no. 1, pp. 13-22. January 1969.

Background on out-of-order execution

"An efficient algorithm for exploiting multiple arithmetic units." Robert. M. Tomasulo. IBM Journal of Research and Development, vol. 11, no. 1, pp. 25-33, January 1967.
This describes how to schedule instructions in order of data dependences. Basically recreates dynamically, at runtime, the control-flow graph that a compiler generates statically, at compile time.
"Implementing precise interrupts in pipelined processors." James. E. Smith and Andrew. R. Pleszkun. IEEE Transactions on Computers, vol. 37, no. 5, pp. 562-573, May 1988.
This describes how to ensure that instructions update the register file in program order, even if they execute and complete execution out of order. This ensures that precise interrupts can be guaranteed in an out-of-order pipeline.
"Instruction issue logic for high-performance, interruptable pipelined processors." G. S. Sohi and S. Vajapeyam. In Proc. 14th Annual International Symposium on Computer Architecture (ISCA-14), pp. 27ā-34. June 1987.
This combines the previous two ideas. :)
"An Out-of-Order RiSC-16 -- Tomasulo + Reorder Buffer = Interruptible Out-of-Order." ENEE 446: Digital Computer Design, Fall 2000. Prof. Bruce Jacob, http://www.ece.umd.edu/~blj/
This describes an old Verilog implementation of a double-issue RUU (the original RUU was single-issue).

Background on branch prediction

"A study of branch prediction strategies." Jim Smith. In Proc. 8th Annual International Symposium on Computer Architecture (ISCA), pp. 135-148. Minneapolis MN, May 1981.
This describes, among other things, the use of adaptive/dynamic state machines (e.g. saturating counters) to track branch behavior and to predict future behavior based on that observed past behavior.
"Branch prediction strategies and branch target buffer design." Johnny K. F. Lee and Alan Jay Smith. IEEE Computer, vol. 17, no. 1, pp. 6-22. January 1984.
A nice description of the branch-target buffer and a description of what Yeh & Patt call a "static training" branch prediction scheme in which a branch history (string of bits signifying taken/not-taken status of most recent branch outcomes) is used to generate a prediction. That prediction can be generated by profiling a set of benchmarks. The article also explores a few other adaptive state machines besides saturating counters.
"Two-level adaptive training branch prediction." Tse-Yu Yeh and Yale N. Patt. In Proc. 24th Annual International Symposium on Microarchitecture (MICRO), pp. 51-61. Albuquerque NM, November 1991.
Combines the concept of the history-indexed table, as described in Lee & Smith [1984] (called static training by Yeh & Patt) and the use of adaptive state machines, as described in Smith [1981]. Instead of putting a static prediction bit into the table, an adaptive state machine is put into each entry in the table. Note that later papers by Yeh & Patt change the name of the scheme to "two-level adaptive branch prediction" (dropping the "training").

Articles well worth reading

"The MIPS R10000 superscalar microprocessor." Kenneth C. Yeager. IEEE Micro, vol. 16, no. 2, pp. 28-40, April 1996.
"Implementing precise interrupts in pipelined processors." James. E. Smith and Andrew. R. Pleszkun. IEEE Transactions on Computers, vol. 37, no. 5, pp. 562-573, May 1988.
"Dynamic instruction scheduling and the Astronautics ZS-1." James E. Smith. IEEE Computer, vol. 22, no. 7, pp. 21-35, July 1989.
"An efficient algorithm for exploiting multiple arithmetic units." Robert. M. Tomasulo. IBM Journal of Research and Development, vol. 11, no. 1, pp. 25-33, January 1967.

Documents describing the RiSC-16 (previous-generation 16-bit, single-issue architecture):

File Name	Document Name	Document Description
F2002-RiSC-ISA.pdf	The RiSC-16 Instruction-Set Architecture	Describes the instruction-set architecture: machine-code forms, assembly-code forms, etc.
F2002-RiSC-seq.pdf	RiSC-16: Sequential Implementation	Describes a sequential implementation of the architecture: control flow, data flow, etc.
F2002-RiSC-pipe.pdf	The Pipelined RiSC-16	Describes a pipelined implementation of the architecture: control flow, data flow, pipeline stages, pipeline hazards, data forwarding, etc.
F2002-RiSC-oo.pdf	An Out-of-Order RiSC-16: Tomasulo + Reorder Buffer = Interruptible Out-of-Order	Describes an out-of-order implementation: instruction queue (ROB/RUU), fetch buffers, forwarding logic, wakeup/scheduling logic, recovery from branch misspeculations, memory request queue, commit logic, etc. Version 1 does not implement precise interrupts (sorry; I ran out of time).

Assignments

ID	Out	Due	Write-up	Distribution	Diagram
Project 1	24 Jan 2018	12 Feb 2018	p1.pdf	Project 1
Project 2	14 Feb 2018	12 Mar 2018	p2.pdf	Project 2
Project 3	26 Mar 2018	9 Apr 2018	p3.pdf	Project 3
Project 4	18 Apr 2018	2 May 2018	p4.pdf	Project 4