Contents CHAPTERS Computer Abstractions and Technology 2 1.1 Introduction 3 1.2 Seven Great Ideas in Computer Architecture 10 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 25 1.6 Performance 29 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Going Faster: Matrix Multiply in Python 49 1.11 Fallacies and Pitfalls 50 1.12 Concluding Remarks 53 1.13 Historical Perspective and Further Reading 55 1.14 Self-Study 55 1.15 Exercises 59 Instructions: Language of the Computer 66 2.1 Introduction 68 2.2 Operations of the Computer Hardware 69 2.3 Operands of the Computer Hardware 73 2.4 Signed and Unsigned Numbers 80 2.5 Representing Instructions in the Computer 87 2.6 Logical Operations 95 2.7 Instructions for Making Decisions 98 2.8 Supporting Procedures in Computer Hardware 104 2.9 Communicating with People 114 2.10 RISC-V Addressing for Wide Immediates and Addresses 120 2.11 Parallelism and Instructions: Synchronization 128 2.12 Translating and Starting a Program 131 2.13 A C Sort Example to Put it All Together 140 2.14 Arrays versus Pointers 148 2.15 Advanced Material: Compiling C and Interpreting Java 151 2.16 Real Stuff: MIPS Instructions 152 2.17 Real Stuff: ARMv7 (32-bit) Instructions 153 2.18 Real Stuff: ARMv8 (64-bit) Instructions 157 2.19 Real Stuff: x86 Instructions 158 2.20 Real Stuff: The Rest of the RISC-V Instruction Set 167 2.21 Going Faster: Matrix Multiply in C 168 2.22 Fallacies and Pitfalls 170 2.23 Concluding Remarks 172 2.24 Historical Perspective and Further Reading 174 2.25 Self-Study 175 2.26 Exercises 178 Arithmetic for Computers 188 3.1 Introduction 190 3.2 Addition and Subtraction 190 3.3 Multiplication 193 3.4 Division 199 3.5 Floating Point 208 3.6 Parallelism and Computer Arithmetic: Subword Parallelism 233 3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 234 3.8 Going Faster: Subword Parallelism and Matrix Multiply 236 3.9 Fallacies and Pitfalls 238 3.10 Concluding Remarks 241 3.11 Historical Perspective and Further Reading 242 3.12 Self-Study 242 3.13 Exercises 246 The Processor 252 4.1 Introduction 254 4.2 Logic Design Conventions 258 4.3 Building a Datapath 261 4.4 A Simple Implementation Scheme 269 4.5 Multicycle Implementation 282 4.6 An Overview of Pipelining 283 4.7 Pipelined Datapath and Control 296 4.8 Data Hazards: Forwarding versus Stalling 313 4.9 Control Hazards 325 4.10 Exceptions 333 4.11 Parallelism via Instructions 340 4.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53 354 4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply 363 4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 365 4.15 Fallacies and Pitfalls 365 4.16 Concluding Remarks 367 4.17 Historical Perspective and Further Reading 368 4.18 Self-Study 368 4.19 Exercises 369 Large and Fast: Exploiting Memory Hierarchy 386 5.1 Introduction 388 5.2 Memory Technologies 392 5.3 The Basics of Caches 398 5.4 Measuring and Improving Cache Performance 412 5.5 Dependable Memory Hierarchy 431 5.6 Virtual Machines 436 5.7 Virtual Memory 440 5.8 A Common Framework for Memory Hierarchy 464 5.9 Using a Finite-State Machine to Control a Simple Cache 470 5.