Computer Architecture

updated 2009-04-29.

Some notes on Computer Architecture. Very incomplete.


[FIXME: should I put links to Beowulf here ?]

David also maintains related files:

``I wisdom dwell with prudence, and find out knowledge of witty inventions.'' -- Proverbs 8:12


computer architecture news

The latest computer architecture information is at

computer architecture comic strips

The Freedom CPU Project


is one of the architectures suggested for the F-CPU. Its major advantage is: gcc has already been ported to it, so we don't need to write a new compiler on top of everything else. So we can immediately compile the Linux kernel and other code for this chip *now*, before we even start doing anything else, which should make it easy to cache analysis and other trade-off analysis. Also, it means that when the hardware is turned on for the first time, it could boot directly into Linux, which would be cool.

(Can we really go to 64 bit architecture and still use this compiler ?)

(Can we really use a TTA architecture for the attached FPUs ?)


is one of the architectures suggested for the F-CPU.


is one of the architectures suggested for the F-CPU.

Using FPGAs to simulate CPUs

(only with FPGAs can you have reconfigurable computing)

see also robot_links.html#PLD Programmable Logic (FPGA, PLD, CPLD, etc.) and the devices needed to program them.

see also #simple_cpu for simple CPU architectures that one would think would be simple to implement in a FPGA.

see also vlsi.html#PCI_on_FPGA for FPGA devices compatible with the PCI bus.

[backup copy of something I wrote on ]

I imagine that just about anyone who has programmed 80x86 assembly language has dreamed about building their own CPU with their own assembly language.

If you've dreamed about building your own CPU that runs your own assembly language, today is a wonderful time to be living. There are many, many ways to fulfull your vision. (A few of them are even useful). -- DavidCary ( )

(Note: this page is *not* about building 80x86-based desktop computers. That sort of thing is discussed over at . Building a custom 80x86-based laptop -- I don't know.).

* alt.comp.hardware.homebuilt FAQ by Mark Sokos

== useful restrictions ==

=== tiny size ===

Sometimes you want a tiny little processor -- say, you want it to fit inside a model rocket.

Imagine a fully custom tiny little processor, running exactly the instruction set you've always dreamed of. How is that different from a tiny PIC or Atmel processor, programmed to *emulate* your instruction set?

[ Taniwha Flight Computer Home Page]

[ "Picaro: A Stamp-like Interpreted Controller"] article by Tom Napier 1998-04 in _Circuit Cellar Ink_

I'm thinking of building one of these with an external *serial* memory (much smaller and cheaper than the normal parallel memory ... slow, but probably plenty fast enough for what I want it to do). Unlike most processors, this way you can execute code in serial memory. (I think this is also going to give the lowest-cost and also the lowest-power way to get your own custom instruction set).

=== higher speed ===

You can build a "processor" that performs some kinds of special-purpose tasks faster than Intel's latest device. Use FPGAs.

You can build a CPU that is "instant on" and "instant off" (use FLASH, and enough capacitance to dump the essential bits of the current state of RAM to FLASH).

=== extreme environments ===

Can you get it working over the entire "industrial temperature range"? ( -40 °C to +125 °C )

== challenging, educational, "fun" restrictions ==

=== built with "classic" TTL chips only ===

Generally built around a 74LS181 ALU or similar ( 74HC181 ). (Several homebuilt CPUs have been built like this)

=== bizzareness ===

Since we're building thing by scratch, why count in the "natural binary code"



Couldn't we increment address registers in some *other* sequence ?

What other "interesting" sequences would be interesting to experiment with ?

Is there some sequence that uses *less* hardware (easier to build) -- perhaps LFSR ? Is there some sequence that uses less power (runs longer on batteries) -- perhaps gray code ?

=== all one part ===

As many of you are aware, most projects are built from a multitude of different parts. There's always one part that takes the longest time to ship in. And then I always break something, and I have to wait even longer for the replacement to ship in.

However, computers have been built out of large numbers of the *same* device wired together appropriately.

* transistors * [ 4,100 ICs, each containing a single 3-input NOR logic gate.] * dual 3-input NOR gate ICs

I (DavidCary) have been wondering:

* given what we now know about "simple" RISC and zero-operand instruction sets, could I design a CPU with significantly *less* than 4,100 NOR gates ? * What other "universal" chips can be used (in large enough quantities) to build an entire CPU ?

I want to use chips that are readily available, and also "dense". (Obviously, the densest chips are the all-in-one CPU microcontrollers ... but I can't customize those. What other points on the spectrum are available?)

Let's focus on a 8 bit register (8 D flip-flops) for a moment. I imagine I'll use a bunch of them in my CPU. (program counter, registers, address pointers, etc.)

 chips/bit;  chips/ 8 bit register; chips/3-input-NOR

 '''universal chips'''
 5(?) 40 1 single NOR
 3(?) 24 1/2  dual NOR
 2(?) 16 1/3 triple NOR 74HC27
 1    8  1  dual 4-input mux 74HC153
 1/2  4  3(?) quad 2-input mux inverting 74HC158

 '''non-universal chips'''
 1/2  4  N dual D flip-flop 74HC74
 1/4  2  N quad D flip-flip 74HC173
 1/6  2  N hex D flip-flop 74HC174
 1/8  1  N octal D flip-flop 74HC564

Obviously, anything that could be built from single NOR gate ICs, could also be built from the much "denser" dual NOR gate ICs, and it would take significantly less space, time, effort, weight, etc.

It looks like the octal D flip-flop is the densest chip. Unfortunately, it is *not* "universal" -- parts of the CPU (the ALU, etc.) need to act in ways that I don't think D flip-flops can act. So if I want to stick to the "all one part" idea, I can't use it.

It looks like it takes three '158 chips to emulate a simple 3-input NOR, but only one '153 chip. So I suspect the '153 is better for building the random logic in the control section and the ALU.

* What other "universal" chips are there ? * Which of the universal chips can implement a given CPU in the fewest number of chips ? (If the CPU is dominated by registers, I suspect it will be one that can store the most bits in the fewest number of chips -- the '158 is the best I've found so far).

=== all 2 parts ===

Similar to the above, but easing the restriction to allow 2 different kinds of chips.

* If I use 2 different kinds of chips, which 2 chips can implement a given CPU in the fewest number of chips ? (The '564 is 4 times as dense as the '158 for storing bits ... and the '153 looks like it will be more dense than the '158 for most random control logic.)

Extremely Simple CPU Architectures

see also "Tiny robots" robot_links.html#tiny and

I am fascinated by CPU designs that are extremely simple, approaching "minimal", in several (sometimes incompatible) senses of the word "simple".

Here "CPU", "MCU", and "PE" are almost interchangeable.

Often these characteristics exhibit synergy -- when you eliminate some opcodes, that eliminates some gates (making it lower-cost) and makes it easier to understand (less documentation required). Occasionally, though, driving a design to extreme simplicity according to one measure causes extra complexity in another area to compensate. This general concept is known as the waterbed theory . When applied to CPU designs, this is called the Turing tarpit #tarpit .

For purposes of robotics, "low-cost" and "Simple interface" are usually the dominant considerations. Some of the concepts of massively parallel processing are also present when one builds swarms of simple robots, but the kind of random communication between constantly-changing arrangements of simple robots is very different than the communication between the rigidly connected (most commonly in a 2 D mesh or a hypercube) elements of common cellular automata and multi-processors.

See also Using FPGAs to simulate CPUs #FPGA for some very simple CPUs designed to fit onto FPGAs. 2 very different reasons for simplicity there: (a) a simpler CPU can fit onto a smaller FPGA, making it much less expensive. (b) making the CPU smaller allows you to fit more copies of the CPU on a given FPGA, increasing MIPs at no cost. [FIXME: should combine these into 1 section ? Since the same op-code set may be implemented many ways, in TTL, in FPGA, in custom VLSI, it doesn't really make sense to split them into seperate sections ... On the other hand, ``less hardware'' means something a little different in these 3 technologies. ]

[here I *list* simple CPUs I know about; Opcode considerations #considerations and #FPGA also talk about tips for designing new CPU architectures. ]

... Here I also list all the CPUs I know about that were built up out of TTL.

cellular automata

cellular automata is related to other interests I have:

[FIXME: CA stuff scattered elsewhere ...]

The cellular automata questions I'm most interested in are:

There's a little loop here -- first we use (simulated) cellular automata to learn about replication, then we use that information to design replicating tools. We also use (simulated) cellular automata to explore good rules and good patterns for computronium. Then we use those replicating tools to (build enough copies of themselves to) build computronium. Then we use *that* computronium (hardware cellular automata) as a better computer.


It has been shown that self-replicators can be designed in cellular automata #cellular_automata . some of this deals with ``real'', physical replication. While other parts -- cellular automata, quines, etc -- deal only with patterns in a computer. Should I separate them ? But some of the theory applies to both.

related to reconfigurable robots robot_links.html#reconfigurable and robot construction (humans building robots) robot_links.html#construction and tool closure 3d_design.html#closure . and the bootstrap problem [FIXME:] [FIXME: cross-link all the self-replication stuff on my web pages. nano, robot, cellular automata, etc. Point back and forth from ``replication'' section to computer architecture # cellular automata robots nanotech idea_space and unknowns ]

Minimum Instruction Set Computing (MISC)

see #simple_cpu and minimal_instruction_set.html

other attempts at a minimal instruction set

Steamer16: a high-performance homebrewer's microprocessor by Myron Plichota <myron.plichota at>. written in VHDL and prototyped on a single wire-wrapped protoboard. Has only 7 instructions. Packets of 5 instructions (5 instructions * 3 bits = 15 bits/packet) execute in 6 cycles/packet at a cycle rate of 20 MHz. Both the address and data busses are 16 bits. Unusual ``ArF'' call/return protocol.

qUark ../mirror/quark.txt a viable stack-computer with 4-bit opcodes (c) vic plichota, original concept by Myron Plichota Dec '98.

Date: Sat, 20 Feb 1999 12:54:34 +0300
From: "Stas Pereverzev" 
Subject: Re: nFORTH v2.3
Content-Type: text/plain;
Content-Transfer-Encoding: 7bit

>Comments, folks?

You need only five instructions, not 16. They are:

1. nand
2. shr

3. store ( addr n -- )
4. lit   ( -- n )

5. ncret  \ JUMP to addr in N, if carry flag isn't set in T,
          \ also drop both T and N

Also, if PC is memory variable (or can be addressed as memory variable)
we can awoid "ncret" instruction:
In that case we sholud use NCSTORE instead STORE:
ncstore (addr n flag -- )

ncstore (addr n 0 ) - same as store,
ncstore (PC n -1 )  - same as ncret

We need only 2 bits per instruction in that case.

That all folks ;-)



(does this really work ???)

Turing tarpit

Turing Tar Pit.

Opcode considerations

Some things you might think about if you are designing a new instruction set (for a Von Neumann machine).


[FIXME: I have a lot of information on subroutines and the importance of well-factored code scattered about ... should I gather it all together here, or since it's so very important, leave a brief reference at each place ?]

(see also data_compression.html#program_compression )

Taxonomy of Multiprocessors

also see Koopman's taxonomy of microprocessors [FIXME: ]

_Computation Structures_ by Stephen A. Ward and Robert H. Halstead, Jr. (c)1990 by The Massachusetts Institute of Technology The MIT Press

p.606 Models of Computation The von Nuemann model of computation ... alternative models ... include:

The above list is neither exhaustive nor even representative...

... p.612 Taxonomy of Multiprocessors ... Michael Flynn's taxonomy ...

Computation Structures

Ward and Halstead

quotes from _Computation Structures_ by Stephen A. Ward and Robert H. Halstead, Jr. (c)1990 by The Massachusetts Institute of Technology The MIT Press

... p. 347 Programming languages offer several /classes/ (allocation disciplines) for storage. These differ ... in the degree to which responsibility for allocation and deallocation of storage is assumed implicitly by the system (as opposed as being left to the programmer).

... common storage classes:

... p.354 the /general-register/ organization, in which ... active registers are available for explicit use by programs to hold frequently accessed data. ...

Proponents of the general-register organization cite two relatively independent argument in its favor:

... objections to the general-register scheme: * Lack of implementation transparency. The number of general registers is effectively "frozen" in the machine-language specification; a higher-performance implementation, offering a greater number of registers, will require modification of the machine langaguage and hence of the software designed for it. * Programming difficulty. The need to allocate registers to variables complicates the programming task.

... There are alternatives to the general-register organization that offer, at some additional implementation cost, its performance and compactness advantages while avoiding the above objections. ... cache-memory ... ... tricks for address encoding ... [DAV: is it really possible to have a "no programmer-visible registers" architecture ?]

... p.365 Pascal and Ada are also concerned with /safety/ in the sense that they strive to make as many programming errors as possible detectable at compile time rather than run time.

neural networks (NNs)

[FIXME: gather other NN information scattered over my other pages here.]

minimal-gate CPU

[FIXME: move to #simple_cpu says:

From: (Michael Josefsson)
Subject: How to do the smallest processor ever?
Date: 13 Nov 1998 00:00:00 GMT
Newsgroups: comp.arch

Hi all,

while everybody seem to make everything as parallell as possible (to get the
speed up), I tinker with the idea of making the easiest possible processor.

Wherever I can, I want to get away from hardware. So these are some ideas that I have
come up with (discussions baits perhaps):

1. Make it serial(!). Yes, it slows down things but OTOH there is only 1 bit
busses. Adding 2 numbers (8-bit) would require a great number of clocks in
multiples of eight. Slow? Yes! But feasible.

2. The original idea used only one 1 bit bus but I have come to the conclusion
(opposite the reason to do this in the first place) that it is far more easy
to implement something using 2 or 3 buses.

3. Minimise the number of registres. Yes a PC, IR and AR (address register)
would be hard to get by without, but otherwise it would suffice to have only
one other register - the accumulator. Incrementing PC is done with the ALU.

4. External memory is always parallell so I imagine that there would be 8
bit busses to the outside.

5. Speed would be slow with some 10-15 clocks to get the smallest operation
done. Given a TTL implementation the total number of mips, at 10 MHz say,
would be around 0.7 - not much, but then again what would one expect from a
handful of TTLs?

Some ASCII graphics might help

                (for shifts)                        * 1bit buses everywhere!
           |-----------<------------                * Accumulator knows how to shift
	|  -|\   ------------------ |               * Everything is clocked bit-by-bit
        |---| |--|8bit accumulator|--------|        * PC and IR better parallell?
        |   |/   ------------------        |
        |                                  |
        |                                  |
        |                    0/1(ld/pc++)  |            Proc Control
        |              /|     |            |            /\/\/\
        |            /  |-----/------------|             | | |
        |-----------|ALU<                             --------------
        |            \  |---------/-------------|-----|IR/OP decode|
        |              \|         |             |     --------------
        |           1bit ALU      |             |
        |                         |             |
        |          -------------  |             |
        |----------|Program cntr|-|       --------------
        |          -------------  |       |Parallell to|
        |                         |       |serial conv.|
        |          -------------  |       --------------
        |----------|Addr reg    |-|             |
        |          -------------  |      From ext memory
        |                         |
        |                         |
   -----------               -----------
   |serial to|               |serial to|
   |parallell|               |parallell|
   |converter|               |converter|
   -----------               ----------
        |                         |
        V                   Ext address
    To ext memory

Has this been done anywhere? The focus seems to get everything as fast as
possible and not as small as possible.

/Micke Josefsson
University of Linkoping

One reply

From: Gary Boone <>
Subject: Re: How to do the smallest processor ever?
Date: 13 Nov 1998 00:00:00 GMT
X-Posted-Path-Was: not-for-mail
Content-Type: text/plain; charset=us-ascii
X-ELN-Date: Fri Nov 13 09:05:21 1998
Organization: Micro Methods
Mime-Version: 1.0
Newsgroups: comp.arch

Regarding minimal processor history aspects:

1. See "Computer Structures ..." Siewiorek, Bell & Newell,
   1982, beginning at page 581, for a description of TMS1000
   microcontrollers, an increased function (and cost
   re-design based on TMS0100 architecture that I worked on.

2. In 1971 "state of the art" 10-micron 1-metal PMOS,
   4-bit data paths were used, not for performance, rather
   allow less chip area (and single-chip design) compared to
   then-conventional 1-bit serial data paths. Consider, for
   example, area savings due to 3-transistor RAM compared to
   conventional 6-transistor shift register memory. Control
   decode for 4-bit data paths also saved (no "bit" timing).

3. The "most minimal" processor I know of was made at
   circa 1976, for use in digital watches. As I recall, this
   processor could only count, not even add. The architect
   Steve McCrystal. I've lost track of my copy of his
   instruction set. Steve, if you see this, please post it!

Gary Boone

The Motorola MC14500B had only 16 pins; processed everything serially.

The Museum of HP Calculators mentions some zero-IC calculators (two types: slide rules and discrete transistors).

stack machines

(see Minimum Instruction Set Computing (MISC) for more stack machines)

see also data_compression.html#space-optimizing_compiler and data_compression.html#compact_instruction_set for more thoughts on compact, dense, non-bloated code.

operating systems development (your own OS)

David Cary thinks that if you know enough to develop an adequate OS, your time and skill could be applied to other projects more worthy of your effort todo.html .

Nevertheless, there's nothing like producing a system where you wrote every byte of its software. (It may be a waste of time, but it's sure a learning experience). You might want to start with a simple PIC robot_links.html#pic .

See also

[FIXME: consider moving to ... ... or perhaps ]

David Cary is fascinated by the variety of computer architectures, and nearly all of the "layers of abstraction" idea_space.html#level of a computer system (from the raw atoms, electrons, and photons at the bottom ... through typography ... to the user interface at the surface). However, I am singularly uninterested in the OS layer. I've seen this sequence of events far too often:

  1. Someone proposes a cool new computer architecture.
  2. then insists that a cool new OS must be written for it.
  3. Which means that all-new applications must be written for this OS.
  4. The time and effort of all these software projects drains the energy of the visionary. Time and resources are spread too thin.
  5. None of these projects get enough critical mass to finish a "killer application".
  6. all this effort comes to naught.

(For example of the ``everything must change'' mentality, see ``a major architectural overhaul is still required, on pretty much all fronts'' -- Ron , or check out Luke 5:36-38 )

Why re-invent the wheel ? There are already far too many OSes out there (in my opinion). Alan Turing proved that any OS can be ported to any computer architecture. Of course we want systems that are better in *every* area, but (correct me if I'm wrong here) it should be possible to just improve one area at a time. ( That's the main idea behind "scaffolding" and "levels" idea_space.html#level )

If you're interested in computer architecture, fine, invent a new architecture, and get an ugly old OS running on this clean new hardware (I hear that Linux is *designed* to be easy to port).

If you want to make your own OS, fine, get it running on ugly old hardware and port a bunch of the ugly old applications to it (there are tons of "free source" applications *designed* to be easy to port .

There's tons of applications (both open and closed source) that are written in C or C++. So you might consider

Porting "gcc" or another C compiler #porting_c to your OS. Once you have the C compiler and C libraries ported to your OS, suddenly thousands of programs "just work".

Porting a FORTH or FROTH compiler is much easier than gcc, but not as many applications are written in this language.

If you're really serious about creating your own OS, here's some links I think you might be interested in.

First we have some general links relevant to developing a OS:

Next we have lots of stuff about particular OS ideas (usually embedded in a specific OS):

Some systems have the GUI mixed up in the OS. Other systems have a distinct GUI layer between between the application and the OS.

[FIXME: should I split out Beowulf and similar "operating systems" into a seperate category ?]

Hope you have fun !

compiler quotes

There seems to be some confusion about what a compiler can and cannot do.

Here I have information about compilers in general. Also see Porting C compilers #porting_c

I'm especially interested in:

compiler quotes

Porting C compilers

Porting the "gcc" compiler; Porting other C compilers

see also general compiler quotes #compiler

2 sections here:

Porting "gcc", a full-size commercial-strength open-source compiler, and re-targeting it as a cross-compiler.

porting simpler C compilers, especially ones targeting very simple processors such as the Microchip PIC, especially ones that optimize space over time. This includes "Small C", a highly restricted subset of C (no structures; DAV finds this annoying) that was designed to require a far simpler compiler than one that can handle the full C language.

other C Compilers:

Since the full source to GCC is so large, perhaps it would be good to get a simple C compiler working first on a new OS or architecture.

(Does it make sense to bootstrap gcc using Small C or another C compiler ? Or should we just use gcc on a previous machine as a cross-compiler ? )

information about porting "gcc":

low power design

low power design [Should this be moved to vlsi.html ?]

Designing for low power has secondary benefits ... assembling a quiet PC hardware_david_uses.html#quiet_pc

There are many levels to low-power design. To get a low-power web server (top level) requires software that doesn't waste power, a low-power CPU, low-power clocking, and low-power circuits (including low-power oscillator). [FIXME: the last section is over at schematic.html; should I break up the other levels into their own sections ?]

see also:

external links:

top N books on CPU development

[FIXME: move to Wikibooks: Microprocessor Design/Resources

historical computer architectures (computer museums)

Here I point to a few museums that list lots and lots of computer architectures. [FIXME: delete all the redundant stuff on this page that can be found easier at these ``museums''].

Some of these architectures can be simulated on FPGAs .

Also see robot_links.html#ucontrollers for information about a few of the most popular architectures currently in production.

dealing with interrupts

dealing with interrupts: some ideas about interrupt-time code. [FIXME: make a clean seperation between peripheral interrupts here (which the CPU can safely ignore for one or two cycles), and MMU traps f_mmu.html which must block WRITE instructions before they overwrite ``protected'' memory. ] [FIXME: should I seperate out things that are only useful to CPU designers, vs. things that are only useful to CPU users ?]


(moved to forth.html )


benchmarks for computer architecture. CPU design goals. benchmarking.

While faster is better, all else being equal, many times all else is not equal. Speed is becoming less important in computer design (many tasks can already be completed ``fast enough''). Power usage, reliability, ease-of-use, and other factors are becoming more important. We must make sure to test for the things we really want creed.html#really_want , and we don't need to test for things that are irrelevant.

What do we really want ? Here are a few of the design goals for various systems. Naturally, if your particular goals heavily emphasize one of these, it will require a different architecture than one that emphasizes a different goal.

CPU design goals: (see computer_architecture.html#CPU_goals )(2003-01-10)

Over at data_compression.html#benchmark I list benchmark test images for image compression.

assembling a quiet PC hardware_david_uses.html#quiet_pc

One might think that comparing 2 pieces of software ``which one compresses my files smaller ?'', vs. comparing 2 CPUs ``which one executes my code faster / better / with less power'' are completely unrelated. But there's one tiny bit of overlap: data_compression.html#program_compression indicates that while dealing with compressed data and compressed executable code is often slower than uncompressed, sometimes there's a synergy -- if we compress it enough (using techniques I talk about there) so it all fits in a higher-level cache, then our programs run faster (what I'm talking about here).

Over at bignums.html#cpu and at robot_links.html#ucontrollers and at vlsi.html#cRAM I list information about some historical CPUs. [FIXME: should I merge them ?]

Why are there not more ``reliability'' benchmarks ?

Other pages about lots of CPUs, reviews, compare and contrast, benchmarks, etc.

hardware compilers

a few selected CPUs

A few unusual, interesting CPUs.



I'm not quite sure why has mirrored this computer architecture page ...

Part of the

Previous Utopia Webring Next
Random Choice

This page started 1998-04-21 and has backlinks

known by AltaVista

known by Yahoo!

known by infoseek

Send comments, suggestions, errors, bug reports to

David Cary

Return to index // end