Computer Architecture

updated 2002-12-21.

Some notes on Computer Architecture. Very incomplete.

Contents:

[FIXME: should I put links to Beowulf here ?]

David also maintains related files:

``I wisdom dwell with prudence, and find out knowledge of witty inventions.'' -- Proverbs 8:12

news

computer architecture news

The latest computer architecture information is at

computer architecture comic strips

The Freedom CPU Project

DLX

is one of the architectures suggested for the F-CPU. Its major advantage is: gcc has already been ported to it, so we don't need to write a new compiler on top of everything else. So we can immediately compile the Linux kernel and other code for this chip *now*, before we even start doing anything else, which should make it easy to cache analysis and other trade-off analysis. Also, it means that when the hardware is turned on for the first time, it could boot directly into Linux, which would be cool.

(Can we really go to 64 bit architecture and still use this compiler ?)

(Can we really use a TTA architecture for the attached FPUs ?)

MMIX

is one of the architectures suggested for the F-CPU.

TTA

is one of the architectures suggested for the F-CPU.

Using FPGAs to simulate CPUs

(only with FPGAs can you have reconfigurable computing)

see also robot_links.html#PLD Programmable Logic (FPGA, PLD, CPLD, etc.) and the devices needed to program them.

see also simple_cpu for simple CPU architectures that one would think would be simple to implement in a FPGA.

see also vlsi.html#PCI_on_FPGA for FPGA devices compatible with the PCI bus.


[backup copy of something I wrote on http://electronicschat.org/echatwiki/HomebuiltCpu ]

I imagine that just about anyone who has programmed 80x86 assembly language has dreamed about building their own CPU with their own assembly language.

If you've dreamed about building your own CPU that runs your own assembly language, today is a wonderful time to be living. There are many, many ways to fulfull your vision. (A few of them are even useful). -- DavidCary ( http://david.carybros.com/html/computer_architecture.html#simple_cpu )

(Note: this page is *not* about building 80x86-based desktop computers. That sort of thing is discussed over at http://en.wikibooks.org/wiki/How_To_Build_A_Computer . Building a custom 80x86-based laptop -- I don't know.).

* alt.comp.hardware.homebuilt FAQ by Mark Sokos http://www.faqs.org/faqs/homebuilt-comp-FAQ/index.html

== useful restrictions ==

=== tiny size ===

Sometimes you want a tiny little processor -- say, you want it to fit inside a model rocket.

Imagine a fully custom tiny little processor, running exactly the instruction set you've always dreamed of. How is that different from a tiny PIC or Atmel processor, programmed to *emulate* your instruction set?

[http://www.taniwha.com/~paul/fc/ Taniwha Flight Computer Home Page]

[http://img.cmpnet.com/edtn/ccellar/e023pdf1.pdf "Picaro: A Stamp-like Interpreted Controller"] article by Tom Napier 1998-04 in _Circuit Cellar Ink_

I'm thinking of building one of these with an external *serial* memory (much smaller and cheaper than the normal parallel memory ... slow, but probably plenty fast enough for what I want it to do). Unlike most processors, this way you can execute code in serial memory. (I think this is also going to give the lowest-cost and also the lowest-power way to get your own custom instruction set).

=== higher speed ===

You can build a "processor" that performs some kinds of special-purpose tasks faster than Intel's latest device. Use FPGAs.

You can build a CPU that is "instant on" and "instant off" (use FLASH, and enough capacitance to dump the essential bits of the current state of RAM to FLASH).

=== extreme environments ===

Can you get it working over the entire "industrial temperature range"? ( -40 °C to +125 °C )

== challenging, educational, "fun" restrictions ==

=== built with "classic" TTL chips only ===

Generally built around a 74LS181 ALU or similar ( 74HC181 ). (Several homebuilt CPUs have been built like this)

=== bizzareness ===

Since we're building thing by scratch, why count in the "natural binary code"

 000
 001
 010
 011
 100
 ...
?

Couldn't we increment address registers in some *other* sequence ?

What other "interesting" sequences would be interesting to experiment with ?

Is there some sequence that uses *less* hardware (easier to build) -- perhaps LFSR ? Is there some sequence that uses less power (runs longer on batteries) -- perhaps gray code ?

=== all one part ===

As many of you are aware, most projects are built from a multitude of different parts. There's always one part that takes the longest time to ship in. And then I always break something, and I have to wait even longer for the replacement to ship in.

However, computers have been built out of large numbers of the *same* device wired together appropriately.

* transistors * [http://en.wikipedia.org/wiki/Apollo_Guidance_Computer 4,100 ICs, each containing a single 3-input NOR logic gate.] * dual 3-input NOR gate ICs

I (DavidCary) have been wondering:

* given what we now know about "simple" RISC and zero-operand instruction sets, could I design a CPU with significantly *less* than 4,100 NOR gates ? * What other "universal" chips can be used (in large enough quantities) to build an entire CPU ?

I want to use chips that are readily available, and also "dense". (Obviously, the densest chips are the all-in-one CPU microcontrollers ... but I can't customize those. What other points on the spectrum are available?)

Let's focus on a 8 bit register (8 D flip-flops) for a moment. I imagine I'll use a bunch of them in my CPU. (program counter, registers, address pointers, etc.)

 chips/bit;  chips/ 8 bit register; chips/3-input-NOR

 '''universal chips'''
 5(?) 40 1 single NOR
 3(?) 24 1/2  dual NOR
 2(?) 16 1/3 triple NOR 74HC27
 1    8  1  dual 4-input mux 74HC153
 1/2  4  3(?) quad 2-input mux inverting 74HC158

 '''non-universal chips'''
 1/2  4  N dual D flip-flop 74HC74
 1/4  2  N quad D flip-flip 74HC173
 1/6  2  N hex D flip-flop 74HC174
 1/8  1  N octal D flip-flop 74HC564

Obviously, anything that could be built from single NOR gate ICs, could also be built from the much "denser" dual NOR gate ICs, and it would take significantly less space, time, effort, weight, etc.

It looks like the octal D flip-flop is the densest chip. Unfortunately, it is *not* "universal" -- parts of the CPU (the ALU, etc.) need to act in ways that I don't think D flip-flops can act. So if I want to stick to the "all one part" idea, I can't use it.

It looks like it takes three '158 chips to emulate a simple 3-input NOR, but only one '153 chip. So I suspect the '153 is better for building the random logic in the control section and the ALU.

* What other "universal" chips are there ? * Which of the universal chips can implement a given CPU in the fewest number of chips ? (If the CPU is dominated by registers, I suspect it will be one that can store the most bits in the fewest number of chips -- the '158 is the best I've found so far).

=== all 2 parts ===

Similar to the above, but easing the restriction to allow 2 different kinds of chips.

* If I use 2 different kinds of chips, which 2 chips can implement a given CPU in the fewest number of chips ? (The '564 is 4 times as dense as the '158 for storing bits ... and the '153 looks like it will be more dense than the '158 for most random control logic.)


Extremely Simple CPU Architectures

see also "Tiny robots" robot_links.html#tiny and http://electronicschat.org/index.cgi/HomebuiltCpu

I am fascinated by CPU designs that are extremely simple, approaching "minimal", in several (sometimes incompatible) senses of the word "simple".

Here "CPU", "MCU", and "PE" are almost interchangeable.

Often these characteristics exhibit synergy -- when you eliminate some opcodes, that eliminates some gates (making it lower-cost) and makes it easier to understand (less documentation required). Occasionally, though, driving a design to extreme simplicity according to one measure causes extra complexity in another area to compensate. This is called the the Turing tarpit #tarpit .

For purposes of robotics, "low-cost" and "Simple interface" are usually the dominant considerations. Some of the concepts of massively parallel processing are also present when one builds swarms of simple robots, but the kind of random communication between constantly-changing arrangements of simple robots is very different than the communication between the rigidly connected (most commonly in a 2 D mesh or a hypercube) elements of common cellular automata and multi-processors.

See also Using FPGAs to simulate CPUs #FPGA for some very simple CPUs designed to fit onto FPGAs. 2 very different reasons for simplicity there: (a) a simpler CPU can fit onto a smaller FPGA, making it much less expensive. (b) making the CPU smaller allows you to fit more copies of the CPU on a given FPGA, increasing MIPs at no cost. [FIXME: should combine these into 1 section ? Since the same op-code set may be implemented many ways, in TTL, in FPGA, in custom VLSI, it doesn't really make sense to split them into seperate sections ... On the other hand, ``less hardware'' means something a little different in these 3 technologies. ]

[here I *list* simple CPUs I know about; Opcode considerations #considerations and #FPGA also talk about tips for designing new CPU architectures. ]

... Here I also list all the CPUs I know about that were built up out of TTL.

cellular automata

cellular automata is related to other interests I have:

[FIXME: CA stuff scattered elsewhere ...]

The cellular automata questions I'm most interested in are:

There's a little loop here -- first we use (simulated) cellular automata to learn about replication, then we use that information to design replicating tools. We also use (simulated) cellular automata to explore good rules and good patterns for computronium. Then we use those replicating tools to (build enough copies of themselves to) build computronium. Then we use *that* computronium (hardware cellular automata) as a better computer.

self-replication

It has been shown that self-replicators can be designed in cellular automata #cellular_automata . some of this deals with ``real'', physical replication. While other parts -- cellular automata, quines, etc -- deal only with patterns in a computer. Should I separate them ? But some of the theory applies to both.

related to reconfigurable robots robot_links.html#reconfigurable and robot construction (humans building robots) robot_links.html#construction and tool closure 3d_design.html#closure . and the bootstrap problem [FIXME:] [FIXME: cross-link all the self-replication stuff on my web pages. nano, robot, cellular automata, etc. Point back and forth from ``replication'' section to computer architecture # cellular automata robots nanotech idea_space and unknowns ]

Minimum Instruction Set Computing (MISC)

see #simple_cpu and minimal_instruction_set.html

other attempts at a minimal instruction set

Steamer16: a high-performance homebrewer's microprocessor http://www3.sympatico.ca/myron.plichota/steamer1.htm by Myron Plichota <myron.plichota at sympatico.ca>. written in VHDL and prototyped on a single wire-wrapped protoboard. Has only 7 instructions. Packets of 5 instructions (5 instructions * 3 bits = 15 bits/packet) execute in 6 cycles/packet at a cycle rate of 20 MHz. Both the address and data busses are 16 bits. Unusual ``ArF'' call/return protocol.

qUark ../mirror/quark.txt a viable stack-computer with 4-bit opcodes (c) vic plichota, original concept by Myron Plichota Dec '98.

------------------------------
Date: Sat, 20 Feb 1999 12:54:34 +0300
From: "Stas Pereverzev" 
To: MISC
Subject: Re: nFORTH v2.3
Content-Type: text/plain;
	charset="koi8-r"
Content-Transfer-Encoding: 7bit

>Comments, folks?