updated 2002-12-21.
Some notes on Computer Architecture. Very incomplete.
Contents:
[FIXME: should I put links to Beowulf here ?]
David also maintains related files:
``I wisdom dwell with prudence, and find out knowledge of witty inventions.'' -- Proverbs 8:12
computer architecture news
The latest computer architecture information is at
news:comp.arch.hobbyist mirror http://groups.google.com/groups?group=comp.arch.hobbyist and
news:comp.arch.embedded mirror http://groups.google.com/groups?group=comp.arch.embedded
The LEON core is a SPARC* compatible integer unit developed for future space missions. It has been implemented as a highly configurable, synthesisable VHDL model. To promote the SPARC standard and enable development of system-on-a-chip (SOC) devices using SPARC cores, the European Space Agency is making the full source code freely available under the GNU LGPL license.
The LEON core has been extensively tested against the SPARC V8 architecture manual and the IEEE-P1754 (SPARC) standard, but have not been formally tested and certified by SPARC international as being SPARC V8 compliant. ...
DO YOU WANT TO HELP? If you wish to contribute to LEON and work on (or donate) any of these modules, please Jiri Gaisler.
X-URL: http://www.egroups.com/list/f-cpu/ Date: Tue, 12 Jan 1999 21:47:57 -0500 (EST) From: Robert Dale <rob at nb.net> To: F-CPU <f-cpu at egroups.com> Subject: [f-cpu] CVS server We have anonymous (read-only) CVS access to AlphaRISC's TTA sim he kindly wrote. $ export CVSROOT=:pserver:anonymous@www.deepfreeze.org:/home/src/f-cpu $ cvs login (Logging in to anonymous@www.deepfreeze.org) CVS password: <hit Enter> $ cvs co f-cpu If you need help and/or want write access, feel free to email me. We'll probably put some help documentation on the website. ;) -- Robert Dale
is one of the architectures suggested for the F-CPU. Its major advantage is: gcc has already been ported to it, so we don't need to write a new compiler on top of everything else. So we can immediately compile the Linux kernel and other code for this chip *now*, before we even start doing anything else, which should make it easy to cache analysis and other trade-off analysis. Also, it means that when the hardware is turned on for the first time, it could boot directly into Linux, which would be cool.
(Can we really go to 64 bit architecture and still use this compiler ?)
(Can we really use a TTA architecture for the attached FPUs ?)
is one of the architectures suggested for the F-CPU.
// $X = ($Y ^ rM) v ($Z|Z ^ ~rM). MUX $X, $Y, $Z|Z // sequence equivalent to MUX $X, $Y, $Z|Z AND $t1, $Y, $rM ORN $t2, $rM, $Z|Z ORN $X, $t1, $t2. // sequence equivalent to MUX $X, $Y, $Z|Z NAND $t1, $Y, $rM ORN $t2, $rM, $Z|Z NAND $X, $t1, $t2. // special peephole optimization for MUX $X, $Y, 0 AND $X, $Y, $rM // special peephole optimization for MUX $X, $Y, $Z // where $Z contains all ones: ORN $X, $Y, $rM
// simulate MXOR $X, $Y, $Z|Z NOT $znot, $Z|Z NOT $ynot, $y MOR $t3, $y, $znot MOR $t4, $ynot, $Z|Z AND $X, $t2, $t4)
NOT $X, $Z|Z // set $X = ~$Z, i.e., bitcomplement; i.e., flip all the bitscan often be merged with previous or following bit instructions by replacing AND, OR with NAND, NOR, ANDN, ORN. When it can't, the assembler can use any of these equivalent instructions:
// When Z is a register, either one of these instructions // bitcomplements it. NOR $X, $Z, 0 ANDN $X, $Z, 0 // When Z is a literal in the range -1 <= Z < #FF, // we can complement it in one instruction // at compile time. // if Z is outside this ranges, // just load ~Z directly into a register. SUB $X, zero, (1+Z) // sets $X = ~Z for -1 <= Z <= #FE. ANDN $X, zero, Z // sets $X = ~Z for 0 <= Z <= #FF.
is one of the architectures suggested for the F-CPU.
(only with FPGAs can you have reconfigurable computing)
see also robot_links.html#PLD Programmable Logic (FPGA, PLD, CPLD, etc.) and the devices needed to program them.
see also simple_cpu for simple CPU architectures that one would think would be simple to implement in a FPGA.
see also vlsi.html#PCI_on_FPGA for FPGA devices compatible with the PCI bus.
I imagine that just about anyone who has programmed 80x86 assembly language has dreamed about building their own CPU with their own assembly language.
If you've dreamed about building your own CPU that runs your own assembly language, today is a wonderful time to be living. There are many, many ways to fulfull your vision. (A few of them are even useful). -- DavidCary ( http://david.carybros.com/html/computer_architecture.html#simple_cpu )
(Note: this page is *not* about building 80x86-based desktop computers. That sort of thing is discussed over at http://en.wikibooks.org/wiki/How_To_Build_A_Computer . Building a custom 80x86-based laptop -- I don't know.).
* alt.comp.hardware.homebuilt FAQ by Mark Sokos http://www.faqs.org/faqs/homebuilt-comp-FAQ/index.html
== useful restrictions ==
=== tiny size ===
Sometimes you want a tiny little processor -- say, you want it to fit inside a model rocket.
Imagine a fully custom tiny little processor, running exactly the instruction set you've always dreamed of. How is that different from a tiny PIC or Atmel processor, programmed to *emulate* your instruction set?
[http://www.taniwha.com/~paul/fc/ Taniwha Flight Computer Home Page]
[http://img.cmpnet.com/edtn/ccellar/e023pdf1.pdf "Picaro: A Stamp-like Interpreted Controller"] article by Tom Napier 1998-04 in _Circuit Cellar Ink_
I'm thinking of building one of these with an external *serial* memory (much smaller and cheaper than the normal parallel memory ... slow, but probably plenty fast enough for what I want it to do). Unlike most processors, this way you can execute code in serial memory. (I think this is also going to give the lowest-cost and also the lowest-power way to get your own custom instruction set).
=== higher speed ===
You can build a "processor" that performs some kinds of special-purpose tasks faster than Intel's latest device. Use FPGAs.
You can build a CPU that is "instant on" and "instant off" (use FLASH, and enough capacitance to dump the essential bits of the current state of RAM to FLASH).
=== extreme environments ===
Can you get it working over the entire "industrial temperature range"? ( -40 °C to +125 °C )
== challenging, educational, "fun" restrictions ==
=== built with "classic" TTL chips only ===
Generally built around a 74LS181 ALU or similar ( 74HC181 ). (Several homebuilt CPUs have been built like this)
=== bizzareness ===
Since we're building thing by scratch, why count in the "natural binary code"
000 001 010 011 100 ...?
Couldn't we increment address registers in some *other* sequence ?
What other "interesting" sequences would be interesting to experiment with ?
Is there some sequence that uses *less* hardware (easier to build) -- perhaps LFSR ? Is there some sequence that uses less power (runs longer on batteries) -- perhaps gray code ?
=== all one part ===
As many of you are aware, most projects are built from a multitude of different parts. There's always one part that takes the longest time to ship in. And then I always break something, and I have to wait even longer for the replacement to ship in.
However, computers have been built out of large numbers of the *same* device wired together appropriately.
* transistors * [http://en.wikipedia.org/wiki/Apollo_Guidance_Computer 4,100 ICs, each containing a single 3-input NOR logic gate.] * dual 3-input NOR gate ICs
I (DavidCary) have been wondering:
* given what we now know about "simple" RISC and zero-operand instruction sets, could I design a CPU with significantly *less* than 4,100 NOR gates ? * What other "universal" chips can be used (in large enough quantities) to build an entire CPU ?
I want to use chips that are readily available, and also "dense". (Obviously, the densest chips are the all-in-one CPU microcontrollers ... but I can't customize those. What other points on the spectrum are available?)
Let's focus on a 8 bit register (8 D flip-flops) for a moment. I imagine I'll use a bunch of them in my CPU. (program counter, registers, address pointers, etc.)
chips/bit; chips/ 8 bit register; chips/3-input-NOR '''universal chips''' 5(?) 40 1 single NOR 3(?) 24 1/2 dual NOR 2(?) 16 1/3 triple NOR 74HC27 1 8 1 dual 4-input mux 74HC153 1/2 4 3(?) quad 2-input mux inverting 74HC158 '''non-universal chips''' 1/2 4 N dual D flip-flop 74HC74 1/4 2 N quad D flip-flip 74HC173 1/6 2 N hex D flip-flop 74HC174 1/8 1 N octal D flip-flop 74HC564
Obviously, anything that could be built from single NOR gate ICs, could also be built from the much "denser" dual NOR gate ICs, and it would take significantly less space, time, effort, weight, etc.
It looks like the octal D flip-flop is the densest chip. Unfortunately, it is *not* "universal" -- parts of the CPU (the ALU, etc.) need to act in ways that I don't think D flip-flops can act. So if I want to stick to the "all one part" idea, I can't use it.
It looks like it takes three '158 chips to emulate a simple 3-input NOR, but only one '153 chip. So I suspect the '153 is better for building the random logic in the control section and the ALU.
* What other "universal" chips are there ? * Which of the universal chips can implement a given CPU in the fewest number of chips ? (If the CPU is dominated by registers, I suspect it will be one that can store the most bits in the fewest number of chips -- the '158 is the best I've found so far).
=== all 2 parts ===
Similar to the above, but easing the restriction to allow 2 different kinds of chips.
* If I use 2 different kinds of chips, which 2 chips can implement a given CPU in the fewest number of chips ? (The '564 is 4 times as dense as the '158 for storing bits ... and the '153 looks like it will be more dense than the '158 for most random control logic.)
-- Jan Gray http://www3.sympatico.ca/jsgray/home1.txtIt is amazing what you can squeeze onto these parts if you design the machine architecture carefully to exploit FPGA resources. In contrast, there was a very interesting article in a recent EE Times by a fellow from VAutomation doing virtual 6502's ... Although the 6502 design used only about 4000 "ASIC gates" it didn't quite fit in a XC4010, a so- called "10,000 gate" FPGA. That a dual-issue 32-bit RISC should fit, and a 4 MHz 6502 does not, states a great deal about VHDL synthesis vs. manual placement, about legacy architectures vs. custom ones, and maybe even something about CISC vs. RISC...
"according to Dataquest, the business of embedding CPUs into other chips is growing at about 25% per year and stood at about $20 billion in 2000. ...
The average is 2.3 processors per intelligent ASIC (those with any programmability) and rising. ...
... in the free category ... processor designs from CMOSexod and Free-IP cores. Both designs have little 8-bit processors you’ve never heard of but come with free tools to get you started. The Free-IP Project ... OpenCores ...
Elixent combines configurable processors with field-programmable logic to create a constantly changing processor. Elixent's tool switches implementations on the fly based on your workload. ...
Elixent enters the world of dynamic reconfigurability, or reconfigurable computing (RC). RC ... change the hardware on the fly to suit the application. ...
... It stands to reason that whenever one part of the circuit is working the other parts must be idle. ... Hardware systems have always been created from a superset of all the functions required over the life of the product, rolled into one.
If RC takes off, that might change. For example, future generations of cell phones may use half the transistors to do 10 times the work. Fewer transistors may be all that’s needed, if none are wasted on functions that aren’t required right now.
... Jim Turley ... visit his web site at www.jimturley.com. "
mentions these companies [FIXME: add to my other list of companies?] : VAutomation ARC International http://www.vautomation.com ... CMOSexod http://www.cmosexod.com/freeip.htm ... Elixent Ltd. http://www.elixent.com ... Lexra Inc. http://www.lexra.com ... OpenCores http://www.opencores.org/projects ... ARCtangent, ARChitect ARC International http://www.arccores.com ... Handel-C Celoxica http://www.celoxica.com ... LEON-1 Distributed by The European Space Agency http://www.estec.esa.nl/wsmwww/leon/ ... ST210 Agilent Technologies http://we.home.agilent.com ... STMicroelectronics http://us.st.com ... Jazz processor Improv Systems, Inc. http://www.improvsys.com ... Xtensa Tensilica http://www.tensilica.com ... The Free-IP Project http://www.free-ip.com with XESS, Corp. http://www.xess.com ...
... the FPGA High Performance Computing Alliance (FHPCA) ... The FHPCA aims to advance the development and adoption of FPGA-based high-performance computing through collaboration on the design and development of world's most powerful FPGA-based supercomputer using COTS (Commercial Off The Shelf) technology. ... provide opportunities for education and training ... facilitate technology transfer, including assistance with porting conventional supercomputing applications to an FPGA-based supercomputer; and promote research in the application of FPGA-based high-performance computing by making facilities available to visiting academics. ... "Nallatech ... We have systems installed in applications as diverse as genomic processing and military radar. ..." said Allan Cantle, CEO of Nallatech. "FPGA computing is today where conventional microprocessor-based computing was 15 years ago. The potential exists to deliver unprecedented computational capacity, using less power in a smaller space, however to unleash that potential the industry needs to develop the means to give users low risk access to that power." ... his white paper, "FPGA Centric Computing Institute" ... in October 2003.
http://www.microswiss.ch/tld/2004/abstracts.html ; http://www.microswiss.ch/tld/2004/papers/Brueckner.pdfAccording to DataQuest, 17% of all FPGA design starts in 2003 included an embedded microprocessor. By 2007, estimates predict embedded microprocessor utilization in FPGAs will grow to 37%. ...
The b16 Processor is a minimalistic stack processor, inspired by Chuck Moore's recent work. The original incarnation is 16 bit, byte addressed. There are 32 instructions, each is 5 bits. Three and a bit (literally!) are packed into a 16 bit word (bundle). The first slot in the bundle can only be a nop or a call.
-- http://b16-cpu.de/ (designed in Verilog)
DAV:
As CPUs get relatively faster than RAM, preload becomes more important.
After deciding on an address and sending it to RAM,
it takes several CPU cycles before the RAM is able to reply with the data.
If the very next instruction requires that data,
then the CPU has to wait.
But if the instruction set is designed so that
the programmer can designate an address ahead of time,
this allows the programmer to squeeze in a few ``internal'' instructions
instead of pausing.
(This is very similar to the "branch delay slot" on some RISC processors).
Early (asynchronous) RAM chips allowed you to present the data at the same time as the address
when you wanted to write to the RAM chip. This *seems* faster
than presenting the address, then later presenting the data, in isolation.
But in the context of alternating reads and writes,
where naturally the data coming from the RAM only arrives a memory cycle *after*
the RAM is presented with an address,
it turns out to be faster to always delay the data (whichever direction it is going)
for writes as well. This leads to synchronous RAM ( in particular, SDRAM --
but SRAM stands for something different: static RAM).
DAV: does simply A! help, or do we need to also indicate whether it will be a load or a store ?
There are several improvements that Chuck has added to his newer Forth virtual machine model.
One of them is the address register and the other is the circular stacks.
Chuck has explained that hardware considerations aside,
the idea of the address register was that the Forth words @ and ! ( fetch and store)
were clumsy
at the top of the stack and
were based on smaller atomic operations that the programmer could take advantage of.
@ was broken into two operations A! and @A. Likewise ! was broken into A! and !A.
...
advantage is the use of auto-increment addressing memory access opcodes
...
--
Jeff Fox, talking about Chuck Moore
http://www.ultratechnology.com/forth3.htm
[FIXME: Why isn't this listed on that page ? ]
[FIXME: get this book _HDL Chip Design_ book by Douglas J. Smith, 1998, Doone Publications, ISBN 0-9651934-3-8 ( Computer Literacy , $65+tax). ]... I strongly recommend the book HDL Chip Design ...
Prentice-Hall has published a Xilinx FPGA lab book and win95/NT software which includes a coupon for an XESS FPGA board. Adding the cost of a power supply, you can be doing experiments for less than $250. ...
Only 3 instructions:
br adr Load adr into PC if F=0 Clear F add adr1,adr2 Add the contents of memory location adr1 and the contents of memory location adr2 and store the result in memory location adr2. Store the carry of the addition in F. nor adr1,adr2 Calculate the NOR function of the contents of memory location adr1 and the contents of memory location adr2 and store the result in memory location adr2. F=1 if result =0 , else F=0.and only 1 user-accessible register (PC) and only 1 user-accessible flag (F).
DAV: Unfortunately, self-modifying code is required for subroutine call/return. This makes re-entrant code impossible.
DAV: some simple macros by DAV:
; These macros require 3 special locations: ; ZERO: a location reserved to handle the value 0. ; ONES: a location reserved to handle the value with every bit set (sometimes called -1 or ~0). ; Set them up with ; really_clear ZERO ; really_mov -1, ONES ; . ;;;;;;;;; single-instruction macros (aliases) not a ; invert each bit in a nor a,a ; or alternatively nor ZERO, a ; clear a ; clear all bits in a to zero. Side effect: sets F1 nor ONES, a ; mov 0, a ; clear a ;;;;;;;;; multi-instruction macros (perhaps should be subroutines) really_clear a; -- this is used only to set up location ONES and ZERO; add a,a add a,a add a,a ... (repeated for 16 bits, or whatever the wordsize is) add a,a really_mov -1, a; -- this is used only to set up location ONES; really_clear a not a mov -1, a ; set location a to the value -1 clear a not a mov -2, a ; set location a to the value -2 mov -1, a add a,a mov 1, a ; set location a to the value 1 -- this is used to set up location ONE mov -2, a not a ; this expands to the 4 instructions clear a not a add a,a not a rotl a ; rotate a left, moving MSB to LSB ; side effect: always clears F. add a,a br continue: add ONE, a continue: rotlc a ; move F into LSB of a, shifting all bits left, leaving old MSB in F br F_was_0: add a,a br F_was_1_high_bit_was_0 ; ; handle case of ``F was 1, high bit was 1'' add ONE, a br set_F_and_continue ; execution never gets here F_was_1_high_bit_was_0: add ONE, a br continue ; execution never gets here F_was_0: add a,a ; the next 3 instruction seem redundant in the F_was_zero case, ; but they are necessary to handle the F_was_1 case. br continue set_F_and_continue: ; this must fall through to the end, because branches always clear F nor ONES, ZERO ; continue: mov a, b ; set location b to the value in location a. WARNING: doesn't work when a=b. clear b add a,b jump addr ; unconditionally jump to given address (side effect: clears F) ; isochronous code should use this: add ZERO, ZERO br addr ; alternatively, one instruction quicker when F=0: br addr br addr
lots of details -- the complete schematic diagram of the CPU, a pretty photograph of the test board (FPGA + RAM + ROM + some LEDs to simulate a traffic light) ... some assembly-language code ...
Date: Thu, 11 Jun 1998 05:32:23 +0200 (MET DST)
To: David Cary <d.cary at ieee.org>
From: Majordomo at lslsun.epfl.ch
Subject: Welcome to nlc
--
Welcome to the nlc mailing list!
If you ever want to remove yourself from this mailing list,
you can send mail to "Majordomo@lslsun.epfl.ch" with the following command
in the body of your email message:
unsubscribe nlc David Cary <d.cary@ieee.org>
Here's the general information for the list you've
subscribed to, in case you don't already have it:
This is a mailing list for discussing the 'C -> netlist' compiler, or 'nlc'
for short.
To send mail messages to the nlc mailing list, send your message to:
nlc@lslsun.epfl.ch
I have made nlc, the C to netlist compiler, available for anonymous
ftp on lslsun5.epfl.ch (128.178.150.25) in /pub/nlc-0.9.tar.gz.
It is compressed using gzip (available at a GNU archive site near you).
A compiled binary version for Sparc Solaris 2.x is in the file
/pub/nlc-0.9.bin.gz.
If you have any trouble getting the file, let me know. I can probably
mail it to you.
For those that are interested in the Spyder project, you will find
below references to already published work.
Please let me know if you have trouble locating these papers, and I'll
see what I can do.
Have fun and please keep in touch,
Christian
---
@Article{key,
author = "Christian Iseli and Eduardo Sanchez",
title = "Spyder: A {SURE} ({SU}perscalar and
{RE}configurable) Processor",
journal = "The Journal of Supercomputing",
year = 1995,
volume = 9,
number = 3,
pages = "231-252"
}
@InProceedings{key0,
author = "Christian Iseli and Eduardo Sanchez",
title = "A Superscalar and Reconfigurable Processor",
volume = 849,
series = "Lecture Notes in Computer Science",
pages = "168--174",
booktitle = "Field-Programmable Logic Architectures, Synthesis
and Applications",
year = 1994,
publisher = "Springer-Verlag",
address = "Prague, Czech Republic",
month = "September"
}
@inproceedings{key1,
author = "Christian Iseli and Eduardo Sanchez",
title = "{Beyond Superscalar Using FPGAs}",
booktitle = "IEEE International Conference on Computer Design",
address = "Cambridge Mass.",
month = "October",
year = 1993
}
@inproceedings{key2,
author = "Christian Iseli and Eduardo Sanchez",
title = "{Spyder: A reconfigurable VLIW processor using FPGAs}",
booktitle = "IEEE Workshop on FPGAs for Custom Computing Machines",
address = "Napa",
month = "April",
year = 1993
}
@InProceedings{key3,
author = "Christian Iseli and Eduardo Sanchez",
title = "Augmentation du parall\'{e}lisme par la reconfigurabilit\'{e}",
editor = "L. Boug\'{e} and M. Cosnard and P. Fraigniaud",
pages = "3--6",
booktitle = "Actes des $6^{\`{e}me}$ Rencontres Francophones du
Parall\'{e}lisme, RenPar'6",
year = 1994,
organization = "\'{E}cole normale sup\'{e}rieure",
address = "Lyon",
month = "June"
}
@InProceedings{key4,
author = "Serge Durand and Christian Iseli",
title = "Developing a reconfigurable coprocessor on an {SBus} board",
pages = "5--9",
booktitle = "Sun User Group 1993 Proceedings",
year = 1993,
address = "San Jose, California",
month = "December"
}
@inproceedings{key5,
author = "Christian Iseli and Eduardo Sanchez",
title = "{A \Cplusplus{} compiler for FPGA custom execution units synthesis}",
pages = "173--179",
booktitle = "IEEE Symposium on FPGAs for Custom Computing Machines",
address = "Napa, CA",
month = "April",
year = 1995
}
---
Here are some more info about Brent Chapman's "Majordomo" mailing list manager.
In the description below items contained in []'s are optional. When
providing the item, do not include the []'s around it.
Majordomo understands the following commands:
subscribe <list> [<address>]
Subscribe yourself (or <address> if specified) to the named <list>.
(You probably have done that already if you are receiving this message).
unsubscribe <list> [<address>]
Unsubscribe yourself (or <address> if specified) from the named <list>.
get <list> <filename>
Get a file related to <list>.
index <list>
Return an index of files you can "get" for <list>.
which [<address>]
Find out which lists you (or <address> if specified) are on.
who <list>
Find out who is on the named <list>.
info <list>
Retrieve the general introductory information for the named <list>.
lists
Show the lists served by this Majordomo server.
help
Retrieve this message.
end
Stop processing commands (useful if your mailer adds a signature).
Commands should be sent in the body of an email message to
majordomo@lslsun.epfl.ch
Commands in the "Subject:" line NOT processed.
If you have any questions or problems, please contact
majordomo-owner@lslsun.epfl.ch
From: Christian Iseli <chris@lslsun.epfl.ch>
Date: Wed, 12 Aug 1998 13:53:07 +0200 (MET DST)
To: nlc@lslsun.epfl.ch
Subject: Re: Is this list dead ?
Sender: owner-nlc@lslsun.epfl.ch
Precedence: bulk
>I heard this mailing list talked about converting (mathematically
>intensive) C programs to run at high speed on FPGAs.
>
>I thought I would lurk and learn, but I haven't gotten any messages since I
>got the "Welcome to nlc" message on 1998-06-11.
>
>Is this list dead ?
Not really, but it was never very lively either... ;-)
As it happens, I'm now done with my PhD thesis and since I had no means of
pursuing nlc further, I now have another job and have little time to actively
develop nlc.
The PhD thesis report is available at:
http://lslwww.epfl.ch/pages/publications/rcnt_theses/home.html
The latest nlc source code is available at:
ftp://lslwww.epfl.ch/pub/
HTH. Cheers,
Christian
The exact opposite thing is also being developed:
vhdl2c "vhdl2c is a vhdl to `C' converter by Michael Knieser"
see also "Tiny robots" robot_links.html#tiny and http://electronicschat.org/index.cgi/HomebuiltCpu
I am fascinated by CPU designs that are extremely simple, approaching "minimal", in several (sometimes incompatible) senses of the word "simple".
Here "CPU", "MCU", and "PE" are almost interchangeable.
Often these characteristics exhibit synergy -- when you eliminate some opcodes, that eliminates some gates (making it lower-cost) and makes it easier to understand (less documentation required). Occasionally, though, driving a design to extreme simplicity according to one measure causes extra complexity in another area to compensate. This is called the the Turing tarpit #tarpit .
For purposes of robotics, "low-cost" and "Simple interface" are usually the dominant considerations. Some of the concepts of massively parallel processing are also present when one builds swarms of simple robots, but the kind of random communication between constantly-changing arrangements of simple robots is very different than the communication between the rigidly connected (most commonly in a 2 D mesh or a hypercube) elements of common cellular automata and multi-processors.
See also Using FPGAs to simulate CPUs #FPGA for some very simple CPUs designed to fit onto FPGAs. 2 very different reasons for simplicity there: (a) a simpler CPU can fit onto a smaller FPGA, making it much less expensive. (b) making the CPU smaller allows you to fit more copies of the CPU on a given FPGA, increasing MIPs at no cost. [FIXME: should combine these into 1 section ? Since the same op-code set may be implemented many ways, in TTL, in FPGA, in custom VLSI, it doesn't really make sense to split them into seperate sections ... On the other hand, ``less hardware'' means something a little different in these 3 technologies. ]
[here I *list* simple CPUs I know about; Opcode considerations #considerations and #FPGA also talk about tips for designing new CPU architectures. ]
... Here I also list all the CPUs I know about that were built up out of TTL.
...
The Block I version used 4,100 ICs, each containing a single 3-input NOR gate. The later Block II version used dual 3-input NOR gates in a flat-pack; appoximately 5,600 gates in all. The gates were resistor-transistor logic (RTL). ...
The instruction format was 3 bits for opcode, 12 bits for address.
(details about each instruction in the instruction set there). (Knowing what we now know about MISC, how much smaller could we make a "reasonable" CPU out of NOR gates ? )
For as long as I've been using microprocessors -- and I was designing the Intel 8080 into radiation-monitoring equipment in 1977 -- I've always itched to have control over the instruction set. The makers always seem to leave something out....
Picaro makes a pleasant change from systems with 16-MB minimum memories and operating systems no human being can comprehend.
describes a very easy-to-build system: There's the CPU connected to some EPROM (for programs and data) and a crystal. The EPROM is serial EPROM (makes things *much* easier to wire up). The "CPU" is really a PIC with an internal interpreter with 2 modes. If the magic value is discovered in EPROM, the PIC starts fetching and "executing" (interpreting) the program in EPROM. Otherwise, it assumes the EPROM is blank, and waits for a properly-formatted program to stream in on the serial port which it programs into the EPROM. [FIXME: todo: build something like this ... but with a completely different instruction set, natch.]
(in other words, no ROM or FLASH is needed for initial testing).From: Ben Franchuk Date: Wed Jul 25, 2001 7:10 am Subject: Re: [fpga-cpu] A debugger for xr16 the easy way... On my cpu on reset a bootstrap loader is run from the serial input. This way all I need for a external parts is [RAM] and a buffer for serial lines from the FPGA.
``The Jupiter Ace hardware page: How to build your own !'' by Grant Searle BSc. http://www.home-micros.freeserve.co.uk/JupiterAce/JupiterAce.html
Build your own ZX81 http://www.home-micros.freeserve.co.uk/zx80/zx80nmi.html
interesting wire-wrap style ... see ``Back of PCB'' photo. While this uses a Z80 cpu (and so is ``cheating'' in the context of building the CPU itself out of standard components), the TV / monitor output circuitry may be interesting... [FIXME: computer museum] [FIXME: crosslink to schematic.html ?]... old micros which can still be built since they don't use custom components. ...
-- http://www.pcengines.com/toy2.htm`` TOY/2 - a minimal 16 bit CPU
TOY/2 was designed by Pascal Dornier and Stephan Paschedag ... in 1988 ...
TOY/2 can address a full 64K word program and memory address space. ... and takes up a whopping 3300 transistors (excluding I/O pads). ...
All instructions are 16 bit, 4 bits for the opcode, and 12 bits for the direct address. ''
DAV: I think the JMP instruction should be P := [vec] to make consistent with description and the conditional branches, and to allow jumping to code anywhere in the full 64 Kword program space. Only 15 defined instructions:
Programmer-visible registers: A: accumulator P: program counter T: temporary register for indirect store C: carry flag Z: zero flag (DAV: does this *always* indicate the current state of A ?) ; 3 for program flow control JMP vec P := [vec] ; indirect jump BCC vec IF (carry clear) then PC := [vec] BNE vec IF (not zero, i.e., 0==Z) then PC := [vec] ; 3 for direct load and store LDC src A := [src], C:= 0. ; load A and clear carry LDA src A := [src] ; load A, don't clear carry STA dest [dest] := A ; ; 3 logical ops (op) src A := A (op) [src] ; arithmetic ops: ; (op) is one of: XOR, OR, AND. ; 2 arithmetic ADC src A := A + [src] SBC src A := A - [src] - C ; subtract ; 4 implicit instructions that don't use the 12 bit address: ROR A,C := A,C ror 1 ; rotate right through carry TAT T := A ; transfer A to T LDI A := [A] ; load A indirect STT [A] := T ; store T indirect
DAV: Given that code is in ROM, and we choose some arbitrary RAM area ``stack_start'' to hold the stack, and we choose some other arbitrary RAM location SP to hold the return stack pointer, call instructions could be implemented like this:
(Is this the most compact implementation of CALL ?). ... ; DAV: This implementation of CALL eats up ; 2 instructions + 1 word of ROM = 3 words per call. LDA $+2 JMP &(banana1) LDA $+2 JMP &(banana2) ... ; linker must allocate a word of ROM to hold the $+2 value, 1 per CALL. ; linker must allocate a word of ROM to hold the &(banana2) value as well, ; but only 1 per subroutine, shared among all calls of banana2(). banana1: ; non-leaf subroutine ; we have return address in A, ; and SP points to an empty location to store it. ; adjust SP to point to empty location TAT LDA SP STT ADC &(1) ; assembler must allocate word of ROM to hold the constant 1 STA SP ... ; return sequence JMP &(non_leaf_return) non_leaf_return: ; common to all non-leaf subroutines LDA SP ADC &(-1) STA SP LDI STA temp JMP temp banana2: ; leaf subroutine ; we have return address in A, ; and SP points to an empty location to store it. TAT LDA SP STT ; leave SP full until end of routine ... ; return sequence JMP &(subroutine_return) subroutine_return: ; common to all subroutines LDA SP LDI STA temp JMP temp
DAV: From the instruction set listed here, I attempted to reverse-engineer a schematic suitable for implementation in TTL. I wonder how close this matches the original design by Pascal Dornier and Stephan Paschedag ?
TOY/2 as reverse-engineered by David Cary
(Warning: not yet implemented -- does this really work ?)
Every instruction takes 2 cycles:
1 SRAM memory cycle cycle (either reading [address] or [A], or writing T to [A]).
1 SRAM memory cycle to read next instruction (from [PC]).
Registers are labeled with their contents in the 1st cycle;
PC and A are swapped in the last cycle of the instruction.
+------------------------------+
| +----------+ |
| | V V
| | +-------------------+ +-------+
| | | opcode /\ address | | A |(PC)
| | +-------/ \--------+ +-------+
| | V V V
| | (control | +--------+
| | lines) | | V
| | V V |
| | +-\/--+ |
| | 1->| mux | |
| | +-----+ |
| | V |
| | (note 1) V address |
| | | +------------+ |
| | V | | |
| | +---+ | SRAM | |
| |2->| T | | | |
| | +---+ +------------+ |
| | V ^ data I/O |
| | | V V
| | +--------->+<| buf |--<+
| ^ V V
| +----------------+ |
| V V
| +-----------+ +----+
| | [address] | | PC |(A)
| +-----------+ +----+
| V V
| +------\ /------+
| 4->\ \/ /-->Z,C
| \ ALU /
| +----------+
| V
^ |
+-----------------------+
Control line summary:
0: instruction register : opcode : next value always latched at end of last cycle.
0: instruction register : address : next value always latched at end of last cycle,
never used during last cycle. (Perhaps share with some other register only used during last cycle ?)
0: [address] transparent register: always transparent during 1st cycle,
latched at end of 1st cycle, and held during last cycle.
0: A(PC) next value always latched at end of every cycle
0: PC(A) next value always latched at end of every cycle
2: Mux selection: depends on opcode and C and Z during 1st cycle;
always comes from PC on last cycle.
2: T: output enabled only on STI ( [A] := T ) instruction; latched only during TAT instruction.
1: SRAM: last cycle always a read; 1st cycle may be read or write.
4: ALU function select (is this really all the control signals needed ?)
0: Z: zero flag always set or cleared during last cycle to reflect current state of A.
1: C: new value of carry flag only latched on LDC, ADC, SBC, ROR instructions.
--
10 bits ? Is this all ?
Notes:
1. Only 1 instruction loads T: the TAT instruction.
There are several ways to implement this:
* the output of A(PC) (during the 1st cycle)
* the output of PC(A) (during the last cycle), or
* the output of the ALU (during the last cycle),
whichever is easiest.
2. The register to hold [address] must be transparent,
to implement JMP and BCC and BNE properly.
3. There's still room for 1 more instruction. What should it be ?
Some options that require *zero* more hardware (just setting up the uinstruction decoder):
* PC-relative jumps: PC := PC + [vec] (for position-independent code)
* TPT: Transfer PC to T: T := PC (to allow position-independent code)
* T := PC + [vec] (to allow position-independent code)
* [A] := PC+1 (to allow position-independent CALL)
This implementation calculates next PC first (using transparent [address] register), then next A.
(I think this ended up simpler than calculate next A first, then next PC).
(Of course, faster CPUs calculate both at the same time,
and have Harvard-style fetches from data cache simultaneous with fetches from instruction cache).
"Microcoded Versus Hard-wired Control: A comparison of two methods for implementing the control logic for a simple CPU" article by Phil Koopman BYTE Magazine January 1987 http://www.skidmore.edu/~pvonk/cs318/calendars/Documents/hard-wired.pdf describes the Toy CPU.
The schematic diagram of the PISC-1a can be downloaded here. DAV: While the paper notes ``no conditional branch microinstruction -- an important need'', I agree that conditionals are important, but "machines do not need conditional branches, they only need conditional subroutine return instructions." -- Philip J. Koopman, Jr. #conditional_returnThe PISC is a processor constructed from discrete TTL logic, which illustrates the operation of both hardwired and microcoded CPUs. ... simple hardware modifications demonstrate interrupts, memory segmentation, microsequencers, parallelism, and pipelining. A standalone PISC board should be an economical and effective tool for teaching processor design.
... Pathetic Instruction Set Computer ... Requiring only 22 standard TTL chips (excluding memory), it is well within the ability of a student to construct and understand. Its writeable microprogram store uses inexpensive EPROM and RAM. ...
Microprogram store and main program store are one and the same. Indeed, the PISC has characteristics of both a hardwired CPU and a microcoded CPU.
... Many weaknesses of the PISC become evident after a short period of use ... It can be argued that the PISC is a valuable educational tool because these faults, and several potential solutions, are painfully obvious. ...
... 2100 gates ...
Subject:
Re: Home-made CPUs
Date:
20 Nov 1998 00:00:00 GMT
From:
jones@cs.uiowa.edu (Douglas W. Jones,201H MLH,3193350740,3193382879)
Organization:
The University of Iowa
Newsgroups:
sci.electronics.design
From article <365533b8.334246@wingate>,
by t.dorrington@dial.pipex.com (Tim Dorrington):
>
> Has anyone out there managed to design/build their own very simplistic
> CPU from basic logic chips, RAM and ROM? I know that this is
> obviously a huge project which has no real practical use, but I just
> wondered if anyone has done it for the challenge?
It's been done, and it's not that huge a project. Reasonable CPU designs
take about 50 SSI and MSI TTL chips. See the Ultimate RISC and the
Minimal CISC architectures indexed on http://www.cs.uiowa.edu/~jones/arch/.
A fair number of people have built that particular RISC.
Of course, I just wrote the specs for it, you've got to figure out how to
reduce it to a chip level design. It's a fun exercise!
Doug Jones
jones@cs.uiowa.edu
Mountain View Press: The Forth Source http://www.TheForthSource.com/ Sells
WISC CPU/16 - A patented hardware design of a 16-bit stack machine with a user writable instruction set. Available in several forms: Complete schematics and documentation with which you can build your own ($50.00); Kits with all of the parts to wire wrap your own ($500.00); bare PC board which you can stuff with your own chips ($250.00); and Assembled and tested system ($750.00). Design uses a PC/AT as an I/O server. FAST!
and also sells lots of educational materials for learning the Forth programming language.
(FIXME: move to http://c2.com/cgi/wiki?WritableInstructionSetComputer ? )
-- _Stack Computers: the new wave_ book by Philip J. Koopman, Jr. 1989 http://www.cs.cmu.edu/~koopman/stack_computers/... one major design tradeoff ... hardwired control and microcoded control. ...
An introduction to the concepts ... may be found in (Koopman 1987a).
Hardwired designs traditionally allow faster and more space efficient implementations to be made. ...
... a microcoded machine can use fewer bits to specify the same possible instructions, ...
... microcoded implementations are more convenient to implement in discrete component designs, so they predominate in board-level implementations. Most single-chip implementations are hardwired.
... Koopman, P. (1987a) Microcoded versus hard-wired control. Byte, January 1987, 12(1) 235-242
DAV: the ``Mark's 8 chip uP'' at the venturalink site looks very clever. I am very impressed. It is far more compact than any other TTL/PAL based CPU I've seen. (It is even smaller than most ``single-board computers'' that use monolithic CPU chips). It makes me want to design a CPU again.
From: Bill Buzbee (bill at buzbees.com) Subject: Re: building a 8 or 16 bit cpu out of TTL parts Newsgroups: comp.arch.hobbyist Date: 2002-10-30 14:30:45 PST "john dobbs" wrote... > Anyone ever thought of building a 8 or 16bit cpu using traditional TTL > chips (NANDs,ORs,Flipflops etc) I could see how it would be a useful > learning device, or for the hardcore guys using transistors/diodes and > no IC's period. I agree with the other posters that there's no *rational* reason to do this using TTL - FPGA is the way to go these days. However, if you're feeling irrational, it can be a fun experience. Here's a link to recent simple CPU project you should check out: http://www.venturalink.net/~jamesc/ttl/ Also, I found the following books very useful when trying to come up to speed on the subject: Digital Computer Electronics, by Albert Paul Malvino, McGraw-Hill, 1977 Understanding Digital Computers, by Forrest Mimms III, Radio Shack, 2nd. Edition,1987. Good luck, ..Bill Buzbee
``... building my own computer from scratch. By "scratch", I mean designing my own instruction set, wire-wrapping a CPU out of a pile of 74 series TTL devices and writing (or porting) my own assembler, compiler, linker, text editor and very rudimentary operating system.
... User and supervisor modes exist, along with hardware address translation ...
... I really want to better understand hardware and operating systems. ... pushing ... the limits of my hobbyist abilities, ... support for preemptive multitasking and paging to enable me to support a "real" (though, I hope, greatly simplified) operating system for my machine.
... wherever possible I'll trade off speed for simpler circuits. Also, as I get closer to actually having to start wrapping wires, I find myself more freely trading off speed for fewer connections.
Compactness. One of my pet peeves is how bloated modern software is. I think there's a lot you can do with 128K bytes of addressing, and I like the idea of keeping things compact and utilitarian. I've tried to construct an expressively dense set of 1-byte opcodes.
At the end of the day, I'd like to have a working, and useful, machine that I understand completely. Oh, and it's also got to have a real front panel with lots and lots of cool blinky lights. ''
The ``links'' page points to many other web pages and books that deal with: small processor design and implementation, digital logic, retargeting C compilers and Forth compilers. [FIXME: does that make this section redundant ?]
[FIXME: finish reading]... relatively inexpensive project (<$100 including power supply, ICS, breadboards and assorted hardware) ...
General Features: All relays are the identical part (Four-Pole-Double-Throw, 12 Volts) 415 Relays 111 Switches 350 LEDs Max Power Consumption: Estimated 12 Amps @ 13.5 Volts (160 Watts) Features: 8 general purpose registers (of 8 bits each) instruction register (8 bits) program counter (16 bits) 2 additional registers (16 bits each) 7 bit ALU (operations: AND, OR, XOR, NOT, SHL, ADD, INC) 16 bit increment unit 32 Kbytes of main memory (implemented using one CMOS chip -- static RAM) Each instruction is 8 bits long. Some instructions (JUMP, CALL, BRANCH, SET-16) are followed by 16 additional bits of immediate data. The instruction set: MOVE register to register CLEAR set register to zero LOAD 1 byte from memory to register STORE 1 byte from register to memory AND 8 bit, register to register OR 8 bit, register to register XOR 8 bit, register to register NOT 8 bit, register to register SHL 8 bit, register to register, shift left 1 bit ADD 8 bit, register to register INCREMENT 8 bit, register to register INCR-XY 16 bit, XY register GOTO unconditional CALL return address is stored in XY register RETURN jump indirect through a register BRANCH conditional (beq, bne, blt, bcy) SET-16 load 16-bit immediate value into register (The 2-byte value follows the instruction.) SET-8 load 5-bit sign-extended immediate value into register (with sign extension) HALT suspend execution The clock cycle time is approximately 5 Hz. Instructions take between 8 and 24 cycles. Obviously not fast, but lights blink and it makes noise. The general purpose registers, which each have 8 bits, are called: A, B, C, D, M1, M2, X, and Y In some instructions, M1 and M2 are combined to form a 16-bit register, called M. Likewise, in some instructions, X and Y are combined to form a 16-bit register, called XY. The program counter (PC) is 16-bits. There is a 16 increment unit, which adds 1, and there is a 16-bit register called INC dedicated to the increment unit. This register is not visible to the programmer's model and is used only (1) to increment the PC during each instruction, and (2) in the INCR-16 instruction. The instruction register (INST) is 8-bits and holds the instruction being executed. There is a 16-bit register, called J, which is used during the CALL, JUMP, and GOTO instructions. These instructions load the register from the immediate 16-bit value following the instruction, and then to complete the transfer of control, the J register is copied to the PC register. ... The LOAD and STORE Instructions =============================== The memory is organized as 32K bytes, addressed from hex 0000 through 7FFF. The LOAD instruction uses the 16-bit value in the M register as an address and reads a byte from Main Memory into either the A, B, C, or D registers. Likewise, the STORE instruction assumes the address is in the M register and stores the value from either A, B, C, or D into a byte in Main Memory. ... The CALL Instruction ==================== The CALL instruction is exactly like the GOTO instruction, except that, in addition, the address of the next instruction after the CALL instruction is saved in the XY register. This instruction is encoded as: 1 1 1 0 0 1 1 1 a a a a a a a a a a a a a a a a where the field "aaaaaaaa aaaaaaaa" is the address of the subroutine. The RETURN Instruction ====================== The RETURN instruction moves the contents of XY back to the PC. This can be used to effect a RETURN (when used after a CALL) or an indirect jump, when XY contains a computed value. This instruction is slightly more flexible and is actually a 16-bit move instruction that can perform any one of the following moves: PC = XY PC = M PC = J XY = M XY = J XY = 0 ... Timeline, History, and General Comments ======================================= I teach CS and I have always loved computers and been interested in how they work. Over the years I have found that you can read and study and make paper designs, but there is no substitute for actually building things. There are always processes at work that you can't understand until you actually build a working unit. I had wanted to build a computer out of relays when I was much younger, but I didn't have the knowledge, patience, money, or time. ... Giving final exams are a panic for students, but are really different for the professors. It is the only time that there are no lectures to plan and no exams to grade. As I sat there during a final one term, I sketched out the outline of an architecture that I might use. So I designed the ALU to "fit" into a complete machine. However, when I decided to build the ALU, I still wasn't committed to building anything more. It is important to stay focussed on a project that you can complete. It is easy (and fun) to fantasize about building something big, but unless one is able to concentrate long enough to get something completed, it is really just idle day-dreaming! Once I finished the ALU, I decided to go ahead and build the second cabinet, the register unit. However, in my mind, I had not committed to building anything beyond that. I figured that if I finished it and it worked, I would then decide whether I wanted to proceed. I knew there was a possibility I might be burned out, or might run into problems that would make a full computer non-functional. I always want successes, not failures, so I decided I would rather have a fully-working half-computer that a half-finished full-computer. After finishing the ALU and the register unit, I found I enjoyed all aspects of the construction. Usually design is the most fun and, when building stuff, we tend to leave the most unpleasant or difficult tasks until the very end, but after completing 2 cabinets, I knew pretty much what would be involved. At that time I decided to keep going. ... Costs ===== 415 Relays, 4PRLY-12, plus extras for prototyping ($3.40 each) 350 LEDs ($1.49 each) 111 Switches, SPDT, on-on mini toggle, MTS-4 ($0.90 each) Acrylic boards, pre-cut and pre-drilled, 4 cabinets (total $1,095.37, aprx $275 per cabinet) ALU (1 board), 82.19 Register Cabinet (10 boards), $251.60 total PGM-CTRL Cabinet (10 boards), $381.80 total Sequencer Cabinet (5 boards): $379.78 total 4 Mahogany Cabinets (est $300 each) 2 Power Supplies, 12V, 10Amp ($79.99 each) 20 Capacitors, 100UF, 100V non-polar ($1.55 each) Memory Board: 1 SRAM, 32K dip, Jameco # 82472CA, Other #: 62256LP-70 ($5.49) 1 eight-channel FET driver module, NCD, www.controlanything.com, IOTEST-L ($49.00) 3 eight-channel LED array module, NCD, www.controlanything.com, 8-FET ($10 each) 1 Prototyping board ($5.99) 1 small power supply (for memory) 5V, 4Amp, 20Watt, Jameco #: 213583CA, ($26.95) DB-9 Sub-miniature Connectors 32 Plugs ($1.59 each) 32 Receptacles ($1.76 each) 64 Connector hoods ($0.49 each) 22-Gauge black solid copper wire, 100 feet ($4.49 each), est. 20 rolls 8-connector cable, CAT5e, 4 pairs, 24AWG solid, 328 feet ($42.00) 4 Fans ($10 each) Misc Hardware. (Est $100) Summary: Relays, $1,411 LEDs, $521 Switches, $100 Acrylic Boards, $1,095 Cabinets, $1,200 Power Supplies $160 SRAM Memory, $117 Capacitors, $148 Connectors, $138 Wire, $132 Fans, $20 Hardware, $100 Grand Total, $5,142 (per unit: $1,285)
cellular automata is related to other interests I have:
[FIXME: CA stuff scattered elsewhere ...]
The cellular automata questions I'm most interested in are:
There's a little loop here -- first we use (simulated) cellular automata to learn about replication, then we use that information to design replicating tools. We also use (simulated) cellular automata to explore good rules and good patterns for computronium. Then we use those replicating tools to (build enough copies of themselves to) build computronium. Then we use *that* computronium (hardware cellular automata) as a better computer.
"Excellent paper (several chapters of a forthcoming book, actually), if you are interested in cellular automata, reversible computation, hardware implementations and CAM modelling, you should check it out. ... Caution, this expands into 17381429 Bytes of PostScript." -- recc. Eugene Leitl <eugene@liposome.genebee.msu.su>
DAV: I agree. This paper has good stuff about the interaction (co-evolution) of hardware and software, efficient realization of computational architectures, some of the background thinking behind the massively parallel CAM-8 processor, and likely directions of future computing devices." -- DAV
the theory behind the CAM-8 http://www.im.lcs.mit.edu/cam8.html a dedicated cellular automata processor. http://www.im.lcs.mit.edu/broch/med1.html mentions that it is fast at generating the data needed for holograms.
Small size is not everything - Q*Bert-based systems will require more cells for some tasks than the equivalent when expressed in (say) the Margolus neighbourhood. However for the majority of applications which involve performing cmputations, the additional richness provided by a larger LUT seems likely to be wasted much of the time, doing the equivalent of transporting signals from one point to another. We believe the advantage of the smaller cells in terms of their smaller size and higher update frequency will commonly win out.
Finally the Q*Bert neighbourhood is essentially hexagonal - and there is a general additional efficiency of hexagonal packing schemes compared to rectangular ones - a matter I will not go into further here.
[89] Toffoli, T. and N. Margolus, _Cellular Automata Machines -- a new environment for modeling_, MIT Press (1987)
DAV: Which of the 2 CA rules detailed in the _Crystalline Computation_ paper is more "interesting" (can build more compact logic circuits out of): "BBMCA", or "Critters" ? (David Cary thinks they are both "more interesting" than the Conway rules) More importantly, since hydrodynamics required a triangular grid, what rules are most "interesting" on triangular grids ? (Perhaps that grid has rules that are "more interesting" than any square-grid rules).
(a) blocking on even steps: aabbccddee..... aabbccddee..... ffgghhiijj..... ffgghhiijj..... kkllmmnnoo..... kkllmmnnoo..... ... ...zz blocking on odd steps: abbccddee.....a fgghhiijj.....f fgghhiijj.....f kllmmnnoo.....k kllmmnnoo.....k ... abbccddee...zza (b) OO -> OO OO OO *O -> *O OO OO *O -> O* O* *O *O -> O* *O O* O* -> ** ** *O ** -> ** ** **
Fig. 1.5 The invertible "Critters" CA.
(a) The solid and dotted blockings are used alternately.
(b) The Critters rule.
The Critters rule is ... "rotationally symmetric" ... "conserves 1's (particles), and only 3 cases change." ... "each of the 16 possible initial states of a block is turned into a distinct result state. Thus the Critters rule is invertable." ... "Unlike the gliders in Life, Critters gliders are quite robust. ... when two of these gliders collide in an empty region ... ... If nothing hits this blob for a while, we always see at least one of the gliders emerge."
(a) OO -> OO OO OO *O -> OO OO O* *O -> O* O* *O *O -> *O *O *O O* -> O* ** ** ** -> ** ** **
Fig. 1.8 A invertible Billiard Ball Model CA.
(a) The BBMCA rule
...
"rotationally symmetric"
"conserves 1's (particles), and only 2 cases change."
"we can recover macroscopic 2D hydrodynamics from a model that is only slightly more complicated than the HPP gas. A single-speed model with 6 particles per site, moving in 6 directions on a triangular lattice, will do. If all zero-net-momentum collisions cause the molecules at the collision site to scatter into a rotated configuration, and otherwise the particles go straight, then in the low speed limit we recover isotropic macroscopic fluid dynamics."
[100] Yakhot, V. and S. Orszag, "Reynolds number scaling of cellular-automaton hydrodynamics", _Phys. Rev. Lett._ (1986), 1691-1693.
It has been shown that self-replicators can be designed in cellular automata #cellular_automata . some of this deals with ``real'', physical replication. While other parts -- cellular automata, quines, etc -- deal only with patterns in a computer. Should I separate them ? But some of the theory applies to both.
related to reconfigurable robots robot_links.html#reconfigurable and robot construction (humans building robots) robot_links.html#construction and tool closure 3d_design.html#closure . and the bootstrap problem [FIXME:] [FIXME: cross-link all the self-replication stuff on my web pages. nano, robot, cellular automata, etc. Point back and forth from ``replication'' section to computer architecture # cellular automata robots nanotech idea_space and unknowns ]
The ``Game of Life'' http://www.astro.virginia.edu/~eww6n/life/ is the most well-known Cellular Automaton, invented by John Conway and popularized by Martin Gardner. Lots and lots of interesting properties and pretty animations have been collected for this cellular automaton. But David Cary doubts that this is really the most interesting ("computationally compact") cellular automata; other cellular automata are likely to have even more pretty animations and properties. "replicator - a Life object which repeatedly forms copies of itself. Such things are known to be possible in Life, but no example is known. But in the HighLife variant, there is a simple replicator." "HighLife - an alternate set of rules similar to Conway's, but with the additional rule that 6 neighbors generates a birth. Most of the interest in this variant is due to the replicator that evolves from:
***. ...* ...* ...*-- http://www.cs.jhu.edu/~callahan/glossary.html "unit Life cell: a pattern with two states, which is determined by its previous state and the previous state of its neighbors, using exactly the rules used to compute it; that is, it simulates its own universe. None have been constructed in Conway's Life yet." When I (DAV) talk about one entity replicating, I mean that we end up with 2 (or more) practically identical copies of the original, in every way (in particular, size). A ``unit Life cell'' seems to me to be somehow related to the so-called ``replicating-tile'' http://www.geocities.com/alclarke0/PolyPages/Reptiles.htm (which creates identical copies, but scaled much larger)
"Looking (and dreaming) toward the future, one can imagine nano-scale (bioware) systems becoming a reality, which will be endowed with evolutionary, reproductive, regenerative, and learning capabilities." http://lslwww.epfl.ch/pages/tutorials/poe/home.html
-- ``Beyond computation: a talk with Rodney Brooks'' 2002 http://www.kurzweilai.net/meme/frame.html?main=/articles/art0475.html | http://www.edge.org/3rd_culture/brooks_beyond/beyond_p4.htmlWe're also trying to build self-reproducing robots. We've been doing experiments with Fischer Technik and Lego. We're trying to build a robot out of Lego which can put together a copy of itself with Lego pieces. Obviously you need motors and some little computational units, but the big question is to determine what the fixed points in mechanical space are to create objects that can manipulate components of themselves and construct themselves. There is a deep mathematical question to get at there, and for now we're using these off-the-shelf technologies to explore that. Ultimately we expect we're going to get to some other generalized set of components which have lots and lots of ways of cooperatively being put together, and hope that we can get them to be able to manipulate themselves. You can do this computationally in simulation very easily, but in the real world the mechanical properties matter. What is that self-reflective point of mechanical systems? ...
see #simple_cpu and minimal_instruction_set.html
------------------------------ Date: Mon, 19 Jun 2000 16:27:44 +0200 From: Tim Böscke To: <MISC> Subject: Re: MISC > >The _real_ minimum instruction set is one with a > >subtract-and-branch-when-negative instruction btw. > > > > Eugene Styer mentions six 'one instruction computers' at > <http://eagle.eku.edu/faculty/styer/oisc.html> > Well - if you look at them closely it becomes obvious that the six machines are after all just flavours of two ideas. 1) The move machine 2) subtract, branch The mentioned "subtract" machine is a combination of both. However I dont really consider 1) as a one instruction machine. After all it is just a fancy way of doing microcoding. A move to the PC is after all not just a move, but a JMP. And thus you already have two instructions. The same goes for the ALU stuff. ------------------------------
------------------------------ Date: Thu, 22 Jun 2000 00:27:00 -0600 From: Roger Ivie <rivie at teraglobal.com> To: MISC Subject: Re: MISC Content-Type: text/plain; charset="us-ascii" ; format="flowed" >Well - if you look at them closely it becomes obvious that >the six machines are after all just flavours of two ideas. > >1) The move machine >2) subtract, branch > >The mentioned "subtract" machine is a combination of both. > >However I dont really consider 1) as a one instruction machine. >After all it is just a fancy way of doing microcoding. A move >to the PC is after all not just a move, but a JMP. And thus >you already have two instructions. The same goes for the ALU >stuff. Bear in mind that the PC doesn't have to be an actual register. A good example of this is the PDP-5, which stored the PC in location 0. Executing an instruction began by fetching location 0 and using it as the address of the instruction to be executed. It was possible for a DMA device to cause a jump by writing to location 0; this was, in fact, how the front panel loaded an address into the PC. -- Roger Ivie rivie@teraglobal.com Not speaking for TeraGlobal Communications Corporation ------------------------------
Steamer16: a high-performance homebrewer's microprocessor http://www3.sympatico.ca/myron.plichota/steamer1.htm by Myron Plichota <myron.plichota at sympatico.ca>. written in VHDL and prototyped on a single wire-wrapped protoboard. Has only 7 instructions. Packets of 5 instructions (5 instructions * 3 bits = 15 bits/packet) execute in 6 cycles/packet at a cycle rate of 20 MHz. Both the address and data busses are 16 bits. Unusual ``ArF'' call/return protocol.
qUark ../mirror/quark.txt a viable stack-computer with 4-bit opcodes (c) vic plichota, original concept by Myron Plichota Dec '98.
------------------------------ Date: Sat, 20 Feb 1999 12:54:34 +0300 From: "Stas Pereverzev" To: MISC Subject: Re: nFORTH v2.3 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit >Comments, folks?