!Instruction set overview\nWith the following decisions in mind, it is time to have a look at what the instruction set now really is comprised of.\n* Minimum number of arithmetic and logic instructions.\n* Only Branch and BranchIfZero.\n!!Loading\n* Load an immediate value into the DataRegister (LD DR,const)\n* Load an immediate value into the ProgramCounter (LD PC,const)\n* Load a direct addressed value into the DataRegister (LD DR,M[index])\n* Load a direct addressed value into the ProgramCounter (LD PC,M[index])\n* Load a value pointed to by a memory cell into the DataRegister (LD DR,M[M[index]])\n* Load a value pointed to by a memory cell into the ProgramCounter (LD PC,M[M[index]])\n!!Storing\n* Store the value of the DataRegister in the ProgramCounter (ST DR,PC)\n* Store the value of the ProgramCounter in the DataRegister (ST PC,DR)\n* Store the value of the DataRegister into the direct address (ST DR,M[index])\n* Store the value of the ProgramCounter into the direct address (ST PC,M[index])\nLoading and storing are constrained by the fact that to keep things as simple as possible, instructions will fetch at most one extra operand.\n\nHowever, is this enough to use the contents of a memory cell as an address ?\n\nSuppose I need to be able to have access to an array. How can this be executed ?\n\nWhen a program has been built, it will consist of three parts : code, static data and and interface to the program stack. We assume C as a programming language.\n\nSince the computer is fairly simple, only one program executing is assumed. Addresses will have been fixed by the program linker, and the program is loaded absolute into memory.\n\nThis means that program code will be first, static data will be second, and the stack will start from the end of memory growing downwards.\n\nThis also means that the starting address for the array will be absolute. When using a global variable as index, its address will also be absolute.\n!!!Case 1 : Storing a value somewhere in the array\n{{{\na[6] = 10;\n\nLD DR,a\nADD DR,6\nST DR,M[TMP]\nLD DR,10\n}}}\nAnd here we are stuck, because the value of the DataRegister should be moved to the address which is contained in M[TMP]. However, this is impossible, because only addresses in the instruction stream are currently usable.\n\nThis means that the instruction set must be extended for storing with instructions which use the address after the instruction as the place to fetch another address from, which is then used as the target.\n\nIf we then add :\n* Store the value of the DataRegister into the indirect address (ST DR,M[M[index]])\n* Store the value of the ProgramCounter into the direct address (ST PC,M[M[index]])\nwhich may need another way of expressing, then the previous problem can be solved.\n{{{\na[6] = 10;\n\nLD DR,a\nADD DR,6\nST DR,M[TMP]\nLD DR,10\nST DR,M[M[TMP]]\n}}}\n!!!Case 2 : Adding two elements of an array together\n{{{\na[7] = a[6] + a[5]\n\nLD DR,a\nADD DR,6\nST DR,M[TMP]\nLD DR,a\nADD DR,5\nST DR,M[TMP2]\nLD DR,M[TMP]\nADD DR,M[TMP2]\nST DR,M[TMP]\nLD DR,a\nADD DR,7\nST DR,M[TMP2]\nLD DR,M[TMP]\nST DR,M[M[TMP]]\n}}}\nIt is apparent that we have enough instructions to do more complex work. However, the usage of the TMP and TMP2 memory locations is flawed, because relying on absolute addresses is dangerous.\n\nWith the current instructionset we could use one absolute address as the stack pointer. What does this give us then ?\n{{{\na[7] = a[6] + a[5]\n\n; Increment stack pointer\nLD DR,M[SP]\nADD DR,1\nST DR,M[SP]\n\n; Store a[6] on top of stack\nLD DR,a\nADD DR,6\nLD DR,M[DR]\nST DR,M[M[SP]]\n\n; Increment stack pointer\nLD DR,M[SP]\nADD DR,1\nST DR,M[SP]\n\n; Store a[5] on top of stack\nLD DR,a\nADD DR,5\nLD DR,M[DR]\nST DR,M[M[SP]]\n}}}\nThe final part is the most difficult, because the first term is always second on the stack. For this part, we really do need a temporary place to store the value.\n{{{\n; Store top of stack in memory\nLD DR,M[M[SP]]\nST DR,M[TMP]\n\n; Decrement stack pointer\nLD DR,M[SP]\nADD DR, 0xFFFF\nST DR,M[SP]\n\n; Get first operand\nLD DR,M[M[SP]]\n\n; Add temporary\nADD DR,M[TMP]\n\n; Store result on stack\nST DR,M[M[SP]]\n}}}\nAn intelligent compiler can probably shorten this code by removing the\nstore of the last operand to be used and putting it immediately in the\nTMP memory.\n{{{\na[7] = a[6] + a[5]\n\n; Increment stack pointer\nLD DR,M[SP]\nADD DR,1\nST DR,M[SP]\n\n; Store a[6] on top of stack\nLD DR,a\nADD DR,6\nLD DR,M[DR]\nST DR,M[M[SP]]\n\n; Increment stack pointer\nLD DR,M[SP]\nADD DR,1\nST DR,M[SP]\n\n; Store a[5] in TMP memory\nLD DR,a\nADD DR,5\nLD DR,M[DR]\nST DR,M[TMP]\n\n;Get first operand\nLD DR,M[M[SP]]\nADD DR,M[TMP]\nST DR,M[M[SP]]\n\n; Store result\nLD DR,a\nADD DR,7\nST DR,M[TMP]\n\nLD DR,M[M[SP]]\nST DR,M[M[TMP]]\n\n;Decrement stack again\nLD DR,M[SP]\nADD DR,0xFFFF\nST DR,M[SP]\n}}}\nThe problem with using a stack is that the order in which an operation must be executed, is inverse from the order of the operands on the stack.\n\nIf I want to support a high-level language, then I need to have a stack architecture to compute expressions. This stack architecture will be completely written in software, without any hardware assistance. Remember, the goal is to have a working system, which can compile programs, which can then be measured. From these measurements, actions can then be taken to optimise the architecture.\n!!Arithmetic and logic operations\nFor this we have to distinguish between unary and binary operations.\n!!!Unary operations\nThese are operations which only work on one operand, which in this case is the DataRegister. The following operations are unary :\n* NOT\n* SL\n* SR\n* INC\n* DEC\nThe INC operation is needed for the ProgramCounter. Since this hardware has to be added, we can as well add the DEC instruction, which makes stack operations faster.\n!!!Binary operations\nSince there is only one internal register to operate on, binary operations are always between the DataRegister and memory contents. Since this implies fetching data, we should have the same possibilities as for loading data :\n* OP DR,const\n* OP DR,M[index]\n* OP DR,M[M[index]]\nThe following operations are binary :\n* AND\n* OR\n* ADD\nThese operations could also be combined with the ProgramCounter. The question in that case is, will it simplify the MicroProgram ?\n\nIf an instruction contains a bit which selects the DataRegister or the ProgramCounter, then this only has to be taken out from the instruction register and nothing else must be added to the MicroProgram.\n\nIf, however, instructions implicitly touch one of the registers, then 2 extra lines are necessary in the MicroInstruction.\n!!Branching\nIf it is possible to do certain operations on the ProgramCounter, then it is not necessary to include branching instructions. However, to be able to optimise the MicroProgram, branch instructions must be explicitly specified.\n\nHowever, it is necessary to include the BranchIfZero instruction, because that is the only way decisions can be taken.\n!!Subroutines\nUntil through benchmarking and measurements is known how the current architecture should be evolved to support subroutines, these will be emulated by sequences of instructions which execute subroutine calls and returns from subroutine calls.\n\nSince the architecture must support high-level languages, a stack mechanism is needed. Since there is no register for doing that, the program code must include features for supporting a stack, which is first and foremost a stack pointer. This will just be an absolute memory address which contains another address.\n\nIf a subroutine must be executed, then the value of the ProgramCounter must be saved on the stack, and the program should jump to the subroutine.\n{{{\n; Increment the SP\nLD DR,M[SP]\nINC DR\nST DR,M[SP]\n\n; Store the PC\nST PC,M[M[SP]]\nLD PC,_SUB\n}}}\nIn the last case, the ProgramCounter value stored on the Stack will point to the load instruction. This means that if we want to return, we need to add a constant to the ProgramCounter so that it points to the instruction after the load.\n\nThere is another problem. The StackPointer must be decremented after the return from the subroutine. This means that that code must be added to the subroutine call.\n\nIn that case, the return from subroutine is :\n{{{\nLD DR,M[M[SP]]\nADD DR,2\nST DR,PC\n}}}\nand the previous instruction sequence is extended with :\n{{{\nLD DR,M[SP]\nDEC DR\nST DR,M[SP]\n}}}\nThe return sequence, however, must be analysed at the MicroProgram level to see if using the standard instructions works fine, or if something else must be done because the normal control flow is interrupted.\n!!Instruction size\nThe operation selection code needs to be added to the instruction. This will need four bits, because it will also be\nnecessary to be able to pass operands through the ALU unmodified.\n\nSince there are three addressing modes, two bits are needed, which gives us an orthogonal instruction of 7 bits, of which probably not all instructions make sense.\n\n|Type |Op |Kind|Source|Address|\n|Load | 3| 1| 1| 2|\n|Store | 3| 1| 1| 2|\n|Unary | 3| 3| 1| 2|\n|Binary | 3| 2| 1| 2|\n|BranchIfZero| 3| 1| 1| 2|\n\nDerived from this table, we can see that we need 10 bits, which gives a possible 1024 instructions. However, much of these combinations will probably not make sense, or even just have too much bits, which is a waste.\n\nThe beauty of this scheme is that only the operation and the address source are needed for decoding, and that the kind and source do not need to pass through the decoding ROM.\n\nOn the other hand it is probable that the subset of needed instructions is smaller than 1024 (e.g. there are only three addressing modes), which could be encoded in a smaller field, that is then decoded in parallel with the decoding ROM, and yields direct signals to the necessary registers and ALU operations.\n\nThis might be an advantage, because it is not sure that loading the ProgramCounter directly with a value is safe, but that it is explicitly needed in the MicroProgram that a jump is being taken.\n\nThis means that two things must be done now :\n* Analyse the MicroProgram flow for loading the ProgramCounter.\n* Give a complete enumeration of instructions.\n!!Implementation process overview\nThese are the steps in the process :\n# Create/Adapt hardware architecture\n# Update MicroController\n# Update MicroAssembler\n# Update MicroProgram\n# Update Assembler\n# Update system tests\n# Run system tests\n# Adapt C compiler\n# Recompile testing programs\n# Make new measurements\n# Find out bottlenecks\n# Write spec for update of hardware architecture\n\nThis is of course a big task. However, the initial part will probably be the most work, because it is initial work. At the end of the cycle, it should be possible to save and deliver a working system with all necessary development tools as a complete release.\n\nAfterwards adding modifications to new versions should prove to be much less work, although still not trivial.\n!Targeting a compiler\nFor creating code out of a high level language, the best possibility is to start using LCC.\n\nLCC is a retargetable compiler. Once it is implemented, it should be possible to build libraries for the target machine using uClibc, and from there go on to build simple applications which can be loaded and run.\n\nAnother goal of the system, but which is not emphasised, is to be able to compile a Scheme system for the target machine. SIOD is a likely candidate, which can already be tested under Debian with uClibc. This will give an indication on the size of the system, which can then give a bound on addressing needs.\n\n!Advanced features overview\nThis is not detailed, but provides an overview of things which may be important in the future.\n!!Interrupts\nThis is probably the single most important feature after the first system is up and running. Interrupts break asynchronously into the instruction stream. After we have a running system with an assembler, interrupts should probably be developed.\n!!Hardware assisted memory protection\nThis is indispensable when running multiple processes. It can only be developed after interrupts have been created, because it should also generate an interrupt.\n!!Larger memory\nThis depends on the word length ultimately chosen for the implementation. If it is possible to build a 24 bit or 32 bit system, then there is no upper bound on memory.\n!!Supervisor and user mode\nThis is necessary for protecting hardware access from normal programs.\n!!Virtual memory\nVirtual memory is a way for extending memory by using a peripheral for storage. This is probably the least necessary part for a working educational system.
!Summary of [[r3]] changes\nMost changes in [[r2]]
|Name |Description |Type | Values |\n|JUMP-1 |Jump destination 1 |Integer |0-1023 |\n|JUMP-2 |Jump destination 2 |Integer |0-1023 |\n|MEMORY-ENABLE|Memory access |Boolean ||\n|MEMORY-RW |Read/Write |Boolean ||\n|IR-LOAD |Load IR |Boolean ||\n|DB-IN-LOAD |Get databus value |Boolean ||\n|MAR-LOAD |Activate MAR |Boolean ||\n|DB-OUT-ENABLE|Activate DB out |Boolean ||\n|DB-OUT-LOAD |Register DB out |Boolean ||\n|AD-SELECT |Select MAR address source |Boolean ||\n|DR-ENABLE |Activate DataRegister |Boolean ||\n|DR-LOAD |Load DataRegister |Boolean ||\n|PC-ENABLE |Activate ProgramCounter |Boolean ||\n|PC-LOAD |Load ProgramCounter |Boolean ||\n|OP-SELECT |Select ALU OP source |Boolean ||\n|ALU-uC-OP |ALU operation |Integer |0-7|\n|CONDITION |Conditionally select address|?||\n\n|Name |Description |Type | Values |Bits|\n|JUMP-1 |Jump destination 1 |Integer |0-1023 | 10|\n|JUMP-2 |Jump destination 2 |Integer |0-1023 | 10|\n|MEMORY-ENABLE|Memory access |Boolean | | 1|\n|MEMORY-RW |Read/Write |Boolean | | 1|\n|IR-LOAD |Load IR |Boolean | | 1|\n|DB-IN-LOAD |Get databus value |Boolean | | 1|\n|MAR-LOAD |Activate MAR |Boolean | | 1|\n|DB-OUT-ENABLE|Activate DB out |Boolean | | 1|\n|DB-OUT-LOAD |Register DB out |Boolean | | 1|\n|AD-SELECT |Select MAR address source |Boolean | | 1|\n|DR-ENABLE |Activate DataRegister |Boolean | | 1|\n|DR-LOAD |Load DataRegister |Boolean | | 1|\n|PC-ENABLE |Activate ProgramCounter |Boolean | | 1|\n|PC-LOAD |Load ProgramCounter |Boolean | | 1|\n|OP-SELECT |Select ALU OP source |Boolean | | 1|\n|ALU-uC-OP |ALU operation |Integer |0-7 | 3|\n|CONDITION |Test condition |Boolean | | 1|\n|COND-TRUE |Which address if zero = true |Integer |0-3 | 2|\n|COND-FALSE |Which address if zero = false|Integer |0-3 | 2|\n
!Sidetrack : Common Lisp EVAL-WHEN\nIt is possible to detect in which phase CL is, by using the special operator EVAL-WHEN.\n\nThis operator obeys the following symbols :\n#:compile-toplevel\n#:load-toplevel\n#:execute\n
|Load DR|Immediate|\n|~|Direct|\n|~|Indirect|\n|Store DR|Direct|\n|~|Indirect |\n|Store PC|DR|\n|~|Direct|\n|~|Indirect|\n|Branch|Direct|\n|~|Indirect|\n|BranchIfZero|Direct|\n|~|Indirect|\n|Arithmetic|Immediate|\n|~|Direct|\n|~|Indirect|\n|System|STOP|\n
Previous entry : [[25 January 2007]]\n!Progress report\n!!Status of SYS-2 simulation\nAfter two weeks of relative inactivity, mostly spent on learning macros and rehearsing recursion, I was finally able to complete the {{{create-accessor-functions}}} macro, which creates from a list of data abbreviations for all the connections in the {{{sys-2-struct}}}.\n\nThe next step is to add the controller connections, and to define a macro for defining meaningful names for the connections of the controller pipeline register.\n\nAfter that it should be possible to define a function which initialises and returns a fully functional SYS-2 environment, for which it is possible to modify the microcode, the jump table and the memory contents.\n
!Component count estimation for 12-bit and 16-bit implementations\n\nThe next table provides an overview of the SYS-2 components, how much there are needed for a 12-bit or a 16-bit implementation, and consequently the real estate needed to implement it. Currently, the estimation will be based upon TTL, but it could be augmented with PAL, PLA and FPGA.\n\n|Component name|Width |Footprint|Number|12-bit|Total|16-bit|Total|Comment|\n|Controller |41 bits|DIP-24 |6 |1 |6 |1 |6 ||\n|Pipeline register |41 bits|DIP-20 |6 |1 |6 |1 |6 ||\n|uAdres mux |10 |DIP-14 |5 |1 |5 |1 |5 ||\n|uAdres counter |10 |DIP-14 |3 |1 |3 |1 |3 ||\n|Incrementer |10 |DIP-14 |3 |1 |3 |1 |3 ||\n|Condition |2 bits |DIP-16 |4 |1 |4 |1 |4 ||\n|Jump-rom |10 bits|DIP-24 |2 |1 |2 |1 |2 ||\n|Splitter |8 |DIP-16 |1 |1 |1 |1 |1 ||\n|OP-SELECT |4 |DIP-16 |1 |1 |1 |1 |1 ||\n|ALU |4 |DIP-20 |1 |3 |3 |4 |4 ||\n|DR |4 |DIP-16 |1 |3 |3 |4 |4 ||\n|PC |4 |DIP-16 |1 |3 |3 |4 |4 ||\n|AD-SELECT |4 |DIP-16 |1 |3 |3 |4 |4 ||\n|MAR |4 |DIP-16 |1 |3 |3 |4 |4 ||\n|DB-OUT |4 |DIP-16 |1 |3 |3 |4 |4 ||\n|DB-IN |4 |DIP-16 |1 |3 |3 |4 |4 ||\n|IR |4 |DIP-16 |1 |3 |3 |4 |4 ||\n|>|>|>||Total|55|Total|63|>|\n\nThis is a minimum. It is necessary to add some additional circuitry for driving multiple parts. However, these are mostly part of the controller, and thus make the relative difference between a 12- and 16-bit implementation smaller.\n\nA 12-bit implementation is nice for simulation, but the difference between a 12-bit and a 16-bit implementation is too small to not choose for a 16-bit implementation if starting work on one.
!Roadmap\n!!Current progress\nThe microprogram is ready, and I have reviewed it. This took out a whole lot of errors already. The following part is to enhance the system test harness, to be able to run it with several parameters :\n* The microprogram file\n* The decoder ROM\n* A main program\n* The number of steps to be executed\n* A list of fields to be shown in the output trace\n!!Overview of changes after the current system works\nThis is not necessarily in chronological order, but a list of enhancements to be done. Every enhancement needs a corresponding change to the microprogram. It is preferable, once each enhancement is finished and tested, that it be archived under a new tag.\n!!!Last changes in the current phase.\n* Change MAR to register, instead of using latch.\n* Create a controller with single register subroutine.\n* Create a display interface peripheral.\n* Create a keyboard interface peripheral.\nOnce this part is finished, this architecture can be archived, and the project should enter a new phase.\n!!Changing the architecture\nHowever, we will stay in the current 12-bit model.\n* Create a 74LS670 component\n* Add a 8x12 register file.\n* Update the instruction set, taking into account the use of extra registers.\n** PC is register 0.\n** Add a stack pointer, register 1.\n** Make operations between registers possible.\n** Look into indirect addressing vs. register addressing.\n** Look into register relative addressing.\n* Add interrupt processing\n* Change the keyboard and display peripherals to be interrupt driven.\n* Add a timer peripheral which generates interrupts.\n!!Code generation\nIt should be possible to create a gcc code generator, and have knowledge of this technology, to generate code for the created architectures. The main goal is not speed, but dense code.\n\nIn order for this to be helpful, there should also be some C benchmark programs which can be used to generate code and measure size and speed of it.\n!!Extending the width of the architecture\nThe next logical step would be to define an architecture which has a larger width in both the data and address sizes.\n\nThe main problem with extending the address sizes, is that either we need\n# to extend the data size accordingly\n# perform multicycle operations on addresses\n# manipulate paging registers\nPoint 1. means we need to use more resources, but when I get to this stage, I will possible already have gone for implementation on FPGA or CPLD.\n\nPoint 2. requires a modest amount of resources. Addresses would need to be loaded in two memory cycles (microprogram), an extra MAR (hardware) would be needed, saving of the carry would be needed to be able to add long integers (hardware & microprogram). Logically, it would mean the least problems. There would be a drop in speed, because of multi-cycle address calculations, but this could be offset by speeding up long integer calculations.\n\nPoint 3. requires the least resources, but an overlying virtual machine is needed in order to be able to program the system easily.\n\nIn case of points 2 and 3, we could stay with the 12-bit model and goto a 24-bit address space, which is large enough to program a whole lot of things. In case of point 2, there are two solutions to the register problem.\n\nEither it is solved like in the 8080, use standard 12bit registers, and have only half 24 bit registers, or use 16 registers, but have only half of them useable, both as 12-bit or 24-bit registers. This complicates register access, but gives a faster system.\n\nIn any case, the amount of memory needed for this platform depends upon the ability of the compiler backend to generate dense code, which runs relatively fast due to the optimisations of the underlying instruction set.\n\n
!Work breakdown for system harness\n!!Requirements\nIt shall be possible to run the system using three files as parameters.\n\nThese files shall contain the microprogram, the decoder ROM and the application program.\n\nThe contents of the three files shall be in a form that can be easily used as a data structure.\n!!Analysis\nHere the process steps needed to go to the end result will be elaborated.\n!!!Microprogram\nThe process starts with the microprogram. This should deliver three data structures which can be used further :\n# The microprogram, which is immediately usable.\n# A list of microprogram entry points.\n# A list of constants, which is more needed for documentation.\nThe basic microassembler can deliver this as computed data, but there should be an application which can assemble the input and write the results in three separate files.\n!!!Decoder ROM\nThe list of microprogram entry points should be combined with a list of instructions to generate an array of addresses, which is immediately usable as an argument to the system setup.\n\nThere should be an application which can use two files, a hand coded input which maps instruction codes against labels, and the file generated by the microassembler, to combine this into a one dimensional array which is indexed by the instruction codes, and gives the value of the matching label.\n!!!IPL\nIPL should be delivered as a one-dimensional array of (unsigned-byte 32). Initially this can be hand assembled, but an assembler should certainly be written.\n
!Progress report\n!!SYS-2 structure initialisation\nThis is a part that is is more work in volume, than work in depth. It consists of enumerating the connections that have to be made, and of course the more complex the design, the more work there is.\n\nLuckily, the CL macro system helps us. It has made it possible to create abbreviations for all component connections, which reduces errors in creating the connections themselves (which is task that needs to be done by the designer. Here the computer cannot help).\n\nThe following step is to make it easy to set the connector comments to their name.\n\n!!Help in creating connector comments\n!!!Problem description\n\nWhen creating the connector aliases, two things are done. The first one is that a (progn) is created which then creates all the alias functions for all the connections in the SYS-2 structure.\n\nThese functions take one argument, a structure of type SYS-2-STRUCT. When using them, the particular connector is returned.\n\nHowever, another list is built in parallel, which contains all the names of the aliases. With this list, it should be possible to do the following.\n\nWhen the SYS-2 initialisation function is called, it should be possible to call a function with the SYS-2 structure as its argument and the previously built list, and then execute\n{{{\n (setf (alias sys-2) (symbol-name alias))\n}}}\nThis is now done using eval, but this is a bad solution.\n\nThe result of the macro must be a single function, which takes a SYS-2-STRUCT as argument, and names the individual connections through their aliases.\n{{{\n(defun comment-connections (sys-2)\n (setf (alias sys-2) (symbol-name alias))\n (setf (alias sys-2) (symbol-name alias))\n (setf (alias sys-2) (symbol-name alias))\n (setf (alias sys-2) (symbol-name alias)))\n}}}\nor\n{{{\n(defun comment-connections (sys-2)\n (dolist (alias '(list with aliases))\n (setf (funcall (first alias) sys-2) (symbol-name (second alias)))))\n}}}\n\nHowever, in the second part, the aliases should be anonymous functions. This means that the list of aliases should be transformed into a list of lambda's and the aliases.\n{{{\n(a1 b2 c3) -> (\n (lambda (sys-2) (setf (a1 sys2) (symbol-name a1))\n (lambda (sys-2) (setf (b2 sys2) (symbol-name b2))\n (lambda (sys-2) (setf (c3 sys2) (symbol-name c3))\n)\n}}}\n
!Progress report\n!!Bugs\nFound an oversight in the design : it should be possible to set the contents of the ProgramCounter to 0.\n\nA possible solution is to shift the PC left, filling it with zeroes from the right, while the reset signal is high. Another one is doing the same after reset, until ZeroStatus is reached for the PC.\n\n!!MicroAssembler\nStarted work on the MicroAssembler. The main documentation is in a noweb file, from which a document and a source file can be extracted.\n\nDefining the control lines seems trivial. This can be done through a function which processes its arguments, which define the control lines.\n\nDefining [[MicroInstruction]]s is also easy, even after defining some rules as what can constitute in a MicroInstruction.\n\nCreating reusable MicroInstructions is also easy, because they can just be defined through another function. This way, they can even use parameters.\n\nHowever, common patterns of individual MicroInstruction assignments are difficult or maybe even impossible.\n\nHowever, is that really necessary ? The ultimate goal is to shrink the MicroInstruction word size. By defining and reusing complete [[MicroInstruction]]s, this goal can be achieved.\n\nSummary : do not search further for substitution of small patterns, but only implement complete [[MicroInstruction]]s or replace reused patterns with parametrised functions.\n!!Technology choices\n[Reference : www.interfacebus.com]\n\nIt seems that there have been some changes and breakthroughs in CMOS logic, so I might as well try to find a current CMOS manual in one of these ranges :\n* ACT\n* ACTQ\n* ACQ\n* FCT-AT\n* AC\n* ABT\n* BCT\nThat these technology choices are good enough, is proved by Dennis Kuschel's myCPU, which can be run at 8MHz. My feeling says that that is probably the maximum speed that is easy to obtain using SSI- and MSI-logic components.\n!!ALU implementation\nThis is also a challenge. None of the standard ALU's has shift capabilities. What's more, shifting is something that uses up much gates.\n\nHowever, it is possible to build a simple left-right shifter using 4-1 multiplexers.\n\nThe ALU would then change to have its output sent through the shifter. This is easy. Adding the shifter in parallel to the ALU would introduce more connections and problems with both outputs on the same\nbus.\n\nIt could also be possible that I have to create ALU's by using PALs or PLCs.\n!!Things still to do before creating the SYS-2 component\nThe following things still need to be implemented and tested :\n* Instruction splitter.\n* Conditional logic.\n* Zero status detection.\n* Jump ROM\nIt is better to do this first, and then to develop the MicroProgram and the SYS-2 component in parallel in order to test them.\n\nSeveral diagnostic [[MicroProgram]]s should be written in order to test the SYS-2 component. Using the emulator, it is easy to upload new MicroCode and do test runs of the system.\n\nIt should be possible to write a whole stack of diagnostic routines for each component of the system and their interactions.
!Defining components and test cases\n\nIt should be possible to have standard tests for the component library. This test is needed to ensure that every component used in the assembly of computer systems is correct.\n\nThe basic components to be tested are :\n* [[connector]]\n* [[bus]]\n* [[clock]]\n* [[single-shot]]\nThe [[connector]] and the [[bus]] are used by every [[Component]], and thus should be rigorously tested, and where possible, agressively optimised.\n\nThe [[clock]] and the [[single-shot]] are [[Component]]s which do not need input from other components, but which deliver output for other components. They are normally only activated once per SimulationStep, and since they are relatively simple, their system time is relatively small compared to the rest of the system.\n\nFor implementing the current design, the following components are next for reviewing and testing :\n* [[latch]]\n* [[multiplexer]]\nA [[counter]] is currently not needed anymore and can stay on its current implementation.\n\nWhat must now be taken into account, however, is the simulation of an [[ALU]]. Included in the [[ALU]], there must also be a mask to restrict the length of the output word.\n\nThe next component in the hierarchy is the [[memory]]. Memory is only restricted to [[word]]s. When necessary, it is probably possible to create byte addressed memory. Memory does not need a mask to restrict its output.\n\nThe component which glues everything together is the [[controller]]. This needs to be tested in conjunction with other components. The controller also needs other surrounding components for instruction decoding. The instruction register should be a special case of a [[latch]], but with outputs which are split of from the main instruction word. Part of this should be decoded from an instruction ROM, which really should be a simple lookup table or array, with an input [[connector]] and an output [[connector]].
!An answer on comp.lang.lisp\nYeah, I think so too, but...\n\nI have this project where I am designing and implementing a\nmicroprocessor. All design is verified through a simulation which I\nwrote in Common Lisp. Because of this, I can implement whatever\ninstruction set I want.\n\nBy having this possibility, I have already been thinking for quite\nsome months how feasible it would be to implement a C compiler or a\nLisp system on the ISA, and what things could be done to the\ninstruction set to help in this regard.\n\nThere can be much found in a whole lot of sources :\n\n- Computer Architecture : A Quantitative Approach (Patterson,\nHennessy)\n- Computer Organization and Design: The Hardware/Software Interface\n(Patterson, Hennessy)\n- Information about Lisp Machines (Wikipedia)\n- Information about the Master Control Program on the B5000, which was\nwritten in a high-level language\n- Movitz\n\nThinking about the issues, I see two problems :\n\n1. Spanning the gap between assembler and your not-C high-level\nlanguage.\n2. Implementing a core library which can be used to communicate\nbetween your high-level language and your processor.\n\nIf you read the first two books in the list, then you will understand\nwhy C (don't know about C++) is used for these tasks. There is a body\nof knowledge about porting and building optimising C compilers for\nmodern microprocessors that is huge. This does not exist for other\nlanguages. Which means that for point 1, any other language that you\nuse will not be able to take advantage of the things that C compilers\ncan take advantage (except for the GNU compilers maybe, I do not know,\nwhould it be possible to use GNAT to write an OS ?).\n\nThe core library of point 2) should be written in assembler, and\nshould only take into account the things needed for low-level access\nto the system and return structures which can be used and manipulated\nimmediately by the supposed HL language, or do things through the\nlibrary calls, which do not even have an equivalent in a HLL, like\nswitching between CPU access levels (user/supervisor).\n\nThink about this bootstrapping part. It is very difficult and\narchitecture dependent. Of course, what Linus could do with C, someone\nelse might do with another language. However, C was already well\nestablished, and if you know the structure of your object files, then\nit is relatively easy to manipulate them into a binary image, which\nyou can supply to your computer's boot loader and take control of the\nhardware. I think that this is the most difficult part in other\nlanguages, to be able to generate object files which do not have any\nother dependencies.\n\nIs it possible in CL to generate a new system with features removed or\nchanged like you can in FORTH ? I suppose that this should mean that\ncertain key sources would be written and interpreted in Common Lisp. I\nknow that SBCL e.g. has parts written in assembler. Would it not be\nbetter for the sake of portability if this where done in a platform\nneutral way, and write per architecture an assembler for this code in\nLisp ?\n\nI do not think I have answered your questions per se, but I hope I can\ngive you an insight into why things are now as they are, and some\npossible directions to look into. \n\n!!Follow-up : Movitz\nWhen, after this answer, I searched for Movitz, I discovered that they effectively do what I proposed here. The minimal library system between the bare metal and the CL implementation is called Muerte.
!Intermediate evaluation of reached design goals\nSince the journal of [[7 December 2006]] the main architecture definition is frozen. This has lead to the definition of the instruction set and the necessary hardware architecture, which consists of at most 10 datapath elements.\n\nA reconciliation between the data- and control paths must now be reached to start implementation of the microprogram for the current architecture. No interface and implementation changes are allowed from this point onwards. The goal is now definitely to have a working implementation of a self designed computer system, in which the following goals have to be implemented :\n* A micro assembler\n* An assembler\n* Visualisation\nIf these things can be done, then a working system has been reached, on which it should be possible to write and execute programs, and which can be controlled and visualised through a front-end on a host computer.\n\n\nThe step after that should be to define components for input and output, so that much more interactive usage of the system is possible.\n!!Missing components\nDerived from the schematic of the datapath, the following components are still missing :\n* 2-1 multiplexer\n* ALU\n* 3-state latch\n!!Micro-assembler theory\nA micro-assembler is a program which makes it possible to use symbolic instructions to define a MicroProgram.\n\nEvery MicroInstruction consists of fields which control all components. This means that a MicroInstruction can be constructed from fields which have the names of the different components from the datapath. These fields should be ordered in the way that the MicroInstruction should appear at the output of the MicroController.\n\nEvery field has some values, and every field should have an inactive value. A MicroInstruction can then be assembled by specifying the fields and their values, which are basically commands to selected components.\n\nThese lists of commands can be put together to form larger commands, but then it should probably be possible to substitute values from parameters.\n\nThe biggest problem in assembling different fields together is the fact that several things can be executed at the same time. At that point, how is a micro-instruction the defined ? This is possibly a case of starting to write the MicroProgram, and then detecting the patterns which pop up, and summarise these into meaningful MicroInstructions.
!Annotations to CPU2.lisp\nThese annotations are ways of organising code to make a more data-driven development of the system possible.\n\nThe first part of this data-driven development is the *alias-database*, which compiles abbreviation functions for all interconnections. However, it currently lacks the possibility to create abbreviations for the controller address inputs and the controller pipeline register outputs.\n!!Annotation 1 : let the database support all components\nIn both cases, it is about using {{{aref}}} to a structure part of the component. The only thing different from a normal alias is that an index is supplied.\n\nThis means that the *alias-database* could be set up using elements with length 3 or 4, in which case the latter has an extra index, and the alias must be compiled with {{{aref}}}.\n!!Annotation 2 : Let the program create the connections\nThis should also be done using a database which enumerates all the connections via the created aliases.\n\nThere is one catch to it : buses have a different way of connecting. The output works the same way as other components, but inputs are connected through the function bus-add-output.\n\nConnecting an input and an output together consists of calling the respective aliases on the system data structure and supplying them to the connect function.\n\nConnecting an input or an output to the bus consists of supplying the object to the function bus-add-output, which then adds it to its internal array and creates the connections internally. Maybe two connect functions should be created, one for connecting real connectors and one for adding a connector to a bus.\n{{{\n(defun connect (c1 c2) ())\n(defun connect (c1 b1) ())\n}}}\n!!Annotation 3 : Let the program create the system structure\nThe structure created to hold the system architecture is not difficult. Instead of creating it manually, it should also be possible to create it through a macro by enumerating the components in a list. This would also make it possible to automatically have every component update its comment field (through the :comment keyword when making all the individual structures).\n!!Summary of data-driven development\nThe idea has its merits, and since it has already been implemented partially, it certainly can work.\n\nStarting from an existing library with components, it should be possible to create a system in the following way.\n{{{\n(defmacro create-system-structure (name list)\n (#| code here |#))\n\n(defvar *component-table*\n '((component-name component-type)\n (component-name component-type)))\n\n(create-system-structure name *component-table*)\n\n;; This should expand to\n\n(defstruct name\n (component-name\n (make-component-type :comment "COMPONENT-NAME"))\n (component-name\n (make-component-type :comment "COMPONENT-NAME")))\n}}}\n\nAfter that, aliases should be created. In the current version, the alias table should be composed of three columns, the component-name, the alias name and the accessor for the connection on the component.\n\nBy using the *component-table*, the compilation of the aliases can be checked against the component names, while the aliases themselves are checked against the accessors of the components.\n\nThis step will compile functions which can be used as abbreviations to access component connections in the previously defined structure.\n\nThe third step is to specify the interconnections between the components. This will be done in a third table where the interconnections should be specified as pairs of aliases (remark : find a solution for buses). The contents of this table can be checked against the first and second tables. Processing of the third table should yield a function which can be used to initialise the previously defined structure.
!A simpler and faster microcontroller\nAfter spending some days studying the optimisations that can be done to a microcontroller, I think I have finally nailed down the requirements for a new implementation.\n\nIt is much simpler and only contains the really necessary code. Parts that where previously optional have been delegated to outside the component, and should be implemented themselves as components.\n\nThe most important part outside the MicroController now is the external logic, which should deliver a signal between 0 and 3 to select the address source.\n\nIn the current architecture, there should be only one status signal, to signify that a result is 0.\n\nThe control of the external logic would then be governed also with two bits with the following meaning :\n\n* 0 : output 0 to select the MicroProgramCounter.\n* 1 : output 1 to select the external jump address input after instruction decoding.\n* 2 : If the logic input is 1, output 2 to select the conditional jump address, otherwise output 0 for the MicroProgramCounter.\n\nThe value 3 will currently not be used, but it is possible to extend the MicroController with another jump address. However, this will only be considered in the future, after careful measurements.
!Design of a software clock simulator\n!!Goal\nWhen creating a new test harness for the [[clock]] component,it was noticed that the current implementation was not completely correct when implementing signals with asymmetric duty-cycles. This journal entry tries to remedy this by looking at the clock parameters and deriving a new way for implementing a periodic pulse train.\n!!Clock features\nThe purpose is that the clock is able to generate periodic pulses between 0% and 100% duty-cycle.\n\nThe parameters of the clock which define this are the clock period and the duty-cycle.\n\nThe clock period is the time needed to move from the beginning of pulse N, to the beginning of pulse N+1.\n\nThe duty-cycle is the number of clock pulses that are high, versus the period. This can be translated into the sum of the time the clock is high and the time the clock is low.\n\nNormally, the duty-cycle would be passed as an floating value between 0 and 1. We are, however, working with discrete values, and our clock can be as fast as one pulse high, followed by one pulse low.\n\nTherefore, the duty-cycle will be expressed as the number of [[SimulationStep]]s the clock is high.\n!!Clock setup\nThe most general way to set up the clock is to specify the counts of its high cycle and its low cycle. Its period is then the sum of these numbers.\n\nFrom this, a method must be derived for the system to drive the clock output the appropriate number of times high, and the appropriate number of times low.\n!!Clock functionality\nMathematically, the output of the counter could be expressed as :\n{{{\nf(x) = 1 | x = [1, 2, 3, ... M]\nf(x) = 0 | x = [M + 1, M + 2, M + 3, ... M + N]\n}}}\nIn this case, a counter should be incremented and its value compared like {{{v >= 1 and v <= M}}}, and also as {{{ v > M and v <= M+N}}}.\n\nThis can, however, be simplified by only testing using the second clause. If the counter falls outside, then f(x) is high, otherwise it is low.\n\nIf the clock is initialised, then the counter can be set to one.\n\nWhen computing, first the value of the output will be decided, using the previous comparison.\n\nAfter that, the counter will be incremented. If the value of the counter exceeds M+N, then it is reset again to 1.\n\nHowever, this can be turned around. As long as the counter is smaller or equal to M, the output is 1, else it is 0. The update of the counter still needs to be done after this.
!Minimal changes to current implementation\nThe current system was designed to use part of the instruction for commanding the [[ALU]]. When the MicroProgram was written however, it was found that the current implementation could do without this feature, and thus save some real estate.\n\nDue to this, three bits in the instruction can be spared, which means that only 5 bits are needed for instructions. This leaves seven bits usable as data or address.\n\nHow can this be exploited to its fullest ?\n\nThe first thing that meaningfully can be done, is to make sure that the output of the DB-IN register can be masked out, so that the 7 bits available from the IR (both should be loaded at the same time) can be used as input to MAR, to PC or DR, or as input to the B input of the ALU.\n\nWhat will happen of course, is that either extra addressing modes are added, which increases the number of instructions and reduces the number of available bits to 6, or addressing modes are changed and only use short addresses. Increasing addressing even with one extra mode, makes it necessary to go to 6 bits.
!BranchIfZero\nStill working on the ZeroStatusHandling and BranchIfZero issues.\n\nThe last result was how to check for BranchIfZero. The problem is that when doing a computation, the result can be zero. If there is only one register available, this result can be tied to the register. However, if there is a register file available, then there should be an explicit check for zero.\n\nChecking for zero is a function of a NOR gate, which ands together all inputs on the bus and delivers a signal Zero. This is normally done at the output of the ALU, but in our case the goal is to separate the computation from the test. This means that it is not necessary that the output from the ALU is tested, but that we can test the contents of the desired register. We can do this by reading the register(file), which is much faster than passing the register through the ALU. The ZeroStatus is then very quickly available to the conditional jump logic, which means the test can select the next MicroInstruction in the current MicroInstructionCycle.\n\nIf we then constrain the system to only having the possibility to do a MicroInstruction BranchIfZero, this means that in case the output is Zero, we need to branch, otherwise we just continue.\n\nThe MicroInstruction layout which is needed now is the one to implement the InstructionSet BranchIfZero. This instruction tests the selected register for ZeroStatus, and if it is Zero, uses the contents of the next memory cell for loading into the PC, otherwise the PC is incremented and the next instruction is fetched.\n# Move PC to MAR; PC <- PC + 1 ;; PC points at jump addres\n# Output from memory is latched onto databus and into IR\n# Next MicroInstruction address is selected from decoding IR\n# Selected register is tested for Zero Status (select uPC(NotZero) or JumpField(Zero))\n# PC <- PC + 1; Jump to instruction fetch (branch not taken : 5 cycles)\n# Move PC to MAR (branch taken)\n# Output from memory is latched onto databus (branch taken : 6 cycles)\n# Databus -> MAR; Databus + 1 -> PC; Jump to instruction latch\nIf we lay this out using a branch ifnot zero, what do we get then ?\n# Move PC to MAR; PC <- PC + 1 ;; PC points at jump addres\n# Output from memory is latched onto databus and into IR\n# Next MicroInstruction address is selected from decoding IR\n# Selected register is tested for Zero Status (select uPC(Zero) or JumpField(NotZero))\n# Move PC to MAR (branch taken)\n# Output from memory is latched onto databus (branch taken : 6 cycles)\n# Databus -> MAR; Databus + 1 -> PC; Jump to instruction latch\n# PC <- PC + 1; Jump to instruction fetch (5 cycles)\nThis does not really make a difference, which means that it is safe to restrict the MicroInstructionSet to only use BranchIfZero.
It's been a while since I last wrote a journal. In the meantime I have succeeded in writing a micro assembler and running the simulation of SYS-2.\n\nA consequence of the simulation is that it pointed out that the way the memory was controlled was wrong.\n\nThe first implementation made the address and the control lines active at the same time, but due to the delays in the circuit to the memory address register, the memory already got written to before the real address was available.\n\nThis has the consequence that the way the memory is addressed has to be rethought.\n\nThe first way is a regrouping of the necessary steps, by using shorter clock periods. The first part of the cycle starts with the rising edge, which puts out the control signals.\n\nThis\n# enables the PC output (tEN)\n# selects the PC output (tDEL)\n# enables the MAR (tSETUP+tDEL+tHOLD)\nIn the same time the signal must go through\n# op select (tDEL)\n# ALU (tDEL)\n# setup for registry (tSETUP + tHOLD)\nIn the second phase, on the rising edge the memory enable signal is activated. If read stays low, then the memory is read, and the data must be available before the falling edge of the clock.\nIf the memory must be written, then the data comes from one of the registers, and must pass through the DO latch. The enable signal stays active until the next clock cycle.
!Compile time system structure code\n!!Introduction\nThe current way of working for creating a usable SYS-2 library should be streamlined.\n\nCurrently there is a SYS-2 structure definition available, but its initialisation needs some extra code, which is code that can be reused from another module.\n\nThis streamlining process should be put up and hold also for future developments.\n!!Structure code\nSetting up a new structure should be easy and always be done in the same way.\n#Import the component module\n#Define the system structure using standard components\n#Define a list of aliases for connections\n#Create a connection database\n#Compile the aliases into functions\n#Compile the alias database and the connection database into a function taking a system structure, and initialises it by\n##Commenting all the connectors through aliases.\n##Connecting all the specified connections with each other.\nIt is then up to the programmer to define additional functions for the needed functionality.\n\nCan this be specified through a complete external template ?\n{{{\n(load #P"file")\n\n(use-package "components")\n(use-package "system")\n\n(defstruct system-structure\n ()\n ())\n\n(defvar *alias-database*)\n(defvar *connection-database*)\n\n(system:setup-initialisation\n *alias-database*\n *connection-database*\n initialisation-function-name)\n\n(defun system-structure-new ()\n (let\n ((rv (make-system-structure)))\n (initialisation-function-name rv)\n (values rv)))\n\n(defun system-structure-compute ()\n )\n\n(defun system-structure-propagate ()\n )\n\n(defun system-structure-cycle ()\n )\n}}}
!Where to go from first successful test case ?\nAs a first test of the system, after modification of the ROM generator program, the HALT program was a success.\n\nThe next step should be to write test sequences, first with simple instructions, later progressively increasing in complexity. Each test sequence should deliver results in memory locations, which can then be tested after the simulation ends.\n\nIn the meantime, designing and building an assembler will also make these things somewhat easier. This is a project for tool_R0.4.
!Optimising branching\nThis is the previous way the conditional branch has been laid out.\n# Move PC to MAR; PC <- PC + 1 ;; PC points at jump addres\n# Output from memory is latched onto databus and into IR\n# Next MicroInstruction address is selected from decoding IR\n# Selected register is tested for Zero Status (select uPC(NotZero) or JumpField(Zero))\n# PC <- PC + 1; Jump to instruction fetch (branch not taken : 5 cycles)\n# Move PC to MAR (branch taken)\n# Output from memory is latched onto databus (branch taken : 6 cycles)\n# Databus -> MAR; Databus + 1 -> PC; Jump to instruction latch\nBy overlapping things in the system, it seems possible to overlap parts, especially if the address select mechanism in the system can be laid out like in the microcontroller, ie. by using a multiplexer to load the PC from alternative sources.\n\nHow can the previous MicroInstruction sequence be arranged like this ?\n# Move PC to MAR; PC <- PC + 1 ;; PC points at jump addres\n# Output from memory is latched onto databus and into IR\n# Next MicroInstruction address is selected from decoding IR; PC -> MAR; PC + 1 -> PC;\n# Selected register is tested for Zero Status (select uPC(NotZero) or JumpField(Zero)); Latch DB\nAt this point, the data available on the databus is the jump address, and the ProgramCounter contains the address after the jump instruction. If the address loading mechanism is adapted, then in the following cycle either the current program counter is selected, or the jump address is selected. In both cases the selected address is incremented by 1 and loaded into the ProgramCounter. The next instruction can be fetched. This means that the BranchIfZero can always be executed in four cycles.\n\n[img[Simple Dataprocessor|datapath_3_mux_1.png][datapath_3_mux_1.png]]\n\nIncluding the multiplexer at the MemoryAddressRegister does not really add very much to our architecture, but makes it possible to execute overlapping instructions.\n\nWhen executing normal instructions, this means that we can have an overlap every time of instruction fetch (2 [[ClockCycle]]s), and InstructionDecoding and InstructionExecution.\n\nWhen taking interrupts into account, the biggest problem with them is that they are asynchronous. Since our system is now effectively pipelined, interrupts pose some more problems. In the previous, unpipelined case, it is relatively simple to sample the interrupt at the end of the instruction and select a branch instruction if there is an interrupt.\n\nThe problem in the overlapping case is that the MicroProgram is rigid. If parallel fetch and execution is encoded everywhere, it is difficult to take interrupts into account.\n\nAt the same time there is either an address out and an instruction decode going on, or an instruction fetch and an instruction execute.\n\nIn the first case, PC - 1 is decoded, PC is sent to the addressbus, and PC is also incremented.\n\nIn the second case, the instruction at PC - 1 is executed, PC contains PC + 1, and the instruction at PC is being fetched.\n\nThe biggest problem in the interrupt is saving the current program counter somewhere and jumping to the address of the interrupt handler.\n\nSo what is the best place to make the system react to an interrupt ? An interrupt can be likened to a conditional branch, but the state at the moment the interrupt is serviced should be restorable. This means first and foremost the value of the program counter. The rest can be done by the interrupt handler itself.\n\nThe best state then to react to an interrupt is in the first case, because no work is being done and the system is in a known state. When the interrupt is handled, this can be used conditionally to disable the incrementing of the program counter. The only thing that is needed then, is to jump to the MicroProgram address which saves the program counter and executes the interrupt handler.\n\nIn the second case, the interrupt is sampled.\n\nIn the first case, PC - 1 is decoded, but PC is not sent to the addressbus and not incremented.\n\nIn the second case, the instruction at PC - 1 is executed, PC stays PC and no instruction is being fetched. The next microinstruction should be the that of the InterruptHandling. That part will define how interrupts can be handled, but [[Interrupts]] at present will not be elaborated further, because the current goal is to keep the architecture still as simple as possible. Currently adding interrupts will add too much complexity at once.\n\n
!Summary of r2 changes.\nIt makes no sense to have point versions of the cpu development process. When the current version is completed, it will be [[r1]]. [[r2]] will be an adaptation of r1, mainly in the hardware department, not in the functionality department. Before adding an interface to a keyboard and a screen, this should be settled.\n\n!!Birdseye view\n* Remove JMP-SRC-2\n* Remove JUMP-2\n* Remove OP-SELECT component\n* Remove OP-SELECT control line\n* Change MAR from latch to register\n* Consider removing AD-SELECT and using 3-STATE register for DB-IN\n* Redefine conditions\n* Replace field splitter by 5-bit mask\n* Create logic to generate write pulse from MEMORY-ENABLE and MEMORY-RW\n* Update the microprogram for pipelined execution with MAR\n\n!!Elaboration\n!!!Remove JMP-SRC-2\nPractice shows that the design does not need an extra address. The ability to choose between PC and another address, and inverting the branching test is enough. Even inverting the branching test can be removed, at the expense of extra micro-code.\n!!!Remove JUMP-2\nThis address is not needed anymore. See the previous point.\n!!!Remove OP-SELECT component and control line\nThere is no selection done between instructions from the micro-program and the instruction register.\n!!!Consider removing AD-SELECT and using 3-STATE register for DB-IN\nSince DB-IN has the same targets as the registers, it should be possible to interconnect them without the multiplexer which is now used. This saves again in micro-instruction space and real estate.\n!!!Change MAR from latch to register\nThe basic operation for moving addresses to the MAR has been to do this in the first cycle, and only to activate the memory in the second cycle. Replacing the latch with a register does not change this fact, but it makes it possible to generate a new address for MAR while another memory cycle is busy.\n!!!Redefine conditions\nThe following conditions are needed :\n* Select the program counter\n* Select the ROM decoder\n* Select the supplied address\n* If zero, branch to the supplied address, else branch PC\nSince there is no BRNZ operation, it is not necessary to add a test at this time.\n!!!Replace field splitter by 5-bit mask\nPractice shows that encoding the ALU operations in the instruction is not practical, or that it needs a more elaborate way of dealing with it. Since it is not used, the splitter can be removed and replaced by a mask system.\n!!!Create logic to generate write pulse from MEMORY-ENABLE and MEMORY-RW\nSince the memory address and the control lines come at the same time active, there must be some circuitry which is able to generate a write pulse from the memory control lines. This is necessary to isolate one write cycle from another.\n!!!Update the microprogram for pipelined execution with MAR\nThe point here is that by using a register instead of a latch, it is possible to have more overlap between external memory fetches and internal housekeeping.\n\nThe reason is that currently a memory access now takes up two cycles, which takes 12 steps. If it is possible to have a memory access each cycle, which might take a little longer, then it is e.g. possible to have a memory access each 8 steps. This is a net speedup of 33%.\n\nIn practice this might lead to a lower clock cycle, but a faster system.
!Second journal\n!Problem : overlap between previous microinstruction and rising clock.\n\nWhen testing the instruction {{{LD DR, constant}}}, I found out that its execution was wrong, because the microinstruction was not changed yet, while the clock was already active. This lead to loading the wrong value in the MAR.\n\nHow can a solution be implemented to this problem ? At the moment the clock rises, a rising pulse is generated. However, at this moment the microinstruction is still the same.\n\nThe weird thing is that this error is only on MAR.\n\nCurrently, all signals are generated in the clock system. This generates a delay between the rising edge and the next micro-instruction. If the edge detection would be in the latch/registers itself, would this make a difference ?
!Analysis of how to add interrupts to SYS-2\nAs of this design phase, interrupts are not yet part of the current architecture. First some simple peripherals should be added, but these will be handled synchronously by the initial program.\n\nHowever, it is interesting to think about interrupts now, so that problems beforehand can be handled.\n\nAn interrupt is an asynchronous signal which is used to divert the current program flow to handle some external events. After this interrupt has been handled, the computer should return to into its original program flow.\n\nBecause of the possibility of doing some operations in SYS-2 in parallel with other things, instruction fetch will overlap in the MicroInstruction with other instructions.\n\nIn SYS-2, the order of execution is FETCH/DECODE. This means that the DECODE stage should sample the interrupt line, in order to change the control flow of the MicroProgram. The normal control flow would be that in the next cycle the PC is again updated. However, when an interrupt is pending, the PC should not be touched, the current MicroInstruction should be finished and the MicroProgram should then transfer to interrupt handling.\n\nIn the interrupt handling, the PC should be saved and it should be possible to execute an interrupt handler.\n\nThe problem with the interrupt is that it forces a split into something that is rigidly defined. When an interrupt has been sampled, the control of the instruction fetch should be disentangled from the currently executing instruction.\n\nThis forces the system into a configuration of three possibilities :\n* Abandon the overlapped execution of IF/EX.\n* Create separate instruction fetch logic.\n* Augment the controller with immediate outputs.\nThe first solution has the advantage of simplicity, at the cost of speed. The second solution will add more complexity, and was in fact already rejected from SYS-1. This means that the third option will be chosen, where the output from the MicroController is sent through additional logic, in order to block [[MicroInstruction]]s, depending on the current state of the CPU. It is probably also necessary to shrink the size of the MicroCode store to compensate for the extra needed logic.\n\nOf course, this means once again that work has to start on the MicroProgram, in order to know what instructions are necessary and to see if the program is really comprised of only a small subset of those.\n
!Loading and storing data\nThis is an important aspect which has to be inventarised before anything can be implemented.\n\nFor loading and storing data, a source and a destination are needed.\n\nWhen starting out, it is necessary to be able to load the DataRegister with a value from memory. This can be expressed as LD D,[MEM].\n\nAfter an operation has been done, the value of the DataRegister must be stored into memory. This can be expressed as ST D,[MEM].\n\nIs it necessary to have other load and store operations ?\n\nAn interesting one is to use the DataRegister itself as source of an address, and load it with the value it points to itself. This can be expressed as LD D,[D]. The inverse operation, however, is meaningless.
!Roadmap planning\nAfter setting up the basic SYS-2 structure, there is still some preparation work to be done before real implementation of the program can be done.\n\n#Review aliases for component connections.\n#Review component interconnections.\n#Create export list for aliases.\n#Implement and export compute and propagate functions.\n
!Status of affairs\n\nToday, I want to try to make a birds eye overview of what I have already realised in terms of design and implementation.\n\nIn the last week I have been busy analysing the control flow of the MicroProgram to see how (conditional) jumps can be executed when the instruction stream is processed in overlapping parts. The biggest drive in this was the fact that there are no flags in the system and the only condition implemented was BranchIfZero, which could easily be implemented without assistance of the ArithmeticLogicUnit.\n\nBy using an address selection mechanism and feedback like it is implemented in the OptimizedMicroController, it has also become possible to double the throughput of the system with only a little additional hardware. What will probably be required is some additional hardware to control the instruction fetch phases when jumps are initiated.\n\nWhen translating this to the simulation, an effective speedup of almost 100% is created, with only slightly additional overhead in the necessary components of the system.\n\nWork has also been done on the interrupt mechanism, but this will probably require other additional hardware, so this is postponed until the first working version of the current system is working.\n\nWorking in this case means having the following features :\n* A working simulation.\n* A complete MicroProgram.\n* A way to handle peripheral devices.\n* An input device.\n* An output device.\n* An assembler.\n* A higher level programming language.\n* An interactive visualisation environment.\n* Ability to boot a selected program.\nA plus on selected features would be that a small system program becomes available which has control over the input and output device, and can load external programs. For this there will possibly be the need for a third mass storage peripheral or an emulation of this through the visualisation environment.\n\nThrough use of the assembler and the higher level programming language, it should be possible to decide how the instruction set can be enhanced by measuring what instruction patterns do occur and then implement the instructions which really enhance the throughput, but with the least impact on the architecture.\n\nAnother thing to analyse are the branch distances. The current implementation will always use an extra external word after the instruction to specify a jump target. Maybe it is better to have branch instructions which have this target or distance embedded in them. However, BranchTargeting proves that this is no advantage in speed. It is an advantage in resources, a program takes less space.\n
!Missing pieces\nThere are still some pieces missing in the complete picture of the system.\n\nThe following parts of the instruction set are defined :\n# Arithmetic and logic instructions\n# BranchIfZero for conditional execution\n# Absolute branching\n# Load and store instructions\nWith these instructions, I have sequential execution, conditional execution and iteration.\n\nWhat is still needed is being able to execute subroutines. Is it possible to also emulate this in software ?\n!! Subroutine story\nAt a certain point in the instruction stream, a subroutine must be executed. This means a branch address will be taken, but it should be possible to return to the instruction stream after the point where the branch was taken.\n\nFor this it should be possible to save the ProgramCounter somewhere and then take the branch. At the end of the subroutine, the value of the ProgramCounter should be retrieved, an offset added and then moved to the ProgramCounter again.\n\nThis system implies that a jump to subroutine and a return from subroutine are created by inserting the appropriate instructions, which can be taken from the above instruction set.
!ALU requirements review\nThis journal will just review the ALU that must be built for this project, more precisely its necessary functions and its behaviour.\n!!Function overview\n* NOT\n* Left shift\n* Right shift\n* Increment\n* Decrement\n* AND\n* OR\n* ADD\nHowever, these operations are based upon an architecture which contained an internal value selection mechanism, which made it possible to use the OR or ADD operations to pass data unchanged through the ALU to the registers.\n\nThe NotOperation is necessary to to have the complete logic functionality together with the AndOperation and the OrOperation.\n\nThe AddOperation is also necessary for arithmetic and does not need to be complemented with a subtraction operation.\n\nThen there are the shift operations and the inc/decrement operations.\n\nThe shift operations are necessary for doing multiplication and division.\n\nThe increment operation is necessary because of the ProgramCounter. Would this have a separate circuit, then the increment operation could be removed from the ALU. However, the decrement operation is not strictly necessary for fundamental operations. This can be removed and an ALU can be constructed with the following operations.\n# NOT A\n# Left shift A\n# Right shift A\n# Increment A\n# AND A,B\n# OR A,B\n# ADD A,B\n# PASS B\nThis makes it possible to construct an ALU with only three control lines.
!CPU-2 benchmark results\n\nI have added the file speed.lisp, which is derived from the current system, and which runs it, depending upon the system, 1000 (CLISP), or 125000 (SBCL) times.\n\nYes, that is right. I can test the system at 125kHz, which is more than adequate to do real-time simulations using I/O and interrupts.\n\n!!Followup\nBy proper byte compiling of all source files, and apparently through the use of compile-when (I think), I was able to obtain an 80-fold speedup. Execution time dropped from 25s to 0.313s.\n\nThis means that the effective clock speed of a CLISP based simulation is now 3200 Hz. This opens up more prospects for interactive simulation with an input and output peripheral.\n\nIn this context, SBCL is 40 times faster than CLISP.
!Additional functions for the MicroAssembler\nTo provide more functionality, the MicroAssembler should return more values.\n\nThese should be :\n# The assembled program\n# A table with the constants\n# A table with the label values\nThe tables are for extra control, but also to be used to generate a jump ROM. This ROM should be computed by creating a list of indexes, together with references to the necessary labels.
!Progress: Instruction set test coverage\n|Instruction type|Number|Finished|\n|LD mode|3|Y|\n|ST PC or DR, mode|4|Y|\n|BRA mode|3|Y|\n|BRZ mode|3|Y|\n|NOT, INC, SL, SR DR|4|Y|\n|ADD, AND, OR DR,mode|Y|N|\n|HALT|1|Y|\n|Total instructions :|25||\n|Test coverage :|60%|
!Accomplishments\nI started this project in August 2006. I am now going into its ninth month. The next phase of the project is to implement the already defined SYS-2-InstructionSet. This is necessary, because the current design is stable, and the implementation must be done before moving to the next level, which is finding out what minimal changes can be implemented to make the system better.\n\nHowever, what goals have I set myself and which have I accomplished ?\n\n#Use this project as a way to learn Common Lisp.\n#Learn about computer architecture and implementation.\n#Design, analyse and document before implementing.\n#Stay on course : try to not change requirements, even if there is a problem.\n#Write a digital component simulation library.\n\n!!Learn Common Lisp\nFirst, choosing Common Lisp was good for the project. Part of it was learning CL, which I did, and also to use it as completely as possible, which I did too.\n\nI am especially about the implementation of the macro system to ease the interconnection of standard components. Using CL made it also easy to do everything that was needed in CL or a variant thereof. The microassembler was implemented in CL, but has itself a Lisp syntax. This means that the same can be done for implementing the assembler.\n\nThe complete exercise has also given me more insight into functional programming, which I have deployed into my Perl programming at work.\n\n!!Learn about computer architecture and implementation\nAfter being for 25 years fascinated by microprocessors and digital electronics, having started, but never completed, several computer simulations, I decided to really go for it and to concentrate me, in my spare time, only on the design of a computer.\n\nThe way I decided to do this, was to write a library of simulated digital components, effectively SSI and MSI components, to be able to design a computer from standard logic components. This library is somewhat optimised in that it does not completely simulate existing components, but ideal implementations of them.\n\nI decided to build this library myself because existing simulation where either too fine-grained (e.g. wire for wire by diglog), or too course grained (no possibility to create an interconnected layout).\n\nThe way the library is setup, is that a standard interconnection can simulate a bus from 1 bit to 32 bits wide. This way components can be created which process whole words, but which also have single wire controls.\n\nThe goals I have reached have certainly given me the opportunity to learn more about implementing a controller and a data path, while a working simulation and the ability to reason about it, have also given me more insight into pipelining, a feature which will certainly be used when this (or a related system) will be implemented for real.\n\n!!Design, analyse and document before implementing\nThis was important, because I knew what I was going to do was complex. It made no sense to start programming (except for small proofs of concept), unless I knew beforehand where I would end up. Thus, this was an exercise in requirements, analysis and planning.\n\n!!Stay on course : try to not change requirements, even if there is a problem\nWith the previous point, this was an important part of discipline, to not change requirements or meanings, in order to reach beforehand set milestones. E.g. I had to wait with starting the analysis and design of an architecture until I had several key components needed for that.\n\nAlso, once I had met the minimum requirements for a working computer system, I knew I had to keep it at that, and start implementing it. Doing the implementation at that stage was more important than optimising. Optimisation needs to be done by profiling.\n\n!!Write a digital component simulation library\nImplementing the components has taken some time, but specifying and analysing the requirements and sticking to it beforehand proved positive.\n\nBut, you say, why reinvent the wheel ? Aren't there any other simulation libraries and programs available, even with graphic visualisation ?\n\nYes, of course, but do not forget that this project was to learn, and learn in several aspects. If it would have been only about computer architecture, I could have settled with a simulation library or program (like diglog), but I wanted to have practical experience from top to bottom. Also, writing the simulation myself gave me insight in the simulation process, and having the sources available in Common Lisp gave me first, a portability advantage, I can run the simulation on different platforms using different CL implementations, and second, the possibility to optimise the code and run it through the SBCL or CMUCL compilers.\n\nAnother part of writing the library was to obtain experience with unit testing. In this case, the requirements of each component were laid out and a test harness was specified before implementation started. Then, after each component implementation, it was possible to validate the component and the test harness. At the end this gave a complete library test function, which makes it possible to validate the functionality of the components after changes have been committed.
!Goals I haven't reached yet\nThe previous journal, [[5 May 2007]], gave an overview of some important goals I have met in the past nine months.\n\nThis journal will give an overview of goals I haven't reached yet in the current design phase, and other things I would like to reach.\n\n!!I/O\n!!!Desired kinds of I/O\n!!!!Keyboard input\nOne way to interact with the system is by means of a keyboard. This is an input device which only delivers input. To interact with it, the keyboard should have the following functions :\n* When a character is ready, a flag should be raised.\n* When a character is read, the system should be able to signal this to the keyboard.\nBecause currently a 12-bit architecture is used, the keyboard should not need more than one register, which can be used bidirectionally. The keyboard can deliver an 8-bit value, and there are 4 bits left which can be used for handshaking.\n!!!!Displayed or printed output\nIt is always nice to have feedback from the system. This can be done through a printer or a character display.\n\nA printer merely serves as an output, while a character display, depending upon the implementation, can also act partially as an input device by reading the characters at certain positions from the screen.\n\nHowever, let's take the simple route, and define a screen as something which the system merely writes data to, using a handshake protocol in the same way as the keyboard. This means that the display also only needs on 12-bit register with which to communicate.\n\nHowever, certain control characters can be used to control the display.\n!!!!Mass storage I/O\nMass storage makes the system independent from the host computer.\n\nWhile in a first phase the host computer is used to perform all kinds of processing and storage, it should be possible to define external storage such that the system is able to boot from its own storage definition, and also to read and store external data by itself.\n\nThe simplest kind of store to emulate is serial (paper tape, punched cards, magnetic tape). To be able to use this fully, it should be possible to define more than one device for storage and retrieval.\n\nThe next kind of storage is block device storage. The problem with this is that access to it is more complex.\n!!!I/O architecture\nThere are two ways to do I/O, the first using memory addresses, the second using separate I/O addresses.\n\nIn the first version, program space is lost. This can be minimised by using external circuitry which activates I/O and suspends memory. The I/O memory can be manipulated using all standard instructions. No changes to the architecture are needed.\n\nIn the second version, no program space is lost, but input and output instructions must be created and an extra control line is needed to inform memory and I/O that an I/O operation is busy.\n\nFrom the standpoint of the current design, the first system is better because it incurs no changes to it. Only a definition of some external circuitry is needed. This has a second advantage, that it is possible to profile I/O operations to see which instructions are used in I/O.\n!!!Interrupts\nWhatever I/O strategy is implemented, sooner or later interrupts are necessary for servicing I/O. This must be thought out deeply, in order to have a positive effect on the system performance.\n!!Programming\n!!!Assembler\nThis is really important. Building an assembler should enhance productivity and form the basis of enhanced system programming.\n\nThe assembler should take in input file, and deliver a binary file which can be loaded into the system's main memory for execution.\n\nThe assembly code should be defined and written as s-expressions.\n!!!More registers\nCurrently there are only two registers, the DataRegister and the ProgramCounter. While it is entirely possible to do all that is necessary, this puts a great burden on the memory, because instructions are interleaved with data to and from memory. Adding more registers increases the number of instructions that can be supplied to the processor.\n\nIn the current design, however, there is a trade-off. It is possible to replace the ProgramCounter and the DataRegister with 2 register files of 4x4. The trade-off is that an extra intermediate register is needed for doing operations between two registers. This means that such operations take two clock cycles. This register should not be accessible through the InstructionSetArchitecture, which means that component count increases with 1 IC.\n\nDoing the same between two memory places takes two cycles to fetch the first datum, two cycles to get the second datum and perform the operation, two cycles to store the result. This can even be worse. This means that having extra registers increases the processing speed.\n\nThis exercise must be made in order to see what the reached speed up is. This exercise should even be made using 4 register files, in order to do the same tests with 16 registers. However, for this really to be appreciated, the same code should be generated through a compiler with different backends.\n!!!Porting a C compiler\nSince the instruction set can be adapted to the target language, it should be possible to build a compiler which needs almost no adaptation for emitting efficient code. This has the advantage of having a big body of code for reuse. There are two possible candidates for this, LCC or GCC.\n\nOne goal of this project would be to be able to compile code and libraries without needing any form of assembler. Is this possible ? A C compiler is normally delivered as a system consisting of binaries, header files, libraries and even object files.\n\nThere is, however, an impedance mismatch between the C virtual machine, which refers to subroutines and variables, and the underlying processor architecture, which can only reference addresses and registers.\n\nThe problem is that the C virtual machine needs some functionality from the underlying platform, which can always be written in machine language. Adapting the C compiler to the underlying platform is much more difficult.\n\nWhat could be done is taking into account in the design of the processor that C will be used as its system language. Even then, there are some things which are difficult to reach through the C compiler immediately. For access to the low level instruction architecture, adapters are created which mediate between the C language and the underlying platform.\n\nThe problem lies in the abstract concepts that are embodied in the C language. C represents all variables as being in main memory, with their own name, but does not permit direct access to all registers in the instruction set architecture, except through subroutines which can be invoked through C, but which have all the overhead of a C subroutine call.\n\nIt is e.g. not possible to define something like :\n{{{\n DataRegister += 1;\n}}}\nwhich then translates immediately to\n{{{\n ADD DR,1\n}}}\nThis would mean that it should be possible to change the syntax rules of the compiler to accomodate such constructions (which is possible in (Common) Lisp).\n!!!Lisp on the bare metal\nInstead of using C as a systems programming language, is it possible to use a Lisp (CL or Scheme) as a systems and application programming language on the bare metal ?\n\nOne thing that should certainly be possible, is to write an assembler with Lisp syntax. This means that it is easier to add an assembler to an existing Lisp implementation.\n\nIs it thus possible to completely bootstrap a usable Lisp system, with underlying OS and drivers ? What features would be needed to do this ?\n\nProbably first, an existing Lisp implementation should be used to create an assembler. This makes it possible to generate code for the system.\n!!Memory management\nSince MemoryManagement is missing, but not really on the roadmap yet, I have devoted a tiddler to it.\n!!Implementation\n!!!Pipelining or not\nI have observed that different timing schemes are possible in the implementation, based on the way registers or latches are defined. More registers means more pipeline stages, more latches means more or less, longer but less clock cycles.\n\nOne exercise would be to create three different constructions, but only in the way latches or registers are defined, and to see which one of the three can reach the highest speed, in terms of data processed.\n\nThis should be done before adding more registers, which means that two optimisations could take place. First, find out which construction is able to attain the largest speed advantage, then, change to architecture to take advantage of 8 or 16 registers and find out what the speed advantage is in that case.
!Instruction set follow-up\nAfter revising the journal [[1 December 2006]], the following decisions can now be made on the supported instruction set.\n* Three addressing modes.\n* Loads only act on the DataRegister.\n* Stores act on both the DataRegister and the ProgramCounter.\n* Loading the ProgramCounter is done by branches.\n* Arithmetic and logical operations only act on the DataRegister.\n* Orthogonality would imply that the addressing modes are usable by\n** Loads\n** Stores\n** Branch\n** BranchIfZero\n** Arithmetic and logical operations\nThis in turn then gives an instruction count of :\n* Load DataRegister : 3\n* Store DataRegister : 3\n* Store ProgramCounter : 3\n* Branch : 3\n* BranchIfZero : 3\n* Arithmetic and logic : 3 * 8 = 24\nTotal count = 15 + 24 = 39.\n\nEncoding this needs 6 bits.\n\n|Instruction|Instruction Type|Addressing Mode|Arithmetic Operation|Filler|\n|Description|TTT|AA|OOO|XXXX|\n|Load DR|000|AA|XXX|XXXX|\n|Store DR|001|AA|XXX|XXXX|\n|Store PC|010|AA|XXX|XXXX|\n|Branch|011|AA|XXX|XXXX|\n|BranchIfZero|100|AA|XXX|XXXX|\n|Filler|101||||\n|Filler|110||||\n|NOP|111|XX|000|XXXX|\n|NOT|111|XX|001|XXXX|\n|INC|111|XX|010|XXXX|\n|SL|111|XX|011|XXXX|\n|SR|111|XX|100|XXXX|\n|ADD|111|AA|101|XXXX|\n|AND|111|AA|110|XXXX|\n|OR|111|AA|111|XXXX|\n\nSplitting this up in the instruction itself would need 3 bits for the type of instruction, 2 bits for the addressing mode and an optional 3 bits for arithmetic and logic, which comes down to 5 or 8 bits, depending upon the requested instruction.\n\nSince the rest of the instruction is not used, the first way is a waste of resources, because extra decoding hardware is needed.\n\nIn the second case, the first five bits can be used immediately for decoding from the JumpROM, while the optional 3 bits can be supplied through a multiplexer to the ALU. This is necessary to be able to make the ALU increment the ProgramCounter e.g.\n\nUsing the rest of the operand as data makes it necessary to expand the instruction set, or to sacrifice some of the current possibilities given by using a full-width data/address operand.\n\nHowever, with 2 bits still free for operations, and 1 bit for the addressing mode, it is probably possible to add enhancements to the platform through these extra bits, without introducing much extra hardware.\n\nThis instruction set gives us currently space for deployment on a platform ranging from 12 bits to 32 (64) bits.\n!!Implementation\nThis report, together with the conclusions from [[1 December 2006]], now gives us the final information to start implementing the desired processor architecture.\n\nTo do this, the following specifications are needed :\n* Detailed InstructionSet description.\n* InstructionSetArchitecture overview.\n* Detailed ImplementationArchitecture.\n* InstructionSet execution timing diagram.\nFrom these specifications, the JumpROM and the MicroProgram must be implemented.\n\nThe ImplementationArchitecture itself is implemented through the DigitalComponents library. The necessary components are derived from the ImplementationArchitecture.\n\nThe implementation itself is done through a Common Lisp program which uses the DigitalComponents library, instantiates the necessary components and connects them with connections and buses.\n\nRunning the simulation means executing compute and propagate functions for all component instances.\n\nHowever, having a program which only does this, is pretty meaningless and misses the point of the simulation : to be able to know what is really going on in the system.\n\nThe simulation of the system should be inside another level of abstraction, which executes one cycle, and then transfers control back to something which can do visualisation of the results.\n\nThe system can be partitioned, where the DigitalComponents library is one partition. Another partition can be formed by creating a SystemLibrary, which should mostly hide the implementation, but which gives the possibility to have easier control.\n\nWhen implementing the SystemLibrary, the DigitalComponents library is used. Inside the SystemLibrary, variables with indefinite extent can be created and connected to each other. This is done at run-time. This\nmeans that an application which wishes to use the SystemLibrary should have a way to inform that a system can be activated. However, this could be implemented class-like, in that a NEW function inside the SystemLibrary is responsible for returning a data structure which completely contains the implemented system.\n\nOn this datastructure, other functions from the SystemLibrary package can then be applied. The most important is {{{step}}}, which executes one compute/propagate cycle.\n\nHowever, the application should also be able to influence inputs from the system. This should also be done through a package function. E.g. a reset function is surely needed. This function would then control the system reset function.\n\nThe application should also be able to visualise things, and pass control from the visual interface to the underlying system. It is the SystemLibrary which is responsible for supplying functions to query parts of the implementation. The data structure which is returned through NEW should never be queried directly.\n\nBy pulling the system in parts like this, it is possible for the visualisation to only take up part of the processing, selectable through the application. This is an MVC implementation, where the Model is the SystemLibrary, the View is the visualisation process, and the Controller is the application.\n\nThrough the View, the Model can be stepped by the Controller, or be run at full-speed. It can be stopped, started and reset.\n\nThrough the SystemLibrary, it should be possible to query the InstructionSetArchitecture, or to have more detailed inquiries of the ImplementationArchitecture.\n\nAnother function of the SystemLibrary is to be able to initialise the system memory. With initially no access to emulation of peripherals, this is necessary to be able to boot and test the system.\n\nWhat would be necessary in the current visualisation ?\n* DataRegister\n* ProgramCounter\n* AddressBus\n* [[MemoryValue]]s around the AddressBus value\n* DataBus\n* Clock\n* Reset\n* ALU A input\n* ALU B input\n* ALU operation\n* ALU output\n* InstructionRegister\n* MicroInstruction\n* Reset Control\n* Stop/Run Control\n* Step Control\n* Breakpoint Control
!Current status\nReceived yesterday finally 'Bit-Slice Microprocessor Design' ([[BSMD]]). I find this a real good reference for computer architectural decisions. It also confirms some of the design decisions I have taken.\n\nImmediately ordered 'Bit-Slices : Controllers and ALUs'.\n\nWhile looking for 'Automated Digital Systems' from A.D. Booth, I found out that there are three different prints, one from 1953, one from 1958 and one from 1965. Having them all three would of course be valuable, since this makes it possible to compare technological changes.\n\nBased upon the first impressions from [[BSMD]], I decided to go with the current design unchanged, but only a 12-bit datawidth. The basic reason is that using the current instruction set wastes memory in a 16-bit configuration.\n\nWhat should still be added to the design, is the possibility of IO, the possibility of interrupt, and the possibility of adding a memory mapping system to expand the memory.\n\n|Page size|Address bits|Mapping bits|Expansion bits|Address bits|Memory|\n|4k|12|0|0|1|4k|\n|2k|11|1|4|15|32k|\n|1k|10|2|4|14|16k|\n\nI should be able to build an assembler using Common Lisp, and use the created architecture to study LCC.\n!Looking at the future\nThe future is to design a bigger system, based upon the current architecture and experiences. The main goal is to create an architecture which can run one of three high-level programming languages :\n# C (by using LCC)\n# Common Lisp (by providing a CLISP byte-code interpreter)\n# Scheme (By implementing SIOD)\nThe main problems with both of the latter languages is being able to do IO and setting interrupts.\n\nHowever, before this can be done, the current system must be made working.
!Component overview for first implementation\nAn extensive library has already been built, but some components are not needed yet, and other components may change function.\n\nStarting from the DataPath3 schematics, the following components are necessary.\n* An ALU with basic possibilities.\n* Registers : ProgramCounter, DataRegister\n* Multiplexer\n* Latches : MemoryAddressRegister, databus drivers\n* Memory\n* Special components : InstructionRegister\n* Clock\n* Single shot\n* Controller\n* JumpROM\n* ZeroStatusHandling\nTo have a scalable system, it should be possible to limit the size of the operands where necessary.\n\nThe place where this can mostly be a problem is the ALU. This means that when creating an ALU, it should be possible to define its word width.\n\nAll other components are passive. They only pass data, and do not perform arithmetic.\n\nWhen writing an assembler the system word width should also be taken into account, which means that for the sake of simplicity, the word width should already be defined before implementation.\n!!Selecting a word width\nThis is more difficult, because this defines the system possibilities.\n!!!12 bits\nThis gives us some space to play with assembler and simple peripherals. Characters would be limited to 6 bits (two characters per word).\n\nThe memory size would be limited to 4k words in first instance. It is possible to expand this system with bank switching hardware, but it would be limited to rather small pages. This could be an interesting exercise, however.\n!!!16 bits\nThis gives the possibilities of early 16-bit minicomputers, a character width of 8 bits, and a memory of 64k words. The memory is actually 128 kB, because no attempt is made to byte-address the memory.\n\nIf more memory is necessary, then bank switching is also here a possibility.\n\nThe biggest issue to know if this architecture is viable is to have a port of the LCC compiler where compilations can be optimised for space, together with a port of uClibc, and then some applications which are built around these tools.\n!!!18 bits\nFrom here on, life gets much more easy, with some room to spare. It should definitely be possible to have a small high-level language like Scheme available in this space, with room to spare for applications.\n\nHowever, from a hardware implementation point of view, an 18 bit width falls a bit short, because it is not a multitude of 4, which means looking for weird components or wasting space. It could be implemented\nusing an FPGA.\n!!!20 bits\nThis does not have the hardware drawback of 18 bits, because it is a multiple of 4, but it is not very much suited to store multiple characters. However, with the increased memory size, this may be less of an issue.\n!!!24 bits\nA popular format in the beginning of the minicomputers, it offers 16 Mb of memory space, and the possibility to use 8 bit or 6 bit character sets.\n\nFrom a possible hardware point of view, additions define the maximum speed of the ALU because of the carry delays.\n!!!32 bits\nCurrently the most popular system width, it offers a large address space and usage of 8 and 16 bit character sets. It is, however, slower from a hardware point of implementation, due to carry delays. Alleviating this means using more, faster and more expensive hardware.\n\nThe note about the assembler can possibly be taken with a grain of salt. The main instruction set architecture stays the same over all these word widths, so it should be possible upon the generation of the assembler, to specify its generated word size by means of a compile time switch.\n!!Moving from cross compilation to native compilation\nIn first instance, an assembler will be written in C, running in a Linux environment. This is practical, but also needed, as the first implementation will severely lack IO capabilities.\n\nAfter that, it should be possible to port LCC as a cross compiler to generate code for the target platform, and to compile uClibc.\n* Assembly source -> assembler -> target code\n* C source -> C compiler -> assembly source -> assembler -> target code\n* Assembler C source -> C compiler -> assembly source -> assembler -> native assembler\n* C compiler C source -> C compiler -> assembly source -> assembler -> native C compiler\n* uClibc C source -> native C compiler -> assembly source -> native assembler -> native uClibc libraries\nOf course, for assembling and compiling natively, a file storage system is needed, which can only be added when peripherals are added to the system.
!Register file musings\nIn anticipation of possible optimisations, one of the greatest will probably be the expansion of the CPU registers. This opens up a whole lot of possibilities with respect to the ISA.\n\nHowever, this will not be as clear cut as it seems to be. Having more registers means having more control lines. The current architecture is very scalable with respect to word width. It is possible to implement the current architecture with word widths ranging from 12 bits to 64 bits.\n\nIn the current ISA definitions, we need either 6 bits, or 5/8 bits. This gives in the first system 6 bits extra with which to specify registers.\n* Load DataRegister : 3\n* Store DataRegister : 3\n* Store ProgramCounter : 3\n* Branch : 3\n* BranchIfZero : 3\n* Arithmetic and logic : 3 * 8 = 24\nTotal count = 15 + 24 = 39.\n\nHaving multiple registers makes it possible to load and store to and from registers.\n!!Load register N\nIn this case, a register must be specified as destination, and a source must be specified. This source can be one of three memory addressing modes, or another register. In this case the register must also be specified, which takes log(2, M) bits.\n!!Store register N\nThe same can be specified for storing a register. Additionally, the separate storing of the ProgramCounter can be moved here.\n!!Branch and BranchIfZero\nAlso, for branching, we could add a branch from a register. We need this for subroutines.\n!!Arithmetic and logic\nIn this case we must move from doing arithmetic on only one possible operand, to having a choice of operands.\n\nUnary operations where always only on the DataRegister. If we expand on this, then all unary operations can have at most one source.\n\nBinary operations operated on the DataRegister plus a a memory address source, and placed the result back in the DataRegister. In the new case, we can still do that, but have as an extra benefit the addition of a register as a source.\n!!Summary\nWe have the following basic operations : Load, Store, Branch, BranchIfZero and Arithmetic.\n\nLoad, Store, Branch and BranchIfZero have 4 possible modes of operation, which gives us 16 operations.\n\nArithmetic has 7 modes of operation, and four different sources, which gives us 28 operations.\n\nIn total, this gives us 44 operations.\n\nIf we encode this densely, then we need 6 bits to specify the operation and its mode, which gives 6 extra bits for specifying a register source.\n\nIf we encode this sparse, then we need 3 bits for specifying basic operations, 2 bits for the addressing mode, 3 bits in case of arithmetic, which gives us a maximum of 8 bits, and which supplies us with 4 extra bits for specifying a register.\n\nIn the first case we get 64 registers, in the second case 16.\n\nOf course, the addressing modes can change too, due to the extra registers.\n\nImmediate addressing (loading a constant from memory) does not change, because only the ProgramCounter is involved.\n\nDirect addressing, which involves loading a value from the address supplied with the instruction, changes to immediate addressing (load the constant into a register), followed by loading the value which is pointed to by the register.\n\nIndirect addressing is also to changed to immediate addressing, loading the value pointed to in a register, and then again using this register as a pointer to the final value.\n\nThis means that direct and indirect addressing can be replaced, by pointer addressing through a register. This reduces the addressing mode to 1 bit, but needs an extra register to be passed as the address source. This changes the instruction count to two modes of operation, which halves the LSBZ count from 16 to 8, as for arithmetic from 28 to 14, which gives us now 22 instructions. These can be represented by 5 bits in dense format, or 7 bits in sparse format.\n\nThe first format leaves 7 bits in 12-bit mode, which can supply 8 registers as destination and source, and in the second case 5 bits, which reduces the po an ssibilities to 4 registers.\n\nIf we still use a 12-bit word as base, then we can specify a source register, a destination register and 2 bits for addressing, which uses up 8 bits, which gives us 4 bits to specify operations.\n\nLoading a register can then be done by specifying a destination, a source register and an addressing mode : immediate, register or register indirect.\n\nStoring a register can also be done by specifying a source register, an addressing mode and optionally an address register : store into direct address, into register or into indirect address.\n\nBranching always has the ProgramCounter as implicit destination, which means that the destination register is removed from the instruction. Branching can be done to an immediate address, to the contents of a register or to in indirect address pointed to by a register.\n\nArithmetic must be done by specifying one of eight operations, a source and a destination. Unary operations can only be executed on a register. Binary operations must then have as destination only a register, but can have as second operand an immediate value, a register or a register indirect value.\n\n|Operation| Instruction Layout|\n|Unary A/L| 0 | 000-100 | XX | XXX | DDD |\n|Binary A/L| 0 | 101-111 | AA | SSS | DDD |\n|Load| 1 | 000 | AA | SSS | DDD |\n|Store| 1 | 001 | AA | SSS | DDD |\n|Branch| 1 | 010 | AA | SSS | XXX |\n|BranchIfZero| 1 | 011 | AA | SSS | DDD |\n\nThis instruction set is still scalable from 12 bits to 64 bits. However, when scaling up it is possibly worthwhile to add additional hardware to load and process multiple instructions at once, and maybe make it possible to specify the sizes of immediate data, to make more efficient use of memory.\n\nAlso, since there is an addressing mode free, this could be used to implement an indirect register + offset addressing mode.
!Test harness and testing for CPU2,r1 are finished\nThis evening I completed CPU2,r1. Now it is time for a rest, in between there are some other important things to do.\n\nLooking through the log of the repository, I found that I took the decision to choose for Common Lisp on 25 July 2006. Previously, I had developed a CPU emulator using C++. It is clear that I had previous experience, and I had delivered something, but it was not as structured as this exercise.\n\nI started working for real on the project on 5 August 2006. This means that it took me 11 months to get where I wanted, from concept to working implementation.
!Large TO DO list\n* Specify usable names for architectural components.\n* Source list of books.\n* List of books to look for.\n** A. D. Booth, "Automatic Digital Calculators".\n* OK : Add and inspect 'comment' fields on components.\n* Implement :print methods for connectors (PROBLEM : infinite recursion).\n* Define register interconnections on SYS-2 as bus.\n* Add controller to SYS-2 schematic.\n* Add splitter component to SYS-2 schematic.\n* Add control logic between zero detect and controller.\n* Specify meaningful and usable names for controlling lines.\n* Investigate clock signals for registers and latches : timing diagrams.\n* Create splitter component for InstructionRegister.\n* Add initialisation function on memory component.\n* Add STOP instruction to instruction set.\n* Integrate components to 12-bit SYS-2 system.\n* Define and implement SYS-2 microprogram.\n* Define test case for separate instructions.\n* Add a journal entry about SYS-1.\n* Add a journal entry on SYS-2 and history of design decisions.\n!!Visualisation\n|Instructie register|Meaningful string (disassembly)|Characters|\n|DI register|Hex values|4|\n|DO register|Hex values|4|\n|MAR|Hex values|4|\n|Databus contents|Hex values|4|\n|Address multiplexer|Hex values4||\n|Address multiplexer select|Data/Register|4|\n|DataRegister|Hex values/Decimal values|4/5|\n|ProgramCounter|Hex values|4|\n|ALU output|Hex values/Decimal values|4|\n|ALU operation|Meaningful operation string|3|\n|ALU operation select|IR/uController|4|\n|Zero result|Zero/Not Zero|2|\n|Next address select|PC/F1/F2/Instruction|2|\n|Controller output fields|Component dependent||\n|Gedeelte van geheugen rond inhoud adresbus|Hex/Decimal/Disassembly|4 + 4|\n* Zoals (n)curses, puur karakter gestuurd.\n* mcclim, enkel geschikt voor Linux.\n\n
!Investigating possibilities for high-level languages\nOne of the things that I like to investigate, is the possibility of fast implementation of programs into new architectures.
!Architectural d overview\nThis article discusses the features that are shortly needed to do further develop on computer architectures.\n!!Current developments\nThe main objective now is to develop a real simple computer architecture with the components that are now defined. This architecture would consist of the following :\n*A microcontroller.\n*A jump ROM.\n*An instruction splitter.\n*A program counter.\n*A memory address register.\n*A data register.\n*An arithmetic and logic unit.\n*The capability of detecting a zero result.\n*A databus shared by the address path, the data path, the instruction path and the memory.\nTwo tools should help make the implementation easier : a micro-assembler to create the microprogram, and an assembler to create programs for the system.\n\nAdditionally, it should also be possible to monitor the status of the system by means of a kind of front-end display.\n\nThe system itself should be capable of the following through the right setup :\n\n*Sources to the program counter :\n**The memory\n**The data register\n*Sources to the memory data register\n**The memory\n**The program counter\n**The ALU output\n*Sources to the data register\n**The program counter\n**The memory\nThe current register will be extended with two extra operations except for loading, that will be shift left and shift right. Either way, left or right, a zero will be entered, and the LSB or MSB will be lost. This means that the programmer is responsible for testing the bits that he needs to, since there are no flags for such purposes.\n\nA multiplexer will be used to move the output from the register to the ALU. Since the ALU will be kept simple, with only four operations (AND, OR, NOT, ADD), it should be possible to move the contents of the data register unchanged through the ALU. This will be done by executing OR and selecting zero from the ALU input multiplexer.\n!!!ALU Operations\n*A AND B -> D0\n*A OR B -> D0\n*NOT A -> D0\n*A + B -> D0\n*0 AND B -> 0\n*0 OR B -> B\n*NOT 0 -> 0xFFFF\n*0 + B -> B\nThe main goal of the complete project has always been to writesimulation software to design something that could be implemented in real hardware. The software provides the possibilities to test ideas fast by supplying standard usable building blocks, but to still have a detailed design by wire and bus. This makes it possible to have a quick design time, but still have enough detail to be able to move to an electronic implementation quickly.\n\nMoving to an electronic implementation means choosing one of the following courses to follow :\n*Use standard building blocks, 74 type logic IC's\n*Use a mix of standard building blocks and custom implemented IC's\n*Use only custom implemented IC's\nThe first option takes the least time to get components and to create a layout. There is also no necessity to get extra tools to custom burn components. The software is readily available, but extra hardware or a service is needed to build custom IC's.\n\nThe second option should get the best mix of standard components and custom designed IC's (programmable logic). Like was said before, additional tools and time is needed to create the custom components.\n\nIf the third option is chosen, then it is best to use an FPGA or something, because otherwise too much identical functionality to standard components must be implemented, and then it is better to head down the second option.\n\nTo build a complete system, also the mounting technology must be chosen, which again leads to three possible choices :\n*Pre-printed boards and wire bridges.\n*Printed circuit boards.\n*Wire wrap.\nFor these to choose from, the possibilities and costs must be computed for each technology.\n\nThe advantage of the approach with the simulation is that it is now possible to start with something that is really simple, and then by implementing programs and doing measurements on them, see what small (or large) modifications to the architecture do to the overall result. It also gives a view of how the complexity increases, e.g. what advantages does the architecture get from a simple register stack, to move to a register stack with two output ports and one input port, and so on. This makes it also possible to look for trade-offs about existing and not existing components.\n\nE.g. an change using a register file would only be possible using the 74LS170 type 4x4 register file, or doing an implementation using an FPGA, which gives more possibilities.\n\nPossibly the biggest problem to solve, for knowing what influence architectural decisions have, is to have a real compiler which can be adapted to to changes in the architecture.\n!!Current changes\n!!!General\nThere should be a way to limit the width of the components in the simulation, especially counters and arithmetic units.\n\nTo make drafting easier, gEDA will be used. All currently available components will be created in the components library. This makes it possible to move components without having to redraw their\nconnections.\n!!!Counter\nRemoved direction line. This saves one control line.\n!!!Data register\nShould be replaced with a type which can shift logical left and right. This will add one control line, which is only active if load is inactive.\n!!Agenda derived from the previous points\n*For every necessary component, create a gEDA symbol.\n**Clock\n**Single-shot\n**Controller\n**Up-counter\n**2-1 multiplexer\n**3-state buffer\n**Register\n**Memory\n**ALU\n**Shift-register\n**Splitters.\n*Update the CL definitions and their test harnesses\n**Splitter : macros possible.\n*Redraw the current implementation in gEDA using the previously defined symbols.
See [[ArithmeticLogicUnit]] for details about what this component does.\n!Requirements\n!Test cases\n
The AddOperation performs the addition of two operands.\n\n|!Operand A| 0101100|\n|!Operand B| 1000101|\n|!Result C| 1110001|\n\nThe problem in computer engineering versus mathematics is the finite word length of operands. In mathematics you just add one digit if the result is larger, but in computer engineering this is not possible.\n\nThe solution employed in older and smaller systems (vs. RISC) is the usage of a carry flag. However, some investigation and design upfront showed that implementing a flag register was a complex issue. Not that it was not possible, but at this moment in time the goal is to quickly implement a complete (maybe slow) workable processor system. Designing an extra status path, if possible, was to be avoided.\n\nAfter a little bit of searching, it was found that it was avoidable. This should be investigated further, but it this is a property of modulo arithmetic.\n\nIn modulo N, if A + B > N, then the result will always be smaller than either A or B.\n\nThus, adding two numbers, storing the result and then comparing the result with one of the terms makes it possible to know if there was a carry. If A + B = C, and C < A (or C < B), there is a carry. Testing X < Y is done through the CompareOperation.
Further work should expand upon SYS-2, but make the system more powerful.\n\nWhat things can all be done, that are worthwile ?\n\n* Add more registers (4, 8, 12 or 16).\n* Separate instruction fetch from datapath.\n* Adding interrupts.\n* Adding paged/protected mode.
The AndOperation is the logical AND of two LogicalValues.\n\n!The AndOperation for two logical values\n\n|!A|!B|!Y|\n|0|0|0|\n|0|1|0|\n|1|0|0|\n|1|1|1|\n\n!The AndOperation for a whole SystemWord\n\n|!Operand A| 0101100|\n|!Operand B| 1000101|\n|!Result C| 0000100|
This part gives an overview of the SimulationLibrary with increasingly complex designs, up to the design of a relatively simple processor.\n\n* [[Microprogram Controlled Counter]]\n* [[Microprogram Controlled Counter Addressing Memory and Reading Data]]\n* [[Basic Stored Program System]]\n* [[First Architecture]]\n* [[Simple Dataprocessing Architecture]]\n
Architecture is a broad word. In the context of this project, it defines two targets, the InstructionSetArchitecture and the ImplementationArchitecture.\n\nIn essence, the InstructionSetArchitecture is a virtual architecture, which can be implemented in different ways through the ImplementationArchitecture.\n\nCurrently, it is mostly the ImplementationArchitecture which will define how the InstructionSetArchitecture will look. This is dependent upon the available DigitalComponents, and also the way the MicroController is implemented. The reason is that the MicroController is a MicroProgram controlled system with a JumpROM for InstructionDecoding.\n\nThe biggest problem with [[Architecture]] is the tracking of architectural evolution through several design changes. These changes will inevitably come through addition of features and implementation changes gained through experience with the built system.
The ArithmeticLogicUnit is the part that can process data. The operations that can be executed are ArithmeticOperations and LogicalOperations. ShiftOperations are something different, which can be implemented through the DataRegister, through the ArithmeticLogicUnit or through a special circuit called a BarrelShifter.\n\nThe operations themselves are more or less straightforward.\n\nThe basic logic operations are the AndOperation, the OrOperation and the NotOperation. AlternativeLogicalOperations are the NandOperation and the NorOperation. Implementing a NotOperation would then not be necessary, but it would be necessary to be able to do an operation on the same word.\n\nThe basic ArithmeticOperations are the AddOperation. This is really all that is needed.\n\nSince the main objective is to keep the design simple, this is all that will be implemented. However, because of the way the datapath is implemented, also a PassOperation is needed, which passes the B input unmodified to the output. However, this could be implemented by using the AndOperation together with a word 0xFFFF, or the OrOperation with the word 0x0000.\n\nThe biggest problem in ArithmeticOperations is how to derive [[Decisions]] from them. Bits in the word can be tested through LogicalOperations, where their result is then either 0, or not 0. The ZeroStatus is the only thing that is really needed and which should be handled by the MicroController (ZeroStatusHandling). All the rest can be derived through software.
The basic arithmetic operation is the addition. From this, all other operations can be derived. Of course, some algorithms are more efficient than others.
The [[Assembler]] is a program which takes symbolic instructions and translates them into machine code which can then be loaded into the SystemMemory.
Bit-Slice Microprocessor Design, Jim Mick, John Brick, MacGraw-Hill.
[img[Circuit 3|circuit_3.png][circuit_3.png]]\n\nThis is an even more complicated example, because of the interconnections and the necessary functionality.\n\nThe purpose here is to let the counter read out instructions from the memory, which will then be put into the instruction register and decoded by the JumpROM in order to do something else.\n\nIn this configuration, the following things are possible :\n\n* The counter can be loaded with a value from memory.\n* The counter can be loaded with a value from the data register.\n* The data register can be loaded with a value from memory.\n
The BranchIfZero instruction is fundamental for taking decisions in programs.\n\nDesigning the BranchIfZero instruction is somewhat difficult, because some trade-offs have to be taken into account.\n\nThe difficulty lies in the way the branch will get its parameters. Testing for zero is simple, but what strategies are there to really take a branch ?\n\nThe most minimalistic way to implement the branch is to split it up in two separate instructions. one to do the test and one which does the real branching. Effectively the conditional branching is split up into a test and branch.\n\nHowever, when laying out the instructions, which are normally a translation of a programming language into machine code, what is then the best way to deal with this ?\n\nWe should analyse this for several high-level constructs.\n* [[If-Then-Else]]\n* [[While-Do]]\n* [[Do-While]]\n* [[Until-Do]]\n* [[Do-Until]]\nFrom this can be seen what sequences will be used :\nBranch if true :\n# Compute ConditionalExpression\n# Negate ConditionalResult\n# BranchIfZero\nBranch if false :\n#Compute ConditionalExpression\n# BranchIfZero\nJump unconditional :\n# Unconditional jump\n\nSkipping the next instruction is difficult, because the current instruction needs to know how long the next instruction is. This can be alleviated by encoding the length of the next instruction in the skip, but then this becomes a BranchIfZero.\n\nUsing single-instruction branch and combining this with a unconditional long branch wastes one instruction. It is also slower.\n\nIf the BranchIfZero consists of the test and the loading of the branch address, when the branch is not taken, one memory access is needed, otherwise two.\n\nIn the other case, one memory access is needed for the first instruction. If the branch needs to be taken however, two more memory accesses are needed.\n\nAs a design decision, BranchIfZero will be implemented as a two word instruction. After a compiler has been implemented, statistics can then be used afterward to see if it pays to have one word jump instructions.
When implementing [[Branching]], there are still several factors to take into account.\n* How large is our address space ?\n* Do we use addresses which can span the complete address space ?\n* Do we use absolute or relative jumps ?\nThe combination of the first and second question leads to the question, what width of addresses will be used ?\n\nIf we use a 32-bit address- and datapath, then we have ample opportunity to implement anything, while we need to only load one address.\n\nWhen we use a 24-bit address- and datapath, we have mostly the same possibilities as the previous case.\n\nHowever, when we use a 16-bit address- and datapath, we are getting constrained. Only 64k words can be addressed and this limit mostly precludes multiprocessing.\n\nIn this case, the simplest mechanism to extend the address space, is to use a bank switching system. This mostly means that for addressing something in the whole address space, two memory accesses are needed.\n\nBut, the whole point of the current design is to minimise memory accesses. This can only be achieved by being limited to a chosen address- and datapath width.\n\nSizing down on address- and data size results in less performance and more hardware.\n* Less performance : because more memory accesses are needed.\n* Less performance : multiple cycles needed for address computation.\n* More hardware : more intermediate registers are needed to assemble longer data and addresses.\nUsing less wide data- and addresspaths means more time is needed to perform computations. However, hardware cannot just be halved e.g., there comes extra overhead with extra registers and extra control lines to control these extra registers.\n\nHaving a simple, uniform architecture makes the most of the used hardware and needs the least control lines. The current architecture scales well from 16 bits onwards. The only things needed are applications which fit in several categories. The problem with a 16-bit address space is that there are probably not enough applications which fit in there.\n\nHowever, since this system is still simulated, it is probably possible to emulate a 32-bit architecture and still have applications which have modest (albeit larger) requirements.\n\nIn this case it is then possible to span the whole address space with one address, or have an instruction which contains a relative addres.\n\nIn the first case, fetching the branch address is done automatically, but a complete branch always takes 4 cycles.\n\nIn the second case, the loaded instruction contains an offset which is sign-extended (extra hardware needed) and fed to the ALU, where it is added to the current value of the ProgramCounter. However, this value can only be used in the next ClockCycle, because the result has to be isolated from the addition.\n\nTracing through the instruction execution however, shows that there is no additional advantage by including a relative offset. While the instruction is decoded, no work is done. In the execution phase, the offset will be added to the current value of the program counter. However, in this phase, this value cannot be passed to the memory address register, because this is also the feedback loop to increment the address value. Since in this cycle an addition is already busy, the system must wait until the next clock cycle for the new ProgramCounter value to appear. This means that two [[ClockCycle]]s extra are needed to begin fetching instructions again, and thus a relative jump takes the same time as an absolute jump.\n
A bus is a data structure which makes provisions for different outputs to be routed to the same input. It does this by storing the output references in an array.\n\nIn the OutputComputation, it will take into account the state of the outputs and also garble the result if more than one input is active.\n\nThe result is stored in an output, which can then again be referenced by several other inputs.
The [[C compiler]] takes input in the form of the C language and translates it into machine code which can then be loaded into the SystemMemory.\n\nA [[C compiler]] makes it possible to reuse code from all kinds of sources, including maybe even a Linux kernel.
The clock is the part that makes the complete system tick, literally. A ClockCycle (or period) consists of a high pulse with a certain duration, and a low pulse with a certain duration. A running clock forms a train of pulses which loads registers, latches and runs counters.
Comparison is done through subtraction.\n\nThe following table gives the necessary relationships of A - B.\n\n|A - B > 0| A > B|\n|A - B = 0| A = B|\n|A - B < 0| A < B|\n\nBy OR'ing together the result of two comparisons, the relations A >= B and A <= B can be defined.\n\nBy inverting the result a A - B = 0, A != B can be defined.
A [[Component]] can be considered as a kind of class, although it is not defined through any object-oriented framework. It should contain at least the following declarations :\n* A structure containing all necessary [[Inputs]], [[Outputs]] and other necessary fields needed for the state of the [[Component]].\n* An OutputComputation function, called ''//component//-compute''.\n* A PropagationStep function, called ''//component//-propagate''.\nOptionally, extra functions for initialisation and setup of the particular [[Component]]. These should consistently be called ''//component//-//function//''.
The DataPath is the most complex part, but the complexity stems from the requirements. Simple requirements yield a simple datapath, but modest additions yield quickly a larger complexity.\n\nThe basic part which defines the requirements of the datapath is also the tasks which are needed to do : is a controller being built, a number cruncher or only a simple dataprocessor. Doing a general purpose implementation makes it necessary to put compromises in the design.
The DataRegister (in this [[Architecture]]) is a register which is connected to the databus. Its contents can be updated by reading out the [[Memory]], and its contents can be stored there too.\n\nIt is used to hold intermediate values in computations and has a ZeroStatus output for ZeroStatusHandling.
[[1 December 2006]]\n[[7 December 2006]]\n[[8 December 2006]]\n
[[Decisions]] are needed to alter the flow of control based upon the outcome of ArithmeticOperations or LogicalOperations.\n\nThis outcome must be fed back to the MicroController, so the MicroProgram can alter its flow of control based upon this result.\n\nThis outcome can be called the OperationStatus.
Diagrams are a big necessity in this kind of project. They help to get an accurate overview of what needs to be accomplished, what must be done yet and how things should change.\n\nOne of the bigger problems is how to publish electronic diagrams. This wiki is only capable of documenting the whole process. Adding diagrams needs to be done through other software. This means using pure graphical formats or making provisions to use PDF.\n* ComponentDescription\n* ImplementationDiagram\n* TimingDiagrams\n* MicroProgram\n
This is the main SimulationLibrary created to build digital [[Component]]s on the functional level, interconnected with InterConnections.\n\nThe language chosen for the simulation is CommonLisp, because of the maturity of the language and the HighPerformance of the compiled code.
Do CodeBlock\nUntil ConditionalExpression.\n\n# Execute CodeBlock\n# Compute ConditionalExpression\n# BranchIfZero to CodeBlock Entry
Do CodeBlock\nWhile ConditionalExpression.\n\n# Execute CodeBlock\n# Compute ConditionalExpression\n# Negate ConditionalResult\n# BranchIfZero to CodeBlock Entry
This is an overview of the features which can be implemented in the SimpleProcessor, as designed through successive stages of increasing complexity. This incremental design also provides us with the necessary confidence in the basic DigitalComponents, because in addition to existing UnitTests, they are also tested in different integration environments.\n* Controlled InstructionCounter.\n* InstructionCounter reads out SystemMemory and stores data in DataRegister.\n* Additional storage of data in MicroController InstructionRegister.\n* Usage of JumpROM to control MicroInstruction flow.\n* Adaptation of architecture to be able to load contents of memory into InstructionRegister.\n* Addition of MemoryAddressRegister.\n* Define instruction to load contents of memory into MemoryAddressRegister and load DataRegister with it.\n* Define instruction to load contents of memory into MemoryAddressRegister and store DataRegister to it.\n* Define instruction to store contents of DataRegister into MemoryAddressRegister and load DataRegister with it.
[img[Circuit 4|circuit_4.png][circuit_4.png]]\n\nAfter some work, needed for finding out what I wanted and how I wanted it, I got a result in this architecture. The system consists of an address path, a datapath, a memory and a microcontroller.\n\nThe goal was to be as simple as possible, and one of the things that where eliminated was the status path. When designing one which would be versatile, I found that I almost needed the same amount of resources as for the rest of the architecture.\n\nAfter studying the issues for some time, I found out that all things that are normally tested through status could be done in software, of course at the cost of speed. However, I wanted to evolve this system by building it, having a compiler for it and doing measurements on the generated code. This meant that I better started real simple, and only implement optimisations for whicinh there was a definitive need.\n\nWhen reviewing this architecture, I found out that I could simplify even more by moving the program counter to the datapath, giving [[A Simple Dataprocessing Architecture]].
A first simulation attempt was made by using two different structures for [[Inputs]] and [[Outputs]].\nHowever, when going further into the creation of components, particularly [[Memory]] in this case, it was found that a bidirectional connection was needed.\n\nSince this bidirectional component contained the parts of both [[Inputs]] and [[Outputs]], it seemed easier to only define one structure which could be used as input, output or both at the same time. This also meant that no special precautions need to be taken for an input component to be able to reference an input or a bidirectional connection.
|Project|Technology|\n|[[Homebrew CPU Home Page|http://www.homebrewcpu.com/]]|TTL|\n|[[Simplex-III|http://members.iinet.net.au/~daveb/simplex/simplex.html]]|TTL|\n|[[The 4-bit homemade CPU|http://wiesi.uttx.net/4bit/index.html]]|TTL|\n|[[Home-Built TTL CPU|http://cpuville.4t.com/index.htm]]|TTL|\n|[[Mark 1 FORTH Computer|http://www.holmea.demon.co.uk/Mk1/Architecture.htm]]|TTL|\n|[[Main Page - TriPU|http://tripu.triphoenix.de/Main_Page]]|TTL|\n|[[Mippy Home|http://madscientistroom.org/mippy/]]|TTL|\n|[[Another boring CPU with TTL|http://freenet-homepage.de/dieter.02/index.htm]]|TTL/Transistors|\n|[[Selbstbau-Computer von Dennis Kuschel|http://mycpu.eu/]]|HC/HCT|\n|[[D16/M Minicomputer|http://www.timefracture.org/D16.html]]|HC/HCT|\n|[[OSU8 Microprocessor|http://www.pjrc.com/tech/osu8/index.html]]|FPGA|\n|[[Harry Porter's Relay Computer|http://web.cecs.pdx.edu/~harry/Relay/index.html]]|Relay|\n|[[Bau eines Relaiscomputers von Kilian Leonhardt|http://www.relaiscomputer.de/]]|Relay|\n|[[DIY Calculator :: Heath Robinson Rube Goldberg Computer|http://www.diycalculator.com/sp-hrrgcomp.shtml]]|Diverse|\n
Although the (simulated) system is able to run autonomously, there are some extra things that should be done to make the system communicate with the outside world.\n\nIt should be possible to communicate with the system using a keyboard and a character display. It should also have provisions to store and retrieve data. The biggest problem with this scheme is that IO becomes very complex very quickly.\n\nIn keeping up with the simplicity already achieved, there should be a way to extend this to IO.\n\nIn the simulation, this is no problem, but once I take a decision to implement something real, the complexity becomes a bigger issue, because it should be possible to interface the system with real peripheral devices.
# Compute ConditionalExpression\n# BranchIfZero else\n# Execute if part\n# Jump to end of statement\n# Execute else part\n# End of statement\n\nSince everything depends on the BranchIfZero statement, the conditional expression must be evaluated for testing on zero.\n\nThere are ultimately two possibilities; use an instruction which contains an offset, or use an instruction for which the jump address must be loaded.\n\nIn the first case, only one instruction must be loaded, but some modifications to the architecture are needed.\n\nIn the second case, after loading the instruction, when the branch is taken, the jump address should be loaded, or the program counter should be incremented again.
Type the text for 'implementation'
This is the way the complete system has been built up from the ground by use of DigitalComponents.
The ImplementationIssues are about the framework that is needed around every [[Component]].\n\nTo start with, every [[Component]] consists of a {{{struct}}}, together with functions to initialise them, a compute function and a propagate function.\n\nHowever, every component also needs a detailed description (requirements) together with test cases.\n\nThe test cases should be gathered in a list with descriptions of the tests, and it should be possible to run all the tests in one pass.
The moment after SystemBootTime when the system reads the first working program into [[Memory]].
An [[Input]] is a field which references an [[Output]].\n\nA [[Component]] should never reference the value of the [[Output]] itself directly, but should always use the function [[connection-value]].
|Instruction|Constant|Direct|Indirect|\n|LD DR|X|X|X|\n|ST DR| |X|X|\n|ST PC| |X|X|\n|BRA|X|X|X|\n|BRZ|X|X|X|\n|NOT DR| | | |\n|INC DR| | | |\n|SL DR| | | |\n|SR DR| | | |\n|ADD DR|X|X|X|\n|AND DR|X|X|X|\n|OR DR|X|X|X|\n|HALT| | | |
The InstructionCounter is a ModuloCounter which increments every MicroProgram cycle, and is used to address the SystemMemory.\n\nIn a more advanced architecture the InstructionCounter would be replaced by a normal [[Register]] and an [[Adder]] or [[Incrementer]] circuit.\n\nIt should be possible to load the InstructionCounter with a value retrieved from SystemMemory.
InstructionDecoding is the process in the MicroController which takes the contents of the InstructionRegister, and uses these contents as an index into the JumpROM. The contents of the JumpROM are then taken as the next address for the MicroProgram. This is in effect a MicroProgram jump.
The InstructionRegister, which is part of the MicroController, can be used to store the contents of memory to influence the MicroProgram control flow. It does this by indexing the JumpROM and using the retrieved address to make a MicroProgram jump.\n\nIt can also contain fields which can be directly used to address registers or select arithmetic operations.
This is the [[Architecture]] which is visible for the assembly programmer and the compiler implementor.\n\nThe current InstructionSetArchitecture will be mostly derived from the ImplementationArchitecture, because it is the goal to define a working system architecture with the least components possible.
When simulating DigitalComponents, [[Interconnection]]s are necessary.\n\nThese [[Interconnection]]s are simulated using a structure which provides for simulation of [[Output]]s and [[Input]]s (see HistoryOfInterconnections for the reasons why this is done using one structure).\n\nAs can be seen from the implementation of these simple structures, it is possible for several [[Input]]s to reference one [[Output]]. When it is necessary for several [[Output]]s to be able to supply information to one [[Input]], [[Buses]] are a necessity.
InterruptHandling needs to do two things :\n\n# Save the ProgramCounter in a location where it can be retrieved.\n# Jump to an InterruptHandler.
[[2 January 2007]]
[[November]]\n[[December]]\n[[Januari]]
The JumpROM is the part of the MicroController which decodes an instruction and provides the address for the next MicroInstruction.\n\nIf possible, the JumpROM should be generated by the MicroAssembler.
LogicalOperations are the AndOperation, OrOperation and NotOperation from Boolean algebra.\n\nIn this case, they are bit operators, because they work on all the bits of the ArithmeticLogicUnit.
LogicalValues are FALSE or TRUE, and can eg. be denoted with 0 and 1, or 0V and +5V.\n\nOn LogicalValues, the operations AndOperation, OrOperation and NotOperation are defined.
[[Tools]]\nSimulationGoal\nSimulationLibrary\nArchitecturalTrail\n[[Journal]]
The MemoryAddressRegister is used to be able to address the [[SystemMemory]] with other addresses than only those of the InstructionCounter.
Memory management is an important point in the future of the systems I am designing. While currently using a 12 bit system for R&D purposes, it makes probably sense to expect the system to have a larger memory in the future.\n!!Memory expansion\nThis is about about expanding the maximum usable memory in a system. In effect, it comes down to making the address bus wider. There are, however, different strategies to do this, depending upon the system design.\n!!!Paging registers\nThis is a solution for a system which does not have possibilities to expand its address space.\n\nThe implementation consists of registers which can be loaded with an extra address part, and which are addressed by one or more bits of the standard address. In this way the number of address bits can be increased.\n\nThis effectively divides the memory in pages, where each page can be addressed by only N bits of the A address bits, where N = A - P, and P is the number of address bits used to address the paging registers.\n\nWhen using 1 bit to address a page register, the usable address in the page is only 11 bits, which gives a page size of 2k. In order to expand the memory, the addressed page register must be at least 2 bits in size, which gives an effective memory address of 13 bits, or 8k words.\n\nThe problem with this solution is that is not possible to contiguously address the contents of the memory. One bit gives only 2 page registers, but 2 bits extra gives four pages. This means that either the application must be structured to accomodate this, or a virtual system must be written which takes into account the status of the page registers for all memory addressing operations.\n\nAlso, more page registers means pages of smaller sizes.\n!!!Longer addresses\nBy extending the architecture, it is possible to expand the address space at the level of the InstructionSetArchitecture. This makes it faster than the previous solution, and instructions and the ImplementationArchitecture can be adapted to be able to use contigous memory without a problem. By introducing relative addressing, it is also possible to relieve the main drawback, that instructions using addresses now take up more space.\n!!!Expand system width\nBy expanding the complete system width, the previous drawbacks are removed at the expense of expanding the hardware resources proportional to the desired system width. However, when going for a system larger than 12 or 16 bits, it should be implemented on FPGA or CPLD.\n!!Protected memory\nThis gives the system the ability to protect itself against bad code. To be able to use this, it is necessary in the architecture to have provisions for running programs in different protection modes.\n\nThe basic property needed, is that when a program runs in a kind of protection mode, the system checks memory accesses against some boundaries. If these boundaries are overstepped, then an interrupt should be generated. This would drop the system in supervisor mode, where it would be able to remove the offending process.\n!!Memory paging\nMemory paging is a system where it is possible to extend address space with parts of external storage.
This tool should help to generate [[MicroProgram]]s. When these are small, it is rather easy to assemble them by hand, but if the InstructionSetArchitecture grows and the machine needs more features, then the chance that errors are introduced grows.\n\nA MicroAssembler should help with this, by making it possible to define fields of bits, and to combine these into operations.\n\nThe MicroAssembler should be written in Common Lisp.\n\n!Analysis and design\nIn [[2 January 2007]] a preliminary analysis of [[MicroAssembler]]s was done. In this part, a further analysis is done of what must be possible with a MicroAssembler, or in other words, what are its\nrequirements ?\n\nThe first requirement is that it should be possible to use symbolic names for signals to control. In fact, it should be possible to design the MicroProgram using a structured logic chart, and use the there defined names.\n\nA second requirement is that it should be possible to use labels for addresses in the MicroProgram, and it should be possible to substitute their values in the MicroInstructions.\n\nA further requirement is that it should be possible to substitute multiple occurrences of the fields by macros, so that it is possible to use abbreviated forms.\n\nAnother requirement is that the output should be in an intermediate form, which can be used by the CL simulator or by an external program to generate ASCII HEX files.\n\nIt should be possible to dump a list of defined labels in the MicroProgram, to generate jump tables from them.\n\nAt the start of the MicroProgram, it should be possible to define a list of control signals and their attributes. When this is done, it should be possible to do the following things :\n* Generate a default MicroInstruction, based upon default values.\n* Generate a MicroInstruction, by referencing or setting a control signal.\n* Use the value of a label in a MicroInstruction.\nThe control signals should be defined from most significant to least significant in the context of a MicroInstruction (another option would be to number them manually, but this way the numbering is automated and thus less error prone).\n\nThe attributes should be the name, the type (boolean or integer) and the number of bits and their default value.\n\nWhen fields are not referenced, they assume their default value in the current MicroInstruction.\n\nFields can be assigned values. An error should be generated if the assigned value falls outside the field's definition.\n\nWhen a boolean field is not assigned to, but referenced only, then its value should be the inverse of its default value.\n\nInteger fields should always be assigned to.\n\nAn assignment can be a numerical value or a label.\n\nSince a MicroInstruction can comprise several actions, all these actions have to be grouped together. This means that a grouping statement is needed, together with allowable actions within that group. A group can be preceded by a label, which assumes as value the address of the group within the MicroProgram.\n\nMacros have to encompass everything from single statements to complete groups.\n\nTo know what the MicroAssembler must be able to do, it is probably best to start with writing down the MicroProgram and then develop a MicroAssembler for it.
This is the controlling circuitry which contains the MicroProgram, the JumpROM, an input instruction register and programmable decision making for controlling the implemented processor.\n\nSee MicroInstructionCycle for an explanation of the working of the MicroController.
One word of the MicroProgram, being equivalent of one step of control in the implemented processor.
One step in the MicroProgram which defines a parallel set of actions for the system.\n\nOne MicroInstructionCycle is mostly the size of one ClockCycle.\n\nTo be able to think about the way the complete system works, the MicroInstructionCycle begins with the moment the MicroInstructionPipelineRegister is loaded with the contents of the MicroProgram, which happens at the falling edge of the clock. Since the next MicroInstruction is then loaded at the next falling edge, everything which happens in one MicroInstruction should be finished between these two edges.\n\nHowever, there are two registers in the MicroController, the MicroProgramCounter and the MicroInstructionPipelineRegister.\n\nUpon SystemBootTime the MicroProgramCounter is loaded with zero and indexes the first word of the MicroProgram, which should set all necessary components to a known state. This means that the MicroInstructionPipelineRegister will always be loaded at the first edge of the ClockCycle.\n\nWhen the ResetPulse ceases its activity, it becomes possible for the MicroProgramCounter to be incremented or loaded with a new value. If the first instruction sets everything in a known state, then the next clock pulse should increment the MicroProgramCounter to select MicroInstruction[1], which should set up the datapath for moving the ProgramCounter to the MemoryAddressLatch, increment it through the ALU and prepare the memory for reading an instruction for reading from the first memory address.\n\nHowever, while the MicroProgramCounter is being incremented, the MicroInstructionPipelineRegister is still loaded with MicroInstruction[0]. Generally, while MicroInstruction[n] is being generated, MicroInstruction[n-1] is being executed. However, the outcome of MicroInstruction[n-1] should be able to influence the selection of MicroInstruction[n].\n\nAt ClockCycle[n], the PipelineRegister will contain MicroInstruction[n-1], and the MicroProgramCounter will be incremented from [n-1] to [n], to select the next MicroInstruction. However, the outcome of MicroInstruction should have the possibility to do two things : go on with MicroInstruction[n], or load the MicroProgram counter with a jump address based upon the outcome of the computation at MicroInstruction[n-1], or the loading of an instruction from main memory.\n\nGoing on with the next microinstruction is the easiest and the fastest, because the selection from the MicroProgram can be done from the edge at which the MicroProgramCounter is incremented, until the next edge.\n\nHowever, if the MicroProgramCounter has to be loaded conditionally with a new value after it has been incremented, then there is a problem, because the MicroController has to wait for the results of the current computation to arrive. This means that selection of the next MicroInstruction is delayed by the time needed to do a computation. The result of this is that the cycle time has to increase.\n\nTo resolve this, a conditional instruction could be split up in more microcycles. MicroInstruction[n-1] executes the computation and should deliver a conditional result, which is only available at the end of the MicroCycle. In the meantime, the MicroProgram counter has been incremented and selects MicroInstruction[n].\n\nOn the following edge MicroInstruction[n] is sent to the PipelineRegister, while at the same time the results and the controls from MicroInstruction[n-1] are fed back to the MicroController just before the MicroProgramCounter is clocked. These results will then decide if the MicroProgramCounter will be incremented at the clock edge, or loaded with a new addres [m], which will then select MicroInstruction[m] for the next cycle [n+1].\n\nThe big trouble with this system is that two cycles are needed for a computation which takes only one cycle, and that a complete cycle is wasted
The MicroInstructionFormat is derived from the control fields of the MicroController.\n\nThese are summarised in the following table.\n\n|Name |Description |Type | Values |\n|JUMP-1 |Jump destination 1 |Integer |0-1023 |\n|JUMP-2 |Jump destination 2 |Integer |0-1023 |\n|MEMORY-ENABLE|Memory access |Boolean ||\n|MEMORY-RW |Read/Write |Boolean ||\n|IR-LOAD |Load IR |Boolean ||\n|DB-IN-LOAD |Get databus value |Boolean ||\n|MAR-LOAD |Activate MAR |Boolean ||\n|DB-OUT-ENABLE|Activate DB out |Boolean ||\n|DB-OUT-LOAD |Register DB out |Boolean ||\n|AD-SELECT |Select MAR address source |Boolean ||\n|DR-ENABLE |Activate DataRegister |Boolean ||\n|DR-LOAD |Load DataRegister |Boolean ||\n|PC-ENABLE |Activate ProgramCounter |Boolean ||\n|PC-LOAD |Load ProgramCounter |Boolean ||\n|OP-SELECT |Select ALU OP source |Boolean ||\n|ALU-uC-OP |ALU operation |Integer |0-7|\n|CONDITION |Conditionally select address|Integer |0-31|\n\nAs such, it can be seen that the output for the jump input still needs to be designed.\n\nThere are four possible values for the conditional logic :\n# ProgramCounter\n# JumpField1\n# JumpField2\n# JumpROM\nThere is only one conditional, and that is the ZeroStatus. The combination of the ZeroStatus and the Condition field must yield one of these four output values.\n\nFirst of all, it should be possible to select unconditionally the next address. This yields four possibilities as described above.\n\nThen, since there is one condition, which can be tested true or false, this gives the possibility to choose from two paths. These paths should be encoded.\n\n|Condition|Path 1 |Path 2|\n|None |ProgramCounter |None |\n|None |JumpField1 |None |\n|None |JumpField2 |None |\n|None |JumpROM |None |\n|Test |Code 1 |Code 2|\n\nThus, it comes down to the fact that the condition is an encoding of a test bit and two path select fields, of two bits each.\n\nWhen the test bit is zero, no condition is tested, and the contents of Path1 are selected. When the test bit is one, the ZeroCondition is tested. If it is true (ie. the DR is zero), then the flow through Path1 is taken, else the flow through Path2 is taken. This gives us the possibility to encode 32 different ways for jumping in the MicroProgram.\n\n|Name |Description |Type | Values |Bits|\n|JUMP-1 |Jump destination 1 |Integer |0-1023 | 10|\n|JUMP-2 |Jump destination 2 |Integer |0-1023 | 10|\n|MEMORY-ENABLE|Memory access |Boolean | | 1|\n|MEMORY-RW |Read/Write |Boolean | | 1|\n|IR-LOAD |Load IR |Boolean | | 1|\n|DB-IN-LOAD |Get databus value |Boolean | | 1|\n|MAR-LOAD |Activate MAR |Boolean | | 1|\n|DB-OUT-ENABLE|Activate DB out |Boolean | | 1|\n|DB-OUT-LOAD |Register DB out |Boolean | | 1|\n|AD-SELECT |Select MAR address source |Boolean | | 1|\n|DR-ENABLE |Activate DataRegister |Boolean | | 1|\n|DR-LOAD |Load DataRegister |Boolean | | 1|\n|PC-ENABLE |Activate ProgramCounter |Boolean | | 1|\n|PC-LOAD |Load ProgramCounter |Boolean | | 1|\n|OP-SELECT |Select ALU OP source |Boolean | | 1|\n|ALU-uC-OP |ALU operation |Integer |0-7 | 3|\n|CONDITION |Test condition |Boolean | | 1|\n|COND-TRUE |Which address if zero = true |Integer |0-3 | 2|\n|COND-FALSE |Which address if zero = false|Integer |0-3 | 2|\n\nThis gives a MicroInstruction width of 41 bits.\n\nThis could be reduced by 10 bits from a jump field, if the JumpROM output is multiplexed with the other jump field. The CONDITION field can then be kept, but COND-TRUE and COND-FALSE can be replaced with 1 bit, which then selects the address input.\n\nThis creates the following possibilities :\n\n|CONDITION|JMP-SELECT |NEXT-ADDRESS|ZeroStatus|Result |\n| 0| 0| 0| 0|PC |\n| 0| 0| 0| 1|PC |\n| 0| 0| 1| 0|Jump field|\n| 0| 0| 1| 1|Jump field|\n| 0| 1| 0| 0|PC |\n| 0| 1| 0| 1|PC |\n| 0| 1| 1| 0|Jump ROM |\n| 0| 1| 1| 1|Jump ROM |\n| 1| 0| 0| 0|PC |\n| 1| 0| 0| 1|Jump field|\n| 1| 0| 1| 0|Jump field|\n| 1| 0| 1| 1|PC |\n| 1| 1| 0| 0|PC |\n| 1| 1| 0| 1|Jump ROM |\n| 1| 1| 1| 0|Jump ROM |\n| 1| 1| 1| 1|PC |\n\n12 bits can be saved which reduces the MicroInstruction width to 29 bits. If we consider that the ROM can be 1024 instructions long, this gives a reduction in size from 41k to 29k, which is 29% reduction. This depends of course on the size of the MicroProgram. In real numbers this is a reduction of 12288 transistors/switches.\n\nThe total number of branch possibilities is reduced to 11 instead of 32, but this will probably pose no real problem.\n\n|Name |Description |Type | Values |Bits|\n|JUMP |Jump destination |Integer |0-1023 | 10|\n|MEMORY-ENABLE|Memory access |Boolean | | 1|\n|MEMORY-RW |Read/Write |Boolean | | 1|\n|IR-LOAD |Load IR |Boolean | | 1|\n|DB-IN-LOAD |Get databus value |Boolean | | 1|\n|MAR-LOAD |Activate MAR |Boolean | | 1|\n|DB-OUT-ENABLE|Activate DB out |Boolean | | 1|\n|DB-OUT-LOAD |Register DB out |Boolean | | 1|\n|AD-SELECT |Select MAR address source |Boolean | | 1|\n|DR-ENABLE |Activate DataRegister |Boolean | | 1|\n|DR-LOAD |Load DataRegister |Boolean | | 1|\n|PC-ENABLE |Activate ProgramCounter |Boolean | | 1|\n|PC-LOAD |Load ProgramCounter |Boolean | | 1|\n|OP-SELECT |Select ALU OP source |Boolean | | 1|\n|ALU-uC-OP |ALU operation |Integer |0-7 | 3|\n|CONDITION |Test condition |Boolean | | 1|\n|JMP-SELECT |Select jump field/ROM |Boolean | | 1|\n|NEXT-ADDRESS |Select next address PC/JMP |Boolean | | 1|\n\nWhen the MicroProgram is finished, a further reduction can probably be gotten, since the program may possibly fit in less than 1024 steps. This can mean a further reduction with 1 to 4 bits (maybe).\n\nHowever, this step will not be taken yet, because it is time for implementing, not for changing requirements anymore.\n\nThere is e.g. another step that could be taken in the controller, and that is to have the possibility to execute part of the MicroProgram as a subroutine. This leads to an even smaller MicroProgram. However, at this phase the changes to any components should stop and SYS-2 should be implemented. When the system gets stable, it could be possible that the architecture of SYS-2 may be adapted or expanded, but the used omponents should not change anymore.
This register loads the MicroInstruction. The goal is to have a stable MicroInstruction while the propagation and evaluation of the MicroInstruction results can be done in the same MicroInstructionCycle.
A MicroInstruction is a coded sequence of signals which controls the operation of the attached electronics.\n\nA sequence of [[MicroInstruction]]s is called a MicroProgram.\n\n[[MicroInstruction]]s can be encoded, in which case they are followed by optional decoding logic which expands their usage, or they can be assigned bits and fields per component.
This is the information which controls the inner workings of the assembled system.\n\nThe reason to choose for a microprogrammed architecture is that it parallellises the control of the system completely, and that no other decoding or processing is necessary, except to store the result of the current MicroInstruction in the output register of the MicroController.
The MicroProgramCounter steps the MicroProgram through the MicroInstructions. It is possible, depending upon the status of the current MicroInstructionCycle, to load the MicroProgramCounter with another address instead of the next one, effectively executing a MicroProgram jump.
[img[Circuit 1|circuit_1.png][circuit_1.png]]\n\nThis is the first test of the microcontroller in an application where it it is used to control other hardware.\n\nThe MicroProgram consists of three steps :\n\n* A reset instruction\n* An increment instruction\n* A NOP instruction\n
[img[Circuit 2|circuit_2.png][circuit_2.png]]\n\nThis is a more complicated example, using a modulo counter to address a memory and subsequently load the contents of the memory into a data register.\n\nThe MicroProgram here consists of three steps :\n\n* A reset instruction\n* An increment instruction\n* A load instruction
The NotOperation is the logical NOT of a LogicalValues.\n\n!The NotOperation for a logical value\n\n|!A|!Y|\n|0|1|\n|1|0|\n\n!The NotOperation for a whole SystemWord\n\n|!Operand A| 0101100|\n|!Result C| 1010011|
[[20 November 2006]]\n[[22 November 2006]]\n[[25 November 2006]]\n[[27 November 2006]]\n[[28 November 2006]]\n
The OperationStatus is the result status of ArithmeticOperations or LogicalOperations.\n\nThe most basic OperationStatus is the ZeroStatus. This status cannot be detected through any kind of software emulation. However, it is easy to implement, because all is needed is an AND of all bits in the result.\n\nOther results can then be detected through usage of the AndOperation, which results either in ZeroStatus or not. This is for testing bits through BitMasks.\n\nBy using subtraction, values can be compared. If the value is zero, then they are equal.\n\nComparison of A and B :\n\nA - B > 0\n\nA -B = 0\n\nA -B < 0\n
[img[Optimised MicroController|modified_controller.png][modified_controller.png]]\n\nThe MicroController can be optimised by removing the MicroProgramCounter in front of the MicroProgram and moving its output to the branch multiplexer.\n\nIn the first version, the MicroProgramCounter effectively acts as a register which creates an additional pipeline stage in the controller. This leads two a two stage pipeline, where in case of a branch, the pipeline needs to be flushed and refilled.\n\nIn the optimised scheme, this does not happen, and the output pipeline register acts for what it is really needed : make it possible to select the next microinstruction in parallel with the current MicroInstruction, and at the same make it possible to select the next microinstruction based upon the outcome of the current operation, be this a branch or not.
The OrOperation is the logical OR of two LogicalValues.\n\n!The OrOperation for two logical values\n\n|!A|!B|!Y|\n|0|0|0|\n|0|1|1|\n|1|0|1|\n|1|1|1|\n\n!The OrOperation for a whole SystemWord\n\n|!Operand A| 0101100|\n|!Operand B| 1000101|\n|!Result C| 1101101|
An [[Output]] is the simplest part of the simulation. It needs to be able to provide the following things.\n* An output value, settable by the component using the simulation.\n* A state value, which simulates the floating or active status of a digital output.\nIn addition, through the way in which the [[Simulation]] is defined, an extra result value is defined, in which first the result of the OutputComputation is stored, and then propagated to the output value through the PropagationStep.
Every component must define a compution step, which is mostly defined as ''//component//-compute''.\n\nThis step must compute the OutputResult for each of its [[Outputs]], derived from its direct inputs.\nIn this step, no values are placed yet on the [[Outputs]] of the component, this is the task of the PropagationStep.
As can be seen with the [[Outputs]], every output structure has an OutputValue and an OutputResult. The OutputResult is the computed value from the OutputComputation step, but this is an intermediate value which is not visible to the components which reference the output.\n\nIn the PropagationStep, the OutputResult is stored in the OutputValue, and only then becomes available for ReferencingInputs.
The OutputValue is the value from an output which can be used by the ReferencingInputs for it.
The PassOperation is necessary to move one input of the ArithmeticLogicUnit unchanged to its output. Doing this saves on wires in the implementation of the system.
Planning is important because it gives guidelines on what to develop, and to know what things still need to be developed.\n* FeatureOverview\n* [[Requirements]]\n* [[Design]]\n* [[Implementation]]\n* [[Testing]]
The ProgramCounter is used by the system to fetch the next instruction from MainMemory.\n\nUpon SystemBootTime it should be set to a known address where the InitialProgram should be located.\n\nWhen nothing special is requested, the contents of the ProgramCounter are moved to the MemoryAddressLatch and also incremented through the ArithmeticLogicUnit.\n\nAt the end of the this MicroInstructionCycle, the MemoryAddress is latched into the MemoryAddressLatch and in the ProgramCounter.
In this step, the OutputResult of the OutputComputation is placed in the OutputValue of all [[Outputs]] defined for the [[Component]].
*Emacs folding-mode\n*The ~One-Page Guide to Using {{{noweb}}} with ~LaTeX\n*gEDA schematic overview of SYS-2\n*Instruction set reference\n*SLIME reference card\n*ILISP reference card
A referencing input of an output is the input of another component which connects to the output.
[[Requirements]] are derived from features and influence the [[Design]], the [[Implementation]] and the [[Testing]] phases.\n\nThe total volume of [[Requirements]] is probably too big to fit easy into this ~TiddlyWiki, but not attempting it is probably even worse.
This pulse is activated on SystemBootTime or initial power up and serves to set some registers in a known state.
SYS-2 is the name given to the second architecture for a self-designed CPU and computer system. SYS-1 was only a preliminary attempt, superseded after noting that the architecture could be further simplified.\n\nSYS-2 is an architecture which tries to have a minimal number of components and instructions.
* [[r1]] : Initial basic implementation\n* SYS-2-OutputDevice\n* SYS-2-InputDevice\n* SYS-2-Monitor : Application for interactive use\n* SYS-2-Interrupts\n* [[r2]] : Optimisation of ImplementationArchitecture of r2.\n* [[r3]] : Update of the controller, shorter microprogram.\n* [[r4]] : Compression of microinstruction width, electronic design.\nNothing changes in the basic ISA. Once this is needed, SYS-3 will be created, based upon an overview of all the optimisations that already have been proposed.
This peripheral is a keyboard.
!Summary of the SYS-2 instruction set\n!!Addressing mode notation\nThere are three addressing modes, for which three notations need to be found.\n\nThe first one is immediate addressing, which loads a constant, addressed by the program counter, into the PC or the DR. This can be noted as {{{OP, constant}}}. This only works for loading.\n\nThe second addressing mode is direct addressing. The program counter loads the next value, and this is used as an address. This address can then be used to load its contents from, or to store one of the registers into. As an extension of immediate addressing, this could be noted {{{OP, M[address]}}}. Another notation used is {{{OP, @address}}}.\n\nThe third addressing mode is indirect addressing. This is an extension of direct addressing, where the contents of the memory location are used as another address, which is then the target of a load or store operation. This means that the contents of {{{M[address]}}} are used as another index into the memory. Previously, the notation {{{M[M[address]]}}} was used, which is basically correct, but awkward. When the {{{@}}} notation is used, however, it can be expressed as {{{M[@address]}}}\n!!Operations\n*LD mode\n*ST PC|DR, mode\n*BR mode\n*BRZ mode\n*NOP|NOT|INC|SL|SR DR\n*ADD|AND|OR DR,mode\n*HALT
After I/O has been defined, it gets time to implement an interrupt strategy. There is not much that can be said about it for now, except that it should be simple and should not make deviations from the current setup, since it should be based upon [[r1]].
!!Overview\nThe SYS-2 MicroProgram can now be developed in a symbolic way, i.e. not as [[MicroInstruction]]s, but in pseudo-code.\n\nThis development in pseudo-code is also necessary to get a feeling of how a MicroAssembler can be defined and implemented. The problem here, vs. a normal assembler, is that several operations at one time can be executed, and so several statements must be grouped in one block.\n\nTogether with developing the MicroProgram comes also the analysis of the timing diagrams. They influence each other and interact also with the clock. This is necessary to know when actions initiated by the MicroProgram start and stop.\n\nKey [[MicroInstruction]]s must be labelled. It should be possible to extract these labels, in order to generate a jump table from them.\n\nAll MicroInstruction fields should have default (inactive) values. This makes it easier to explicitly say what has to be done in a real MicroInstruction.\n\nThe key to managing a MicroProgram properly is probably in the recognition of patterns, and abbreviating these by using macro's, and probably even recognising that certain [[MicroInstruction]]s are repeating and being able to manage these by grouping them into one macro. Possibly these macro's may need the ability to introduce them\nwith parameters (e.g. jump to another location should be possible by passing the destination address as a parameter).\n\nThe MicroProgram cannot be developed without the system schematics. A colour code may be helpful here, to highlight the different phases in the execution of an instruction. Another useful tool might be an overview derived from the electronics schematics which contains just the necessary components to create the MicroProgram from and which can be used to markup the different phases in instruction execution.\n!!Breakdown\nThe work to be done can be broken down into the following parts.\n\n|Load DR|Immediate|Load the DR with a value|\n|~|Direct|Load the contents of address into the DR|\n|~|Indirect|Load the contents through an indirect pointer into the DR|\n|Store DR|Direct|Store the DR into memory|\n|~|Indirect |Store the DR into memory through a pointer|\n|Store PC|DR|Store the PC into the DR|\n|~|Direct|Store the PC into memory|\n|~|Indirect|Store the PC into memory through a pointer|\n|Branch|Direct|Branch to the address supplied|\n|~|Indirect|Branch to the address pointed to by the operand|\n|BranchIfZero|Direct|Branch conditional to the address supplied|\n|~|Indirect|Branch conditional to the address pointed to by the operand|\n|Arithmetic|Immediate|DR OP value|\n|~|Direct|DR OP M[value]|\n|~|Indirect|DR OP M[M[value]]|\n|System|STOP|Halt the processor until reset|\n\nSetting up the MicroProgram should start with the definition of a MicroInstruction, which is a sequence of fields with distinct types and values. Since everything is based around Common Lisp, why not set it up as a struct ?\n\n{{{\n\n(deftype control ()\n `(integer 0 1))\n\n(deftype operation ()\n `(integer 0 7))\n\n(deftype source ()\n `(integer 0 3))\n\n(deftype address ()\n `(integer 0 1023))\n\n(defstruct SYS-2-Instruction\n (memory-enable 0 :type control)\n (memory-rw 0 :type control)\n (ir-load 0 :type control)\n (db-in-load 0 :type control)\n (mar-load 0 :type control)\n (db-out-enable 0 :type control)\n (db-out-load 0 :type control)\n (ad-select 0 :type control)\n (dr-enable 0 :type control)\n (dr-load 0 :type control)\n (pc-enable 0 :type control)\n (pc-load 0 :type control)\n (op-select 1 :type control)\n (alu-uC-OP 7 :type operation)\n (condition 0 :type control)\n (jmp-src-1 0 :type source)\n (jmp-src-2 0 :type source)\n (jump-1 0 :type address)\n (jump-2 0 :type address))\n\n}}}\n\nTo be able to create [[MicroInstruction]]s, a macro framework is necessary to be able to specify them separately and to tally them for labels.
\n
[[Scheme]] is a functional language, derived from Lisp. There are many implementations of it. Having a C compiler handy would make it easy to port an existing implementation to this system. However, to be able to write system software, a Scheme compiler would make more sense.
ShiftOperations work by shifting the contents of a data word to the left or to the right. To keep things easy, bits which are located on the extremes (MostSignificantBit and LeastSignificantBit) are dropped when shifted to left or right, and filled with 0 when shifted to right or left.
[img[Simple Dataprocessor|datapath_3.png][datapath_3.png]]\n\nThis architecture consists of two registers, to be used as a DataRegister and as a ProgramCounter. These can be used through the ALU. The output from the ALU is isolated from the paths which serve as inputs.\n\nFor speed, the system is constrained by the memory, because all data operations must come from the memory.\n\nHelp registers in the architecture are the memory address latch, the data write latch, the data read latch and the instruction latch.\n\nOne of the differences between this architecture and the previous one is that the ALU now must be controlled by both the MicroController and the InstructionRegister. This stems from the fact that the PC is not autonomous anymore, but must be incremented through the ALU.
Designing a simple processor architecture using the DigitalComponents simulation kit.
Simulating a digital circuit is not that easy.\n\nThrough the way the connection between [[Outputs]], [[Inputs]] and [[Buses]] are defined, one could start by trying to compute the value of an input by recursively following back from an input to an output. However, most digital circuits contain loops, which mean that the simulation will never end, or that a stack based algorithm is needed to find out if an output has already been referenced.\n\nThe solution is to do the simulation in two steps :\n# In the OutputComputation, a result value from the components direct inputs is computed. This should be done for all components.\n# In the PropagationStep, all output results are propagated to the output values of the components.\nThis gives the added advantage that there is a delay, which is also the case in a real digital circuit (however, in a real digital circuit the propagation delay time is not constant).\n\nOnly [[Buses]] should not have a delay. This means that they should not have an OutputComputation, but only a PropagationStep. To have a good simulation, this means that the bus propagation should be computed after all its connected outputs have propagated.
The SimulationGoal is to create a SimulationLibrary of functional [[Components]], to wire together a computer system in an ImplementationArchitecture, and adding an InstructionSetArchitecture through a [[MicroProgram]]med design.\n\nFrom this, experience in the following fields should be derived :\n\n* Basic computer design\n* Managing and documenting a rather complex project\n* Microprogramming\n* Knowledge and exercise in CommonLisp, since that is the implementation language\n* Building an assembler
This library defines in CommonLisp the following [[Component]]s (list can be extended).\n* [[Interconnections]]\n* [[Clock]]\n* SingleShot\n* ModuloCounter\n* [[Latch]]\n* [[Memory]]\n* [[Multiplexer]]\n* [[Splitter]]\n* MicroController\n* OptimizedMicroController\n* ArithmeticLogicUnit\nThere is also a topic on the ImplementationIssues.
A SimulationStep is one part of the simulation, where every [[Component]] is controlled to execute an OutputComputation and a PropagationStep.
Putting computer design into practice.
SimpleProcessor
There is an implementation of the LISP defined in AIM-8 at:\n\nhttp://www.informatimago.com/develop/lisp/small-cl-pgms/aim-8/\n\nbut there are a lot of small lisp implemented in lisp in textbooks, such as SICP (see chapter 4)\nhttp://mitpress.mit.edu/sicp/full-text/book/book.html\n\nand of course in LiSP "Lisp In Small Pieces"\nhttp://www.lmet.fr/fiche.cgi?_ISBN=9782916466033&_WORDS=lisp#couverture\n\n it is probably not exactly the Lisp-interpreter in Lisp you are looking for, as there are so many, but it is my attempt in this domain\n(not quite fulfilling the request of being terse):\n\nhttp://www.aviduratas.de/lisp/meval.html\n\nYou find it under the heading "Some sources" and a more expanded version through the link given directly below there (this expanded version contains also a rudimentary 'read' function in Lisp).\n\n(In case you are interested in such a thing too: On the the same page, further above, there is a presentation of a small compiler + runtime system + CPU simulator for a subset of Lisp).\n\nPaul Graham translated John McCarthy's metacircular eval to Common Lisp. Somebody put it up on Cafepress, so you can get it printed on a coffee mug if you want:\n\nhttp://www.cafepress.com/buy/lisp/-/pv_design_prod/p_storeid.14330770...\n\nI noticed that they also sell "My next car is a cdr" bumper stickers.\n\nYou may also be interested in "Write Yourself a Scheme in 48 Hours: a Haskell tutorial":\n\nhttp://halogen.note.amherst.edu/~jdtang/scheme_in_48/tutorial/overvie...\n\nMaybe someone will write "Write Yourself a Haskell in 48 Years: a Lisp tutorial" but it sounds like handling infix syntax from Lisp might be a real sticking point. ;-) \n\nYou may be thinking of Graham's "Roots of Lisp" code[1], in which he defines McCarthy's original LISP in CL using only quote, atom, eq, cons, car, cdr, cond.\n\nAlong those lines, and not all that much larger, are Norvig's scheme interpreters from PAIP, which are also available online[2]. The Lambda Papers[3] are the ultimate source of metacircular enlightenment IMO, and are excellent reading.\n\nFinally, though not at all tiny, Sacla[4] is a Common Lisp in Common Lisp. Neato. \n\nhth,\n\ndrewc\n\n[1] http://lib.store.yahoo.net/lib/paulgraham/jmc.lisp\n\n[2] http://norvig.com/paip/README.html\n\n[3] http://library.readscheme.org/page1.html\n\n[4] http://homepage1.nifty.com/bmonkey/lisp/sacla/index-en.html\n\nYou may want to look at the Movitz project, which seems to have similar goals.\n\n> The other possibility is to write a GCC backend for the architecture\n> and take advantage of all the free/open software that is available for\n> C.\n\nMy personal favorite is to help port ECL to whatever system I'm using.\nThen, I get to use the system's native C compiler, and also write CL.
The [[Splitter]] is a component which takes as input an [[Interconnection]], which in this case is always a full dataword, and splits it up in several other [[Interconnection]]s.\n\nThis is needed to derive subfields from [[Buses]] or [[Latch]]es, e.g. in the case of InstructionSplitting.\n\nThe [[Splitter]] is a nice example of a datastructure which should be created through a macro. The code for splitting up the input of the component is regular, and thus can be computed at compile time. A structure definition can then be created from which an object can be derived, which has an easy accessible [[Input]]s and where other inputs can refer to its [[Output]]s.
Also called InitialProgramLoad, but in the case of the [[Simple Dataprocessing Architecture]], which is simulated with software, the InitialProgram will mostly be located in [[MainMemory]] before the system starts.\n\nSystemBootTime is started when the system is powered on, which activates the ResetPulse. This ResetPulse will set the MicroProgramCounter and the ProgramCounter to a known value. This will be zero in case of the MicroProgramCounter, but could be a different value in case of the ProgramCounter.\n\nAs an extension to this scheme with a preloaded [[Memory]], if there is an IODeviceStrategy, it is possible to create a default IO device from which the system autonomously reads boot code and which then handles the InitialProgramLoad.
SystemMemory is the [[Memory]] used in the implemented SimpleProcessor to store instructions and data.\n\nThese contents are retrieved through either direct use of the InstructionCounter or indirect via a value loaded in the MemoryAddressRegister.
The SystemWord is a binary number with the logical size of the system, which is a number of bits.\n\nIt should be possible to change the setup of the system so that it can run with 4 bits, 8 bits, 12 bits, 16 bits, 24 bits and 32 bits (on a 64 bit platform, 48 and 64 bits can be implemented).
!What technology to choose ?\nAs fast as possible, but with a slight view on the power requirements, and of course it should be easy to purchase them nearby.\n\nIt is easier to turn down the system clock to decrease the speed of the system, than it is to turn down the gate delay time to increase the speed of the system.\n\nSince it is the edges of the logic signals which cause trouble, it does not make a difference if a slow or fast technology is chosen.\n\nThis depends on the purchase of the catalog 'De Componenten Specialist'.\n
Tools are the things with which the project is built.\n* ~OpenOffice.org for preparing documents, plans and diagrams\n* Emacs for editing source files\n* SLIME for interactively developing and testing Common Lisp programs\n* SBCL and CLISP for compiling and running the developed code\n* Subversion for versioning\n* ~TiddlyWiki to provide a web view of this project\nMissing tools are\n* A planning/tracking database for ideas, features and changes\n* a MicroAssembler\n* an [[Assembler]] for the InstructionSetArchitecture\n* a [[C compiler]] for the InstructionSetArchitecture\n* a [[Scheme]] interpreter or compiler
All parts of the SimulationLibrary have tests defined for them which can be run every time an update of the source is done. This is to ensure that at least the basic functionality of the components does not raise issues.
Until ConditionalExpression\nDo CodeBlock.\n\n# Compute ConditionalExpression\n# Negate ConditionalResult\n# BranchIfZero to Exit\n# Execute CodeBlock\n# Jump to beginning\n# Exit
!How to visualise SYS-2\nFor the further development and testing, a visualisation system is absolutely necessary. Initial analysis has already been done in [[9 January 2007]], where our choices seemed to be restricted to a graphical environment using <b>mcclim</b>, or to a character environment using <b>ncurses</b>, or something related.\n\nThe big advantage of using ncurses, together with CLISP, is that the complete environment can be used cross-platform, whereas with mcclim, one is restricted to an X Window environment.\n\nThe disadvantage of using ncurses, is that the view on the system is somewhat coarser, and not everything fits on a standard display.\n\nAlmost all signals need 4 characters, and several others may need labels to designate them. Some can be replaced by a symbolic value.\n!!What visualisation is needed ?\nVisualisation must come in a form that resembles the architecture of the system. An earlier system created showed that this was feasible, but also that much more information was needed, and that it also should be possible to draw and highlight buses.\n\nThe system should be able to get enough information from components to show which are currently active. This means that the terminal must be able to use normal and standout modes.\n\nThe main visualisation component is something which should be able to control the SYS-2 component, so that the system can be reset, started, stopped, stepped, run free and possibly have breakpoints.\n\nUsing a mouse will also not be possible, which means that the control will ultimately be done through the keyboard.
While ConditionalExpression\nDo CodeBlock.\n\n# Compute ConditionalExpression\n# BranchIfZero to Exit\n# Execute CodeBlock\n# Jump to beginning\n# Exit
This signifies if the result of a computation in the DataRegister is zero or not.
ZeroStatusHandling refers to the fact that in the [[Simple Dataprocessing Architecture]] the only status result that will be considered is if the result is zero or not.\n\nThe ultimate goal of this way of working is to do away with status registers. The reason for this is that implementing these in a fashion which is workable is at least as much work as designing a complete datapath.\n\nAfter studying some examples in the books of Patterson and Hennessy, I found out that a computer could do away with flag registers and base handling of status flag related tasks with ordinary instructions.\n\nThe only thing that is really needed is to know whether a result is zero or not.\n\nFor future expandability, restoration of the current state of the computation has to be taken into account. This means that if an interrupt occurs at a certain point in time, with the DataRegister a certain value and the ProgramCounter a certain value, the system should be able to continue at the next instruction (we let interrupts only take place when the current instruction is finished).\n\nFor this to be possible, the zero status should be a feature of the dataregister, not of the ArithmeticLogicUnit. If a zero is loaded into the DataRegister, then its output will be True, else it will be False.\n\nWhen returning from an interrupt this way, and reloading the DataRegister, the status is automatically zero again and available for the next instruction which may or may not test it.\n\nHowever, this is also not an ideal situation. In the current architecture, it should be easily possible to expand the number of dataregisters.\n\nIn that case it is not possible to store the ZeroStatus as part of the register file.\n\nThe real solution is to have an explicit test mechanism, which translates to an update of the ZeroStatus. Thus, the computation and the test are decoupled from each other. In this case, the ZeroStatus is part of the ArithmeticLogicUnit. Testing for zero will then always be part of an indivisible instruction which is executed first and can then be interrupted.\n\nThe MicroProgram will then proceed like this. After InstructionDecoding, the current MicroInstruction will pass the DataRegister through the [[ALU]] for detecting if its value is zero or not. We will use it as a branch mechanism, which will execute the sequence BranchIfZero.\n\nThe BranchIfZero can be adjusted to test and branch in the same cycle by incorporating the changes defined in [[21 November 2006]].\n# Test DataRegister result.\n# BranchIfZero Address Fetch\n# Increment PC (Skip branch address)\n# Jump to InstructionFetch\n# Address Fetch :\n## Load MAR with [PC];PC<-MAR+1\n## Fetch next instruction\n\n
!Requirements\n!Test cases
!Requirements\n!Test cases
This inline function returns the output value of the output which is referenced through the supplied input.\n\n''Syntax:''\n''connection-value'' //input// => [[word]]
!Requirements\n!Test cases
!Requirements\n!Test cases
!Requirements\n!Test cases
!Requirements\n!Test cases
!Requirements\n!Test cases
!Requirements\n!Test cases
This is the first iteration of the SYS-2 system. Its main goal is to create a working implementation of the SYS-2-InstructionSet on the current ImplementationArchitecture.
This is the second iteration of the SYS-2 architecture.\n\nFollowing the implementation and testing of [[r1]], [[r2]] adds changes to the ImplementationArchitecture and the SYS-2-MicroProgram, but has fundamentally the same instruction set. Existing test harnesses should thus be usable for verification. Due to this changes, the r2 implementation should be faster and use less resources.
The basic changes will be in the controller, which will be updated with a single-level subroutine mechanism and the removal of the second jump address. This should lead to one more register in the controller, but much less space in the microprogram. This is a reduction in depth of the microprogram.\n\nThe goal is to prepare for [[r4]], where the microprogram should be reduced in width, by analysing the used microinstructions and replacing them by different codes.
This iteration should reduce the width of the micro-instructions, by encoding them in a denser format. This is the preparation of the system for implementation in an electronic form.\n\nThe basic goal is to use standard MSI components, but this will not succeed. For special functions, PALs should be used. This will include the output functions of the controller. The microprogram should be reduced in width and depth, which will hopefully lead to an implementation which can be done by small footprint PROMs.\n\nReally building the system would need the following services :\n# PLA programming\n# PROM programming\n# PCB manufacture\nTo be really useful, the system should consist of the following components :\n# The system as designed in the simulation\n# An interactive input mechanism\n# A visual output mechanism\n# A way to boot the system\n# A general purpose I/O expansion system
!Requirements\n!Test cases
[[word]] is the datatype defined for manipulating 32-bit unsigned integers.\n\n{{{\n(deftype word ()\n `(unsigned-byte 32))\n}}}\n