 |
Welcome to the Natami / Amiga ForumThis forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.
|
Do you have ideas and feature wishes? Post them here and discuss your ideas. |
|
|---|
Matt Hey USA
| | Posts 733 17 Jun 2012 21:14
| I received an e-mail from Gunnar von Boehn on March 26, 2012 that stated:Gunnar von Boehn wrote:
| I have some information for you. The 68K CPU development is going to be separated out from the Natami into its own project. The goal of the CPU team is to bring the core to market now.The CPU team is relative small but focused. We are 3 HW developers. I would like to also get 2 good software views into the boat - to be able to discuss features and ideas from a wider angle.
|
I started to document a new 68k ISA with ideas that our Apollo Team came up with on the Apollo Forum that Gunnar created. We never planned to keep all the ideas but the documentation allows us to better evaluate ideas. Gunnar stopped posting to the Apollo Forum several weeks ago leaving the team in limbo. I believed that I was helping create a new 68k ISA unaware that the whole fpga softcore team and development had not split off from the Natami. I think our team came up with some good ideas that would be useful. I think we need a new 68k ISA to standardize enhancements made to 68k fpga softcores (not just N68k or Apollo) for compatibility and ease of development. Maybe a future ASIC could be burned as well. I hope to have some discussion about the new ISA. Nothing is set in stone yet. The first question is what name to use. I have come up with 68koolFusion (68kF for short) but it can easily be changed if someone comes up with a better name. Cool spelled starting with a k is not uncommon here in the U.S. and has the same slang meaning (fun, neat, interesting, hip, good, excellent, elegant, sophisticated). The small case k represent 1000 of course as in 68000. Fusion is for the fusing of ColdFire instructions and other new ideas ;). I didn't find any other use of the name so copyright or trademark infringement is unlikely. The different versions could be 68kF1, 68kF2, etc. and they could be specified to a compiler or assembler in that way. I have put a copyright on the ISA and documentation because we can't have people modifying a standard (beyond what is permitted) as it would defeat it's purpose. It would be better for some standards organization to hold the copyright and I would be willing to turn it over if an appropriate one was ever created. The documentation would of course be free to distribute and add notes and the ISA free to implement. The documentation is in OpenOffice Writer format. I have created html and PDF formats from this which are quite large and some formatting is lost. OpenOffice Writer EXTERNAL LINK PDF EXTERNAL LINK html EXTERNAL LINK One of the changes is allowing for immediate 16 bit extended to 32 bit values almost everywhere and would be done automatically where a size for the immediate is not specified. This can improve the speed, code density and/or register utilization (no more trash register with moveq+op). The 68kF2 ISA has OPI.L instructions with 16 bit immediates extended to longs but it replaces the instructions CMP2, CHK2, CAS.B, CAS.W and CAS2.W. These instructions would not be added to a modern processor (CAS.L would and CAS2.L might but they could be reimplemented later), are good candidates to delete (most already trapped on 68060) and are very rare on the Amiga. These old instructions would no long be trappable and would limit maximum 68020+ compatibility so they are not implemented in 68kF1. Any other modifications that could allow 68kF1 to be adopted by the likes of the TG68 or other retro cores will be considered. There isn't much encoding space for 16 bit instructions. Many of the 32 bit instructions have been split from existing ones. I hope this doesn't cause muxes or other slowdowns. I would like to hear performance considerations or enhancement encodings for any ideas. Some instructions are not documented but would still be supported of course. I can add any missing instructions that should be in the ISA. Address registers are opened up and allowed where possible. This does create some questions about setting the CC. I currently have the CC documented as not being set with a destination address register except for Bxxx, BF, MUL and DIV instructions. Read the descriptions of these instructions before telling me that all destination address registers should not set the CC. Operations to an address register would probably have to be longword operations which I don't have documented. It might be better not to allow some operations if they are more trouble than they are worth. I'm open to ideas. The SATS/SATU instructions would need some additional logic to make happen with MUL and DIV which don't set the overflow on the ColdFire. Setting some (hidden?) CC bits when the overflow CCR[V] bit is set that represent signed/unsigned underflow may be necessary. The other option is using an unused bit in MUL and DIV to turn on saturation. There is no free bits in ADD or SUB and there multiple variations. New saturated ADD and SUB instructions could be created is the other option. I created the MULADD/MULSUB as 32*32=32 only because that is the only MUL that can overflow and would require 2 saturations yet is common. This needs evaluating. The new shift and rotate instructions could be handled like: 1) Add the new 68k shift/rotate instructions with a new name (currently I and R forms). a) Allow the old form of shift/rotate to be upgraded to the new form when there is a limitation that would normally generate an error. b) Make the programmer specify the exact instruction wanted with no promotion/aliases. 2) Put the old and new shift/rotate instructions together using the same name and document the 2 different encodings. This is how the 2 different forms of shift/rotate work currently using the same name (i.e. LSR #<1-8>,Dn and LSR #1,<ea>) but only 1 encoding is possible for any given syntax unlike the new shift/rotate instructions. 3) Rename the existing 68k shift/rotate instructions with a q (for quick) on the end (i.e. LSR -> LSRQ). Add the new shift/rotate instructions with the old names but optimize to the old "quick" versions where possible (if no Q suffix on the instruction, count<=8 and destination Dn). I don't like option 2 and I'm starting to like option 3 more as it's the way the 68k might have been done if it was created today. It would be possible to simplify this with rotate in 1 direction and combine lsl and asl. Large shifts are the biggest gain and moving while shifting is nice. SBcc and SELcc were added because a CMOVcc (conditional move) type instruction and branch predication do not work well when the base register is updated. I think they are a good fit. Some instructions are simplified mnemonics (aliases) as PPC would call them. Some ColdFire instructions are remapped FF1->BFFO, BYTEREV->PERM, BITREV->PERM, etc. Have fun with PERM. It's quite powerful and kool ;). We never started to discuss addressing modes so I don't have anything there yet. Any questions, ideas or comments (likes or dislikes)?
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 17 Jun 2012 21:38
| The original had 14 address modes IIRC this is quite a lot. Be honest with yourself and which modes are least used?
| |
Wawa Tk Germany
| | Posts 581 17 Jun 2012 21:48
| i dont know if i grasp the importance of this extensions. to be honest i think it is too early. i think first goal for 68k compatible softcores (im not aware of that many, especially extended) should be to stay compatible, though faster. the day it is achieved we can think on improving on that and then establishing standards might become a valid subject, though good luck with that especially in this community. as for name its the last thing id worry about, we are not seriously going to need to market it, do we? i really liked following motorola numbers principle if you ask me. i always hated pretentious names.
| |
Matt Hey USA
| | Posts 733 17 Jun 2012 21:53
| Marcel Verdaasdonk wrote:
| The original had 14 address modes IIRC this is quite a lot. Be honest with yourself and which modes are least used?
|
Gunnar did mention that the memory indirect post and pre indexed modes would likely be trapped because they are difficult to implement in a superscaler environment. The 68060 did manage to get them working in HW though. They would be the least used of course. They might be useful (especially C++ programs) if they were fast enough and if they worked ok with superscaler. At least the Amiga doesn't use them much. I bet some Unix/Linux systems back in the day used them a lot.
| |
Matt Hey USA
| | Posts 733 17 Jun 2012 22:19
| wawa tk wrote:
| i dont know if i grasp the importance of this extensions. to be honest i think it is too early. i think first goal for 68k compatible softcores (im not aware of that many, especially extended) should be to stay compatible, though faster. the day it is achieved we can think on improving on that and then establishing standards might become a valid subject, though good luck with that especially in this community. as for name its the last thing id worry about, we are not seriously going to need to market it, do we? i really liked following motorola numbers principle if you ask me. i always hated pretentious names. |
When fpga implementations start adding their own extensions it's too late to have a standard. The last thing we want is something like x86 AMD and Intel extensions. I am attempting to create one easier to implement and more 68k compatible 68kF1 ISA with a more powerful, consistent and modern 68kF2 ISA that can continue into the future and maybe even compete with ARM. Developing a good ISA takes time and having one ready when needed would save time for developers using it. Thanks to Gunnar, much of the work is already done too :/. The ISA is different than the processor name. 68k never made a distinction but ColdFire has ISA_A, ISA_B, ISA_B+ and ISA_C while the processors are mcfxxxx or mxxxx. It also has generations like mcfv1, mcfv2, mcfv3, mcfv4, mcfv4e and mcfv5 which are usually used instead of specifying the ISA. We would have ISAs of 68kF1, 68kF2 with processors names of N68050, N68060, TG68, Apollo etc.
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 17 Jun 2012 22:37
| A clean way of adding extra instructions is to see them as coprocessor instructions IMHO.(which in some cases would be true) As for those addressing modes i shall try and device a clean way to get them super scalar no promises.
| |
Wawa Tk Germany
| | Posts 581 17 Jun 2012 22:38
| if you can impose the standards ;) btw, matt, a guy is working on netbsd drivers for mediator, which are simultanously ported to aros. in case you are interested in that. as yet mostly 1200 models are supported, but im going to join testing 4k.
| |
Samuel D Crow USA
| | (Natami Team) Posts 1295 17 Jun 2012 22:42
| I've looked through the draft document and it looks good to me. I agree that having a LSLQ and LSL as separate encodings which was plan 3. That way the old software will still see the LSLQ as a small shift, which was what it always was, and just ignore the new full-width LSL instruction which it shouldn't be aware of unless it's recompiled anyway. As for the usage of VTables in C++, the Amiga uses a different encoding in its shared libraries anyway. As you probably know, we get JSR (negativeOffset, A6) to an address which contains an absolute jump instruction. C++ compilers programmed to use the Amiga ABIs should not need the extended addressing modes. Implementing multiple inheritance is as easy as putting an additional library handle in the address space of the original library handle and doing a JSR (negativeOffset, A3) to the interface handle and have the library heap still at the positive offsets of A6, and the locals at the positive of A3. The class library would have to be marked as such so that the implemented interfaces inherited will be obtainable as marked in A3. I'd suggest using a small hash table in the positive offsets of A6 so that the interfaces can be named.
| |
Matt Hey USA
| | Posts 733 17 Jun 2012 22:59
| Marcel Verdaasdonk wrote:
| As for those addressing modes i shall try and device a clean way to get them super scalar no promises. |
Are you playing with fpga programming now too? wawa tk wrote:
| if you can impose the standards ;) |
I don't want to impose anything. I would like developers to come together and adopt standards. I've even done a lot of the work for them :).Samuel D Crow wrote:
| I agree that having a LSLQ and LSL as separate encodings which was plan 3. That way the old software will still see the LSLQ as a small shift, which was what it always was, and just ignore the new full-width LSL instruction which it shouldn't be aware of unless it's recompiled anyway. |
Yea, old software wouldn't know about the new instruction and would continue to generate what we would now call LSLQ. New software would use LSL and let the assembler optimize to LSLQ where possible which also works. It works well until a programmer loads an old assembler program into the new assembler environment (new ISA) without optimizations on (vasm default) so the longer LSL would not be converted to LSLQ. One solution is to turn on optimizations for LSL->LSLQ as the default when a 68cF ISA or CPU using it is selected. That might be the best solution. It would be the way it should have been from the beginning (a word shift of 1 in memory was powerful back then 8) and it would simplify the documentation by combining the LSLI and LSLR into LSL. We'll see what others think before I go changing things around though. Samuel D Crow wrote:
| As for the usage of VTables in C++, the Amiga uses a different encoding in its shared libraries anyway. As you probably know, we get JSR (negativeOffset, A6) to an address which contains an absolute jump instruction. C++ compilers programmed to use the Amiga ABIs should not need the extended addressing modes. Implementing multiple inheritance is as easy as putting an additional library handle in the address space of the original library handle and doing a JSR (negativeOffset, A3) to the interface handle and have the library heap still at the positive offsets of A6, and the locals at the positive of A3. The class library would have to be marked as such so that the implemented interfaces inherited will be obtainable as marked in A3. I'd suggest using a small hash table in the positive offsets of A6 so that the interfaces can be named.
|
Going through a jump table is slow. I believe it's possible to use the indirect addressing modes here also: jsr ([function+2,A6]) ;6 bytes 8+ cycles The normal way (jsr+jmp) of going through the jump table looks like it's barely faster on the 68060. Both ways are pretty slow though. I don't think it would be any problem to trap the memory indirect pre and post indexed modes with our CPU library always available in flash memory. If there is never an advantage to them, then it's probably better to get rid of (trap) them.
| |
SID Hervé France
| | Posts 663 17 Jun 2012 23:45
| Matt Hey wrote:
| The first question is what name to use. I have come up with 68koolFusion (68kF for short) but it can easily be changed if someone comes up with a better name. Cool spelled starting with a k is not uncommon here in the U.S. and has the same slang meaning (fun, neat, interesting, hip, good, excellent, elegant, sophisticated). The small case k represent 1000 of course as in 68000. Fusion is for the fusing of ColdFire instructions and other new ideas ;). I didn't find any other use of the name so copyright or trademark infringement is unlikely. The different versions could be 68kF1, 68kF2, etc. and they could be specified to a compiler or assembler in that way.
|
Hello Some quick considerations: Any new product results from market research witch will give the design and the communication. If the demand does not exist, then it must be created. The technical market is atypical and realistic. The name of a product must be in accordance with its final environment and it must not be misleading. You can create a line and decline your production according to several criteria. For example: Apollo for the name of a range N68k is a version for the project NatAmi 68kF is a version to the consumer market "acronym of your choice" for the professional market Of course, this will complicate the communication because you will have to consider several types of audiences.
Marcel Verdaasdonk wrote:
| A clean way of adding extra instructions is to see them as coprocessor instructions IMHO.
|
I'm not sure that's a good idea because I remember that this way has a significant impact (it was a topic dealing with extension)
| |
Matt Hey USA
| | Posts 733 18 Jun 2012 01:29
| SID Hervé wrote:
| Marcel Verdaasdonk wrote:
| A clean way of adding extra instructions is to see them as coprocessor instructions IMHO. |
I'm not sure that's a good idea because I remember that this way has a significant impact (it was a topic dealing with extension)
|
None of the ISA instructions are in F-line or A-line. We left out the ColdFire MOV3Q and MAC instructions in A-line. 68k MAC software emulation should work with 68kF1 and may work with 68kF2 or may require some patching. Motorola did use F-line (coprocessor) instructions which are more 68k compatible but eat up FPU and SIMD encoding space. MOVE16 is F-Line as are the CPU32 table instructions. I don't like the encoding for MOVE16 and I don't like how the table instructions are implemented. We did look at MOVE16 and possibly a MOVE8 as well. It would be easy to document MOVE16 in F-line as the original. It's probably not a priority but a move that avoids the caches on large copies is probably more important with large caches and more memory. MOVE16 is implemented a lot like a separate coprocessor so maybe that makes it ok?
| |
Marcel Verdaasdonk Netherlands
| | Posts 3976 18 Jun 2012 10:28
| Matt i was aware of that CPU32 used the F lines. As for Pre-de/incrementation it is possible with a small performance penalty post has a larger cost.
| |
Wojtek P Poland
| | Posts 1597 19 Jun 2012 06:45
| SID Hervé wrote:
| Apollo for the name of a range N68k is a version for the project NatAmi
|
Natami is already scrapped. 68k compatible core can have "established" market now so it will be continued.
|
|
|
Marcel Verdaasdonk Netherlands
| | Posts 3976 19 Jun 2012 07:30
| Wojtek no amiga or atari then there would not be a need for a 68K because the supporting software base is too small! Besides that if Thomas really quits I will continue the project with or without his permission. besides that comments like that should at least result in a temp ban.
| |
Megol .
| | Posts 675 19 Jun 2012 09:28
| IMHO reserving an expensive 16 bit encoding for something like ABS is a mistake. While such an instruction is useful the alternatives of using a longer encoding or just doing it in software aren't bad. Instead the encoding could be saved for a future expansion to 64 bits as a prefix: [Z][REG] Z=1 indicates data size override, Long (32 bits)-> Quad (64 bits). REG specifies destination register (same type as the following instruction) for three address operations. or [REG ] REG specifies destination register, data or address. Operation size is always changed when using the prefix so that Word -> Quad. Byte and Long sized operations aren't changed. or even [Z][C][X][R] Z=size override. C=condition code update. X=register expansion, 1=instruction field will be followed by a register field. Could allow 3 address instructions + more registers. R=reserved, must be zero.
| |
SID Hervé France
| | Posts 663 19 Jun 2012 20:24
| Wojtek P wrote:
| SID Hervé wrote:
| Apollo for the name of a range N68k is a version for the project NatAmi |
Natami is already scrapped. 68k compatible core can have "established" market now so it will be continued. |
| HelloGet out an element of its context is not very elegant. Here is an antibiotic For example: R is the name of the range; J is a version for the project; M is a version for the mass market; P is a version for the professional market. This should prevent any misappropriation.
|
|
Evil Igel Germany
| | Posts 154 19 Jun 2012 21:19
| Wojtek P wrote:
| Natami is already scrapped. 68k compatible core can have "established" market now so it will be continued. |
Ah, interesting, thanks for a bit of info, even for such bad. Months of silence, Thomas split up to create his own MD-Board and what is left after all these years are much of these now: ?????? The Natami Project is (was) maybe one of the greatest attempt to bring back an awesome architecture back to life but recently everythimng seems to fall apart. Yes, there is maybe a market for an 68k-compatible, enhanced FPGA-Core but as one of countless AMIGA-users and potentional buyers of a Natami out there i wanna know: Can i expect something sometime what represent the original idea of the Natami or is it really scrapped? I dont wanna offend anyone, you are all awesome guys but i (and many others here) feel left behind in a way.
| |
André Jernung Sweden
| | (MX-Board Owner) Posts 988 19 Jun 2012 22:04
| Wojtek P wrote:
| Natami is already scrapped. 68k compatible core can have "established" market now so it will be continued.
|
Would you kindly stop spreading nonsense, please?
| |
Nixus Minimax Germany
| | Posts 272 19 Jun 2012 22:21
| Evil Igel wrote:
| | Can i expect something sometime what represent the original idea of the Natami or is it really scrapped? |
I think the answer to both questions is "no".
| |
Matt Hey USA
| | Posts 733 20 Jun 2012 01:16
| Samuel D Crow wrote:
| I agree that having a LSLQ and LSL as separate encodings which was plan 3. That way the old software will still see the LSLQ as a small shift, which was what it always was, and just ignore the new full-width LSL instruction which it shouldn't be aware of unless it's recompiled anyway.
|
Alright, here is what the PRM looks like with the old shift/rotate instructions becoming quick versions as mentioned above. OpenOffice Writer EXTERNAL LINK PDF EXTERNAL LINK html EXTERNAL LINK Megol . wrote:
| IMHO reserving an expensive 16 bit encoding for something like ABS is a mistake. While such an instruction is useful the alternatives of using a longer encoding or just doing it in software aren't bad.
|
Actually, there are plenty of possible encoding slots that have 4 bits free for a register. The current location used by ABS and POPCNT has 6 more slots free for similar instructions. This space has 7 bits free. This is the space for LEA An,An and LEA Dn,An that don't make sense. There are other instructions where some addressing modes don't make sense that are free also. I would like to keep the writable (PC) destination addressing modes free in the ISA and let the particular implementation decide what to do with them. They could implement them, trap them or declare the result undefined. Frank Wille and ThoR didn't like them and some may not consider an ISA with them "professional". Even so, I could probably find some other unused slots to stick these 2 instructions as you are correct that this space with 7 bits free could be used better. I just started filling the empty slots that Gunnar had left free and suggested using. Here is his Apollo encoding OpenOffice spreadsheet which I have modified and fixed bugs in: EXTERNAL LINK It could use some improvements and updating but may give you a little different perspective. I was originally against adding ABS if the replacement code's branch could be predicated. Predication and conditional instructions do not work very well with addressing modes that update the base register we found. ABS would save a branch that can be difficult to predict and it's a fairly useful and common DSP like instruction. The encoding space and logic should be minimal for this type of instruction. We could even add more register only DSP like instruction in other empty slots where a branch or loop can be eliminated. PPC has this instruction by the way. Megol . wrote:
| Instead the encoding could be saved for a future expansion to 64 bits as a prefix: [Z][REG] Z=1 indicates data size override, Long (32 bits)-> Quad (64 bits). REG specifies destination register (same type as the following instruction) for three address operations. or [REG ] REG specifies destination register, data or address. Operation size is always changed when using the prefix so that Word -> Quad. Byte and Long sized operations aren't changed. or even [Z][C][X][R] Z=size override. C=condition code update. X=register expansion, 1=instruction field will be followed by a register field. Could allow 3 address instructions + more registers. R=reserved, must be zero.
|
The prefix idea could be good but I don't know if the 68k ISA is consistent enough to take full advantage. Personally, I wouldn't try to add full 64 bit support with a .Q (quadword) encoding. I've looked at ways to do it and it would make the 68k too complex and ruin the code density. The condition code flag in the prefix would be good. I would try to add a full <ea> rather than just a register to the prefix. The 68k is setup to handle 2x <ea>. The 3 op would be nice but I don't know if the added complexity is worth while. We already added some powerful data movement operations with 68koolFusion. Adding more registers is easier than adding 64 bit support but it looks like it would be difficult to take full advantage of them because of inconsistencies and limitations of the 68k. I don't want something confusing like ARM with it's different register sets. I think 16 registers is the best for code density and enough with powerful instructions that don't needlessly trash them. We can also work in memory (actually cache) efficiently which RISC can not. If I were designing a new Super CISC, I would do the prefix with CC flag, <ea> for 3 op, and leave some bits for other options depending on the instruction like saturation for ADD/SUB/MUL/DIV. If you can work out a good system for the 68k with the prefix that is simple and consistent then let's take a look. I think you will find that it's not as easy as you think to retrofit to the existing 68k. It would actually be easier to create a completely new ISA but we want to keep current software compatibility and development tools. I have tried to keep some of the feel and look of 68k also.
| |
|
|