| 68050 Gets Bitfield Instrustions | page 1 2
|
|---|
|
|---|
Gunnar von Boehn Germany
| | (Natami Team Member) Posts 3738 25 Feb 2010 10:01
| We decided to complete our full 68K instruction set support of our 68050 CPU. To complete it we now started work on adding support for the Bitfield instructions to our core.Bitfield instructions were a feature that was added to the 68020 CPU. Bitfield instructions were supported by 020/030/040 and 060 CPUs. I think it should be possible to implement the instructions in a much faster way then they were on the original 68K CPU. I'm personally very curious if we manage to do this. :-)
| |
Jacek Rafal Tatko Espania
| | Posts 515 25 Feb 2010 12:03
| Good Luck .·'* Bitfield
| |
Samuel D Crow USA
| | Posts 546 25 Feb 2010 16:45
| @Gunnar Excellent! I hope the 68050 is a complete success!
| |
Thomas Richter Germany
| | Posts 699 25 Feb 2010 22:01
| Gunnar von Boehn wrote:
| We decided to complete our full 68K instruction set support of our 68050 CPU. To complete it we now started work on adding support for the Bitfield instructions to our core.
|
Very nice, bitfield instructions might come handy for some applications, i.e. locating free sectors in the FFS bitmap, and working on bitmap graphics.If I may add a suggestion, and an experience from my projects: It was often helpful to have an (albeit incomplete) prototype that "kind of" works to detect problems in the design - there are often more than enough bugs that require fixing, and usually many bad surprises still happen even though the goal seems "almost complete". This is, of course, the ugly and annoying part of the job, I know. (-: What I want to suggest is, at this point, not to get lost in details like bitfields, but get the Natami hardware and software ready to a degree where the whole system integration can be tested. I would be surprised if there weren't enough problems waiting to be solved by you in that final stage. (-: Let bitfields wait for a while if necessary - it is nothing that needs to be available immediately. Greetings, Thomas
| |
Matt Hey USA
| | Posts 204 25 Feb 2010 23:40
| @Gunnar Great! The N050 will have an advantage over UAE instead of a disadvantage. A lot of existing 68k code can be used "as is" saving effort for better problems. You even worked out reading or writing across memory bounds?
| |
Gunnar von Boehn Germany
| | (Natami Team Member) Posts 3738 26 Feb 2010 05:43
| Thomas Richter wrote:
| If I may add a suggestion, and an experience from my projects: It was often helpful to have an (albeit incomplete) prototype that "kind of" works to detect problems in the design. |
You are of course right. We are running a lot 68k code on the 050 in similation to find and fix issues it. This is "normal" chip design technique. All of todays modern chips (IBM/Intel/etc) are developed like this. We have a complete regression set of 68K core that does test the individual addressing modes and different instructions. This has helped us a lot to find and fix issues.
| |
Team Chaos Leader USA
| | (Natami Team Member) Posts 1199 26 Feb 2010 06:27
| I have never used bitfield instructions before. I wanted to use them but it was always faster to use a sequence of simpler & faster instructions. It would be pretty amazing if bitfield instructions suddenly became faster than a wheelbarrow full of simple instructions. :)
| |
Thomas Richter Germany
| | Posts 699 26 Feb 2010 07:48
| Team Chaos Leader wrote:
| I have never used bitfield instructions before. I wanted to use them but it was always faster to use a sequence of simpler & faster instructions. It would be pretty amazing if bitfield instructions suddenly became faster than a wheelbarrow full of simple instructions. :)
|
The only time I used bitfields was in the MMU library, namely to extract the fields from the MMU descriptors. Not because it was faster, but because it was simpler to write. Otherwise, you need to shift and mask, and since so many bitfield extractions were required, I got lazy computing all the masks.... (-:So long, Thomas
| |
Marcel Verdaasdonk Netherlands
| | Posts 2100 26 Feb 2010 10:04
| IMHO i think if this is faster then the workaround then this is terrific.
| |
Morgan Johansson Sweden
| | Posts 45 01 Mar 2010 08:29
| Nice, I recall using those instructions in my old NES-emulator "A/NES" for decoding Action Replay/Game Genie codes. :)
| |
Phil G. France
| | (Natami Team Member) Posts 151 04 Mar 2010 14:47
| Quick bit-field operations can be a nice edge over other architectures, and I'm happy to see you care about them. They can be useful for multimedia, in bit readers - e.g. my PNG viewer uses bfextu for deflate decoding. IMO they need to be fast for registers, moderately fast for memory, and whatever when it comes to accessing several longwords (could even trap, I've never seen some code needing this). You may wish to extend them, too, as the extension word has bit #15 free. It can be used f.e. for counting bits in the reverse order. What do you think about it ?
| |
Gunnar von Boehn Germany
| | (Natami Team Member) Posts 3738 04 Mar 2010 18:07
| Phil G. wrote:
| You may wish to extend them, too, as the extension word has bit #15 free. |
Bit 15 in the extension word does select between Data(=0) and Addr(=1) Registers. All the 68K instructions using the extension word were designed by Motorola that in theory they could work on all 16 registers (not only on the Data registers). But this ability was never used. The 68050 uses this feature.
| |
Phil G. France
| | (Natami Team Member) Posts 151 07 Mar 2010 10:37
| Gunnar von Boehn wrote:
| Bit 15 in the extension word does select between Data(=0) and Addr(=1) Registers. All the 68K instructions using the extension word were designed by Motorola that in theory they could work on all 16 registers (not only on the Data registers). But this ability was never used. The 68050 uses this feature.
|
It wasn't really "never used", as MOVES, MOVEC, CMP2 and CHK2 actually use it. Well, I won't complain if address registers have more uses, as I'm constantly out of data regs ;-)But perhaps it might be useful to document these new 68050 features somewhere. Just saying "An registers are usable everywhere" isn't enough, as you can't do e.g. scc An.
| |
Marcel Verdaasdonk Netherlands
| | Posts 2100 08 Mar 2010 14:34
| Phil G. is right it is interesting to see the setup and design goals. Maybe it could be made in a checklist style so everyone knows what is currently worked on and what is next on the todo list.I doubt something would be setup though on this end. Perhaps they fear that they might get advice on how to implement. :P
| |
Thomas Richter Germany
| | Posts 699 08 Mar 2010 19:52
| Gunnar von Boehn wrote:
| Bit 15 in the extension word does select between Data(=0) and Addr(=1) Registers. All the 68K instructions using the extension word were designed by Motorola that in theory they could work on all 16 registers (not only on the Data registers). But this ability was never used. The 68050 uses this feature.
|
Hmm. Probably all the bit-fiddling instructions are too special to allow a broad use of them on address registers - I don't quite see the need there, honestly. I personally run out of address registers much easier, but the 020 alternative via dp(za0,dx.l) was too slow to be useful, i.e. suppressed address register with long indexed data register. (a7 taken as stack, a6 as library, a4 as data pointer, sometimes even a5 as frame pointer. a0 and a1 as scratch registers, only a2 and a3 remained, sometimes also a5. A bit tight.)I would rather keep the instruction space for some future, more interesting use. But anyhow, let's not make a big fuzz about it... So long, Thomas Hey, an idea! The following *would* be actually useful: Address register indirect with multiplier: move.l (a0*4),d0 -or- move.l (d2*4),d1 A perfect use for Tripos and its BPTRs. (-; (The 020 addressing modes would again have that with suppressed base register, address register as index and multiplier).
| |
Team Chaos Leader USA
| | (Natami Team Member) Posts 1199 08 Mar 2010 23:12
| I personally run out of address registers before I run out of data registers. It is a common ongoing neverending problem.
| |
Gunnar von Boehn Germany
| | (Natami Team Member) Posts 3738 09 Mar 2010 06:43
| Thomas Richter wrote:
| Hmm. Probably all the bit-fiddling instructions are too special to allow a broad use of them on address registers
|
The 68K instruction set, defines a 4bit field (Bit:15/14/13/12) to allow addressing of all 16 registers freely. The N68050 can do full logic and arithmetic on Adressregisters. This means the N68050 can do MUL, AND, OR, BIT operation on them. Thomas Richter wrote:
| Hey, an idea! The following *would* be actually useful: Address register indirect with multiplier:move.l (a0*4),d0 -or- move.l (d2*4),d1
|
Yes, this is supported and these address modes are free. They do NOT cost any clock extra. But one comment on the address calculation. Mind that any register ued in the address calculation should NOT be touched in the cycle before. This is bad: addq.l #4,D0 move (A0,D0),D1 This is bad: AND.l #-2,A0 move (A0),D1 These bad constructs will cause a "bubble" between those two instructions. This behavior was allways the case also 040 and 060. The only exeptions are SUBA and ADDA. SUBA and ADDA use a special ALU which is closer to the top, therefore this ALU can forward without penalty. This means that the below is OK to do: adda.l #4,A0 move (A0),D1 All clear now? Cheers
| |
Gunnar von Boehn Germany
| | (Natami Team Member) Posts 3738 09 Mar 2010 06:58
| I think we should write an update of the 68K manual and we should outline the changes (enhancements) in the instruction set in it clearly. Basically the N68050 does what all the 68K CPU did before. But we have 1 major enhancement. The 68040 and 68060 and 68050 have the following pipeline structure: 1) Decoding 2) Register Operant fetch 3) EA-Calculation 4) Cache/Memory fetch 5) NORMAL ALU operation 6) Result Storeback To Register and Cache/Memory These CPUs have 2 ALUs. 1 ALU is used for the EA Calculations in stage 3 1 ALU is used for the NORMAL ALU operation in stage 5 The instructions ADDA, LEA, SUBA are executed in the EA-ALU in stage 3. ALL other ALU-OPERATIONS are executed in stage 5. The two ALUs increase CPU performance a lot. They allow the 68K to do a EA calculation and a normal ALU per clock. In general the 68K has EXCELLENT forwarding. Sequential code is handled very good by the 68K architecture. There is one small thing you need to mind - if an register is updated by the ALU in stage 5 then the next 2 instructions should now use this register in the EA-ALU in stage 3. The N68050 does follow the original 68K CPU design to 100%. But we did a small enhancement. The original ALU in stage 5 did only update DATA registers. We dropped this limit - the ALU can now update both DATA and ADDRESS registers. Does this make sense? Cheers
| |
Gunnar von Boehn Germany
| | (Natami Team Member) Posts 3738 09 Mar 2010 07:10
| Regarding which instructions can now operate on ADDRESS registers and which can not. This is actually very simple to explain: A 68K instruction has three different ways to encode which target it updates: A) 6 bit EA-field. Most instruction use this encoding: This encoding always allowed to target a Address-register but the instructions executed in the Normal ALU were just not able to do this. This is "fixed" so you can now do arithmetic on address registers. B) 4 Bit Register pointer in the 1st extension word. This encoding was used by some new instructions which were added with the 68020 CPU to the 68K Instruction set. These 4 bits allow targeting Data- and Adress-registers. The old 68K did support targeting the Adress registers for some instructions but not for all. This is simplified with the N050. It can simply address all now. C) Sometimes the DATA register to update is encoded in a 3Bit field. Example of instruction encoded like this are MOVEQ or SCC. The 3Bit encoding only allows targeting data registers of course.
| |
Team Chaos Leader USA
| | (Natami Team Member) Posts 1199 09 Mar 2010 07:40
| Thomas Richter wrote:
| Hey, an idea! The following *would* be actually useful: Address register indirect with multiplier: move.l (a0*4),d0 -or- move.l (d2*4),d1 |
Gunnar von Boehn wrote:
| Yes, this is supported and these address modes are free. They do NOT cost any clock extra. |
Maybe I am mixed up, but I thought Thor was inventing a brand new ASM BCPL addressing mode?Did you use your time-machine to back-support this new mode? (:
| |
|