|
|---|
Andreas G. Szabo Germany
| | Posts 134 08 Jan 2012 10:22
| Hi, I dont know if this was allready written here. I have two new opcodes in mind: - return on condition (rtcc) - making ascii chars same case (cas dx) Andreas
| |
Team Chaos Leader USA
| | (Moderator) Posts 2094 08 Jan 2012 15:14
| Andreas G. Szabo wrote:
| - making ascii chars same case (cas dx)
|
Please explain.
| |
Thomas Hirsch Germany
| | (MX-Board Owner) Posts 647 08 Jan 2012 15:54
| Team Chaos Leader wrote:
|
Andreas G. Szabo wrote:
| - making ascii chars same case (cas dx) |
Please explain.
|
He might mean a TOUPPER(dx), which would be equivalent to andi.l #$dfdfdfdf,dx or a TOLOWER(dx) which would be ori.l #$20202020,dx
| |
Team Chaos Leader USA
| | (Moderator) Posts 2094 08 Jan 2012 16:31
| @Thomas Ah that makes much more sense!
| |
Andreas G. Szabo Germany
| | Posts 134 08 Jan 2012 18:57
| Yes, toupper/tolower is what I mean. But it is not that simple: only characters that are within the a-z o A-Z range, may be touched. Otherwise it does not work correctly. That's the reason why I ask for this. It may be dx or (ax) or what ever.
| |
André Jernung Sweden
| | (MX-Board Owner) Posts 988 08 Jan 2012 19:11
| In which use cases for case conversion is the existing instruction set insufficient or too slow? I cannot think of one. I can see why RTcc would be a useful instruction, though.
| |
Andreas G. Szabo Germany
| | Posts 134 08 Jan 2012 19:25
| It's a string comparing for a sort algorythm that should be very efficient. If I use the existing instrutions I must check the chars for range a-z or A-Z, before I may change the bit 5. This makes my code thrice as long in size and speed. This is the case sensitive code: .cmploop move.b (a1)+,d1 move.b (a0)+,d0 beq.s .cmp cmp.b d0,d1 beq.s .cmploop .cmp sub.b d1,d0 .rts rts
| |
Samuel D Crow USA
| | (Natami Team) Posts 1295 08 Jan 2012 21:02
| Couldn't an opcode fused predication do the work of RTcc? Like this: Bcc.b label RTS label: It would become a single opcode internally, even though the encoding looks like 2 opcodes.
| |
Matt Hey USA
| | Posts 726 09 Jan 2012 00:15
| Samuel D Crow wrote:
| Couldn't an opcode fused predication do the work of RTcc? |
Probably, but that's not that common of a sequence. There is usually a return code put in d0 and/or registers restored with movem.l and/or an unlk instruction before the rts. @Andreas We looked at bsrcc which is somewhat similar to rtscc. It would be possible to use the unused least significant "odd" bit of the branch distance to modify a bcc to a bsrcc but most programmers thought that this free bit would be better used for other purposes. Using a bcc plus bsr instruction is more flexible allowing for variables to be loaded only if the branch is taken. It looks to me like rtscc would also reduce flexibility as the return value and register/stack restoration would have to be done first and this could only occur at the end of the function. I'm open to seeing some real world examples that would use it though. Preferably larger examples that use rtscc multiple times in the same function. I think there would be some benefit to a lower/upper instruction but again I have my doubts that it's worthwhile. It would avoid 2 branches but my concerns are that 8 bit ascii is being used less and less on modern computers and that ascii processing is usually not time critical. The exception I can think of is for compilers where extra speed is quite useful. We did discuss x86 string handling which does not look like a good idea but it is more with loops... CLICK HERE I would prefer more general purpose instructions if possible. We did discuss min, max, saturate, range type instructions which the majority (and Gunnar ;) thought could be useful additions. I suppose that a range type instruction would be most to the point in this case but takes a bit of encoding space. tolower: range {$41,$5a},d0 ;set CC Z flag if d0 is $41-$5a (A-Z ascii) bne .notupper add.l #$20,d0 ;adding $20 to d0 converts to lower case .notupper: toupper: range {$61,$7a},d0 ;set CC Z flag if d0 is $61-$7a (a-z ascii) bne .notlower sub.l #$20,d0 ;subtracting $20 from d0 converts to upper case .notlower: A range instruction would have a rather large encoding but if it could be done it would be powerful (2 branches avoided) and general purpose (I can think of many other places it could be used). The CC V flag could be set if the value is above the range and the CC C or N flag if the value is below the range. A min or max instruction could be used instead with a subtraction before but it would destroy the contents of the register and still require 2 branches. The N68k will be able to fetch large instructions with no penalty but there may be other problems to this type of instruction as I'm no expert on the hardware side.
| |
Nixus Minimax Germany
| | Posts 272 09 Jan 2012 12:02
| Matt Hey wrote:
| I would prefer more general purpose instructions if possible. We did discuss min, max, saturate, range type instructions which the majority (and Gunnar ;) thought could be useful additions. |
I think "count leading zeros" and "count leading ones" are useful general purpose instructions. They can be used for an estimate of the logarithm and hence in a lot of optimised mathematical functions (divisions, square roots a.s.o.). They do just what the name says: count the number of 0s or 1s with which a word begins and place the result [0..32] in the destination register.
| |
Thomas Richter Germany
| | (MX-Board Owner) Posts 1425 09 Jan 2012 12:38
| Andreas G. Szabo wrote:
| It's a string comparing for a sort algorythm that should be very efficient. If I use the existing instrutions I must check the chars for range a-z or A-Z, before I may change the bit 5. This makes my code thrice as long in size and speed.
|
This is too specific, and it is likely an opcode that is almost always of no use, unless you fit the character set to the instruction set of the CPU. And doing this seems to be a very bad idea to me. Why, in specific, ISO-LATIN-1, and hardwire this in the processor? If so, programs should probably use UTF8 in the future. If ISO-Latin is good enough for you, XOR the two values, mask out bit 5, and if this is equal, it is a "near hit" that then requires an additional, but slower validation step. In most cases your algorithm would go into the fast compare and get a result. Hardwiring this into the CPU is pretty much a premature optimization.
| |
Thierry Atheist Canada
| | Posts 1828 09 Jan 2012 17:42
| Could a sorting subroutine be turned into it's own co-processor? With several instructions and even it's own registers? Any use?
| |
Deep Sub Micron Germany
| | (MX-Board Owner) Posts 566 09 Jan 2012 19:30
| What I have learned so far is that it is really hard to find new and useful opcodes. Most of the ideas so far are shot down early. Even some opcodes implemented by Motorola should have been shot down early :-)
| |
Andreas G. Szabo Germany
| | Posts 134 09 Jan 2012 19:33
| Deep, you are probably lookin for a risc cpu. ;)
| |
Matt Hey USA
| | Posts 726 09 Jan 2012 20:45
| Nixus Minimax wrote:
| I think "count leading zeros" and "count leading ones" are useful general purpose instructions. They can be used for an estimate of the logarithm and hence in a lot of optimised mathematical functions (divisions, square roots a.s.o.). They do just what the name says: count the number of 0s or 1s with which a word begins and place the result [0..32] in the destination register.
|
A bit/population count was one of the other possible new instructions that was thought to be beneficial. I suggested a bitfield instruction for the flexibility like bfcnt. I can't seem to find the thread or I would link it. I don't think it would help with changing the case though ;). @all Combining 2 condition tests allowing the removal of a branch is pretty powerful. The 68k is not very good at combining 2 comparisons and acting on them with 1 branch because the CC gets destroyed often. Certain well thought out instructions can overcome this problem. Dbrcc is another example that can avoid 2 branches (even more powerful in a loop) but is difficult to use as is. Even long encoded instructions may be worthwhile if 2 branches can be combined into 1 and are general purpose enough. It's at least worth thinking about for those that overlooked this interesting detail :).
| |
Thierry Atheist Canada
| | Posts 1828 09 Jan 2012 21:32
| Thierry Atheist wrote:
| Could a sorting subroutine be turned into it's own co-processor? With several instructions and even it's own registers? Any use?
|
I don't really understand how CPUs work. Could other, really small co-processors be made to do small tasks just like the AMIGA's original co-processors?
| |
Ceti 331 United Kingdom
| | Posts 282 09 Jan 2012 22:59
| Thierry Atheist wrote:
|
Thierry Atheist wrote:
| Could a sorting subroutine be turned into it's own co-processor? With several instructions and even it's own registers? Any use? |
I don't really understand how CPUs work. Could other, really small co-processors be made to do small tasks just like the AMIGA's original co-processors?
|
cpu is the ultimate jack of all trades, but you are describing dsp type approach, many early 3d machines would have a simple CPU then a DSP for bulk repetitive mathsr.e. sorting , sort networks implemented on GPU's are pretty interesting. with the ability to sort & scatter/gather, very wide data parallel becomes more useful
| |
Megol .
| | Posts 672 10 Jan 2012 16:23
| Thierry Atheist wrote:
| Could a sorting subroutine be turned into it's own co-processor? With several instructions and even it's own registers? Any use?
|
Yes and it have been done. Not worth it for normal use but some server tasks can be accelerated.
| |
|