Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Do you have questions about the Natami?
Post it here and we will answer it!

Advantages of X86?page  1 2 3 4 5 6 
Marcel Verdaasdonk
Netherlands

Posts 3974
12 Oct 2009 13:33


Aw man that reminds me on a x86 if you don't set A20, forget having more then 1Mb.(This used to be part of the keyboard controller)

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
12 Oct 2009 14:13


Gunnar von Boehn wrote:

  Recently we brainstormed about options to enhance the 68K instruction set.
 
  During this discussion we found some instructions which the x86-CPUs do have but the 68K-CPUs is missing.
  Examples were MOVEcc or SHIFT #imm,Reg with a immediate higher than 8.
 
  We know that GCC support for x86 is actually good. This made me wonder if generally adding instructions which the x86 has but 68K lacks would be a good idea, as this might help us to reuse compiler tricks on x86 for us.
 

  The next "damn useful" instructions in my experience - for fast code at least - are vector instructions that operate on four integers or floats on parallel. I suggest leaving that to the 070, though, as it requires rethinking the design, I guess.
 
  The x86 does have some string/block instructions as well, but those are also pretty complex to implement and not just merged from several existing instructions - besides, similar to movem they cause some headache in exception processing, so don't bother.
 
  One particular nice thing about the x86 (and the PPC) is that a move not necessarily modifies the condition codes which helps in branch-optimization, i.e. the test can be moved before a couple of moves later (never on the x86, the PPC had two forms of moves IIRC). Probably a move that leaves condition codes alone might be useful.

The x86 also has - but I think that's already on your list - the "move and extend" operation that automatically loads zeros (or sign-extends) the operand. Comes handy. This would be needed byte-sized and word-sized, of course.

So long,
  Thomas
 

R. Leffmann
Sweden

Posts 16
12 Oct 2009 15:29


How about extending the addressing capabilities to make the instruction set a bit more orthogonal? It's a shame you can't use PC-relative addressing everywhere.


Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
12 Oct 2009 15:52


R. Leffmann wrote:

How about extending the addressing capabilities to make the instruction set a bit more orthogonal? It's a shame you can't use PC-relative addressing everywhere.

There is an idea behind this restriction: PC-relative addressing is only allowed for instructions that don't alter the stored data. The rationale is here that data you address indirectly over the PC is part of the "code" segment of the program and thus non-alterable. Modifiable data is either on the stack (indirect over a7 or the frame pointer, usually a5) or on the data segment (indirect over the BSS/DATA pointer, usually a4).

Of course, you can break this concept by first loading a PC-relative addressing into an address register, but this is "unintentional".

I personally would keep this restriction, it makes a lot of sense and supports good programming practice.

So long,
Thomas


Ernest Unrau
Canada

Posts 32
12 Oct 2009 16:53


Ernest Unrau wrote:

Gunnar von Boehn wrote:

      Recently we brainstormed about options to enhance the 68K instruction set...
       
      This made me wonder if generally adding instructions which the x86 has but 68K lacks would be a good idea, as this might help us to reuse compiler tricks on x86 for us.
       
        I'm not really an x86 expert and my x86 ASM days are also over decade ago.
       
        So maybe someone can step in here and help us a bit...
       
        Are there any really cool instructions on x86 which are damn useful - that would make sense to "borrow" for our new 68K-family?
       
     

     
  Gunnar - I'm not a programmer at all on on the level that you guys are. But I have had a fair bit of communication with a fellow who is - but then he's also nearing completion of his own entire x86 OS and his own compiler. He's the fellow who has ported Ghostscript (latest version 8.60) for amiga, and you can find out more here:
     
    EXTERNAL LINK - - ghostscript page at "www.whoosh777.pwp.blueyonder.co.uk"
    EXTERNAL LINK  - - x86 os project at "www.whoosh777.pwp.blueyonder.co.uk/os.html"
   

Okay, I alerted Whoosh as to the tenor of this thread, and he took the time and effort to send the following, which I am "forwarding" here. I suggest, though, if you have questions, that you contact him directly at the above links, because I am way over my head here :-)

Regards,

Ernest Unrau
Morden, Manitoba
CANADA

***FORWARDED REPLY**

well you can forward the following reply provided you clearly note that
it is forwarded

Gunnar von Boehn (Natami Team Member) posed this question:

GVB>Recently we brainstormed about options to enhance the 68K 
GVB>instruction set.
GVB>During this discussion we found some instructions which the x86-CPUs 
GVB>do have but the 68K-CPUs is missing.
GVB>Examples were MOVEcc or SHIFT #imm,Reg with a immediate higher than 8.

68020 upwards has bfffo which finds the leftmost 1 of the operand

x86 has
bsr (bit scan reverse) which finds the leftmost 1
but also has
bsf (bit scan forwards) which finds the rightmost 1,

x86 being little endian regards  «---  as forwards and ---» as reverse
x86 counts both bits and bytes thus:  ... 5, 4, 3, 2, 1, 0

ie for x86 the entire memory is one gigantic number, which is demarcated
leftwards either in bits, bytes, words, ints, or quads.

much more logical than big endian where bits are done in the opposite
direction from bytes.

bfffo has more options than bsr, but 68k doesnt appear to have an equivalent
of bsf (maybe it does but a cursory look at the docs I only found bfffo)

x86 has HUGE amounts of supervisor things which 68k doesnt, x86 supervisor
is literally a lot of apps.

68k has 2 MMU tables, one for user and one for supervisor, which is a
much better idea than x86's having just 1 MMU table.
very tricky to make a total change of the VM mapping with x86 whereas
straightforward with 68k as you switch to the supervisor table and
then change the user table and then switch back.

also 68k allows you to use physical memory directly, whereas 64 bit x86
doesnt. (I think 32 bit x86 does)

64 bit x86 can only be done with an MMU, whereas 68k and 32 bit x86 IIRC can
be used without an MMU.

but supervisor and bit scans are typically not used by compilers,
they are more for handcoded asm.

TBH at the user level 68k is fine for compilers, but see the later comments.

the BIG thing x86 has which 68k doesnt is 64 bit. if 64 bit versions of the
68k registers can be done then 68k would catch up with x86 at the user and
compiler level.

the way x86 extends things is with prefix bytes called REX prefixes, these are
only relevant to assemblers and are implicit at the user asm level.

the REX prefix has the binary form: 0100WRXB
if W is 1 it means the instruction is promoted to 64 bit.
(R X and B are very complicated, x86 specific and not relevant to 68k)

user asm will say eg

  add rax, rbx  ;  rax = rax + rbx, 64 bit registers

but the ACTUAL asm will be 32 bit with a REX prefix with W set to promote it
to 64 bit,

more precisely the above compiles to the following bits:

4801d8  in hex

and the 32 bit version

  add eax, ebx

compiles to:

01d8

the 64 bit version is just the 32 bit version with the promotion prefix
01001000

in x86 asm the 8 bit, 16 bit, 32 bit, 64 bit versions of register a are
denoted al, ax, eax, and rax respectively.
similarly bl, bx, ebx and rbx for b etc.

68k denotes the size at the opcode:

  add.w  d0, d1    ; 68k
  add.l  d0, d1    ; 68k

x86 denotes it at the register:

  add eax, ebx    ; x86
  add rax, rbx    ; x86

the 68k notation is more efficient as it regards size as an opcode attribute
and not a register attribute.

also with 68k the dest is the right operand, and with x86 it is the left
operand.

if you reflect 68k you get k86 which looks like x86!
x86 is a reflection of 68k,

I have written an entire pre-emptive multi processor kernel in 64 bit x86 asm
so x86 64 bit asm is pretty good at the notational level. At the bit level
it is enormously cumbersome. at the user asm level it is like a 64 bit version
of 68k asm. but at the bit level 68k is a much better design.

x86 instructions have LOTS of prefixes and the total size of a 64 bit x86
instruction can be up to 15 bytes.

using amigaos pattern matching notation an x86 instruction is:

#legacy_prefix (rex_prefix | ) opcode1 opcode2? (ModRM SIB?)? ( | (disp) |
(disp disp) | (disp disp disp disp) ) ( | imm | (imm imm ) | (imm imm imm imm)
)

but REX.W is the only one relevant here.

the success of x86 is based on brute force hackwork to catch up with other
systems. instead of giving up which Motorola did, x86 patched a design much worse than
68k and have maintained backwards compatibility all the way to around 1980. when you
switch the newest PC on, it starts off in 1980 as a 16 bit computer with about 1MB of
ram, and bootstraps in stages to modernity. when you use the early startup on a PC
you are in 1980 on a 1MB 16 bit PC with 64K ROM.

the Amiga abandoned backwards compatibility right at the A600 where they
changed some disk interface chip, some games no longer functioned. todays PCs have
maintained backwards compatibility all the way back to 1980.

x86 is an EXTREME form of CISC, but is based on a RISC core.
if you code x86 supervisor asm you soon realise that x86 is an app, ie it is
done in software in the microcode. eg when an interrupt occurs, x86 literally runs
some microcode software to enact the interrupt. the microcode software at the end
of each instruction polls for interrupts, ie x86 interrupts are not true
interrupts but are polled by the microcode. but that makes them well behaved,

the other major advantage of x86 asm is that ALL registers are gpr,
ALL can be used for addresses AND numbers,

with 68k  a0 to a7 are address registers, and d0 to d7 are data registers
eg with 68k you cannot do * and / with address registers,
whereas with x86 you can do anything with everything which doubles the power
of the registers.

and 64 bit x86 has 16 registers, just like 68k. 32 bit x86 has just 8
registers which arguably is worse than 68k. 8 gprs versus 16 nongprs

"64 bit" and "promotion to gprs" perhaps could be done using a promotion
prefix, 68k uses 16 bit aligned instructions, so for 68k you could use a 16 bit
promotion prefix using an undefined instruction (if any are left, you are supposed to
always leave SOME instructions unallocated!)
that would still be just 4 bytes, much less than the 15 x86 can use.
a sweet feature of x86 is that some instructions are just one byte!

x86 instructions also are completely unaligned, eg the above "add rax, rbx" is
3 bytes. with caches alignment isnt such a big problem, unalignment has ZERO cost IF
its all within a cache line.

x86 is completely at odds with RISC principles, it flouts ALL the principles
of RISC and yet is the most successful CPU ever.

GVB>We know that GCC support for x86 is actually good. This made me 
GVB>wonder if generally adding instructions which the x86 has but 68K 
GVB>lacks would be a good idea, as this might help us to reuse compiler 
GVB>tricks on x86 for us.
GVB>
GVB>I'm not really an x86 expert and my x86 ASM days are also over 
GVB>decade ago.
GVB>
GVB>So maybe someone can step in here and help us a bit.
GVB>
GVB>Are there any really cool instructions on x86 which are damn useful 
GVB>- that would make sense to "borrow" for our new 68K-family?

the main way to catch up with x86 at the user level are
make all the registers into 64 bit gprs using the same methodology x86 has
used which is modifier prefices.
see above for how its done.

the modifier prefix is only used where the original 68k cannot express the
action.

eg

  muls.l a0, a1

would use a modifier
as would

  mul.q d0, d1

I would say 68k is better than user level 32 bit x86

***END FORWARDED REPLY***



Rune Stensland
Norway
(MX-Board Owner)
Posts 871
12 Oct 2009 17:54


Implemtent MMX (as a paralell coprosessor)
 
  Eight MMX registers (MM0..MM7).
  Four MMX data types (packed bytes, packed words, packed double words, and quad word).
  57 MMX Instructions.
 
  Almost all mpeg / inline x86 asm is mmx.
 
  C programs will run faster

..

Or implement a fast compatible 68882 fpu..

Samuel D Crow
USA
(Natami Team)
Posts 1295
12 Oct 2009 18:34


@SP

The Robin core will have all of that.  We won't need it in the '050.

Ernest Unrau
Canada

Posts 32
12 Oct 2009 18:44


re. x86 instructions:

Ernest Unrau wrote:

  Okay, I alerted Whoosh as to the tenor of this thread, and he took the time and effort to send the following, which I am "forwarding" here. I suggest, though, if you have questions, that you contact him directly at the above links, because I am way over my head here :-)
 

More comments from Whoosh on the x86 instruction subject below...

-Ernest Unrau

***FORWARDED REPLY FROM WHOOSH***

basically the problem isnt so much the instruction set
but is at a lower structural level. mainly the registers.

68k has        sixteen 32 bit registers
64 bit x86 has sixteen 64 bit gpr registers

but 68k's registers are:

a0 a1 a2 a3 a4 a5 a6 a7

which can only be used for address ops, ie they are meant to only
contain memory addresses.

and

d0 d1 d2 d3 d4 d5 d6 d7

which can only be used for maths ops,

what is needed is to "generalise" them so that ALL can be used
for address and maths ops. a register which can be used for anything
is referred to as a "general purpose register" or gpr.
all the registers on x86 and say PPC are gpr's.

the registers also need to be widened to 64 bits to catch up with x86.
registers are just hardware variables, which are much faster than software
variables.

compilers typically do all maths with registers, moving the operands to register,
doing the maths and then moving the result from the dest register. This led to
the RISC idea where maths can only be done on registers.

when a compiler does a = b + c, where a,b,c are memory cells, it typically
does:

  mov reg1, b
  mov reg2, c
  add reg1, reg2
  mov a, reg1

with RISC that is the only way to do maths on memory, and its called a
load-store architecture. Memory can only be loaded to and stored from
registers, no other ops. whereas CISC eg 68k and x86 allow ops directly on memory,
with x86 only one operand can be in memory.

Now 68k's instructions are specified in 16 bits, eg
the add instruction is:

1 1 0 1 reg2 reg1 reg0 opmode2 opmode1 opmode0 mode2 mode1 mode0 ea_reg2 ea_reg1 ea_reg0

the 1101 is the opcode, encoding "add"
and each symbol represents 1 binary digit, 16 in total.
now there are no spare bits, but
opmode can be 000 001 010 100 101 110
which means there are 2 "free" values: 011 and 111,
maybe you could encode 64 bit with the existing opcode, unfortunately a
problem!

adda uses the same format and uses opmode==011 and 111.

how to achieve 64 bit add?

you have to use a special prefix instruction which "modifies" the next
instruction,

eg modifies 32 bits to 64, or modifies address register to data register

eg modifies

  add.l a0, d1

to 

  add.q a0, d1

.l means we only add the first 32 bits of the registers,
.q would mean we add the first 64 bits.

once you have 64 bit registers, you need to also extend the MMU to 64 bits.

at this moment in time 32 bit is still fine, but eg if you wanted to store
a DVD in ram, 4.7G that is a bit beyond 32 bit which is 4G

things wont go beyond 64 bit for the foreseeable future because
it would take centuries just to zero 2^64 bytes at todays speeds.
ie 64 bit is unfeasible to utilise.

similarly there is a practical limit on the sizes of drives, eg it took me
nearly 3 hours to backup a 50G drive. 5MB/s copy speed

x86 had the same problem of all bits being accounted for, how to extend
to 64 bits? they decided to use a prefix "opcode" to modify the following
instruction.

and it worked because x86 is damn fast despite the bloated instruction code
format.

you can upload the above if you want,

***END FORWARDED REPLY***


Gunnar von Boehn
Germany
(Moderator)
Posts 5775
12 Oct 2009 20:24


It looks like my post was a bit unclear and caused some confusion.
I think I need to clarify a few things:

We do not want to create a x86 CPU

We do not want to execute x86 code.

We do not want to support little endian/wrong endian.

But what I would like to do, is to steal some good ideas.

E.g if another architecture has a instruction which help to do certain task easier or more efficient than why not add these instructions to the 68K also?

wrote:

It's a shame you can't use PC-relative addressing everywhere.

The 68050 does NOT implement this limitation.
In other words writing to PC-relative is possible on 68050.
PC-relative is a "normal" full featured address-mode on 68050.
This means you can read AND write to it.
This greatly help code generation on the GCC - as the GCC is not able to handle context sensitive address modes.

Regarding 64-bit.
We will NOT implement 64-bit integer registers.
The disadvantages of 64Bit integer registers do outweight the advantages by far. Its much better for us to stay 32bit.

Cheers
 

Marcel Verdaasdonk
Netherlands

Posts 3974
12 Oct 2009 21:24


Gunnar would it be posible to create more registers then?(I know that this has been answered in the past)
 

Chris S
United Kingdom

Posts 18
12 Oct 2009 21:48


Marcel Verdaasdonk wrote:

  Gunnar would it be posible to create more registers then?(I know that this has been answered in the past)
   
 

 
  How about register sets for fast context switching, or indeed any other purpose, like the z80?
 

Marcel Verdaasdonk
Netherlands

Posts 3974
12 Oct 2009 22:31


Chris you know that a Z80 is a 8080 clone but better, right?

In general, i think the 68050 should be compared to the 68060 in the same manner a Z80 compares to a 8080. ;)

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
12 Oct 2009 22:37


Marcel Verdaasdonk wrote:

Gunnar would it be posible to create more registers then?(I know that this has been answered in the past)

The problem is that the used register is encoded in all the 68K instruction in such a way that selecting more registers than 16 is not possible.

So even if we add more registers than selecting them with the current 68K instruction set is not possible.

If anyone has a good idea how ...

But the 68050 does change one thing:
Many instruction which only worked on DATA registers can now also access ADDRESS registers.



Matt Hey
USA

Posts 726
13 Oct 2009 00:20


I think it's best to keep the 68050 integer unit small and efficient. No 64 bit, no more registers, only simple fast instructions added. It could do most of the simple work on small data sizes and bit manipulations. Then for the 68070, actually make a combination fpu/vector unit that is compatible with the 68k fpu. It would be much easier to add more and larger registers and be able to add powerful instructions without messing up the efficiency of the 68k itself. Most of the fpu and vector units do some similar work but were designed many years apart so don't necessarily work efficiently together and waste chip space with similar functionality. Full 68k fpu compatibility might not be possible but rounding to 64 bit from 96bit fp would not cause a problem in most cases. 64 bit fp move to 2 integer data registers and vice verse could be added. I like this better than the idea of the Robin being a device even though extra registers have to be saved during task switching. Again, if possible, I would like to have 68k compatibility. I haven't heard much specific about the Robin though and would appreciate more info on it (were there threads?).

I do like the idea of making the registers more general purpose and flexible where it's easy and fast. I thought Thomas Richter's idea of a move without setting condition codes was interesting as long as it could be kept 2 bytes long. Anything longer and a movem could be used instead. The move with extension is already available when moving a word to an address register. Would it be possible to allow a byte move to address register so we could get free byte extension too?

Whoosh talks about us using the 68k bfffo but if it goes through a trap it will be too slow to be usable. If we had fast register and immediate 68k bit field instructions then we would have an advantage on UAE instead of a similar bottleneck. ff1 could replace but I thought this was something the fpga could do well. Either bfffo or ff1 instruction coupled with byterev should make a x86 bsr (bit scan reverse) instruction unnecessary.


Mr. de Brun
USA

Posts 17
13 Oct 2009 02:34


I think the SMARTEST thing to do is ask current/recent Amiga software developers.  Perhaps if you cater your hardware to them, they might develop for it specifically utilizing their recommended x86 instructions added to your fpga. Ask Olaf Barthel or people/companies with a track record.

Fabian Nunez
USA

Posts 312
13 Oct 2009 03:51


My suggestion is to look at CPU intensive tasks that are expected to be used frequently, eg mp3 or H.263/H.264 decoding, and add opcodes that help with the most frequent calculations (there should be plenty of fast x86/amd64 asm implementations to look at and see what clever tricks coders use).  The original designers of the M68K actually did that (with mainframe CPUs of the time - the 68000 is a lot like the PDP11), but obviously the kind of programs people run in 2009 are different to the programs they were running in 1976.
 

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
13 Oct 2009 06:26


Matt Hey wrote:

I think it's best to keep the 68050 integer unit small and efficient. No 64 bit, no more registers, only simple fast instructions added.

Yes.

* Going to 64 bit int would be more of a disadvantage than advantage.

* More register is a nice idea but difficult to implement with the encoding

Also more INT register are not really needed.
The 68K has 16 registers which is quite a lot.
And the 68K can operate on cache/memory without penalty.
You can even keep counters in memory without penalty.


subq #1,xxx(a7)
bne .loop

is as fast as

subq.l #1,d0
bne .loop

Because of the memory-manipulation-capabilities of 68k-Cisc, more registers are not that needed and not that important for us.

Matt Hey wrote:

Then for the 68070, actually make a combination fpu/vector unit that is compatible with the 68k fpu. It would be much easier to add more and larger registers and be able to add powerful instructions without messing up the efficiency of the 68k itself.

The implementation challenge will does stay the same whether you add it to the 050 or the 070.

Matt Hey wrote:

  I like this better than the idea of the Robin being a device even though extra registers have to be saved during task switching.

If you add an vector unit or powerful PFU to the 68k then this unit needs to fit in the "world" of the 68k design.

The 68k is optimized for a certain "workload".
Its pipeline length is optimized for this too.

A good integer core want to keep a short pipeline for short jumps.

A good vector core want to create a long pipeline for more throughput. For a pure vector calculation task you need many features of the 68K CPU not at all.

This means splitting the core would increase the power of the 68K and it would allow increasing the power of the vector core.
Also by splitting you would have two cores which can do two things in parallel. Splitting would thereby allow to increase the peak performance by 2-3 times over the combined solution.

But this thread was actually about options to improve the 68K integer unit.

BTW adding new instructions to the 68K does not need to slow it down. Certain things are possible to do nearly for free with the current 050 design. We takled already about that MOVEcc could be added for free. The same is with the MOVse type instructions - its only an ea-extension to the EXT instruction. Or in other words the core knows this instruction already but only uses a limited version of it (the EXT) as of today.

So anyone any ideas?

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
13 Oct 2009 07:14


BTW, I agree that adding an FPU to the 68K could be of advantage.

Adding a really powerful FPU with instructions like FMAD is difficult.
As for this it would be beneficial to extend the pipeline.
But extending the pipeline would be of disadvantage for the Integer-Unit.

Not easy to decide what to do here. :-/

If we add an FPU to the 050/070 I would drop 96bit mode.
This mode is fluff and not supported by C-language anyway.

What would be nice and important will be increasing the number of FPU registers.
And this is actually easy to do as the 68K instruction set luckily  has free encoding space here  :-)

Having 16 or 32 FPU registers would be a real big improvement on its own - if you want to do complex Matrix calculations as you need a lot of scratch registers for this.


Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
13 Oct 2009 07:51


Matt Hey wrote:

Most of the fpu and vector units do some similar work but were designed many years apart so don't necessarily work efficiently together and waste chip space with similar functionality. Full 68k fpu compatibility might not be possible but rounding to 64 bit from 96bit fp would not cause a problem in most cases. 64 bit fp move to 2 integer data registers and vice verse could be added. I like this better than the idea of the Robin being a device even though extra registers have to be saved during task switching. Again, if possible, I would like to have 68k compatibility. I haven't heard much specific about the Robin though and would appreciate more info on it (were there threads?).

MMX/XMMS and FPU registers on the x86 use the same hardware AFAIk, thus no chip space wasted. Problem is of course that you cannot use them in parallel. It is quite interesting to note that even though the old x86 stack-based FPU looks much less elegant than the register based XMMS unit, the AMDs are faster in the "old crappy mode" in my experience. For the intel's this is no longer the case.

Task switching with an FPU, or any type of vector unit does not cause quite as much overhead as you might think. For the FPU, for example, exec first runs an "fsave" and then checks whether the FPU is in reset state or not. If the FPU is not used at all, exec doesn't save any registers, only the status word. FPU registers are only saved if the FPU is either idle or busy, but not if it is unused.

Similar methods could be used with a 68070 coprocessor which should probably contain also a vector unit. Don't save the state if its not needed.

So long,
Thomas


Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
13 Oct 2009 08:14


Gunnar von Boehn wrote:

BTW, I agree that adding an FPU to the 68K could be of advantage.
 
  Adding a really powerful FPU with instructions like FMAD is difficult.

Not really required, at least for signal processing purposes. Most algorithms there use lifting right away, and a full matrix multiplication is either too costly anyhow, or to special to be supported by hardware. Don't bother.

Gunnar von Boehn wrote:

  If we add an FPU to the 050/070 I would drop 96bit mode.
  This mode is fluff and not supported by C-language anyway.

Since when that? (-: C doesn't say anything about floating point types except that double must be at least as precise as float, and long double must be at least as precise as double. "long double" is the 80 (resp. 96) bit mode on MC68K, and well supported by the gcc, and AFAIK part of the C99 standard.

I would prefer to keep it, came handy from time to time where I really needed more precision. If so, however, I would prefer the 68040 and up "precision selection" where the rounding precision is directly selected in the instruction and not a floating point status register. This caused a lot of mess in AmigaOs; problem there is that the status register is local to the task, which means that the mathieee-libraries must always be opened by the task that uses them, quite unlike any other libraries. If you check the RKRMs, you'll find a strange clause there that libraries are "task local" and a new task has to re-open all the libraries it wants to use. The FPU-rounding is the reason why this has been put in. Usually nobody bothers and things "just work" - but not quite "as unprecise as intended" if the FPU comes into play.

Gunnar von Boehn wrote:

  What would be nice and important will be increasing the number of FPU registers.
  And this is actually easy to do as the 68K instruction set luckily  has free encoding space here  :-)

Interestingly, I never ever run out of FPU registers, but often run out of address registers. (One wasted for the stack, one wasted for the library base, one wasted as base register for the data section, and some compilers wasted another one for the frame pointer. With two taken as scratch registers, only a2 and a3 were available for register variables in for some C compilers - tight.). Hard to fix. The Z80 had an "exchange register set" instruction with all registers existing twice, but only one half of the register set available to the program - still an ugly solution.

Gunnar von Boehn wrote:

  Having 16 or 32 FPU registers would be a real big improvement on its own - if you want to do complex Matrix calculations as you need a lot of scratch registers for this.

Think so? That only helps if the matrix fits into the registers completely, otherwise it is just a loop over rows and columns which is usually fast enough. If you want something faster, you usually don't want a full matrix multiplication in first place; then you have special algorithms that work on sparse matrices, and data structures to represent them, or you use lifting instead of full multiplication.

It would help if we had instructions that would run several operations in parallel on the registers, something like

fsmul fp0,fp1:fp2:fp3:fp4

multiply fp1 to fp4 with fp0 in single precision. For that I have an immediate use-case. (Yes, single precision would be perfectly ok here).

Probably another extension: A couple of applications, most notably OpenGL, also OpenEXR, support a half-float format taking up 16 bits (1+5+9 IIRC). This would also come handy for graphics manipulations.

So long,
Thomas


posts 104page  1 2 3 4 5 6