Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Welcome to the Natami lounge.
Meet new AMIGA friends here and enjoy having a friendly chit chat.

Frontend - Backend System to Use I386 Processors?page  1 2 3 
Wojtek P
Poland

Posts 1597
18 Feb 2011 21:28


Thomas Clarke wrote:

Hi Jorge,
 
  It's an interesting idea you've got there. I do like that Natami has a native 68K CPU, but your idea of a FPGA-based JIT/code translator could make for an interesting accelerator for classic Amigas.
 
  I suppose the question boils down to whether you can build a FPGA JIT that's much faster than a native CPU. FPGAs generally seem very good at speeding up tasks if they can be parallelised. I don't know enough about JITs to know if this is applicable, anyone help fill us in? Are there any existing software-based JITs that make use of multi-core CPUs,
 

You may put high end x86 CPU+it's own memory on separate board+small FPGA just to interface it to s-zorro bus and use 68k emulator/translator code for say UAE.
Then put it instead of 68060 board into natami mainboard and it will work. Most probably faster that current 68050 in many cases.

just don't forget to include 100W power supply ;)


Thomas Clarke
United Kingdom

Posts 286
18 Feb 2011 22:38


Wojtek, the thing I'm trying to establish is whether you can make a parallel JIT.

If you can, then a FPGA has a decent chance of being part of a very fast x86-based, 68k-compatible Amiga accelerator. In fact the bottleneck would most likely be the CPU card interface rather than on the accelerator itself.

Wojtek P
Poland

Posts 1597
18 Feb 2011 23:28


Thomas Clarke wrote:

Wojtek, the thing I'm trying to establish is whether you can make a parallel JIT.
 
  If you can, then a FPGA has a decent chance of being part of a very fast x86-based, 68k-compatible Amiga accelerator. In fact the bottleneck would most likely be the CPU card interface rather than on the accelerator itself.

your idea of connecting modern x86 (or powerpc, or whatever) processor to hardware translator is simply impossible and i already explained why.



Marcel Verdaasdonk
Netherlands

Posts 3979
19 Feb 2011 00:51


Wojtek your being a nay sayer here.

It is a interesting Idea to talk about but be realistic here on sane developer would get a I7 for a power up card.
This is because the power requirement cannot me met in clear way by the SZorro bus.
Furthermore there would be a issue of heat dispensation.

It would be smarter to use either e ELV or a embedded processor.(ATOM ARM you name it)


Thomas Clarke
United Kingdom

Posts 286
19 Feb 2011 01:02


Wojtek P wrote:

 
Thomas Clarke wrote:

  Wojtek, the thing I'm trying to establish is whether you can make a parallel JIT.
   
    If you can, then a FPGA has a decent chance of being part of a very fast x86-based, 68k-compatible Amiga accelerator. In fact the bottleneck would most likely be the CPU card interface rather than on the accelerator itself.
 

  your idea of connecting modern x86 (or powerpc, or whatever) processor to hardware translator is simply impossible and i already explained why.

 
No you didn't.
 
It's not impossible to design HDL that takes 68k instructions as input and outputs x86 instructions. The question is, is it worth designing this device? My argument is, it might be, if you can decode multiple instructions in parallel.
 
How might such a device work? With most 68k programs you're looking at code that needs to be executed in sequence, so even if you decode in parallel the instructions must keep their place in the queue. Let's say we have 4 68k to x86 translators in our FPGA (the number 4 is chosen completely arbitrarily). Decoding might look like this:
 
68k instruction 1 -> Core 1 -> x86 position 1 (completed in clock 1000)
68k instruction 2 -> Core 2 -> x86 position 2 (completed in clock 3000)
68k instruction 3 -> Core 3 -> x86 position 3 (completed in clock 2000)
 
The example above is overly simplistic, but what I wanted to get across was the following: if a later instruction (which doesn't directly depend on the preceding instruction) is completed in less time (such as 3rd instruction before 2nd instruction in example above), you would most likely still wait for the earlier instruction to execute before moving on in sequence. However, you've still saved yourself time as the later instruction is cached, ready to use.
 
Of course there's a limit to how far you can push this, as later instructions may depend on values that are still to be calculated. Please also note I do not doubt other people could make a more elegant design. Nevertheless, I don't see any technical reason why you can't make 68k-to-x86 decoding a parallelised task. If you doubt it can be done, say so, and back it up with proof.

Jorge Windmeisser Oliver

Posts 32
19 Feb 2011 01:08


Wojtek P wrote:

Thomas Clarke wrote:

  Wojtek, the thing I'm trying to establish is whether you can make a parallel JIT.
 
  If you can, then a FPGA has a decent chance of being part of a very fast x86-based, 68k-compatible Amiga accelerator. In fact the bottleneck would most likely be the CPU card interface rather than on the accelerator itself.
 

  your idea of connecting modern x86 (or powerpc, or whatever) processor to hardware translator is simply impossible and i already explained why.
 
 

Hi Wojtek,

I don't know what gives you the authority to make such a bold statement.

EXTERNAL LINK 
From the document linked above (emphasis mine):

<i>This chapter provides an overview of the options Altera® provides to connect an external processor to an Altera FPGA or Hardcopy® device. These interface options include the PCI Express, PCI, RapidIO®, serial peripheral interface (SPI) interface or a <b>simple custom bridge that you can design yourself</b></i>.

<i>Offload pre- or post- processing of data to the external processor.</i>

In my case the post-processing data bit, seems to be the thing I'm looking for.

So conneting an external processor to a FPGA for post processing data is possible.

Really it would have astounded me that such a thing wouldn't be possible...

Cheers :)


Jorge Windmeisser Oliver

Posts 32
19 Feb 2011 01:11


Hi Marcel,

I agree, better put 40 new dual core ARM cortext designs that would roughly consume around 30 watts and give you the same speed as the fastest consumer i7 ;) the "only" problem, real estate :D

Cheers ;)

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
19 Feb 2011 08:43


Jorge Windmeisser Oliver wrote:

Hello All,
 
  last day I was thinking if it was possible to use a FPGA to translate assembler statements from 68000 code to, for example, i386 code.
 
  Would such a thing be possible? Or would data dependencies make this impossible? Or is it just a silly thought?

I fail to see the benefit of this concept.

This proposal sounds:
A) definitely slower and less performance than doing the whole 68K.
B) More expensive than doing the complete CPU in the FPGA
C) A lot more work
This proposal sounds


Deep Sub Micron
Germany
(MX-Board Owner)
Posts 567
19 Feb 2011 10:04


Gunnar von Boehn wrote:

Jorge Windmeisser Oliver wrote:

  Hello All,
 
  last day I was thinking if it was possible to use a FPGA to translate assembler statements from 68000 code to, for example, i386 code.
   
  Would such a thing be possible? Or would data dependencies make this impossible? Or is it just a silly thought?
 

 
  I fail to see the benefit of this concept.
 
  This proposal sounds:
  A) definitely slower and less performance than doing the whole 68K.
  B) More expensive than doing the complete CPU in the FPGA
  C) A lot more work
  This proposal sounds
 

Well the proposal of HW accelerated JIT can be implemented with the same kind of 68k decoder we use but instead of feeding a execute pipeline it has to generate machine code for another back end CPU. In the first pass this can be just as fast as the decoder. So there is no benefit yet (except for instructions like divide). If the code is executed a second time then the back end CPU can use it's cache and can execute it faster. But there is a much higher latency when jumps are mis-predicted or when fetching new code.

While B) and C) is obviously true. A) seems to be a bit more like a guess.

But there are a lot of fallacies and pitfalls to take care of. So a real estimate which one is faster is a lot of work. So I think it is better to just execute the decoder output. That is already enough work to do :-)

It reminds me a little of Transmeta's code morphing microprocessors, but the other way around.


Marcel Verdaasdonk
Netherlands

Posts 3979
19 Feb 2011 11:34


Gunnar von Boehn wrote:

Jorge Windmeisser Oliver wrote:

  Hello All,
 
  last day I was thinking if it was possible to use a FPGA to translate assembler statements from 68000 code to, for example, i386 code.
   
  Would such a thing be possible? Or would data dependencies make this impossible? Or is it just a silly thought?
 

 
  I fail to see the benefit of this concept.
 
  This proposal sounds:
  A) definitely slower and less performance than doing the whole 68K.
  B) More expensive than doing the complete CPU in the FPGA
  C) A lot more work
  This proposal sounds
 

Let's say we use a PPC CPU for this idea.
We then would have a PPC CPU card.

The FPGA would translate 68K code to PPC code.
A added advantage with the PPC would be when you need the real PPC for PPC code.

So the whole Idea is a interesting subject for CS classes.
Another one is that is no longer matter if we have access to real 68K's for CPU board.
This Idea could be applied to a real 68K CPU too to add the missing instructions we would have on the N68050 and N68070.
It would work for a dragonball or coldfire in the same manner.

So it creates a form of independence.

A.) It doesn't per default mean you get less performance it would however add to the latency.
B.) AMD sells CPU's for 30 euro's i cannot make one for that same price.
C.) This is indeed some work but if it's more then designing everything in a CPU is debatable.(now that the CPU is nearly done yes it would but if you only had a decoder...)


Gunnar von Boehn
Germany
(Moderator)
Posts 5775
19 Feb 2011 14:03


Marcel Verdaasdonk wrote:

  Let's say we use a PPC CPU for this idea.
  We then would have a PPC CPU card.

This first misconception here is that translating an 68K instruction to another CPU is not a simple 1 to 1 translation.

Especially handling of flags can take very many instructions.
This means your decoder will have to output not 1 instructions put sometimes 10 to the x86 or PPC backend.

A good JIT will optimise this and verify of the flags are used by the following instructions and if not it will remove the emulated instructions that would create the flags.

This means your translator can not work on single instructions - you need to "compile" big amounts of code and rewrite it - and recalculating all branch and jump address for the emulating host.

The next problem is that you need keep track of the translation.
If 68K instruction are replaced / changed you need to destroy the translated code -and flush the caches of the x86 system.

In a nutshell :
With a simple 1 to 1 instruction translation you gain nothing.
This would not perform at all.

What you want it a clever just in time compiler with code analysis.
Such a project is absolutely NOT good for doing in Hardware.
The same as you would not want to develop a C-compiler ASIC.



Michael Ward
USA

Posts 234
19 Feb 2011 14:27


Jakob Eriksson wrote:

It is a cool idea. However something far, far easier to implement would be to just put the x86 processor in there and do JIT, like Amithlon did.

Speaking of Amithlon...With the advent of an open sourced kickstart replacement, I think it more possible to resurrect this concept. And all in open source of course. Use a mainstream linux kernel or maybe even something like NetBSD. And have the ability for Aros68K or original workbench. While one was at it, rewrite a 68K JIT in the manner Bernd discussed in Umilator specs (64bit w/ lookup tables). Between this little package and Natami, 68K should be around for much time to come. Bounty anyone?

Jakob Eriksson
Sweden
(Moderator)
Posts 1097
19 Feb 2011 14:45


Thank you Gunnar for your explanation, this is what wanted to say but could not find words. THERE is not 1 to 1 translation between 68k and x86 instructions!
 
 
  This is the killer problem. Add to that practical problem like bus latency etc. If anyone is crazy enough to do this (this will not happen in Natami) I suggest a test:
  Design an accelerator for A1200. (Like someone in this thread said.) But instead of a 68k processor, there is a x86 in there with its' own memory and everything. PLUS an FPGA to adapt buses etc. Then boot that x86 up from a local ROM which contains nothing but a fast JIT. As also mentioned in this thread, a good JIT can reach near native speeds with code inlining.
 
  This is quite a loot of work, both in physical hardware, cards, HDL-programming, JIT-programming etc but entirely doable. I imagine it will take as much effort as the Natami project did, if not more. Good luck.
 
 

Jorge Windmeisser Oliver

Posts 32
19 Feb 2011 15:45


Gunnar von Boehn wrote:

Jorge Windmeisser Oliver wrote:

  Hello All,
 
  last day I was thinking if it was possible to use a FPGA to translate assembler statements from 68000 code to, for example, i386 code.
   
  Would such a thing be possible? Or would data dependencies make this impossible? Or is it just a silly thought?
 

 
  I fail to see the benefit of this concept.
 
  This proposal sounds:
  A) definitely slower and less performance than doing the whole 68K.
  B) More expensive than doing the complete CPU in the FPGA
  C) A lot more work
  This proposal sounds
 

Hi Gunnar,

The backend-frontend system has an inherent benefit, independence from the underlaying technology. But as all abstraction layers it doesn't come without penalty.

You could do a card that has the master (the FPGA/translator/JIT) with a specific interface. Then do another card with the slave (the processor) that is attached to the specific interface on the master card. So the slave card could have any processor, the "only" thing that would change is the code in the FPGA, that would have to be adapted to the slave processor. I know it's far more complicated to design such a thing than I put it and way less efficient than using the slave directly.

But for 50$ you can get a pretty fast i386 family processor.

I'm not suggesting here to do this. Just exploring possibilities.

Cheers :)


Jorge Windmeisser Oliver

Posts 32
19 Feb 2011 16:07


Jakob Eriksson wrote:

Thank you Gunnar for your explanation, this is what wanted to say but could not find words. THERE is not 1 to 1 translation between 68k and x86 instructions!

Hi Jakob,

so one would need blocks of code to gain any efficiency from a 68000 interface to i386 processor, so to say JIT territory.

With a coldfire you could implement the missing/extra instructions in the FPGA and wire the rest through? Would that also be very complicated?

Cheers :)

Sampo S.
Finland

Posts 12
19 Feb 2011 16:31


I wonder what is the point other than you would have an laptop-natami and don't have an PC laptop with you. I mean most of us have somekind of PC and this CPU wouldn't be much faster than the 68050 anyways.

imo. If you need i386 support then use your PC.

Gunnar von Boehn
Germany
(Moderator)
Posts 5775
19 Feb 2011 17:51


Jorge Windmeisser Oliver wrote:

With a coldfire you could implement the missing/extra instructions in the FPGA and wire the rest through? Would that also be very complicated?

As the 050/070 is faster than the Coldfire - where would be the point?

Wojtek P
Poland

Posts 1597
19 Feb 2011 22:49


Thomas Clarke wrote:

Wojtek, the thing I'm trying to establish is whether you can make a parallel JIT.
 
  If you can, then a FPGA has a decent chance of being part of a very fast x86-based, 68k-compatible Amiga accelerator. In fact the bottleneck would most likely be the CPU card interface rather than on the accelerator itself.

You may make FPGA hardware for JIT translation only - cooperating with software on x86 CPU.
But i doubt it will be faster.


Wojtek P
Poland

Posts 1597
19 Feb 2011 22:50


Jorge Windmeisser Oliver wrote:

  Hi Wojtek,
 
  I don't know what gives you the authority to make such a bold statement.
 
  EXTERNAL LINK 


I never said it is not possible to connect x86 processor to FPGA.

Wojtek P
Poland

Posts 1597
19 Feb 2011 22:53


Gunnar von Boehn wrote:

  The next problem is that you need keep track of the translation.
  If 68K instruction are replaced / changed you need to destroy the translated code -and flush the caches of the x86 system.

Moreover - as i already said - there is no way even to detect such case. Whole change of code may happen in CPUs L2 cache so FPGA will not know it, unless implementing all coherency control.
Even if - it will still not work as translated 68k program will "see" x86 instruction stream at address of original 68k code.


posts 41page  1 2 3