Home   News   Concept   AMIGA-Compatible   Hardware   Forum   Questions+Answers   Pictures   Contact & Team

Welcome to the Natami / Amiga Forum

This forum is for AMIGA fans interested in the new NATAMI platform.
Please read the forum usage manual.



All TopicsNewsQAFeaturesTalkTEAMLogin to post    Create account
Welcome to the Natami lounge.
Meet new AMIGA friends here and enjoy having a friendly chit chat.

Even Without Optic Computing... 1 Second NatAmi?page  1 2 3 4 
Thierry Atheist
Canada

Posts 1828
02 May 2012 13:20


EXTERNAL LINK 
Now THAT would make for a NatAmi that beats the four core i7's!!!! (Unless they're made in the same process.... but our OS is LIGHT YEARS ahead!)

Chuck T
USA

Posts 673
02 May 2012 13:28


"... but graphene is still a long way away from being used in current silicon processes."



Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
03 May 2012 08:40


Chuck T wrote:

"... but graphene is still a long way away from being used in current silicon processes."

Besides, just clocking the processor that high still leaves the problem of fetching data fast enough - a problem already existing today, and the reason why the x86 have such huge caches. Memory bandwidth cannot cope with the operating speeds of the core by simple physical constraints. The connection from the core to the memory is basically nothing but an antenna for the high-frequency signals.

Besides, just to mention another limitation: With a clock speed of 3Ghz already getting a signal from one end of the dice to another end of the dice takes more than one clock cycle. Which was the reason for the unfortune "netburst" architecture.


Nixus Minimax
Germany

Posts 272
03 May 2012 09:34


Thomas Richter wrote:
With a clock speed of 3Ghz already getting a signal from one end of the dice to another end of the dice takes more than one clock cycle. Which was the reason for the unfortune "netburst" architecture.

I don't think that netburst had anything to do with the problem of clock distribution. Besides, even though today's x86 dice are usually greater than 1 cm (~wavelength of a 3GHz clock signal), each core only covers a minor part of the die.


Megol .

Posts 672
03 May 2012 12:34


Thomas Richter wrote:

Chuck T wrote:

  "... but graphene is still a long way away from being used in current silicon processes."
 

 
  Besides, just clocking the processor that high still leaves the problem of fetching data fast enough - a problem already existing today, and the reason why the x86 have such huge caches. Memory bandwidth cannot cope with the operating speeds of the core by simple physical constraints. The connection from the core to the memory is basically nothing but an antenna for the high-frequency signals.

Just for clarity: this is the so-called Von Neumann bottleneck and is something all high-performance processors have to compensate with large amounts of multi level cache memory. Just look at the IBM Power series.

  Besides, just to mention another limitation: With a clock speed of 3Ghz already getting a signal from one end of the dice to another end of the dice takes more than one clock cycle. Which was the reason for the unfortune "netburst" architecture.

And that's what some call the real Von Neumann bottleneck. Can't outrun the speed of light. :)
While Netburst was clearly designed for high clock rates with dedicated drive stages in the pipeline (just for allowing signals to propagate) the real problems with it had nothing to do with speed of light. The instruction scheduling was speculative (which isn't a big problem in itself) which was combined with a very inefficient mechanism to retry failed speculations.

Wojtek P
Poland

Posts 1597
04 May 2012 18:59


Thomas Richter wrote:

Chuck T wrote:

  "... but graphene is still a long way away from being used in current silicon processes."
 

 
  Besides, just clocking the processor that high still leaves the problem of fetching data fast enough - a problem already existing today, and the reason why the x86 have such huge caches. Memory

With actual software doing complex things that doesn't do tight loop or get almost all cache hits ALL todays CPUs are slow.

In most server workload x86 CPUs usually have IPC of 0.2-0.3, instead of four they are in theory capable.

memory latency doesn't count for most of this slowdown (large L2 cache are quite effective), but branch misprediction.

As i already explained long time ago no branch prediction can properly predict outcomes that depends from input data, which is common case

example C program

char *instr,t1;

while(t1=*(instr++))
  switch(t1) {
  case 'a':
    do_something....;
    break;
  case 'b':
    do_something_else...;
    break;
  default:
    do_default_action...;
  }

unless string have all 'a', 'b' or none 'a' or 'b' character this will execute slow.

In every other case it will execute FAR SLOWER on 2GHz x86 CPU than on simple 200MHz MIPS RISC, the latter having 1/100 of transistor complexity and use milliwatts of power.

This is because of enormously long pipelines and huge branch misprediction penalty. as much as 20 cycle stalls are produced.

recently i tried to understand why FreeBSD system call takes like 3000 cycles even if it is reading 1 byte from file, and even if done in a loop so it is fully cached.

Actually there are below 500 opcodes executed.

Wojtek P
Poland

Posts 1597
04 May 2012 19:01


Nixus Minimax wrote:

Thomas Richter wrote:
With a clock speed of 3Ghz already getting a signal from one end of the dice to another end of the dice takes more than one clock cycle. Which was the reason for the unfortune "netburst" architecture.

 
  I don't think that netburst had anything to do with the problem of

Pentium IV took pipeline length to extreme size like 40 stages or so producing similarly large branch misprediction penalty.

Pentium IV is great for MP3 or video encoding, and nothing else.


Wojtek P
Poland

Posts 1597
04 May 2012 19:07


Nixus Minimax wrote:

Thomas Richter wrote:
With a clock speed of 3Ghz already getting a signal from one end of the dice to another end of the dice takes more than one clock cycle. Which was the reason for the unfortune "netburst" architecture.

 
  I don't think that netburst had anything to do with the problem of clock distribution. Besides, even though today's x86 dice thre usually greater than 1 cm (~wavelength of a 3GHz clock signal), each core only covers a minor part of the die.
 

3GHz wavelength is 10cm, counting for reduced speed of light over copper interconnect (yet still more than half of vacuum speed), and worst case connection (1.41cm from corner to corner) it is still below 1/5 of wavelength.

as you mentioned cores doesn't take whole die, while you exaggerated.

x86 cores are huge and take considerable space.

long wires make capacity problem, when you go over some minimal size (like 1mm on modern ICs) wire resistance+capacitance becomes a problem, and time needed to fully charge/discharge wire grows as length^2

When long connections are needed, they put repeaters every few mm, and - in case of huge buses, increases power.

on modern ICs there are often 1-2 CPU cycles dedicated just for flight time between core and L2 cache.

Wojtek P
Poland

Posts 1597
04 May 2012 19:09


Thierry Atheist wrote:

EXTERNAL LINK 
  Now THAT would make for a NatAmi that beats the four core i7's!!!! (Unless they're made in the same process.... but our OS is LIGHT YEARS ahead!)

Amiga OS is just system written properly. The question is - if good software would exist and be written as OS itself is not enough.

Porting modern crap like openoffice is just bad idea.

CPU speed is really second issue.

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
04 May 2012 21:13


Nixus Minimax wrote:

  I don't think that netburst had anything to do with the problem of clock distribution. Besides, even though today's x86 dice are usually greater than 1 cm (~wavelength of a 3GHz clock signal), each core only covers a minor part of the die.

Let's compute a little bit. Speed of light in the vacuum: 3*10^8 m/s, clock cycle: 3*10^9 Hz, duration of one clock 1/3*10^-9. Length an electric signal in vacuum travels during that period: 3*10^8 * 1/3 * 10^-9 m = 10 cm.

Now, speed of light in silicon is only 60% of that in vacuum, plus you need a bit more precision than just a single cycle to synchronize, plus paths of the lines on the chip are not exactly straight: Basically, we *are* at the edge of the physical possibilities.

IOW, yes, I believe this is indeed a problem. On today's processors, you are already off by approximately one clock to half a clock if the signal has to travel from one end to the other.



Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
04 May 2012 21:14


Wojtek P wrote:

  Amiga OS is just system written properly.

No.

SID Hervé
France

Posts 663
04 May 2012 22:03


Is there an operating system which has been or is written correctly?

Marcel Verdaasdonk
Netherlands

Posts 3975
04 May 2012 23:39


SID Hervé wrote:

Is there an operating system which has been or is written correctly?

not that i am aware of each has it's flaws.

Børge Nøst
Norway

Posts 53
05 May 2012 02:10


SID Hervé wrote:

Is there an operating system which has been or is written correctly?

We are probably entering the taste realm here.

Personally I'm interested in microkernels and SASOS. But then you have issues like the OS being a complete ecosystem and not missing any parts, or how you construct your screendrawing (networkable or not).
I consider something like datatypes part of the OS, but for linux there is a wast divide between the kernel, libraries, and the visual system. There is no-one to enforce any standard. Apple dictates their entire ecosystem, but I don't know if any other unix or X11 based system could repeat that.

SID Hervé
France

Posts 663
05 May 2012 09:20


Almost all operating systems are inspired or were inspired by UNIX. And this may be the problem.

Some (eg DragonFly with inter-process communication via message passing as in the kernel of the Amiga) try other approaches.

Another problem is the startup delay. According to a discussion topic, the OS libs could be one source of inspiration.

About the OS of Apple, even if it is repainting every year, this is a child of UNIX.

Nixus Minimax
Germany

Posts 272
05 May 2012 09:55


Wojtek P wrote:
In every other case it will execute FAR SLOWER on 2GHz x86 CPU than on simple 200MHz MIPS RISC, the latter having 1/100 of transistor complexity and use milliwatts of power.
 
  This is because of enormously long pipelines and huge branch misprediction penalty. as much as 20 cycle stalls are produced.

Well, the 20 cycle stalls in the 2 GHz CPU are equivalent to only 2 cycles in the 200 MHz CPU...

Nixus Minimax
Germany

Posts 272
05 May 2012 09:57


Wojtek P wrote:
3GHz wavelength is 10cm

Oh, yes, of course. I did microchip design at clock rates quite a bit above 3 GHz where wavelengths were in the cm range so I somehow mixed the numbers.


Nixus Minimax
Germany

Posts 272
05 May 2012 10:00


Thomas Richter wrote:
IOW, yes, I believe this is indeed a problem. On today's processors, you are already off by approximately one clock to half a clock if the signal has to travel from one end to the other.

That's why the cores are usually divided into several local clock domains and clock is distributed along with data flow which cancels out most of the latency. Also, they put dozens of PLLs and DLLs in different parts of the chips to cancel out clock skew and to regenerate the clock signal.


Megol .

Posts 672
05 May 2012 11:26


Wojtek P wrote:

Thomas Richter wrote:

 
Chuck T wrote:

  "... but graphene is still a long way away from being used in current silicon processes."
 

 
  Besides, just clocking the processor that high still leaves the problem of fetching data fast enough - a problem already existing today, and the reason why the x86 have such huge caches. Memory
 

  With actual software doing complex things that doesn't do tight loop or get almost all cache hits ALL todays CPUs are slow.
 
  In most server workload x86 CPUs usually have IPC of 0.2-0.3, instead of four they are in theory capable.

Current processors can execute 5+ instructions per clock. More if converted to RISCy terms.

  memory latency doesn't count for most of this slowdown (large L2 cache are quite effective), but branch misprediction.
 
  As i already explained long time ago no branch prediction can properly predict outcomes that depends from input data, which is common case

Then you are provable, absolutely, completely wrong. The majority of input data are predictable to a very high degree. What do you think branch prediction are used for if it wasn't?

<removed example>
  unless string have all 'a', 'b' or none 'a' or 'b' character this will execute slow.
 
  In every other case it will execute FAR SLOWER on 2GHz x86 CPU than on simple 200MHz MIPS RISC, the latter having 1/100 of transistor complexity and use milliwatts of power.

Yeah right. Even with fully random input data the branch prediction would be ~50% correct. Branch misprediction penalty for Intel Sandy Bridge is ~15 clocks when fetching from the µop cache. Work it out.
 
  This is because of enormously long pipelines and huge branch misprediction penalty. as much as 20 cycle stalls are produced.
 
  recently i tried to understand why FreeBSD system call takes like 3000 cycles even if it is reading 1 byte from file, and even if done in a loop so it is fully cached.
 
  Actually there are below 500 opcodes executed.

The common case isn't reading 1 byte so a smart coder would optimize for larger reads. I guess you wouldn't?

Thomas Richter
Germany
(MX-Board Owner)
Posts 1425
05 May 2012 13:39


Børge Nøst wrote:

 
SID Hervé wrote:

  Is there an operating system which has been or is written correctly?
 

  We are probably entering the taste realm here.
 

  To some degree of course, but look, the main job of an operating system is to keep the system operating - thus the name. And I afraid AmigaOs fails miserably in this respect. Of course, this is because back then anything like resource management and memory protection was off-limits, but this doesn't make the system better. It only explains why it became what it was.
 
 
Børge Nøst wrote:

  Personally I'm interested in microkernels and SASOS. But then you have issues like the OS being a complete ecosystem and not missing any parts, or how you construct your screendrawing (networkable or not).
  I consider something like datatypes part of the OS, but for linux there is a wast divide between the kernel, libraries, and the visual system. There is no-one to enforce any standard. Apple dictates their entire ecosystem, but I don't know if any other unix or X11 based system could repeat that.
 

 
  Linux has problems, too - the kernel is too large, and not exactly modular, and there is no clean interface you could ever program against to implement a device driver or a filing system. Linux changes from release to release, and kernel hackers prefer to re-invent the wheel for every release or so. No miracle it is not exactly wide-spread. Either a company spends man-hours to keep their interfaces updated from release to release, or the next minor linux update would break their software. Bad idea!
 
  The Unix interface layer to the user space is "dusty", (why are there four versions of wait(), just to mention one oddity?) and Linux invents all clever mechanisms just to fit somehow into this old picture: udev, device-manager, network-manager... Why do I need to run a desktop enviroment to be able to use remote media in a comfortable way? It is a complete misconception. Why is "everything a device", but there is no "/dev/eth0" for the network? Makes no sense.
 
  Apple understood this problem probably better, though I only know the old 68K MacOs - which was also terribly broken. I do not know much about windows (and I probably even don't want to).
 
  Yet, all these systems do what AmigaOs never did: Keep the system running. At least in most situations AmigaOs would require a reboot, and you couldn't be sure that saving the text right now might probably ruin the disk structure because a faulty program wrote into the internals of the filing system...
 
 

posts 70page  1 2 3 4