Why is In Order Execution so Popular?
Hi,
I've been digging around the architecture of the 360 and noticed the PowerPC chips inside have In Order Executiong (IOE). After looking around some more, apparently the Wii and the PS3 also have IOE. Is this coincidence?
The first XBox had the intel x86 processor which had Out of Order Execution (OOE) and therefore branch prediction. After more investigation the only advantage I could see for IOE over OOE is a simpler chip design. I would think something like 3 tier branch prediction would have benefits in a tight game loop. Why do all 3 game consoles have IOE? What benefits do IOE have over OOE, because I can't seem to find any.
Thanks.
P.S.
Sorry if this isn't appropriate for the XNA Framework Forum. It did seem to fit here best of all the others.
[781 byte] By [
bagjuice] at [2007-12-29]
I'd guess that its to keep chip costs down. I'm no CPU expert, but essentially, all 3 consoles use Power PC chips, with varying speed and features - I dont know if a desktop PPC (like in the old Apples) is "in order" or not (although google probably knows), but extra functionality==extra transistors==extra cost. I'm sure that more of the time its probably cheaper to ramp up the clock speed and rely on the additional speed to negate any negative performance from less features.
A quote from an "unnamed" developer in Edge a couple of years ago suggested that the (his words not mine) 2Ghz PPC in the 360 would often run slower than the same code on the intel CPU in the xbox.
Now we all know now that there are 3 3.2Ghz processors in the 360, which (even if the above is true) probably negates the above statement, but I'd guess that it still holds somewhat true - for certain processes it will be slower (Mhz for Mhz) but with 9.6GHz running at once, you can probably sleep afe knowing that the code will run pretty fast anyway :)
Neil
It is indeed to keep costs down - the smaller the die area the greater the yield. Although the in order execution is a a bit of a performance killer, careful compiler design and hand written micro-optimisations (for the very hottest loops) can win back a great deal of that performance loss. This only works because unlike desktop PC's, there is only one processor variant to design for (rather than all of the AMD and Intel CPU's, or if we are talking PowerPC then Motorola and IBM). It's a trade off: console cost vs coding effort.
Thats right. There's nothing inherantly wrong with in-order execution perse. Its an advantage on x86 architectures simply because there are so many different x86 implimentations that the compiler cannot possibly know how many integer, floating point, branch, fetch, store, etc. units will be available before hand; only the CPU itself can know how to optimally order instructions to take advantage of the most hardware possible.
On a compiler specifically targetting the xenon CPU, all this information is known already, and so the burden of ordering instructions optimally is placed on the compiler. This was actually one of the design tennants of the original RISC idea, and more recently of VLIW architectures like the Itanium.
In-order execution does not preclude branch prediction either, there's just a larger penalty to be paid on a failed prediction.
The benefit to the simpler, in-order cores is that they are smaller. OOO Execution accounts for a signifigant number of transistors in a modern CPU like the pentium 4 or PPC970. Checking some numbers quickly, it seems that the Xenon CPU, with its 3 64bit cores and 1MB cache, is only about 30% larger than a P4 Presscott CPU with only 1 32bit core and 512KB cache. More modern CPUs are even larger. The Xenon is only 1/3 the size of a PentiumD 900 and about 45% the size of a core 2 duo with 2MB cache.
The low number of transistors leads to smaller dies at a given process, the smaller the die, the less silicon goes to waste when a defect occurs in the manufacturing process. Combined with process shinks (the 130nm to 90nm transition cuts die size roughly in half) this makes the chips cheaper to produce over the lifetime of the console.
Ultimately, the decision to go with In-order execution was due to the fact that it offered a !/$ ratio that OOO cores didn't offer on terms acceptable to Microsoft.