▲Reverse engineering the 386 processor's prefetch queue circuitryrighto.com

170 points by todsacerdoti 54 days ago | 51 comments

kens 54 days ago [-]

Author here. I hope you're not tired of the 386... Let me know if you have any questions.

sitkack 54 days ago [-]

I'll never tire of any analysis you do. But if you are taking requests, I'd love two chips.

The AMD 29000 series, a RISC chip with many architectural advances that eventually morphed into the K5.

And the Inmos Transputer, a Forth like chip with built in scheduling and networking, designed to be networked together into large systems.

https://en.wikipedia.org/wiki/AMD_Am29000

https://en.wikipedia.org/wiki/Transputer

kens 54 days ago [-]

Those would be interesting chips to examine, if I ever get through my current projects :-)

dboreham 54 days ago [-]

I have some Transputer die and die plots if you ever need those.

Zeetah 54 days ago [-]

If you are doing requests, I'd love to see the M68k series analyzed.

moosedev 54 days ago [-]

Another vote for the 68000 series :)

kragen 54 days ago [-]

Have you thought about doing these yourself?

sitkack 53 days ago [-]

Great question. I have and I should.

This would make concrete and bring coherence to the grab bag of skills and experience I have. Though I think it would be 10x as much in a small group setting. It is like trying to recover the source code a binary where you don't even know the source language.

https://www.youtube.com/watch?v=M3nFcTpAwoM&list=PLUg3wIOWD8...

kragen 53 days ago [-]

That would be awesome!

sitkack 54 days ago [-]

At what number of layers is it difficult to reverse engineer a processor from die photos? I would think at some point, functionality would be too obscured to able to understand the internal operation.

Do they ever put a solid metal top layer?

kens 54 days ago [-]

I've been able to handle the Pentium with 3 metal layers. The trick is that I can remove metal layers to see what is underneath, either chemically or with sanding. Shrinking feature size is a bigger problem since an optical microscope only goes down to about 800 nm.

I haven't seen any chips with a solid metal top layer, since that wouldn't be very useful. Some chips have thick power and ground distribution on the top layer, so the top is essentially solid. Secure chips often cover the top layer with a wire that goes back and forth, so the wire will break if you try to get underneath for probing.

bgnn 54 days ago [-]

Interesting! What is the reason of 800nm limit? I have successfully photographed my own designs down to 130nm with optical microscobes, though not with metal layer removal. The resolution isn't perfect but fearures were clearly visible.

HappMacDonald 54 days ago [-]

The first thing I thought he was referring to is the wavelength of optical light, which is generally between 800-400nm IIRC. I take it your 130nm optical microscopes are imaging using ultraviolet? Regardless, let's just get this man a scanning tunneling microscope already. :D

rogerbinns 53 days ago [-]

I'd love to see an analysis of byte ordering impact on CPU implementation. Does little vs big endian make any difference to the complexity of the algorithms and circuits?

anyfoo 54 days ago [-]

Never, the 386 is way too important.

leeter 54 days ago [-]

So Epictronics recently looked at the 386SX, the version with the 16bit external bus, which was slower than the 286 at the same clock. What changed between that and this? Was the major difference the double clock hit on fetch? Or did it have a shorter prefetch queue as well like the 8088?

adrian_b 54 days ago [-]

386SX was slower than a 286 at the same clock only for the legacy 16-bit programs and only for the 16-bit programs that did not use a floating-point coprocessor, as the 80387 coprocessors available for 386SX were much faster at the same clock frequency than the 80287 available for 286.

Moreover there was only a small time interval when 286 and 386SX overlapped in clock frequency. In later years 286 could be found only at 12 MHz or 16 MHz, while 386SX was available at 25 MHz or 33 MHz, so 386SX was noticeably faster at running any program.

Rewriting or recompiling a program as a 32-bit executable could gain a lot of performance, but it is true that in the early years of 386DX and 386SX most users were still using 16-bit MS-DOS applications.

neuroelectron 54 days ago [-]

Ok, now do 486.

kens 54 days ago [-]

I'm not as interested in the 486; I went stright to the Pentium: https://www.righto.com/2025/03/pentium-multiplier-adder-reve...

guerrilla 54 days ago [-]

I totally agree with your methodology. Stick to the classic leaps.

neuroelectron 54 days ago [-]

Fair enough. But why?

kens 54 days ago [-]

Because I saw a Navajo weaving of a Pentium and wanted to compare the weaving to the real chip: https://www.righto.com/2024/08/pentium-navajo-fairchild-ship...

specialist 54 days ago [-]

That was great. Thank you.

Too bad (for the Navajo Nation) about the armed standoff and its aftermath.

neuroelectron 54 days ago [-]

I was only joking but I'm glad you have decided to take it seriously.

53 days ago [-]

bananaboy 54 days ago [-]

Never!

myself248 54 days ago [-]

I remember reading about naive circuits like ripple-carry, where a signal has to propagate across the whole width of a register before it's valid. These seem like they'd only work in systems with very slow clocks relative to the logic itself.

In this writeup, something that jumps out at me is the use of the equality bus, and Manchester carry chain, and I'm sure there are more similar tricks to do things quickly.

When did the transition happen? Or were the shortcuts always used, and the naive implementations exist only in textbooks?

kens 54 days ago [-]

Well, the Manchester carry chain dates back to 1959. Even the 6502 uses carry skip too increment the PC. As word sizes became larger and transistors became cheaper, implementations became more complex and optimized. And mainframes have been using these tricks forever.

kragen 54 days ago [-]

As I understand it, you can use slower carry propagation techniques in parts of a design that aren't on the timing critical path. Speeding up logic that isn't on the critical path won't speed up your circuit; it just wastes space and power.

Clock dividers (for example, for PLLs and for generating sampling clocks) commonly use simple ripple carry because nobody is looking at multiple bits at a time.

lysace 54 days ago [-]

I miss those dramatic performance leaps in the 80s. 10x in 5 years, give or take.

Now we get like 2x in a decade (single core).

rasz 54 days ago [-]

There was no performance improvement clock for clock between 286 and 386 when running contemporary 16 bit code https://www.vogons.org/viewtopic.php?t=46350

vnorilo 54 days ago [-]

I wrote blitters in assembly back in those days for my teenager hobby games. When I could actually target the 386 with its dword moves, it felt blisteringly fast. Maybe the 386 didn't run 286 code much faster but I recall the chip being one of the most mind-blowing target machine upgrades I experienced. Much later I recall the FPU-supported quadword copy in 486dx and of course P6 meeting MMX in Pentium II. Good times.

to11mtm 54 days ago [-]

You're 100% right that the 386 had a huge amount of changes that were pivotal in the future of x86 and the ability to write good/fast code.

I think a bigger challenge back then was the lack of software that could take advantage of it. Given the nascent state of the industry, lots of folks wrote for the 'lowest common denominator' and kept it at that (i.e. expense of hardware to test things like changing routines used based on CPU sniffing.)

And even then of course sometimes folks were lazy. One of my (least) favorite examples of this is the PC 'version' (It's not at all the original) of Mega Man 3. On a 486/33 you had the option of it being almost impossible twitchy fast, or dog slow thanks to turbo button. Or, the fun thing where Turbo Pascal compiled apps could start crapping out if CPU was too fast...

Sorry, I digress. the 386 was a seemingly small step that was actually a leap forward. Folks just had to catch up.

magicalhippo 53 days ago [-]

I was programming in Turbo Pascal at the time, which was still 16-bit. But when I upgraded my 286 to a Cyrix 486, on a 386 motherboard[1], I could utilize the full 32-bit registers by prefixing assembly instructions with 0x66 using db[1].

This was a huge boost for a lot of my 3D rendering code, despite the prefix not being free compared to pure 32-bit mode.

[1]: https://en.wikipedia.org/wiki/Cyrix_Cx486DLC

[2]: http://www.c-jump.com/CIS77/ASM/DataTypes/T77_0030_allocatio...

lysace 54 days ago [-]

As did I :).

Imagine how it felt going from an 8086 @ 8 MHz to an 80486SX (the cheapo version without FPU) @ 33 MHz. With blazingly fast REP MOVSD over some form of proto local bus Compaq implemented using a Tseng Labs ET4000/W32i vga chip.

Grosvenor 54 days ago [-]

Well, that's not at all true.

The 286 in the benchmark was using 60ns Siemens ram, and a 25mhz unit which virtually no one has ever seen in the wild. 286's that people actually bought topped out at 12mhz.

The 386 in the test was using 70ns ram.

Lets see them both with 60ns ram.

lysace 54 days ago [-]

Ok.

I'm speaking of e.g. the leap between the IBM PC in 1981 and the Compaq 386 five years later.

Or between that and the 486 another five years later or so.

54 days ago [-]

forinti 54 days ago [-]

In the 90s, every time you got a new computer it would have at least twice the RAM, sometimes 4x.

mrheosuper 53 days ago [-]

My upgrade path kind of following the same pattern even at modern time.

My 4th gen intel haswell machine had 8gb of ram, then i upgraded to amd zen2 with 16gb ram.

After that i upgraded to zen3+ with 32gb ram, and currently my laptop is zen4 with 64gb of ddr5 ram.

andrewstuart 54 days ago [-]

The 386 was so cool when it came out it was an absolute powerhouse.

Having said that there was no operating system for it. All that 32 bit power just got used for faster DOS and sometimes concurrent DOS.

It’s weird to think how long it took for the operating systems to be developed for it.

lizknope 53 days ago [-]

Microsoft and IBM were both developing OS/2 together. There were a lot of disagreements between the two companies. IBM wanted to keep supporting the 286.

https://en.wikipedia.org/wiki/OS/2#1990:_Breakup

> OS/2 1.x targets the Intel 80286 processor and DOS fundamentally does not. IBM insisted on supporting the 80286 processor, with its 16-bit segmented memory mode, because of commitments made to customers who had purchased many 80286-based PS/2s as a result of IBM's promises surrounding OS/2.[30] Until release 2.0 in April 1992, OS/2 ran in 16-bit protected mode and therefore could not benefit from the Intel 80386's much simpler 32-bit flat memory model and virtual 8086 mode features. This was especially painful in providing support for DOS applications. While, in 1988, Windows/386 2.1 could run several cooperatively multitasked DOS applications, including expanded memory (EMS) emulation, OS/2 1.3, released in 1991, was still limited to one 640 kB "DOS box".

> Given these issues, Microsoft started to work in parallel on a version of Windows which was more future-oriented and more portable. The hiring of Dave Cutler, former VAX/VMS architect, in 1988 created an immediate competition with the OS/2 team, as Cutler did not think much of the OS/2 technology and wanted to build on his work on the MICA project at Digital rather than creating a "DOS plus". His NT OS/2 was a completely new architecture.[31]

DOS extenders had started in the 1980's but they weren't a real OS but I would barely call DOS an OS either.

https://en.wikipedia.org/wiki/DOS_extender

But Unix was ported to the 386 in 1987.

https://en.wikipedia.org/wiki/Xenix#Transfer_of_ownership_to...

> In 1987, SCO ported Xenix to the 386 processor, a 32-bit chip, after securing knowledge from Microsoft insiders that Microsoft was no longer developing Xenix.[41] Xenix System V Release 2.3.1 introduced support for i386, SCSI and TCP/IP. SCO's Xenix System V/386 was the first 32-bit operating system available on the market for the x86 CPU architecture.

https://en.wikipedia.org/wiki/Xenix

I had friends running Linux from the very beginning in 1991.

yukIttEft 54 days ago [-]

When are you going to implement the first electron-level 386 emulator?

siliconunit 54 days ago [-]

very nice analysis! personally I'm a DEC alpha fan.. but I guess that's a too big endeavor.. (or maybe a selected portion?)

kens 54 days ago [-]

So many chips, so little time :-)

RetroTechie 54 days ago [-]

May I suggest a video chip? Yamaha V9958

I hope some day the tedious part of what you do, can be automated (AI?), so that you (or others) can spend their time on whatever aspect is most interesting. Vs all the grunt work needed to get to a point where you understand what you're looking at.

Btw. any 4 bit cpus/uC's in your collection? Back in the day I had a small databook (OKI, early '90s iirc) that had a bunch of those. These seem to have sort of disappeared (eg. never saw a pdf of that particular databook on sites like Bitsavers).

rasz 54 days ago [-]

https://www.twitch.tv/tubetimeus is currently reversing IBM MCGA chip down to gate level diagram.

andrewf 54 days ago [-]

This is the same fellow who reverse engineered and cloned the Sound Blaster. https://blog.adafruit.com/2019/02/05/the-snark-barker-a-soun...

rasz 54 days ago [-]

and inspired me to reverse some random vintage ram crap

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module

https://github.com/raszpl/386RC-16

gitroom 54 days ago [-]

man, reading all this makes me wanna bust out some old manuals and mess with assembly again, those jumps in chip power back then always blew my mind

shihabkhanbd 54 days ago [-]

[dead]

shihabkhanbd 54 days ago [-]

[flagged]

kens 54 days ago [-]

This appears to be a bot reposting comments from an older article on my blog.

dboreham 54 days ago [-]

Can we reverse-engineer the purpose of the bot? Just for lulz?

kens 54 days ago [-]

For the most part, the account posts comments on HN that previously appeared on reddit discussions of the same article (you can check this with Google). My guess is that it's an experiment in karma farming.

Loading comments...

kens 54 days ago [-]

Author here. I hope you're not tired of the 386... Let me know if you have any questions.

sitkack 54 days ago [-]

I'll never tire of any analysis you do. But if you are taking requests, I'd love two chips.

The AMD 29000 series, a RISC chip with many architectural advances that eventually morphed into the K5.

And the Inmos Transputer, a Forth like chip with built in scheduling and networking, designed to be networked together into large systems.

https://en.wikipedia.org/wiki/AMD_Am29000

https://en.wikipedia.org/wiki/Transputer

kens 54 days ago [-]

Those would be interesting chips to examine, if I ever get through my current projects :-)

dboreham 54 days ago [-]

I have some Transputer die and die plots if you ever need those.

Zeetah 54 days ago [-]

If you are doing requests, I'd love to see the M68k series analyzed.

moosedev 54 days ago [-]

Another vote for the 68000 series :)

kragen 54 days ago [-]

Have you thought about doing these yourself?

sitkack 53 days ago [-]

Great question. I have and I should.

https://www.youtube.com/watch?v=M3nFcTpAwoM&list=PLUg3wIOWD8...

kragen 53 days ago [-]

That would be awesome!

sitkack 54 days ago [-]

Do they ever put a solid metal top layer?

kens 54 days ago [-]

bgnn 54 days ago [-]

HappMacDonald 54 days ago [-]

rogerbinns 53 days ago [-]

I'd love to see an analysis of byte ordering impact on CPU implementation. Does little vs big endian make any difference to the complexity of the algorithms and circuits?

anyfoo 54 days ago [-]

Never, the 386 is way too important.

leeter 54 days ago [-]

adrian_b 54 days ago [-]

neuroelectron 54 days ago [-]

Ok, now do 486.

kens 54 days ago [-]

I'm not as interested in the 486; I went stright to the Pentium: https://www.righto.com/2025/03/pentium-multiplier-adder-reve...

guerrilla 54 days ago [-]

I totally agree with your methodology. Stick to the classic leaps.

neuroelectron 54 days ago [-]

Fair enough. But why?

kens 54 days ago [-]

Because I saw a Navajo weaving of a Pentium and wanted to compare the weaving to the real chip: https://www.righto.com/2024/08/pentium-navajo-fairchild-ship...

specialist 54 days ago [-]

That was great. Thank you.

Too bad (for the Navajo Nation) about the armed standoff and its aftermath.

neuroelectron 54 days ago [-]

I was only joking but I'm glad you have decided to take it seriously.

53 days ago [-]

bananaboy 54 days ago [-]

Never!

myself248 54 days ago [-]

In this writeup, something that jumps out at me is the use of the equality bus, and Manchester carry chain, and I'm sure there are more similar tricks to do things quickly.

When did the transition happen? Or were the shortcuts always used, and the naive implementations exist only in textbooks?

kens 54 days ago [-]

kragen 54 days ago [-]

Clock dividers (for example, for PLLs and for generating sampling clocks) commonly use simple ripple carry because nobody is looking at multiple bits at a time.

lysace 54 days ago [-]

I miss those dramatic performance leaps in the 80s. 10x in 5 years, give or take.

Now we get like 2x in a decade (single core).

rasz 54 days ago [-]

There was no performance improvement clock for clock between 286 and 386 when running contemporary 16 bit code https://www.vogons.org/viewtopic.php?t=46350

vnorilo 54 days ago [-]

to11mtm 54 days ago [-]

You're 100% right that the 386 had a huge amount of changes that were pivotal in the future of x86 and the ability to write good/fast code.

Sorry, I digress. the 386 was a seemingly small step that was actually a leap forward. Folks just had to catch up.

magicalhippo 53 days ago [-]

This was a huge boost for a lot of my 3D rendering code, despite the prefix not being free compared to pure 32-bit mode.

[1]: https://en.wikipedia.org/wiki/Cyrix_Cx486DLC

[2]: http://www.c-jump.com/CIS77/ASM/DataTypes/T77_0030_allocatio...

lysace 54 days ago [-]

As did I :).

Grosvenor 54 days ago [-]

Well, that's not at all true.

The 286 in the benchmark was using 60ns Siemens ram, and a 25mhz unit which virtually no one has ever seen in the wild. 286's that people actually bought topped out at 12mhz.

The 386 in the test was using 70ns ram.

Lets see them both with 60ns ram.

lysace 54 days ago [-]

Ok.

I'm speaking of e.g. the leap between the IBM PC in 1981 and the Compaq 386 five years later.

Or between that and the 486 another five years later or so.

54 days ago [-]

forinti 54 days ago [-]

In the 90s, every time you got a new computer it would have at least twice the RAM, sometimes 4x.

mrheosuper 53 days ago [-]

My upgrade path kind of following the same pattern even at modern time.

My 4th gen intel haswell machine had 8gb of ram, then i upgraded to amd zen2 with 16gb ram.

After that i upgraded to zen3+ with 32gb ram, and currently my laptop is zen4 with 64gb of ddr5 ram.

andrewstuart 54 days ago [-]

The 386 was so cool when it came out it was an absolute powerhouse.

Having said that there was no operating system for it. All that 32 bit power just got used for faster DOS and sometimes concurrent DOS.

It’s weird to think how long it took for the operating systems to be developed for it.

lizknope 53 days ago [-]

Microsoft and IBM were both developing OS/2 together. There were a lot of disagreements between the two companies. IBM wanted to keep supporting the 286.

https://en.wikipedia.org/wiki/OS/2#1990:_Breakup

DOS extenders had started in the 1980's but they weren't a real OS but I would barely call DOS an OS either.

https://en.wikipedia.org/wiki/DOS_extender

But Unix was ported to the 386 in 1987.

https://en.wikipedia.org/wiki/Xenix#Transfer_of_ownership_to...

https://en.wikipedia.org/wiki/Xenix

I had friends running Linux from the very beginning in 1991.

yukIttEft 54 days ago [-]

When are you going to implement the first electron-level 386 emulator?

siliconunit 54 days ago [-]

very nice analysis! personally I'm a DEC alpha fan.. but I guess that's a too big endeavor.. (or maybe a selected portion?)

kens 54 days ago [-]