I agree that when you start comparing across generations of technology, you have another variable to deal with. However, there is still physics at play and that never changes, only our understanding of it does.
The basis for my comparison is based on a couple things. First is bclaff's PDR calculations, which show that FX has an ideal dynamic range 1 stop greater than DX. The basis for this is the larger sensor.
This is born out by D800 vs. D7000 which was shows by DxO to have identical level pixel performance (look at the "screen mode" tests) yet the D800 has overall 1-stop better noise and dynamic range performance. Try to reconcile that in any way with pixel pitch. You can't.
Also, this indicates the D800 is equiv-tech with D7000 which was introduced something like 18 months previously. Which goes some way to answering your D4 question. The D4 for one is targeted at a different audience than D800 and therefore might have undergone greater R&D to hit its high-ISO target. Possibly the assumption that D4 is equiv-tech to D800 isn't right. D800 is actually D7000-like tech and D4 is a couple years newer.