Now thinking of smaller pixels, the same camera movement would spill the light that is aimed at the single pixel into a larger number of neighbor pixels.
This is a pretty obscure discussion, but I think one fallacy is "single pixel". The width of any blur from camera shake will necessarily be in a few pixels, at least if we are going to see it. If it were in one pixel, it would just be a mostly undetectable tiny spot, a dot, or maybe a thin line, but I doubt we would call it blur.
Assuming same size DX sensor, 12mp pixels are only 41% larger than 24mp pixels. Sq root of 2 ratio for double area, so 1.414 width x 1.414 height = area 2, which is the 12/24 mp ratio in the same total area. Being that square root of 2 is an irrational number, I doubt these pixel borders would actually ever line up, but an area of 5x5 1.414 pixels is near the same dimension and area as 7x7 pixels of unit 1 (5x1.4 = 7, 5x5x2 = 50, 7x7 = 49, close, no cigar). This is the scale we are excited about.
Any blur from camera shake will necessarily be a few pixels wide, in order for us to see it. If it were in one pixel, it would just be a spot, probably not even noticeable. But we call it blur because we see blur - blur denotes width. If it were one pixel, it would be little problem.
Even if by magic, if the pixels could line up, and if the blur was only one pixel, the lens image details projected on the sensor would not line up on pixel borders anyway, it has to straddle. But we see blur as a wider blur, containing various gray tones, etc. We are talking several pixels.
I have no clue what size the shake blur width would be, but it is what it is, regardless of which sensor, and it is clearly wide enough that we are complaining about it. Just for convenience, let's make up that one instance of the blurred edge width we see could be at least five 12mp pixels or seven 24mp pixels. This width may be too small (opinion), but such difference would only affect the resolution of the blur, how much detail we can see in the blur. Even the sharpest imaginable edge has anti-aliasing, and is 2 or 3 pixels wide. I really doubt it would make much difference about if we perceive blur or not. I still think there is a bigger problem, better solved by the conventional accepted anti-shake methods (tripods, VR, faster shutter speed, higher ISO, adding more light, bracing against a tree, paying better attention, etc).
This is a D800 36 mp image... at 10 feet, lens was 120 mm, f/8.
This is a 600% crop from it, to show the pixels.
Not 100%, but 600%.
I don't think there is much blur there, but how many pixels wide would you say the edge of the catchlight is? What if it were blurred, what might we see then? And this is 36mp. All edges in images are anti-aliased (intentionally slightly blurred with intermediate colors to hide jaggies a bit better). The eye lashes for example (click it to enlarge it a bit), are black, but there are added lighter brown pixels added (blending with background), which shows pixels (jaggies), but not as much as black pixels would show. Edges have width, and of course, blur would have more width.
I kinda like the resolution, it is kinda the point (600% is not the best view - 6x size would print at 50 dpi, at 12x8 feet), but perhaps there are reasons to select 12 mp or 16 mp over 24 mp (file storage size is all I can imagine, but which is hardly the most important thing about any image). I really don't think camera shake is one of the reasons. Most likely we are going to resample the 24mp image to no more than 2 mp to show it full screen on the monitor screen (and which will hide much detail). Regardless of sensor details, the blur is what it is, and surely we would instead want to just eliminate the camera shake.
And the most modern sensors sure do have a lot going for them.