I don't get how ETTR takes exposure out of the hands of the photographer. It's not auto, it's a specific exposure choice that has to be tightly controlled in order to prevent blown highlights. It IS a choice to see the camera as a data collector, not something that makes pictures. With ETTR you are making the conscious choice to collect data at capture, and make the picture in post.
ETTR is a method that uses your camera to optimize data capture. With optimum data, your pictures don't necessarily look as intended out of camera, but you have the most data which gives you the most flexibility in post processing. Whether you shoot this way, or shoot for results out of camera, both are conscious choices and neither is auto. I'm not sure what look you'd be going for in post that you couldn't achieve with ETTR, unless you blew the exposure and lost highlights.
As to why, there's a reason behind the prevailing wisdom. There are in fact a couple reasons.
Light is a geometric scale, mapped by our cameras into a linear scale. 1 stop of light is a doubling of the amount of light. Converted to a numeric value by analog to digital conversion, that means the difference between 16 and 32 is one stop, 32 to 64 is one more stop, and 64 to 128 is one more stop, etc. In a 12-bit conversion, the numeric value can range from 0-(2^12-1) or 0-4095. The brightest stop of light that can be recorded therefore contains a numeric range of 2048-4095. That's a full half of the numeric scale! The next stop has a range of 1024-2047. Your brightest 2 stops contain 3/4 of your numeric range! Now consider the numeric range of all 12 stops:
2048-4095 or 2048 counts 1024-2047 or 1024 counts 512-1023 or 512 counts 256-511 or 256 counts 128-255 or 128 counts 64-127 or 64 counts 32-63 or 32 counts 16-31 or 16 counts 8-15 or 8 counts 4-7 or 4 counts 2-3 or 2 counts 0-1 or 1 count
So, on a 12-bit capture, your brightest stop has the ability to contain 2048 gradations, and your darkest stop only 1. This is why you expose to the right, because let's say your data spans 8 stops. You really want it to occupy the top 2/3 of that table, not the middle, and certainly not the bottom! 8 stops of data in your file could contain a numeric range of 16-4095 (ETTR, 4079 potential gradations), or 4-1023 (centered histogram, 1019 potential gradations) or 0-255 (exposed to the left, 255 potential gradations, and equivalent to jpeg bit depth).
At the top you have captured the most data, numerically the most detail, and you have the most flexibility in post processing. This is one reason why, when you try to brighten shadows, they have a tendency to be noisy and lack detail, because you're re-mapping an area on the scale where there is little potential for detail.
Another aspect to ETTR is signal-to-noise ratio. If your sensor produces a fixed amount of noise, you get improved SNR by giving the sensor more data (light) to the fixed noise, although with today's cameras I'm not sure this is as much a concern.