Why Does Diffusion Work Better than Auto-Regression?