Lets reproduce GPT-2 (124M)