The Attention Mechanism in Large Language Models