Emil Rijcken
1 min readJan 22, 2020

--

Thanks for this article, it makes the concept easier to understand. What I struggle to understand though is the intuitition behind the key, value and queries. In your example, you initialise them and mention that normally they are initialised randomly. If that’s the case, then what is their role? If they are just randomized weights, then the calculated attention is also more or less a random score subject to a few operations right?

(How) do they learn? And do they interact with each other?

--

--

Emil Rijcken
Emil Rijcken

Written by Emil Rijcken

PhD candidate in Natural Language Processing

Responses (2)