WEEK8 : Attention

2020. 12. 27. 02:42

Attention에 대해 정말 잘 정리되어 있다고 생각한 글

위키독스

온라인 책을 제작 공유하는 플랫폼 서비스

wikidocs.net

1. attention의 필요성

seq2seq(=encoder decoder) 모델은

encoder에서의 정보를 context vector라는 하나의 벡터에 압축해서 전달하고,

decoder에서는 context vector 하나를 이용해 출력 sequence를 만들어낸다.

하나의 벡터에 encoder의 모든 내용을 담아야 하는데, 이 과정에서 정보 손실이 발생하게 된다.

이러한 문제를 해결하고자 등장한 모델이 Attention이다.

decoder의 매 step마다 encoder의 어느 단어에 주목해야 하는지 알려주자

이것이 attention의 아이디어이다.

Attention(Q, K, V) = Attention Value

attention score 구하기
- t시점의 decoder의 hidden state와 모든 시점의 encoder의 hidden state를 내적하여 attention score 구한다.
attention distribution 구하기
- attention score에 softmax 적용하여 attention distribution 구한다.
attention value 구하기
- attention distribution과 encoder의 hidden state를 곱하고, 합하여 attention value를 구한다.
t+1 시점의 decoder hidden state 구하기
- t시점의 decoder hidden state와 attention value를 이용하여 t+1시점의 decoder hidden state 구하기

코세라 Deep Learning 정리 (2)	2020.12.27
WEEK8 : Bleu score (0)	2020.12.27
WEEK8 : beam search in language model (0)	2020.12.27
WEEK8 : negative sampling (0)	2020.12.26
WEEK8 : Word Embedding (word2vec) (0)	2020.12.26