Attention์— ๋Œ€ํ•ด ์ •๋ง ์ž˜ ์ •๋ฆฌ๋˜์–ด ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•œ ๊ธ€

wikidocs.net/22893

 

์œ„ํ‚ค๋…์Šค

์˜จ๋ผ์ธ ์ฑ…์„ ์ œ์ž‘ ๊ณต์œ ํ•˜๋Š” ํ”Œ๋žซํผ ์„œ๋น„์Šค

wikidocs.net


1. attention์˜ ํ•„์š”์„ฑ

 

seq2seq(=encoder decoder) ๋ชจ๋ธ์€

encoder์—์„œ์˜ ์ •๋ณด๋ฅผ context vector๋ผ๋Š” ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ์— ์••์ถ•ํ•ด์„œ ์ „๋‹ฌํ•˜๊ณ ,

decoder์—์„œ๋Š” context vector ํ•˜๋‚˜๋ฅผ ์ด์šฉํ•ด ์ถœ๋ ฅ sequence๋ฅผ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค.

 

ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ์— encoder์˜ ๋ชจ๋“  ๋‚ด์šฉ์„ ๋‹ด์•„์•ผ ํ•˜๋Š”๋ฐ, ์ด ๊ณผ์ •์—์„œ ์ •๋ณด ์†์‹ค์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋œ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ๋“ฑ์žฅํ•œ ๋ชจ๋ธ์ด Attention์ด๋‹ค.

 

http://incredible.ai/nlp/2020/02/20/Sequence-To-Sequence-with-Attention/


2. attention

 

decoder์˜ ๋งค step๋งˆ๋‹ค encoder์˜ ์–ด๋Š ๋‹จ์–ด์— ์ฃผ๋ชฉํ•ด์•ผ ํ•˜๋Š”์ง€ ์•Œ๋ ค์ฃผ์ž

์ด๊ฒƒ์ด attention์˜ ์•„์ด๋””์–ด์ด๋‹ค.

 

 


Attention value๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•

Attention(Q, K, V) = Attention Value

 

  1. ์–ดํ…์…˜ ํ•จ์ˆ˜๋Š” ์ฃผ์–ด์ง„ query์— ๋Œ€ํ•ด์„œ key์™€์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•˜๊ณ ,
  2. key์™€ mapping๋˜์–ด ์žˆ๋Š” value์— ๋ฐ˜์˜ํ•ด์ค€๋‹ค.
  3. ๊ทธ๋ฆฌ๊ณ  query์™€์˜ ์œ ์‚ฌ๋„๊ฐ€ ๋ฐ˜์˜๋˜์–ด ์žˆ๋Š” value๋ฅผ ๋ชจ๋‘ ๋”ํ•ด์„œ ๋ฆฌํ„ดํ•œ๋‹ค.
  4. ์—ฌ๊ธฐ์„œ ๋ฆฌํ„ด๋œ ๊ฐ’์„ attention value๋ผ๊ณ  ํ•œ๋‹ค.

 

query, key, value๊ฐ€ ์˜๋ฏธํ•˜๋Š” ๋ฐ”

  • query : t์‹œ์ ์—์„œ์˜ decoder์˜ hidden state
  • key : ๋ชจ๋“  ์‹œ์ ์—์„œ์˜ encoder์˜ hidden state
  • value : ๋ชจ๋“  ์‹œ์ ์—์„œ์˜ encoder์˜ hidden state

 

 

Attention ๊ณ„์‚ฐ ๊ณผ์ •

 

 

  1. attention score ๊ตฌํ•˜๊ธฐ
    • t์‹œ์ ์˜ decoder์˜ hidden state์™€ ๋ชจ๋“  ์‹œ์ ์˜ encoder์˜ hidden state๋ฅผ ๋‚ด์ ํ•˜์—ฌ attention score ๊ตฌํ•œ๋‹ค.

  2. attention distribution ๊ตฌํ•˜๊ธฐ
    • attention score์— softmax ์ ์šฉํ•˜์—ฌ attention distribution ๊ตฌํ•œ๋‹ค.

  3. attention value ๊ตฌํ•˜๊ธฐ
    • attention distribution๊ณผ encoder์˜ hidden state๋ฅผ ๊ณฑํ•˜๊ณ , ํ•ฉํ•˜์—ฌ attention value๋ฅผ ๊ตฌํ•œ๋‹ค.

  4. t+1 ์‹œ์ ์˜ decoder hidden state ๊ตฌํ•˜๊ธฐ
    • t์‹œ์ ์˜ decoder hidden state์™€ attention value๋ฅผ ์ด์šฉํ•˜์—ฌ t+1์‹œ์ ์˜ decoder hidden state ๊ตฌํ•˜๊ธฐ

 

'๐Ÿ™‚ > Coursera_DL' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

์ฝ”์„ธ๋ผ Deep Learning ์ •๋ฆฌ  (2) 2020.12.27
WEEK8 : Bleu score  (0) 2020.12.27
WEEK8 : beam search in language model  (0) 2020.12.27
WEEK8 : negative sampling  (0) 2020.12.26
WEEK8 : Word Embedding (word2vec)  (0) 2020.12.26

+ Recent posts