ResNet

  • ResNet์ด ํ•„์š”ํ•œ ์ด์œ 
  • ResNet์˜ ๊ตฌ์กฐ
  • ResNet์„ ์‚ฌ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ข‹์•„์งˆ ์ˆ˜ ์žˆ๋Š” ์ด์œ 
  • ResNet ์ตœ์ข…

ResNet

 

1. ResNet์ด ํ•„์š”ํ•œ ์ด์œ 

์ด์ „์˜ ๋งŽ์€ ์—ฐ๊ตฌ๋“ค์—์„œ ๋ชจ๋ธ์˜ layer๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•จ์„ ๋ฐํ˜€๋ƒˆ๋‹ค.

์ด๊ฒƒ์˜ ์›์ธ์€ vanishing gradient / exploding gradient ๋ฌธ์ œ ๋•Œ๋ฌธ์— ํ•™์Šต์ด ์ž˜ ์ด๋ค„์ง€์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ถœ์ฒ˜ : https://arxiv.org/pdf/1512.03385.pdf

์ด๊ฒƒ์€ overfitting๊ณผ๋Š” ๋‹ค๋ฅธ "degradation" ๋ฌธ์ œ์ด๋‹ค.

overfitting์ด ์ผ์–ด๋‚ฌ๋‹ค๋ฉด 20-layer๋ณด๋‹ค 56-layer์˜ training error๊ฐ€ ๋” ๋‚ฎ์•„์•ผํ•˜์ง€๋งŒ,

์œ„์˜ ์‚ฌ์ง„์„ ๋ณด๋ฉด tranining error, test error ๋ชจ๋‘ 20-layer๋ณด๋‹ค 56-layer์—์„œ ๋†’๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

์ธต์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ํ•™์Šต์ด ์ œ๋Œ€๋กœ ์ง„ํ–‰๋˜์ง€ ์•Š๋Š” degradation ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์•ˆ์œผ๋กœ ResNet์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค.


2. ResNet์˜ ๊ตฌ์กฐ

์ถœ์ฒ˜ : https://arxiv.org/pdf/1512.03385.pdf

 

$$ z^{[l+1]} = W^{[l+1]}a^{[l]}+b^{[l+1]} $$

$$ a^{[l+1]} = relu(z^{[l+1]}) $$

$$ z^{[l+2]} = W^{[l+2]}a^{[l+1]}+b^{[l+2]} $$

$$ a^{[l+2]} = relu(z^{[l+2]}+a^{[l]}) $$

 

 

 

 


3. ResNet์„ ์‚ฌ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ข‹์•„์งˆ ์ˆ˜ ์žˆ๋Š” ์ด์œ 

 

  • ResNet์„ ์‚ฌ์šฉํ•ด๋„ ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค. ์™œ๋ƒํ•˜๋ฉด identity mapping์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์‰ฝ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
    • $$ a^{[l+2]} = relu(z^{[l+2]}+a^{[l]}) $$
    • $$ a^{[l+2]} = relu(W^{[l+2]}a^{[l+1]}+b^{[l+2]}+a^{[l]}) $$
    • W, b๊ฐ€ 0์ด๊ธฐ๋งŒ ํ•˜๋ฉด a^[l+2] = a^[l]์ด ๋œ๋‹ค.
    • ์ด๊ฒƒ์ด ํฐ ์‹ ๊ฒฝ๋ง์˜ ์ค‘๊ฐ„์ด๋‚˜ ๋ ์–ด๋””์ธ๊ฐ€์— ๋”ํ•ด๋„ ์ˆ˜ํ–‰๋Šฅ๋ ฅ์ด ์ €ํ•ด๋˜์ง€ ์•Š๋Š” ์ด์œ ์ด๋‹ค.
    • ์ด layer๋“ค์ด ํ•ญ๋“ฑํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์‰ฝ๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜ํ–‰ ๋Šฅ๋ ฅ์ด ์ €ํ•ด๋˜์ง€ ์•Š๊ณ , ์‹ฌ์ง€์–ด ๋” ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋”ฐ๋Š” ๊ฒƒ์„ ๋ณด์žฅํ•œ๋‹ค.
  • ์ž ๊น, ๊ทธ๋ ‡๋‹ค๋ฉด ๊ทธ๋ƒฅ ์ผ๋ฐ˜ network๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด identity mapping์„ ํ•™์Šตํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์šด๊ฐ€?
    • $$ z^{[l+1]} = W^{[l+1]}a^{[l]}+b^{[l+1]} $$
    • $$ a^{[l+1]} = relu(z^{[l+1]}) $$
    • $$ z^{[l+2]} = W^{[l+2]}a^{[l+1]}+b^{[l+2]} $$
    • $$ a^{[l+2]} = relu(z^{[l+2]}) $$
    • ์—์„œ a^[l+2] = a^[l]์ด ๋˜๋„๋ก W, b๋“ค์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ์ผ์ด๋‹ค.
    • residual block ์—†์ด ํ•ญ๋“ฑํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ์ผ์ด๊ธฐ์—, ์ธต์„ ์Œ“์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์„ฑ๋Šฅ์ด ๋‚˜๋น ์ง€๋Š” ์ด์œ ์ด๋‹ค.
  • ResNet์„ ์‚ฌ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋  ์ˆ˜ ์žˆ๋Š” ์—ฌ์ง€๊ฐ€ ์กด์žฌํ•œ๋‹ค.
    • ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” "์ธต์„ ์Œ“์•„๋„ ์ˆ˜ํ–‰๋Šฅ๋ ฅ์„ ๋–จ์–ดํŠธ๋ฆฌ์ง€ ๋ง์ž!"๊ฐ€ ์•„๋‹ˆ๋ผ,
      "์ธต์„ ์Œ“์•„๋„ ์ˆ˜ํ–‰๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค์ž!"์ด๋‹ค.
    • residual block์„ ์ถ”๊ฐ€ํ•ด์„œ ์ธต์„ ์Œ“์„ ๋•Œ, residual block์˜ unit๊ฐ€ ์œ ์šฉํ•œ ๊ฒƒ์„ ํ•™์Šตํ•œ๋‹ค๋ฉด,
      ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ํ•ญ๋“ฑํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

 

+) 

  • $$ a^{[l+2]} = relu(z^{[l+2]}+a^{[l]}) $$
  • ์—ฌ๊ธฐ์„œ z^[l+2]์™€ a^[l]์˜ ํฌ๊ธฐ๊ฐ€ ๋™์ผํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ์—๋Š”

 

  • $$ a^{[l+2]} = relu(z^{[l+2]}+Wa^{[l]}) $$
  • ์•„๋ž˜์™€ ๊ฐ™์ด W๋ฅผ ๊ณฑํ•ด์ฃผ์–ด ๋™์ผํ•œ shape์œผ๋กœ ๋ณ€๊ฒฝํ•ด์ฃผ๋ฉด ๋œ๋‹ค.
    • ์—ฌ๊ธฐ์„œ W๋Š” parameter matrix์ผ์ˆ˜๋„ ์žˆ๊ณ ,
    • a^[l]์— zero padding์„ ๋”ํ•ด์ฃผ๋Š” ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•ด์ค„ ์ˆ˜๋„ ์žˆ๋‹ค.

 

 


4. ResNet

 

https://arxiv.org/pdf/1512.03385.pdf

 

 

3 layer ๋’ค์˜ feature map๊ณผ x์˜ shape๊ฐ€ ๋™์ผํ•œ ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด skip connection ์ ์šฉ

 

 

3 layer ๋’ค์˜ feature map๊ณผ x์˜ shape๊ฐ€ ๋‹ค๋ฅผ ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด skip connection ์ ์šฉ

'๐Ÿ™‚ > Coursera_DL' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

WEEK6 : convNet ์‚ฌ์šฉ์— ๋„์›€์ด ๋  ์ง€์‹  (0) 2020.12.23
WEEK6 : Inception (googLeNet)  (0) 2020.12.23
WEEK5 : CNN (convolutional neural network)  (0) 2020.12.21
WEEK5 : end to end DL  (0) 2020.12.21
WEEK5 : Multi-Task Learning  (0) 2020.12.20

+ Recent posts