• activation function
    • sigmoid
    • tanh
    • relu
    • reaky relu

 


1. simgoid

https://alexander-schiendorfer.github.io/2020/02/17/why-so.sigmoid.html

$$ \sigma(x) = \frac{1}{1+\exp(-x)} $$

$$ \sigma'(x)=\sigma(x)(1-\sigma(x)) $$

 

  • [0, 1] ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด์ค€๋‹ค.
  • ์ธ๊ฐ„ ๋‘๋‡Œ์˜ ๋‰ด๋Ÿฐ๊ณผ ์œ ์‚ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜ค๋žซ๋™์•ˆ ์‚ฌ์šฉ๋˜์–ด ์™”๋‹ค.
  • ๐Ÿ’ฅ ๋ฌธ์ œ์ 
    • vanishing gradient ๋ฌธ์ œ ๋ฐœ์ƒ
    • zero-centered๊ฐ€ ์•„๋‹ˆ๋‹ค (ํ™œ์„ฑํ•จ์ˆ˜์˜ ๊ฒฐ๊ณผ๊ฐ’์˜ ์ค‘์‹ฌ์ด 0.5์ด๋‹ค) -> zig-zag update๊ฐ€ ๋ฐœ์ƒ ๐Ÿ˜ข
    • exp()๋Š” ์—ฐ์‚ฐ๋น„์šฉ์ด ๋น„๊ต์  ํฌ๋‹ค

 

 

* ์—ฌ๊ธฐ์„œ ์™œ not zero-centered๊ฐ€ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š”๊ฐ€?

$$ x_i = \sigma(z_i) $$

$$ f = \sum w_ix_i+b $$

$$ \frac{df}{dw_i}=x_i $$

$$ \frac{dL}{dw_i} = \frac{dL}{df} \frac{df}{dw_i} = \frac{dL}{df} x_i $$

 

x_i๊ฐ€ ๋ชจ๋‘ sigmoid๋ฅผ ํ†ต๊ณผํ•œ ๊ฒฐ๊ณผ๋ฌผ๋กœ, ์–‘์ˆ˜(+)์ด๊ธฐ ๋•Œ๋ฌธ์—

dL/dw_1, dL/dw_2, ..., dL/dw_n ๋ชจ๋‘ ๋™์ผํ•œ ๋ถ€ํ˜ธ๊ฐ’์ด ๋‚˜์˜จ๋‹ค.

๋”ฐ๋ผ์„œ zig-zag ์‹์˜ parameter update๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜๋ ด ์†๋„๊ฐ€ ๋Š๋ ค์ง€๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. 

 

 

+) not zero-centered์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ๋งํฌ

stats.stackexchange.com/questions/237169/why-are-non-zero-centered-activation-functions-a-problem-in-backpropagation


2. tanh

https://astralworld58.tistory.com/62

$$ tanh(x) = \frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} $$

$$ \frac{d tanh(x)}{x} = 1-(tanh(x))^{2} $$

 

  • [-1, 1] ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด์ค€๋‹ค.
  • zero-centered์ด๋‹ค (sigmoid์˜ ๋ฌธ์ œ์  ํ•ด๊ฒฐ โœจ) -> zig-zag update๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค.
  • ๋ฌธ์ œ์ 
    • ์—ฌ์ „ํžˆ vanishing gradient ๋ฌธ์ œ๊ฐ€ ์‹ฌ๊ฐํ•˜๋‹ค.

 

 

+)  

https://bhsmath.tistory.com/181

 

 


3. ReLU

https://astralworld58.tistory.com/62

$$ relu(x) = \begin{cases}x & x \geq 0\\ 0 & x < 0\end{cases} $$

$$ relu'(x) = \begin{cases}1 & x \geq 0\\ 0 & x < 0\end{cases} $$

 

  • ์ž…๋ ฅ์ด ์–‘์ˆ˜์ธ ๊ฒฝ์šฐ gradient๊ฐ€ ์‚ฌ๋ผ์ง€์ง€ ์•Š๋Š”๋‹ค.
  • ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋‚ฎ๋‹ค (sigmoid, tanh๋Š” exp๊ฐ€ ์กด์žฌํ–ˆ์—ˆ๋‹ค)
  • sigmoid / tanh๋ณด๋‹ค 6๋ฐฐ ์ •๋„ ๋นจ๋ฆฌ ์ˆ˜๋ ดํ•œ๋‹ค.
  • ์ƒ๋ฌผํ•™์ ์œผ๋กœ sigmoid๋ณด๋‹ค ๊ทธ๋Ÿด๋“ฏํ•˜๋‹ค.
  • ๋ฌธ์ œ์ ๐Ÿ’ฅ
    • not zero-centered (sigmoid์˜ ๋ฌธ์ œ์ ์ด๊ธฐ๋„ ํ•˜๋‹ค)
    • ์ž…๋ ฅ์ด ์Œ์ˆ˜์ธ ๊ฒฝ์šฐ ํ•ญ์ƒ 0์„ ์ถœ๋ ฅํ•œ๋‹ค (parameter๊ฐ€ ํ•™์Šต๋˜์ง€ ์•Š๋Š”๋‹ค)
    • dying ReLU

 

 

 

 

* dying ReLU?

 

$$ a1 = w1x1+x2x2+b $$

$$ h1 = relu(a1) $$

$$ \frac{\partial L}{\partial w1} = \frac{\partial L}{\partial h1} \frac{\partial h1}{\partial a1} \frac{\partial a1}{\partial w1} $$

$$ \frac{\partial L}{\partial w2} = \frac{\partial L}{\partial h1} \frac{\partial h1}{\partial a1} \frac{\partial a1}{\partial w2} $$

$$ \frac{\partial L}{\partial b} = \frac{\partial L}{\partial h1} \frac{\partial h1}{\partial a1} \frac{\partial a1}{\partial b} $$

 

์ด ๋•Œ, b๊ฐ€ ๋งค์šฐ ์ž‘์€ ์ˆซ์ž๋ผ์„œ h1 = max(a1, 0) = 0์ด ๋‚˜์™”๋‹ค๊ณ  ํ•œ๋‹ค๋ฉด, $$  \frac{\partial h1}{\partial a1} = 0 $$

์ด๊ธฐ ๋•Œ๋ฌธ์— w1, w2, b ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋ณ€ํ•˜์ง€ ์•Š๋Š”๋‹ค. (=์Œ์ˆ˜ ์ž…๋ ฅ์— ๋Œ€ํ•ด gradient 0์ด๊ธฐ ๋•Œ๋ฌธ์— parameter update X)

 

์ด๋ ‡๊ฒŒ ๋˜๋ฉด ๊ณ„์‚ฐ๋˜๋Š” a1๊ฐ’์ด ๋™์ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— parameter๊ฐ€ ๊ณ„์† update๋˜์ง€ ์•Š๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค.

์ด๋Ÿฌํ•œ relu์˜ ๋ฌธ์ œ์ ์„ dying relu๋ผ๊ณ  ํ•œ๋‹ค.

 

์ด ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ๋‚˜์˜จ ๊ฒƒ์ด leaky relu์ด๋‹ค.

 

 

Deep Learning Best Practices: Activation Functions & Weight Initialization Methodsโ€Š—โ€ŠPart 1

Best Activation functions & Weight Initialization Methods for better accuracy

medium.com


4. Leaky ReLU

$$ f(x) = \begin{cases}x & x \geq 0\\0.01x & x< 0\end{cases} $$

$$ f'(x) = \begin{cases}1 & x \geq 0\\0.01 & x< 0\end{cases} $$

 

  • ์ž…๋ ฅ์ด ์–‘์ˆ˜์ด๋“ , ์Œ์ˆ˜์ด๋“  gradient๊ฐ€ ์†Œ๋ฉธ๋˜์ง€ ์•Š๋Š”๋‹ค.
  • dying relu ๋ฌธ์ œ๊ฐ€ ์—†๋‹ค. (์–ธ์ œ๋“  gradient๊ฐ€ ํ๋ฅธ๋‹ค)
  • ๊ณ„์‚ฐํ•˜๊ธฐ ์‰ฝ๋‹ค (sigmoid, tanh๋Š” exp ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•œ๋‹ค)
  • zero-centered์— ๊ฐ€๊น๋‹ค (relu๋‚˜ sigmoid ๋Œ€๋น„)

 


 

+) non-linear activation function์ด ํ•„์š”ํ•œ ์ด์œ 

  • linear activation function์„ ์‚ฌ์šฉํ•˜๋ฉด deepํ•˜๊ฒŒ ์ธต์„ ์Œ“๋Š” ๊ฒƒ์˜ ์˜๋ฏธ๊ฐ€ ์‚ฌ๋ผ์ง„๋‹ค.
  • ์—ฌ๋Ÿฌ๊ฐœ์˜ ์„ ํ˜•ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ์€ ํ•˜๋‚˜์˜ ์„ ํ˜•ํ•จ์ˆ˜ ์ ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•˜๋‹ค.

 

+ Recent posts