<content>

  • learning from multiple task
    • transfer learning
    • multi-task learning

Transfer Learning

 

1) Transfer Learning์ด๋ž€?

  • ์‚ฌ์ „ ์ž‘์—…(source task)์— ๋Œ€ํ•˜์—ฌ ํ•™์Šต๋œ ์ •๋ณด๋ฅผ ๋ชฉํ‘œ ์ž‘์—…(target task)์— ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•

 


2) Transfer Learning์˜ ์žฅ์ 

  • target task์— ๋Œ€ํ•œ ์ˆ˜๋ ด ์†๋„ ํ–ฅ์ƒ, ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • source task์— ๋Œ€ํ•œ ์ถฉ๋ถ„ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋‹ค๋ฉด, target task์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•˜๋”๋ผ๋„ ๋น„๊ต์  ๋†’์€ ์„ฑ๋Šฅ ๋ณด์ž„

 


3) Transfer Learning์ด ๋„์›€์ด ๋˜๋Š” ์ด์œ 

  • ๋ฏธ๋ฆฌ pre-trained๋œ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์ด์šฉํ•˜๋ฉด low level์— ๋Œ€ํ•œ ์ง€์‹ ์Šต๋“ํ•˜๊ณ  ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ 
    • ์ด๋Ÿฌํ•œ ์žฅ์ ์„ ์ด์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ pre-trained ๋ชจ๋ธ์€ ๋ชจ๋“  task์— ์ ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ๊ณตํ†ต์ ์ธ ํŠน์ง•์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

 


4) Transfer Learning์˜ ์ ‘๊ทผ ๋ฐฉ์‹

  • weight initialization
    • source task์— ์‚ฌ์šฉ๋œ ๊ตฌ์กฐ๋ฅผ target task๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ์— ์ ์šฉ
      • ์ผ๋ถ€ layer๋ฅผ ์ถ”๊ฐ€/์ œ๊ฑฐ ๋“ฑ ์ˆ˜์ •ํ•˜๊ธฐ๋„ ํ•œ๋‹ค.
    • ์‚ฌ์ „ํ•™์Šตํ•œ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋กœ ์ดˆ๊ธฐํ™”
      • target task ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•œ ๊ฒฝ์šฐ, ์ „์ฒด๋ฅผ fine-tuning
      • target task ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ, ์ผ๋ถ€ layer๋งŒ fine-tuning
      • * source task์— ๋Œ€ํ•ด ๊ฐ€์ค‘์น˜ ํ•™์Šต = pre-training
      • * pre-training๋œ weight๋ฅผ target task์— ๋Œ€ํ•ด ๊ฐฑ์‹  = fine-tuning
  • feature extraction
    • feature ์ถ”์ถœ์„ ์œ„ํ•ด์„œ source model ์‚ฌ์šฉ 
    • source task์— ์‚ฌ์šฉ๋œ ๊ตฌ์กฐ๋ฅผ target task๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ์— ์ ์šฉํ•˜์ง€๋งŒ fine-tuning ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์Œ

 


5) Transfer Learning์„ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ์ค€

A์—์„œ์˜ ์ง€์‹์„ B์—๊ฒŒ๋กœ transferํ•˜๋Š” ์ƒํ™ฉ์ด๋ผ๋ฉด

  1. A task์™€ B task์˜ input์ด ๋™์ผํ•  ๋•Œ (ex. image, text, audio ๋“ฑ)
  2. A task ๋ฐ์ดํ„ฐ์˜ ์–‘ >> B task ๋ฐ์ดํ„ฐ์˜ ์–‘
  3. A์—์„œ์˜ low feature๊ฐ€ B ํ•™์Šต์— ๋„์›€์ด ๋˜๋Š” ๊ฒฝ์šฐ

 


6) Transfer Learning์˜ ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜

  • same task (-> transductive transfer learning)
    • 1) domain adaptation : ์‚ฌ์ „์ž‘์—…๊ณผ ๋ชฉํ‘œ์ž‘์—… ๋™์ผํ•˜์ง€๋งŒ, ์˜์—ญ์ด ๋‹ค๋ฅธ ๊ฒฝ์šฐ
      • A. gradient reversal domain adaptation
        • task classifier์™€ domain classifier๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ, domain classifier์˜ loss๋Š” reverseํ•˜์—ฌ ์ „๋‹ฌ
        • ์ฆ‰, domain classifier๋Š” loss๊ฐ€ maximize๋˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ
      • B. adversarial discriminative domain adaption
        • 1) feature extractor ๊ณ ์ • / domain classifier ํ•™์Šต (์ด ๋•Œ domain์— ๋Œ€ํ•œ real ์ •๋‹ต ์ด์šฉ)
        • 2) domain classifier ๊ณ ์ • / feature extractor์™€ task classifier ํ•™์Šต
          • ์ด ๋•Œ domain์— ๋Œ€ํ•œ ์˜ค๋‹ต์„ ์ฃผ๊ณ  feature extractor์˜ weight ํ•™์Šต (domain specificํ•œ ์˜์—ญ์ด ํ•™์Šต๋˜์ง€ ๋ชปํ•˜๋„๋ก)
    • 2) cross-lingual learning : ์‚ฌ์ „์ž‘์—…๊ณผ ๋ชฉํ‘œ์ž‘์—… ๋™์ผํ•˜์ง€๋งŒ, ์–ธ์–ด๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ
  • different task (-> inductive transfer learning)
    • 1) multi-task learning : tasks learned simultaneously
      • ๊ด€๋ จ์žˆ๋Š” ์ž‘์—…๋“ค์˜ ํ‘œํ˜„์„ ๊ณต์œ ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ
      • 1) hard parameter sharing
      • 2) soft parameter sharing
    • 2) sequential transfer learning : tasks learned sequentially
      • ์‚ฌ์ „์ž‘์—…๊ณผ ๋ชฉํ‘œ์ž‘์—…์ด ๋‹ค๋ฅด๊ณ  ๊ฐ ์ž‘์—…์— ๋Œ€ํ•ด์„œ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šต์„ ์ˆ˜ํ–‰

 

 


 

 

 

+ Recent posts