Lab 4c. Deep Learning - iNaturalist

1 Deep Learning with R / Python Exercises

You’ll first learn about Computer Vision techniques by going through the Chapter 5 lab exercises:

The subsequent lab exercises meet the limits of using a CPU over a GPU, which is not available on taylor.bren.ucsb.edu. Here’s as far as I was able to get for demonstration sake, but you’re not expected to run this. You might want to try if you have personal computer with a GPU setup.

2 iNaturalist

The main lab that you’ll turn in is to apply these techniques to a small subset of the iNaturalist species imagery. These data were downloaded from the links provided at github.com/visipedia/inat_comp:2021/. Of all the 10,000 species and many images for each from training (Train), training mini (Train Mini), validation (Val) and test images, you’ll draw only from the Train Mini set of images:

The first step is to move the images into directories for the variety of models. The keras::flow_images_from_directory() expects the first argument directory to “contain one subdirectory per class”. We are building models for two species spp2 (binary) and ten species spp10 (multiclass), plus we want to have train (n=30), validation (n=10) and test (n=10) images assigned to each. So we want a directory structure that looks something like this:

├── spp10
│   ├── test
│   │   ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│   │   │   ├── cfd17d74-c7aa-49a2-9417-0a4e6aa4170d.jpg
│   │   │   ├── d6c2cf8f-89ef-40a2-824b-f51c85be030b.jpg
│   │   │   └── ...[+n_img=8]
│   │   ├── 06033_Plantae_Tracheophyta_Liliopsida_Asparagales_Orchidaceae_Epipactis_atrorubens
│   │   │   └── ...[n_img=10]
│   │   └── ...[+n_spp=8]
│   ├── train
│   │   ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│   │   │   └── ...[n_img=30]
│   │   └── ...[+n_spp=9]
│   └── validation
│       ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│       │   └── ...[n_img=10]
│       └── ...[+n_spp=9]
└── spp2
    ├── test
    │   └── ...[n_spp=2]
    ├── train
    │   └── ...[n_spp=2]
    └── validation
        └── ...[n_spp=2]
librarian::shelf(
  digest, dplyr, DT, glue, purrr, readr, stringr, tidyr)

# path to folder containing species directories of images
dir_src  <- "/courses/EDS232/inaturalist-2021/train_mini"
dir_dest <- "~/inat"
dir.create(dir_dest, showWarnings = F)

# get list of directories, one per species (n = 10,000 species)
dirs_spp <- list.dirs(dir_src, recursive = F, full.names = T)
n_spp <- length(dirs_spp)

# set seed (for reproducible results) 
# just before sampling (otherwise get different results)
# based on your username (unique amongst class)
Sys.info()[["user"]] %>% 
  digest::digest2int() %>% 
  set.seed()
i10 <- sample(1:n_spp, 10)

# show the 10 indices sampled of the 10,000 possible 
i10
 [1] 6034 6274 7031 6473 1173 8343 7846 6248 7246 7898
# show the 10 species directory names
basename(dirs_spp)[i10]
 [1] "06033_Plantae_Tracheophyta_Liliopsida_Asparagales_Orchidaceae_Epipactis_atrorubens"      
 [2] "06273_Plantae_Tracheophyta_Liliopsida_Poales_Cyperaceae_Carex_vulpinoidea"               
 [3] "07030_Plantae_Tracheophyta_Magnoliopsida_Asterales_Asteraceae_Symphyotrichum_sericeum"   
 [4] "06472_Plantae_Tracheophyta_Lycopodiopsida_Lycopodiales_Lycopodiaceae_Lycopodium_volubile"
 [5] "01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata"           
 [6] "08342_Plantae_Tracheophyta_Magnoliopsida_Gentianales_Rubiaceae_Galium_odoratum"          
 [7] "07845_Plantae_Tracheophyta_Magnoliopsida_Ericales_Primulaceae_Bonellia_macrocarpa"       
 [8] "06247_Plantae_Tracheophyta_Liliopsida_Poales_Cyperaceae_Carex_intumescens"               
 [9] "07245_Plantae_Tracheophyta_Magnoliopsida_Brassicales_Brassicaceae_Diplotaxis_tenuifolia" 
[10] "07897_Plantae_Tracheophyta_Magnoliopsida_Fabales_Fabaceae_Acacia_saligna"                
# show the first 2 species directory names
i2 <- i10[1:2]
basename(dirs_spp)[i2]
[1] "06033_Plantae_Tracheophyta_Liliopsida_Asparagales_Orchidaceae_Epipactis_atrorubens"
[2] "06273_Plantae_Tracheophyta_Liliopsida_Poales_Cyperaceae_Carex_vulpinoidea"         
# setup data frame with source (src) and destination (dest) paths to images
d <- tibble(
  set     = c(rep("spp2", 2), rep("spp10", 10)),
  dir_sp  = c(dirs_spp[i2], dirs_spp[i10]),
  tbl_img = map(dir_sp, function(dir_sp){
    tibble(
      src_img = list.files(dir_sp, full.names = T),
      subset  = c(rep("train", 30), rep("validation", 10), rep("test", 10))) })) %>% 
  unnest(tbl_img) %>% 
  mutate(
    sp       = basename(dir_sp),
    img      = basename(src_img),
    dest_img = glue("{dir_dest}/{set}/{subset}/{sp}/{img}"))

# show source and destination for first 10 rows of tibble
d %>% 
  select(src_img, dest_img)
# A tibble: 600 × 2
   src_img                           dest_img                         
   <chr>                             <glue>                           
 1 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 2 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 3 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 4 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 5 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 6 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 7 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 8 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
 9 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
10 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
# … with 590 more rows
# iterate over rows, creating directory if needed and copying files 
d %>% 
  pwalk(function(src_img, dest_img, ...){
    dir.create(dirname(dest_img), recursive = T, showWarnings = F)
    file.copy(src_img, dest_img) })

# uncomment to show the entire tree of your destination directory
# system(glue("tree {dir_dest}"))

Your task is to apply your deep learning skills to build the following models:

  1. 2 Species (binary classification) - neural net. Draw from 3.4 🍿 Movies (binary classification). You’ll need to pre-process the images to be a consistent shape first though – see 5.2.4 Data preprocessing.

  2. 2 Species (binary classification) - convolutional neural net. Draw from the dogs vs cats example.

  3. 10 Species (multi-class classification) - neural net. Draw from 3.5 📰 Newswires (multi-class classification).

  4. 10 Species (multi-class classification) - convolutional neural net. Draw from dogs vs cats example and update necessary values to go from binary to mult-class classification.

In your models, be sure to include the following:

3 Deep Learning Cheat Sheets

Parameterizing all the image processing and layers can be confusing. Here are a few cheat sheets to help with a deeper understanding:

4 Submit Lab 4

To submit Lab 4, please submit the path (/Users/*) on taylor.bren.ucsb.edu to your iNaturalist Rmarkdown (*.Rmd) or Jupyter Notebook (*.pynb) file here: