You’ll first learn about Computer Vision techniques by going through the Chapter 5 lab exercises:
5.1 Introduction to convnets R: html, Rmd ; Python: html, ipynb
5.2 Training a convnet from scratch on a small dataset R: html, Rmd ; Python: html, ipynb
The subsequent lab exercises meet the limits of using a CPU over a GPU, which is not available on taylor.bren.ucsb.edu
. Here’s as far as I was able to get for demonstration sake, but you’re not expected to run this. You might want to try if you have personal computer with a GPU setup.
The main lab that you’ll turn in is to apply these techniques to a small subset of the iNaturalist species imagery. These data were downloaded from the links provided at github.com/visipedia/inat_comp:2021/. Of all the 10,000 species and many images for each from training (Train), training mini (Train Mini), validation (Val) and test images, you’ll draw only from the Train Mini set of images:
The first step is to move the images into directories for the variety of models. The keras::
flow_images_from_directory()
expects the first argument directory
to “contain one subdirectory per class”. We are building models for two species spp2
(binary) and ten species spp10
(multiclass), plus we want to have train
(n=30), validation
(n=10) and test
(n=10) images assigned to each. So we want a directory structure that looks something like this:
├── spp10
│ ├── test
│ │ ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│ │ │ ├── cfd17d74-c7aa-49a2-9417-0a4e6aa4170d.jpg
│ │ │ ├── d6c2cf8f-89ef-40a2-824b-f51c85be030b.jpg
│ │ │ └── ...[+n_img=8]
│ │ ├── 06033_Plantae_Tracheophyta_Liliopsida_Asparagales_Orchidaceae_Epipactis_atrorubens
│ │ │ └── ...[n_img=10]
│ │ └── ...[+n_spp=8]
│ ├── train
│ │ ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│ │ │ └── ...[n_img=30]
│ │ └── ...[+n_spp=9]
│ └── validation
│ ├── 01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata
│ │ └── ...[n_img=10]
│ └── ...[+n_spp=9]
└── spp2
├── test
│ └── ...[n_spp=2]
├── train
│ └── ...[n_spp=2]
└── validation
└── ...[n_spp=2]
librarian::shelf(
digest, dplyr, DT, glue, purrr, readr, stringr, tidyr)
# path to folder containing species directories of images
dir_src <- "/courses/EDS232/inaturalist-2021/train_mini"
dir_dest <- "~/inat"
dir.create(dir_dest, showWarnings = F)
# get list of directories, one per species (n = 10,000 species)
dirs_spp <- list.dirs(dir_src, recursive = F, full.names = T)
n_spp <- length(dirs_spp)
# set seed (for reproducible results)
# just before sampling (otherwise get different results)
# based on your username (unique amongst class)
Sys.info()[["user"]] %>%
digest::digest2int() %>%
set.seed()
i10 <- sample(1:n_spp, 10)
# show the 10 indices sampled of the 10,000 possible
i10
[1] 6034 6274 7031 6473 1173 8343 7846 6248 7246 7898
# show the 10 species directory names
basename(dirs_spp)[i10]
[1] "06033_Plantae_Tracheophyta_Liliopsida_Asparagales_Orchidaceae_Epipactis_atrorubens"
[2] "06273_Plantae_Tracheophyta_Liliopsida_Poales_Cyperaceae_Carex_vulpinoidea"
[3] "07030_Plantae_Tracheophyta_Magnoliopsida_Asterales_Asteraceae_Symphyotrichum_sericeum"
[4] "06472_Plantae_Tracheophyta_Lycopodiopsida_Lycopodiales_Lycopodiaceae_Lycopodium_volubile"
[5] "01172_Animalia_Arthropoda_Insecta_Lepidoptera_Geometridae_Circopetes_obtusata"
[6] "08342_Plantae_Tracheophyta_Magnoliopsida_Gentianales_Rubiaceae_Galium_odoratum"
[7] "07845_Plantae_Tracheophyta_Magnoliopsida_Ericales_Primulaceae_Bonellia_macrocarpa"
[8] "06247_Plantae_Tracheophyta_Liliopsida_Poales_Cyperaceae_Carex_intumescens"
[9] "07245_Plantae_Tracheophyta_Magnoliopsida_Brassicales_Brassicaceae_Diplotaxis_tenuifolia"
[10] "07897_Plantae_Tracheophyta_Magnoliopsida_Fabales_Fabaceae_Acacia_saligna"
# show the first 2 species directory names
i2 <- i10[1:2]
basename(dirs_spp)[i2]
[1] "06033_Plantae_Tracheophyta_Liliopsida_Asparagales_Orchidaceae_Epipactis_atrorubens"
[2] "06273_Plantae_Tracheophyta_Liliopsida_Poales_Cyperaceae_Carex_vulpinoidea"
# setup data frame with source (src) and destination (dest) paths to images
d <- tibble(
set = c(rep("spp2", 2), rep("spp10", 10)),
dir_sp = c(dirs_spp[i2], dirs_spp[i10]),
tbl_img = map(dir_sp, function(dir_sp){
tibble(
src_img = list.files(dir_sp, full.names = T),
subset = c(rep("train", 30), rep("validation", 10), rep("test", 10))) })) %>%
unnest(tbl_img) %>%
mutate(
sp = basename(dir_sp),
img = basename(src_img),
dest_img = glue("{dir_dest}/{set}/{subset}/{sp}/{img}"))
# show source and destination for first 10 rows of tibble
d %>%
select(src_img, dest_img)
# A tibble: 600 × 2
src_img dest_img
<chr> <glue>
1 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
2 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
3 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
4 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
5 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
6 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
7 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
8 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
9 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
10 /courses/EDS232/inaturalist-2021… ~/inat/spp2/train/06033_Plantae_…
# … with 590 more rows
# iterate over rows, creating directory if needed and copying files
d %>%
pwalk(function(src_img, dest_img, ...){
dir.create(dirname(dest_img), recursive = T, showWarnings = F)
file.copy(src_img, dest_img) })
# uncomment to show the entire tree of your destination directory
# system(glue("tree {dir_dest}"))
Your task is to apply your deep learning skills to build the following models:
2 Species (binary classification) - neural net. Draw from 3.4 🍿 Movies (binary classification). You’ll need to pre-process the images to be a consistent shape first though – see 5.2.4 Data preprocessing.
2 Species (binary classification) - convolutional neural net. Draw from the dogs vs cats example.
10 Species (multi-class classification) - neural net. Draw from 3.5 📰 Newswires (multi-class classification).
10 Species (multi-class classification) - convolutional neural net. Draw from dogs vs cats example and update necessary values to go from binary to mult-class classification.
In your models, be sure to include the following:
Split the original images per species (n=50) into train (n=30), validate (n=10) and test (n=10). These are almost absurdly few files to feed into these complex deep learning models but will serve as a good learning example.
Include accuracy metric and validation in the fitting process and history plot.
Evaluate loss and accuracy on your test model results. Compare standard neural network and convolutional neural network results.
Parameterizing all the image processing and layers can be confusing. Here are a few cheat sheets to help with a deeper understanding:
To submit Lab 4, please submit the path (/Users/*
) on taylor.bren.ucsb.edu to your iNaturalist Rmarkdown (*.Rmd
) or Jupyter Notebook (*.pynb
) file here: