Due: noon Wed, Jan 28 2015 via GauchoSpace
In this lab, you’ll explore physical and climatic agents that determine vegetation types. In lab 1, you qualitatively assessed that the evergreen land cover type associated with steep, northern exposures tend to recieve less sun and are therefore moister and cooler. Now we’ll quantiatively determine these physical-biotic associations using a more detailed land cover on vegetation available through USGS for the Santa Ynez area of Santa Barbara County with additional data layers prepared by Frank Davis per Davis & Dozier (1990). We’ll determine these associations by fitting a recursive partitioning model.
To get started:
Download *lab3_agents.7z** from GauchoSpace into your course home directory (eg H:\esm215
). Right-click on the file -> 7-Zip -> Extract Here. Navigate into the newly expanded lab3_agents
folder and double-click on the ArcMap document lab3.mxd
.
Download *lab3.Rmd** from GauchoSpace into the lab directory (ie H:\esm215\lab3_agents
). Right-click on it -> Open with -> RStudio. Replace the working directory with your own (be sure to replace slashes.).
geol A subregion of the 1:250,000 scale geologic map of CA. 30 m raster. Legend is available at http://ngmdb.usgs.gov/ngm-bin/ILView.pl?sid=329_1.sid&vtype=b
soil A subregion of the ****1:24,000 scale soil survey map (SSURGO, http://soils.usda.gov/survey/geography/ssurgo/description.html. 30 m raster data. SSURGO maps are the most detailed soil survey maps available for most of the
elev - 28m raster: Shuttle imaging radar topographic data. Values are elevations in meters above sea level.
slope****- 28 m raster: slope angle in degrees derived from synezdem
flow 28 m raster: flow accumulation model, derived from synezdem for a subregion corresponding to subsoil30. Pixel values are the drainage area for each pixel. (The data are noisy because errors in the dem propagate to disrupt drainage topology.)
sunw - 28 m the data raster: integrated clearsky shortwave radiation, units are watts/sq. m., for December-Feb.
suns - 28 m the data raster: annual ****clearsky shortwave radiation, units are watts/sq. m.
lc - 30 m raster: 2010 vegetation/land cover map produced from thematic mapper satellite imagery. Actual vegetation mapped by USFS/TNC to Alliances (dominant overstory species or community types) according to NatureServe/US vegetation standard based on Thematic mapper imagery and elevation data.
In lab spend some time learning to display the data. Overlay individual layers and combinations on the air photo. Zoom in and out. Play with the symbology.
In particular, examine apparent land cover pattern (air photo) and vegetation pattern (SBColfireveg) in relationship to geology, soils, fire history and topographic factors like elevation, slope, radiation and flow accumulation.
What controls vegetation pattern in the Santa Ynez Hills and Valley subsection? Landscape theory posits that pattern could vary from one landscape to another and reflect local physical controls, disturbance history and plant dispersal processes.
Various techniques exist to quantify the relationship between vegetation pattern and environmental factors at different scales. Here you will learn a method known as mutual information analysis. Basically, you will measure the statistical association between vegetation types and each of a set of environmental variables.
# load necessary libraries having useful functions, suppressing startup messages and warnings
suppressPackageStartupMessages(suppressWarnings({
library(foreign) # read.dbf
library(raster) # read rasters
library(rpart) # recursive partitioning
library(party) # recursive partytioning
library(partykit)
}))
# set your working directory
setwd('H:/esm215/lab3_agents')
# get lookup table matching hierarchical vegetative landcovers
lu = read.dbf('natgaplandcov_table.dbf')
# get stack of rasters
r = stack(
raster('rasters/lc.img'),
raster('rasters/elev.tif'),
raster('rasters/slope.tif'),
raster('rasters/aspect.tif'),
raster('rasters/suns.tif'),
raster('rasters/sunw.tif'),
raster('rasters/flow.tif'),
raster('rasters/geol.tif'))
# extract raster values into data frame
D = as.data.frame(getValues(r))
D$lc = factor(D$lc)
D$geol = factor(D$geol)
# add id
D = cbind(id=1:nrow(D), D)
# merge D with lookup table for landcovers
D = merge(D, lu, by.x='lc', by.y='VALUE', all.x=T)
# look at just the NLDC high level landcover for Forest & Woodland
nldc = 'Forest & Woodland'
table(subset(D, NVC_CLASS == nldc, lc))
##
## 39 40 41 42 45 165 183 277 278 282 296 297
## 22881 1121 64430 2057 3150 614 25 3319 6178 1 0 0
## 300 302 303 304 383 384 432 471 472 489 539 540
## 0 0 0 0 0 0 0 0 0 0 0 0
## 547 556 557 579 581 582 583 584
## 0 0 0 0 0 0 0 0
# fit model for recursive partitioning
mdl = rpart(lc ~ elev + slope + aspect + suns + sunw + flow,
method='class',
data=subset(D, NVC_CLASS == nldc))
# examine fitted model
print(mdl)
## n= 103776
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 103776 39350 41 (0.22 0.011 0.62 0.02 0.03 0.0059 0.00024 0.032 0.06 9.6e-06)
## 2) slope>=3.67 97330 34210 41 (0.23 0.0077 0.65 0.021 0.032 0.0059 0.00026 0.032 0.021 1e-05) *
## 3) slope< 3.67 6446 2297 278 (0.046 0.058 0.2 0.0028 0.0084 0.0057 0 0.033 0.64 0)
## 6) suns>=5.548e+05 1324 557 41 (0.15 0.046 0.58 0.0083 0.029 0.003 0 0.016 0.17 0) *
## 7) suns< 5.548e+05 5122 1192 278 (0.018 0.061 0.11 0.0014 0.0029 0.0064 0 0.038 0.77 0) *
summary(mdl)
## Call:
## rpart(formula = lc ~ elev + slope + aspect + suns + sunw + flow,
## data = subset(D, NVC_CLASS == nldc), method = "class")
## n= 103776
##
## CP nsplit rel error xerror xstd
## 1 0.07226 0 1.0000 1.0000 0.003972
## 2 0.01393 1 0.9277 0.9305 0.003912
## 3 0.01000 2 0.9138 0.9164 0.003898
##
## Variable importance
## slope suns elev sunw aspect
## 70 12 10 7 2
##
## Node number 1: 103776 observations, complexity param=0.07226
## predicted class=41 expected loss=0.3791 P(node) =1
## class counts: 22881 1121 64430 2057 3150 614 25 3319 6178 1
## probabilities: 0.220 0.011 0.621 0.020 0.030 0.006 0.000 0.032 0.060 0.000
## left son=2 (97330 obs) right son=3 (6446 obs)
## Primary splits:
## slope < 3.67 to the right, improve=3777.0, (0 missing)
## elev < 86.27 to the right, improve=3428.0, (0 missing)
## sunw < 154500 to the left, improve=1838.0, (0 missing)
## suns < 511100 to the left, improve=1600.0, (0 missing)
## aspect < 64.99 to the left, improve= 282.7, (0 missing)
## Surrogate splits:
## elev < 85.87 to the right, agree=0.945, adj=0.115, (0 split)
## aspect < -0.5 to the right, agree=0.940, adj=0.028, (0 split)
## flow < 2518 to the left, agree=0.938, adj=0.000, (0 split)
##
## Node number 2: 97330 observations
## predicted class=41 expected loss=0.3514 P(node) =0.9379
## class counts: 22585 750 63124 2039 3096 577 25 3104 2029 1
## probabilities: 0.232 0.008 0.649 0.021 0.032 0.006 0.000 0.032 0.021 0.000
##
## Node number 3: 6446 observations, complexity param=0.01393
## predicted class=278 expected loss=0.3563 P(node) =0.06211
## class counts: 296 371 1306 18 54 37 0 215 4149 0
## probabilities: 0.046 0.058 0.203 0.003 0.008 0.006 0.000 0.033 0.644 0.000
## left son=6 (1324 obs) right son=7 (5122 obs)
## Primary splits:
## suns < 554800 to the right, improve=638.0, (0 missing)
## sunw < 177800 to the right, improve=593.0, (0 missing)
## elev < 86 to the right, improve=578.0, (0 missing)
## slope < 1.958 to the right, improve=185.6, (0 missing)
## aspect < 46.47 to the right, improve=105.6, (0 missing)
## Surrogate splits:
## sunw < 179900 to the right, agree=0.911, adj=0.566, (0 split)
## elev < 269.4 to the right, agree=0.830, adj=0.170, (0 split)
##
## Node number 6: 1324 observations
## predicted class=41 expected loss=0.4207 P(node) =0.01276
## class counts: 202 61 767 11 39 4 0 21 219 0
## probabilities: 0.153 0.046 0.579 0.008 0.029 0.003 0.000 0.016 0.165 0.000
##
## Node number 7: 5122 observations
## predicted class=278 expected loss=0.2327 P(node) =0.04936
## class counts: 94 310 539 7 15 33 0 194 3930 0
## probabilities: 0.018 0.061 0.105 0.001 0.003 0.006 0.000 0.038 0.767 0.000
# plot the model using the pretty partytioning class
mdl_p = as.party(mdl)
print(mdl_p)
##
## Model formula:
## lc ~ elev + slope + aspect + suns + sunw + flow
##
## Fitted party:
## [1] root
## | [2] slope >= 3.67: 41 (n = 97330, err = 35%)
## | [3] slope < 3.67
## | | [4] suns >= 554759.22: 41 (n = 1324, err = 42%)
## | | [5] suns < 554759.22: 278 (n = 5122, err = 23%)
##
## Number of inner nodes: 2
## Number of terminal nodes: 3
plot(mdl_p)
# predict landcover classes
D_nldc = subset(D, D$NVC_CLASS == nldc)
D_nldc$lc_pred = predict(mdl, D_nldc, type='vector')
D_nldc$lc_pred = levels(D_nldc$lc)[D_nldc$lc_pred]
# examine matches
table(D_nldc[,c('lc','lc_pred')])
## lc_pred
## lc 278 41
## 39 94 22787
## 40 310 811
## 41 539 63891
## 42 7 2050
## 45 15 3135
## 165 33 581
## 183 0 25
## 277 194 3125
## 278 3930 2248
## 282 0 1
## 296 0 0
## 297 0 0
## 300 0 0
## 302 0 0
## 303 0 0
## 304 0 0
## 383 0 0
## 384 0 0
## 432 0 0
## 471 0 0
## 472 0 0
## 489 0 0
## 539 0 0
## 540 0 0
## 547 0 0
## 556 0 0
## 557 0 0
## 579 0 0
## 581 0 0
## 582 0 0
## 583 0 0
## 584 0 0
# turn into raster
D_nldc$lc_pred = as.integer(as.character(D_nldc$lc_pred))
#D = merge(D, D_nldc[,c('id','lc_pred')], by.x='id', by.y='id', all.x=T)
#...