Building a simple statistical model

Geog 315 T2 2023

Overview

This week the main focus of the assignment is building a simple statistical (regression) model to help us understand a spatial pattern.

We will also look at the RMarkdown file format which you can use to present results from analysis in this or other assignments (this is not required, but can be a very effective and convenient way of presenting analysis results).

Because the lab material is presented in RMarkdown format in .Rmd files, we will consider that first, before going on to the assignment proper.

This means that the procedure this week is slightly different. You should download all the materials for this week’s lab from this link. You should then uncompress them to a folder on your computer, and set up a project in that folder as usual. Then inside the project, open up the file rmarkdown.Rmd.

That file explains itself, and you should spend a little bit of time with it to get used to the idea of RMarkdown.

When you are done, you can come back to these lab instructions in the usual way, or you can follow the instructions in the 07-lab-instructions.Rmd file instead (the instructions are the same in that document as in this one).

Building a simple statistical model

In this assignment you will build a simple regression model of the Airbnb listings in and around Wellington that we assembled a few weeks ago. The model will aim to account for variation in the numbers of listings with respect to the age structure of the population (from census) and relative to the numbers of various ‘amenities’ such as cafés, retail, and so on.

Libraries

Before you start, as usual we need some libraries.

library(sf)
library(tmap)
library(dplyr)
tmap_mode("view")

The data

Provided you have unpacked this week’s materials to an accessible folder and opened a .Rproj file in the usual way, you should find the datasets by simply running the commands shown below. If that doesn’t seem to work, then download the data from the links provided in the section below. You should find all the data in a subfolder called data, if you unpacked the zip file correctly.

Base data

The base data are in this file. Open them with st_read:

welly <- st_read("data/wellington-base-data.gpkg")

and take a look with a plot command:

plot(welly, lwd = 0.5, pal = RColorBrewer::brewer.pal(9, "Reds"))

I’ve used the pal option here to get a nicer colour palette than the plot command default.

If you really want to get a feel for the distribution of different variables, then you should make some tmap maps of individual attributes. If you make web maps with tmap_mode("view") you will also be able to get a closer look at things.

The attributes in this base data set are

Attribute Description
sa2_id Statistical Area 2 (SA2) ID
sa2_name Statistical Area 2 name
ta_name Territorial Authority name, limited to just Wellington City and Lower Hutt City (this is a smaller area than we originally looked at, to allow easier mapping)
pop Total population of SA2 per the 2018 Census
u15 % of population under 15
a15_29 % of population aged from 15 to 29
a30_64 % of population aged from 30 to 64
o65 % of population aged 65 and over
dist_cbd Approximate distance in km to the CBD. This is measured in a straight line, not over the road network

Airbnb locations

We already saw this dataset recording all the Airbnb locations across the wider region.

abb <- st_read("data/abb.gpkg")
tm_shape(abb) + 
  tm_dots()