Landscape Ecology

Jeffery B. Cannon

Introduction and goals

Typically, LiDAR datasets are much larger than 1 km2 we have been working with. Often, you may wish to analyze 10s or 100s of km2 at a time. Working with large LiDAR datasets presents unique challenges. Apart from data storage needs, merely processing the data on a typical desktop computer takes careful attention. In the last module, we covered how to reduce LiDAR datsets to speed up some analyses. After completing this module, you will be able to:

Acquire large dataset

To start working with a large dataset, you will need to download one! I recommend starting with Discrete Point Lidar data from NEON. You can find data from several sites and several years from the NEON data & download portal. We will be using the data from the JERC site, 2023 which is (at the time of this writing) in a provisional stage.

Note that if you are in my lab, we already have a curated neon dataset which is available on our network I: drive: I:\neon\2023\lidar\ClassifiedPointCloud\

NEON discrete point lidar data download site, with data availability for JERC site shown

Once you have downloaded the data, you want to start by indexing all the files. Using the writelax function on each file can be tedious to do for every file. You can make the process more efficient by:

  1. Writing a loop that runs the writelax command on every file, or
  2. Sse the landecoutils package which can loop automatically, and uses parallel processing to speed up the process!
# Path to your lidar directory
dir = "C:/JERC/ClassifiedPointCloud/"

# list of all the files that end ($) with .las or (|) .laz in that directory. Look into regular expressions in R. They are a really powerful way to match strings
lazfiles = list.files(dir, pattern = '.las$|.laz$', full.names=TRUE)

# Loop through all files and writelax for each one.
for(f in lazfiles) writelax(f)

Our lab is developing the landecoutils package to provide code for several functions we use often. It contains the check_for_lax function which can loop through an entire directory automatically and create index files automatically. It skips the ones that are already indexed, and it can use multiple computing cores to speed up the process. You will need to download the landecoutils package from GitHub.  Important Note: The landeocutils package uses a now defunct dependency, so you have to also install TreeLS and spanner packages. They will need to be installed first for landecoutils to successfully install.

# download the landecoutils library from github AND its dependencies spanner and TreeLS


# Path to your lidar directory
dir = "C:/JERC/ClassifiedPointCloud/"

# Specify how many cores to use 
check_for_lax(dir, n_cores = 10) 

Dealing with edge effects

The basic process for working on a large dataset, is breaking into smaller chunks, working on each chunk separately. As long as the analysis can be run on each chunk separately, then you can use multiple processing cores to work on different chunks of the dataset. At the end, you stitch all the chunks back together like a quilt to get a seamless product.

However, it is not usually possible to break a problem up seamlessly into chunks without creating odd effects along the edges. For example, creating a digital elevation model for two adjacent chunks might result in a noticeable ‘seam’ where the chunks connect. Or if creating a stem map, what happens when a tree crown spills over the edge of a tile into another? Might it be counted twice? or not at all?

You will need to think carefully about the process of interest and decide how best to deal with possible edge effects. The typical solution is to slightly expand the area of each chunk. This buffer contains a little bit of information to help with the edge effects, and the final product can be clipped back down to size.

Setting chunk and parallel processing options

# Path to your lidar directory
dir = "C:/JERC/ClassifiedPointCloud/"


# Load the catalog object
ctg = readLAScatalog(dir) # may take a second.... 
class       : LAScatalog (v1.4 format 7)
extent      : 730957.3, 751916.4, 3449742, 3468765 (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 16N 
area        : 301.17 km²
points      : 5.02 billion points
density     : 16.7 points/m²
density     : 11.1 pulses/m²
num. files  : 337

Plot of LAScatalog object of the 2023 NEON JERC lidar data. Each lavender tile represents the extent of a single .laz file.

This data covers a very large area, over 300 km2 and contains over 5 billion lidar returns. Each tile represents a single .laz file as it is stored on your disk. The tiles allow us to access a small portion of the data at once.

Important Note: The LAScatalog object does not actually load the data into R. instead it just represents a map of where all the data is located. It is unlikely you can load all of the data at once anyway. So next we will define how we want to divide the data into chunks.

The data is already in 1000 × 1000 m files. But that does not mean we necessarily have to stick with these pre-defined boundaries. The LAScatalog object gives us a very convient way to define chunks however we like. We can also set a buffer in the way we would like, and we can even set filters like the ones you learned about in the last module. That way, when data is read in, you only load the data you need.

You can set options with the opt_filteropt_chunksize, and opt_chunk_buffer options.

# Let's make the chunks bigger!
opt_chunk_size(ctg) = 2000

# give each chunk a 10 m buffer 
opt_chunk_buffer(ctg) = 10 

# Choose the origin of where your chunks are drawn
# i like them to snap to round numbers, so start at (0,0).
opt_chunk_alignment(ctg) = c(0,0)

# let's thin data down to 0.5 m voxel
opt_filter(ctg) = '-thin_with_voxel 0.5'

# Now view your chunk pattern
plot(ctg, chunk_pattern = TRUE)

In the image below, the lavender squares still show how the original data is divided into tiles. However, the new larger black squares show the analytical chunks that you have defined. Here are some things to note:

LAScatalog object plotted with the chunk_pattern shown

  1. When we process data, the lidR package will start with the bottom-left chunk, first, and work rightward, and then upward.
  2. Each time it works on a chunk, it will automatically load the correct bits of each file for that chunk.
  3. It can quickly load just the portions you need for that chunk because you have already indexed the file!
  4. The hard to see red lines are actually slightly larger than the black squares. In addition to the 2000 × 2000 m tiles, lidR will also automatically load an extra 10 m buffer on each side. So, the tile that gets loaded will actually be 2020 × 2020 m.

Using the code above, define several different chunk sizes, buffers, and alignments so you get a better feel for how the LAScatalog chunk options work.

Next steps

Now you have all the tools you need to tackle a really big analysis. In the next module, you will put a lot of what you have learned together and create a digital elevation model for the entire 300 km22 area! In the next module you will learn to:

Continue learning