catalog_apply: Apply a user-defined function to an entire catalog in a continuous way

Description

This function enables application of a user-defined routine over an entire catalog using a multi-core process. When a user has a dataset organized into several files, it applies the user-defined function to the entire catalog by automatically splitting it into several clusters. The clustering pattern can be either split into a set of squared areas or split by file. The clustering pattern can be modified using the global catalog options with catalog_options. The "Examples" section describes the procedure for applying functions to the catalog, beginning with data loading (see example). Warning: there is a mechanism to load buffered data and to avoid edge artifacts, but no mechanism to remove the buffer after applying user-defined functions, since this task is very specific to each process. See section "Edge artifacts". lidR supports .lax files. Computation speed will be signifcantly improved with a spatial index.

Usage

catalog_apply(ctg, func, func_args = NULL, ...)

Arguments

ctg

A LAScatalog object.

func

A user-defined function for which the first input is a LAS object.

func_args

A list of extra arguments to pass in the function 'func'.

...

Any argument available in readLAS to reduce the amount of data loaded.

Edge artifacts

It is very important to take precautions to avoid 'edge artifacts' when processing LiDAR tiles. If the points from neighboring tiles are not included during certain processes, this could create 'edge artifacts' at the tile boundaries. For example, empty or incomplete pixels in a rasterization process. The lidR package provides internal tools to load buffered data. However, there is no mechanism to remove the results computed in the buffered area since this task depends on the output of the user-defined function. Therefore, depending on the metric being computed, some output results could appear several times. The LAS object received by the user-defined function has a special column called 'buffer_side' which indicates, for each point, if it comes from a buffered area or not. Points from non-buffered areas have a 'buffer_side' value of zero, while points from buffered areas have a 'buffer_side' value of 1, 2, 3 or 4, where 1 is the bottom buffer and 2, 3 and 4 are the left, top and right buffers, respectively (see example).

Examples

Run this code

# NOT RUN {
# Visit http://jean-romain.github.io/lidR/wiki for an illustrated and commented
# version of this example.
# This is a dummy example. It is more efficient to load the entire file than
# splitting it into several pieces to process, even when using multiple cores.

# 1. Build a project (here, a single file catalog for the purposes of this example).
LASfile <- system.file("extdata", "Megaplot.laz", package="lidR")
project = catalog(LASfile)
plot(project)

# 2. Set some global catalog options
# For this dummy example, the clustering size is 80 m and the buffer is 15 m using
# a single core (because this example is run on the CRAN server when the package is submitted).
catalog_options(buffer = 15, multicore = 1, tiling_size = 120)

# 3. Load the shapefile needed to filter your points.
folder <- system.file("extdata", "", package="lidR")
lake_shp = rgdal::readOGR(folder, "lake_polygons_UTM17")

# 4. Build the function that analyzes each cluster of the catalog.
# The function's first argument is a LAS object. The internal routine takes care of
# this part. The other arguments can be freely choosen by the user. See the following
# template:
tree_area = function(las, lake)
{
  # The las argument is a LAS object with each field loaded and an extra column 'buffer'

  # Associate geographic data with lidar points
  lasclassify(las, lake, field = "lake")

  # filter lakes, and low elevation points
  las %<>% lasfilter(lake == FALSE, Z > 4)

  if (is.null(las))
    return(NULL)

  # segment trees (in this example the low point density does not enable
  # accurate segmentation of trees. This is just a proof-of-concept)
  lastrees(las, algorithm = "li2012")

  # Here we used the function tree_metric to compute some metrics for each tree. This
  # function is defined later in the global environment.
  m = tree_metrics(las, myMetrics(X, Y, Z, buffer))

  # If min buffer is 0 it means the trees were at least partly in the non-buffered area, so we
  # want to keep these trees.
  # However, the trees that are on the edge of the buffered area will be counted
  # twice. So we must remove the trees on the right side and on the top side of the buffer
  # If max buffer is <= 2 it means that the trees belong inside the area of interest, on
  # the left side or the bottom side, or both.
  m = m[minbuff == 0 & maxbuff <= 2]

  # Remove buffering information that is no longer useful
  m[, c("minbuff","maxbuff") := NULL]

  return(m)
}

# This function enables users to extract, for a single tree, the position of the highest point
# and some information about the buffering position of the tree. The function tree_metrics takes
# care of mapping along each tree.
myMetrics <- function(x, y, z, buff)
{
  i = which.max(z)
  xcenter = x[i]
  ycenter = y[i]
  A = area(x,y)
  minbuff = min(buff)
  maxbuff = max(buff)

  return(
    list(
      x = xcenter,
      y = ycenter,
      area = A,
      minbuff = minbuff,
      maxbuff = maxbuff
    ))
}

# Everything is now well defined, so now we can process over an entire catalog with
# hundreds of files (but in this example we use just one file...)

# 4. Process the project. The arguments of the user-defined function must
# belong in a labelled list. We also pass extra arguments to the function readLAS
# to load only X, Y and Z coordinates. This way we save a huge amount of memory, which
# can be used for the current process.
fargs = list(lake = lake_shp)
output = catalog_apply(project, tree_area, fargs, XYZonly = TRUE)

# 5. Post-process the output result (depending on the output computed). Here, each value
# of the list is a data.table, so rbindlist does the job:
output = data.table::rbindlist(output)

output %$% plot(x,y, cex = sqrt(area/pi)/5, asp = 1)
# }

Run the code above in your browser using DataLab