Package 'pedometrics' reference manual

Title:	Miscellaneous Pedometric Tools
Description:	An R implementation of methods employed in the field of pedometrics, soil science discipline dedicated to studying the spatial, temporal, and spatio-temporal variation of soil using statistical and computational methods. The methods found here include the calibration of linear regression models using covariate selection strategies, computation of summary validation statistics for predictions, generation of summary plots, evaluation of the local quality of a geostatistical model of uncertainty, and so on. Other functions simply extend the functionalities of or facilitate the usage of functions from other packages that are commonly used for the analysis of soil data. Formerly available versions of suggested packages no longer available from CRAN can be obtained from the CRAN archive <https://cran.r-project.org/src/contrib/Archive/>.
Authors:	Alessandro Samuel-Rosa [aut, cre] , Lucia Helena Cunha dos Anjos [ths] , Gustavo Vasques [ths] , Gerard B M Heuvelink [ths] , Juan Carlos Ruiz Cuetos [ctb], Maria Eugenia Polo Garcia [ctb], Pablo Garcia Rodriguez [ctb], Joshua French [ctb], Ken Kleinman [ctb], Dick Brus [ctb] , Frank Harrell Jr [ctb], Ruo Xu [ctb]
Maintainer:	Alessandro Samuel-Rosa <[email protected]>
License:	GPL (>= 2)
Version:	0.12.1
Built:	2025-03-12 04:47:46 UTC
Source:	https://github.com/laboratorio-de-pedometria/pedometrics-package

Pedometric Tools and Techniques

Description

This package contains many tools and techniques used in the field of pedometrics (see https://en.wikipedia.org/wiki/Pedometric_mapping for a definition of pedometrics). These tools and techniques were developed to fulfill the demands created by the PhD research project (2012-2016) entitled “Contribution to the Construction of Models for Predicting Soil Properties”, developed by Alessandro Samuel-Rosa under the supervision of Dr Lúcia HC Anjos (Universidade Federal Rural do Rio de Janeiro, Brazil), Dr Gustavo M Vasques (Embrapa Solos, Brazil), and Dr Gerard B M Heuvelink (ISRIC - World Soil Information, the Netherlands). The project is/was funded by the CNPq Foundation (Process 140720/2012-0), Ministry of Science and Technology of Brazil, Brasília, DF, 70040-020, Brazil, phone +55 (61) 2022 6002, and the CAPES Foundation (Process ID BEX 11677/13-9), Ministry of Education of Brazil, Brasília, DF, 70040-020, Brazil, phone: +55 (61) 2022 6210.

Details

Several functions simply extend the functionalities of other functions commonly used for the analysis of pedometric data. It should be noted that changes are likely to occur quite often and the use of this package as a dependency for other packages is strongly discouraged.

Author(s)

Author and Maintainer: Alessandro Samuel-Rosa [email protected].

Adjusted coefficient of determination

Description

Calculates the adjusted coefficient of determination of a multiple linear regression model.

Usage

adjR2(r2, n, p)
adjR2(r2, n, p)

Arguments

`r2`	Numeric vector with the coefficient of determination to be adjusted.
`n`	Numeric vector providing the number of observations used to fit the multiple linear regression model.
`p`	Numeric vector providing the number of parameters included in the multiple linear regression model.

Value

A numeric vector with the adjusted coefficient of determination.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Coefficient of determination. Wikipedia, The Free Encyclopedia. Available at https://en.wikipedia.org/wiki/Coefficient_of_determination.

Examples

x <- adjR2(r2 = 0.95, n = 100, p = 80)
x <- adjR2(r2 = 0.95, n = 100, p = 80)

Create a Spatial* object from a bounding box

Description

Take the bounding box of a Spatial* object and create a SpatialPoints* or SpatialPolygons* object from it.

Usage

bbox2sp(obj, sp = "SpatialPolygons", keep.crs = TRUE)
bbox2sp(obj, sp = "SpatialPolygons", keep.crs = TRUE)

Arguments

`obj`	Object of class Spatial*.
`sp`	Class of the resulting object with options `"SpatialPolygons"` (default), `"SpatialPoints"`, `"SpatialPointsDataFrame"`, and `"SpatialPolygonsDataFrame"`.
`keep.crs`	Logical for assigning the same coordinate reference system to the resulting Spatial* object. Defaults to `keep.crs = TRUE`.

Value

An object of class SpatialPoints* or SpatialPolygons*.

Dependencies

The sp package, provider of classes and methods for spatial data in R, is required for bbox2sp() to work. The development version of the sp package is available on https://github.com/edzer/sp/ while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/sp/.

Note

Some of the solutions used to build this function were found in the source code of the R-package intamapInteractive. As such, the authors of that package, Edzer Pebesma [email protected] and Jon Skoien [email protected], are entitled ‘contributors’ to the R-package pedometrics.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Edzer Pebesma, Jon Skoien with contributions from Olivier Baume, A. Chorti, D.T. Hristopulos, S.J. Melles and G. Spiliopoulos (2013). intamapInteractive: procedures for automated interpolation - methods only to be used interactively, not included in intamap package. R package version 1.1-10. https://CRAN.R-project.org/package=intamapInteractive.

Examples

if (require(sp)) {
  data(meuse, package = "sp")
  sp::coordinates(meuse) <- ~ x + y
  bb <- bbox2sp(obj = meuse, keep.crs = FALSE)
}
if (require(sp)) {
  data(meuse, package = "sp")
  sp::coordinates(meuse) <- ~ x + y
  bb <- bbox2sp(obj = meuse, keep.crs = FALSE)
}

Build a series of linear models using automated variable selection

Description

Build a series of linear models with stats::lm() using one or more automated variable selection methods implemented in the functions stepVIF() and MASS::stepAIC().

Usage

buildModelSeries(
  formula,
  data,
  vif = FALSE,
  vif.threshold = 10,
  vif.verbose = FALSE,
  aic = FALSE,
  aic.direction = "both",
  aic.trace = FALSE,
  aic.steps = 5000,
  ...
)

buildMS(
  formula,
  data,
  vif = FALSE,
  vif.threshold = 10,
  vif.verbose = FALSE,
  aic = FALSE,
  aic.direction = "both",
  aic.trace = FALSE,
  aic.steps = 5000,
  ...
)
buildModelSeries(
  formula,
  data,
  vif = FALSE,
  vif.threshold = 10,
  vif.verbose = FALSE,
  aic = FALSE,
  aic.direction = "both",
  aic.trace = FALSE,
  aic.steps = 5000,
  ...
)

buildMS(
  formula,
  data,
  vif = FALSE,
  vif.threshold = 10,
  vif.verbose = FALSE,
  aic = FALSE,
  aic.direction = "both",
  aic.trace = FALSE,
  aic.steps = 5000,
  ...
)

Arguments

`formula`	A list containing one or several model formulas (a symbolic description of the model to be fitted).
`data`	Data frame containing the variables in the model formulas.
`vif`	Logical for performing backward variable selection using the Variance-Inflation Factor (VIF). Defaults to `vif = FALSE`.
`vif.threshold`	Numeric value setting the maximum acceptable VIF value. Defaults to `vif.threshold = 10`.
`vif.verbose`	Logical for printing iteration results of backward variable selection using the VIF. Defaults to `vif.verbose = FALSE`.
`aic`	Logical for performing variable selection using Akaike's Information Criterion (AIC). Defaults to `aic = FALSE`.
`aic.direction`	Character string setting the direction of variable selection when using AIC, with options `"both"` (default), `"forward"`, and `"backward"`.
`aic.trace`	Logical for printing iteration results of variable selection using the AIC. Defaults to `aic.trace = FALSE`.
`aic.steps`	Integer value setting the maximum number of steps to be considered for variable selection using the AIC. Defaults to `aic.steps = 5000`.
`...`	Further arguments passed to `MASS::stepAIC()`.

Details

buildModelSeries() was devised to deal with a list of linear model formulas. The main objective is to bring together several functions commonly used when building linear models, such as automated variable selection. In the current implementation, variable selection can be done using stepVIF() or MASS::stepAIC() or both. stepVIF() is a backward variable selection procedure, while MASS::stepAIC() supports backward, forward, and bidirectional variable selection. For more information about these functions, please visit their respective help pages.

An important feature of buildModelSeries() is that it records the initial number of candidate predictor variables and observations offered to the model, and adds this information as an attribute to the final selected model. Such feature was included because variable selection procedures result biased linear models (too optimistic), and the effective number of degrees of freedom is close to the number of candidate predictor variables initially offered to the model (Harrell, 2001). With the initial number of candidate predictor variables and observations offered to the model, one can calculate penalized or adjusted measures of model performance. For models built using buildModelSeries(), this can be done using statsModelSeries().

Some important details should be clear when using buildModelSeries():

this function was originally devised to deal with a list of formulas, but can also be used with a single formula;
in the current implementation, stepVIF() runs before MASS::stepAIC();
function arguments imported from MASS::stepAIC() and stepVIF() were named as in the original functions, and received a prefix (aic or vif) to help the user identifying which function is affected by a given argument without having to go check the documentation.

Value

A list containing the fitted linear models.

TODO

Add option to set the order in which MASS::stepAIC() and stepVIF() are run.

Dependencies

The MASS package, provider of support functions and datasets for Venables and Ripley's Modern Applied Statistics with S, is required for buildModelSeries() to work. The development version of the MASS package is available on https://www.stats.ox.ac.uk/pub/MASS4/ while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/MASS/.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Harrell, F. E. (2001) Regression modelling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.

Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.

A. Samuel-Rosa, G. B. M. Heuvelink, G. de Mattos Vasques, and L. H. C. dos Anjos, Do more detailed environmental covariates deliver more accurate soil maps?, Geoderma, vol. 243–244, pp. 214–227, May 2015, doi: 10.1016/j.geoderma.2014.12.017.

Examples

if (interactive()) {
  # based on the second example of MASS::stepAIC()
  library("MASS")
  cpus1 <- cpus
  for(v in names(cpus)[2:7])
    cpus1[[v]] <- cut(cpus[[v]], unique(stats::quantile(cpus[[v]])),
                      include.lowest = TRUE)
  cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
  cpus.samp <- sample(1:209, 100)
  cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                    chmax + perf),
                    formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                    formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
  data <- cpus1[cpus.samp,2:8]
  cpus.ms <- buildModelSeries(cpus.form, data, vif = TRUE, aic = TRUE)
}
if (interactive()) {
  # based on the second example of MASS::stepAIC()
  library("MASS")
  cpus1 <- cpus
  for(v in names(cpus)[2:7])
    cpus1[[v]] <- cut(cpus[[v]], unique(stats::quantile(cpus[[v]])),
                      include.lowest = TRUE)
  cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
  cpus.samp <- sample(1:209, 100)
  cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                    chmax + perf),
                    formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                    formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
  data <- cpus1[cpus.samp,2:8]
  cpus.ms <- buildModelSeries(cpus.form, data, vif = TRUE, aic = TRUE)
}

Evaluation of geostatistical models of uncertainty

Description

Evaluate the local quality of a geostatistical model of uncertainty (GMU) using summary measures and graphical displays.

Usage

checkGMU(
  observed,
  simulated,
  pi = seq(0.01, 0.99, 0.01),
  symmetric = TRUE,
  plotit = TRUE
)
checkGMU(
  observed,
  simulated,
  pi = seq(0.01, 0.99, 0.01),
  symmetric = TRUE,
  plotit = TRUE
)

Arguments

`observed`	Vector of observed values at the validation points. See ‘Details’ for more information.
`simulated`	Data frame or matrix with simulated values (columns) for each validation point (rows). See ‘Details’ for more information.
`pi`	Vector defining the width of the series of probability intervals. Defaults to `pi = seq(0.01, 0.99, 0.01)`. See ‘Details’ for more information.
`symmetric`	Logical for choosing the type of probability interval. Defaults to `symmetric = TRUE`. See ‘Details’ for more information.
`plotit`	Logical for plotting the results. Defaults to `plotit = TRUE`.

Details

There is no standard way of evaluating the local quality of a GMU. The collection of summary measures and graphical displays presented here is far from being comprehensive. A few definitions are given bellow.

Error statistics

Error statistics measure how well the GMU predicts the measured values at the validation points. Four error statistics are presented:

Mean error (ME): Measures the bias of the predictions of the GMU, being defined as the mean of the differences between the average of the simulated values and the observed values, i.e. the average of all simulations is taken as the predicted value.
Mean squared error (MSE): Measures the accuracy of the predictions of the GMU, being defined as the mean of the squared differences between the average of the simulated values and the observed values.
Scaled root mean squared error (SRMSE): Measures how well the GMU estimate of the prediction error variance (PEV) approximates the observed prediction error variance, where the first is given by the variance of the simulated values, while the second is given by the squared differences between the average of the simulated values, i.e. the squared error (SE). The SRMSE is computed as the average of SE / PEV, where SRMSE > 1 indicates underestimation, while SRMSE < 1 indicates overestimation.
Pearson correlation coefficient: Measures how close the GMU predictions are to the observed values. A scatter plot of the observed values versus the average of the simulated values can be used to check for possible unwanted outliers and non-linearities. The square of the Pearson correlation coefficient measures the fraction of the overall spread of observed values that is explained by the GMU, that is, the amount of variance explained (AVE), also known as coefficient of determination or ratio of scatter.

Coverage probabilities

The coverage probability of an interval is given by the number of times that that interval contains its parameter over several replications of an experiment. For example, consider the interquartile range $IQR = Q3 - Q1$ of a Gaussian distributed variable with mean equal to zero and variance equal to one. The nominal coverage probability of the IQR is 0.5, i.e. two quarters of the data fall within the IQR. Suppose we generate a Gaussian distributed random variable with the same mean and variance and count the number of values that fall within the IQR defined above: about 0.5 of its values will fall within the IQR. If we continue generating Gaussian distributed random variables with the same mean and variance, on average, 0.5 of the values will fall in that interval.

Coverage probabilities are very useful to evaluate the local quality of a GMU: the closer the observed coverage probabilities of a sequence of probability intervals (PI) are to the nominal coverage probabilities of those PIs, the better the modeling of the local uncertainty.

Two types of PIs can be used here: symmetric, median-centered PIs, and left-bounded PIs. Papritz & Dubois (1999) recommend using left-bounded PIs because they are better at evidencing deviations for both large and small PIs. The authors also point that the coverage probabilities of the symmetric, median-centered PIs can be read from the coverage probability plots produced using left-bounded PIs.

In both cases, the PIs are computed at each validation location using the quantiles of the conditional cumulative distribution function (ccdf) defined by the set of realizations at that validation location. For a sequence of PIs of increasing width, we check which of them contains the observed value at all validation locations. We then average the results over all validation locations to compute the proportion of PIs (with the same width) that contains the observed value: this gives the coverage probability of the PIs.

Deutsch (1997) proposed three summary measures of the coverage probabilities to assess the local goodness of a GMU: accuracy ($A$), precision ($P$), and goodness ($G$). According to Deutsch (1997), a GMU can be considered “good” if it is both accurate and precise. Although easy to compute, these measures seem not to have been explored by many geostatisticians, except for the studies developed by Pierre Goovaerts and his later software implementation (Goovaerts, 2009). Richmond (2001) suggests that they should not be used as the only measures of the local quality of a GMU.

Accuracy: An accurate GMU is that for which the proportion $p^*$ of true values falling within the $p$ PI is equal to or larger than the nominal probability $p$, that is, when $p^* \geq p$ . In the coverage probability plot, a GMU will be more accurate when all points are on or above the 1:1 line. The range of $A$ goes from 0 (lest accurate) to 1 (most accurate).
Precision: The precision, $P$, is defined only for an accurate GMU, and measures how close $p^*$ is to $p$. The range of $P$ goes from 0 (lest precise) to 1 (most precise). Thus, a GMU will be more accurate when all points in the PI-width plot are on or above the 1:1 line.
Goodness: The goodness, $G$, is a measure of the departure of the points from the 1:1 line in the coverage probability plot. $G$ ranges from 0 (minimum goodness) to 1 (maximum goodness), the maximum $G$ being achieved when $p^* = p$ , that is, all points in both coverage probability and interval width plots are exactly on the 1:1 line.

It is worth noting that the coverage probability and PI-width plots are relevant mainly to GMU created using conditional simulations, that is, simulations that are locally conditioned to the data observed at the validation locations. Conditioning the simulations locally serves the purposes of honoring the available data and reducing the variance of the output realizations. This is why one would like to find the points falling above the 1:1 line in both coverage probability and PI-width plots. For unconditional simulations, that is, simulations that are only globally conditioned to the histogram (and variogram) of the data observed at the validation locations, one would expect to find that, over a large number of simulations, the whole set of possible values (i.e. the global histogram) can be generated at any node of the simulation grid. In other words, it is expected to find all points on the 1:1 line in both coverage probability and PI-width plots. Deviations from the 1:1 line could then be used as evidence of problems in the simulation.

Value

A list of summary measures and plots of the coverage probability and width of probability intervals.

Note

Comments by Pierre Goovaerts [email protected] were important to describe how to use the coverage probability and PI-width plots when a GMU is created using unconditional simulations.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Deutsch, C. Direct assessment of local accuracy and precision. Baafi, E. Y. & Schofield, N. A. (Eds.) Geostatistics Wollongong '96. Dordrecht: Kinwer Academic Publishers, v. I, p. 115-125, 1997.

Papritz, A. & Dubois, J. R. Mapping heavy metals in soil by (non-)linear kriging: an empirical validation. Gómez-Hernández, J.; Soares, A. & Froidevaux, R. (Eds.) geoENV II – Geostatistics for Environmental Applications. Springer, p. 429-440, 1999.

Goovaerts, P. Geostatistical modelling of uncertainty in soil science. Geoderma. v. 103, p. 3 - 26, 2001.

Goovaerts, P. AUTO-IK: a 2D indicator kriging program for the automated non-parametric modeling of local uncertainty in earth sciences. Computers & Geosciences. v. 35, p. 1255-1270, 2009.

Richmond, A. J. Maximum profitability with minimum risk and effort. Xie, H.; Wang, Y. & Jiang, Y. (Eds.) Proceedings 29th APCOM. Lisse: A. A. Balkema, p. 45-50, 2001.

Ripley, B. D. Stochastic simulation. New York: John Wiley & Sons, p. 237, 1987.

Examples

if (interactive()) {
  set.seed(2001)
  observed <- round(rnorm(100), 3)
  simulated <- t(
    sapply(1:length(observed), function (i) round(rnorm(100), 3)))
  resa <- checkGMU(observed, simulated, symmetric = T)
  resb <- checkGMU(observed, simulated, symmetric = F)
  resa$error; resb$error
  resa$goodness; resb$goodness
}
if (interactive()) {
  set.seed(2001)
  observed <- round(rnorm(100), 3)
  simulated <- t(
    sapply(1:length(observed), function (i) round(rnorm(100), 3)))
  resa <- checkGMU(observed, simulated, symmetric = T)
  resb <- checkGMU(observed, simulated, symmetric = F)
  resa$error; resb$error
  resa$goodness; resb$goodness
}

Categorize/stratify numerical variable(s)

Description

Create break points, compute strata proportions, and stratify numerical variable(s) to create categorical variable(s).

Usage

cont2cat(x, breaks, integer = FALSE)

breakPoints(x, n, type = "area", prop = FALSE)

stratify(x, n, type = "area", integer = FALSE)
cont2cat(x, breaks, integer = FALSE)

breakPoints(x, n, type = "area", prop = FALSE)

stratify(x, n, type = "area", integer = FALSE)

Arguments

`x`	Vector, data frame or matrix with data on the numerical variable(s) to be categorized/stratified.
`breaks`	Vector or list containing the lower and upper limits that should be used to break the numerical variable(s) into categories. See ‘Details’ for more information.
`integer`	Logical value indicating if the categorical variable(s) should be returned as `integer`s. Defaults to `integer = FALSE`, i.e. the variable(s) will be returned as `factor`s.
`n`	Integer value indicating the number of categories/strata that should be created.
`type`	Character value indicating the type of categories/strata that should be used, with options `"area"` (default), for equal-area, and `"range"`, for equal-range strata.
`prop`	Logical value indicating if the strata proportions should be returned? Defaults to `prop = FALSE`.

Details

Argument breaks must be a vector if x is a vector, but a list if x is a data frame or matrix. Using a list allows breaking each column of x into different number of categories.

Value

A vector, data frame, or matrix, depending on the class of x.

Dependencies

The SpatialTools package, provider of tools for spatial data analysis in R, is required for breakPoints() and stratify() to work. The development version of the SpatialTools package is available on https://github.com/jfrench/SpatialTools while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/SpatialTools/.

Reverse dependencies

The spsann package, provider of methods for the optimization of sample configurations using spatial simulated annealing in R, requires breakPoints(), cont2cat() and stratify() for some of its functions to work. The development version of the spsann package is available on https://github.com/Laboratorio-de-Pedometria/spsann-package.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

B. Minasny and A. B. McBratney. A conditioned Latin hypercube method for sampling in the presence of ancillary information. Computers & Geosciences, vol. 32, no. 9, pp. 1378–1388, Nov. 2006, doi: 10.1016/j.cageo.2005.12.009.

T. Hengl, D. G. Rossiter, and A. Stein. Soil sampling strategies for spatial prediction by correlation with auxiliary maps. Australian Journal of Soil Research, vol. 41, no. 8, pp. 1403–1422, 2003, doi: 10.1071/SR03005.

Examples

if (require(SpatialTools)) {
  ## Compute the break points of marginal strata
  x <- data.frame(x = round(rnorm(10), 1), y = round(rlnorm(10), 1))
  x <- breakPoints(x = x, n = 4, type = "area", prop = TRUE)

  ## Convert numerical data into categorical data
  # Matrix
  x <- y <- c(1:10)
  x <- cbind(x, y)
  breaks <- list(c(1, 2, 4, 8, 10), c(1, 5, 10))
  y <- cont2cat(x, breaks)

  # Data frame
  x <- y <- c(1:10)
  x <- data.frame(x, y)
  breaks <- list(c(1, 2, 4, 8, 10), c(1, 5, 10))
  y <- cont2cat(x, breaks, integer = TRUE)

  # Vector
  x <- c(1:10)
  breaks <- c(1, 2, 4, 8, 10)
  y <- cont2cat(x, breaks, integer = TRUE)

  ## Stratification
  x <- data.frame(x = round(rlnorm(10), 1), y = round(rnorm(10), 1))
  x <- stratify(x = x, n = 4, type = "area", integer = TRUE)
  x
}
if (require(SpatialTools)) {
  ## Compute the break points of marginal strata
  x <- data.frame(x = round(rnorm(10), 1), y = round(rlnorm(10), 1))
  x <- breakPoints(x = x, n = 4, type = "area", prop = TRUE)

  ## Convert numerical data into categorical data
  # Matrix
  x <- y <- c(1:10)
  x <- cbind(x, y)
  breaks <- list(c(1, 2, 4, 8, 10), c(1, 5, 10))
  y <- cont2cat(x, breaks)

  # Data frame
  x <- y <- c(1:10)
  x <- data.frame(x, y)
  breaks <- list(c(1, 2, 4, 8, 10), c(1, 5, 10))
  y <- cont2cat(x, breaks, integer = TRUE)

  # Vector
  x <- c(1:10)
  breaks <- c(1, 2, 4, 8, 10)
  y <- cont2cat(x, breaks, integer = TRUE)

  ## Stratification
  x <- data.frame(x = round(rlnorm(10), 1), y = round(rnorm(10), 1))
  x <- stratify(x = x, n = 4, type = "area", integer = TRUE)
  x
}

Defunct functions

Description

The functions listed here are no longer part of the pedometrics package. If you need to use any of these functions, you can still find them at https://github.com/samuel-rosa/ASRtools or https://cran.r-project.org/src/contrib/Archive/pedometrics/.

Usage

coordenadas(...)

cdfPlot(...)

cdfStats(...)

cdfTable(...)

gcpDiff(...)

trend.terms(...)

trend.matrix(...)
coordenadas(...)

cdfPlot(...)

cdfStats(...)

cdfTable(...)

gcpDiff(...)

trend.terms(...)

trend.matrix(...)

Arguments

...

Not used.

Value

No return value, called for side effects.

Author(s)

Alessandro Samuel-Rosa [email protected]
Tony Olsen [email protected]
Tom Kincaid [email protected]

Association between categorical variables

Description

Compute the Cramer's V, a descriptive statistic that measures the association between categorical variables.

Usage

cramer(x)
cramer(x)

Arguments

`x`	Data frame or matrix with a set of categorical variables.

Details

Any integer variable is internally converted to a factor.

Value

A matrix with the Cramer's V between the categorical variables.

Reverse dependencies

The spsann package, provider of methods for the optimization of sample configurations using spatial simulated annealing in R, requires cramer() for some of its functions to work. The development version of the spsann package is available on https://github.com/Laboratorio-de-Pedometria/spsann-package.

Note

The original code is available at https://sas-and-r.blogspot.com/, Example 8.39: calculating Cramer's V, posted by Ken Kleinman on Friday, June 3, 2011. As such, Ken Kleinman [email protected] is entitled a ‘contributor’ to the R-package pedometrics.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Cramér, H. Mathematical methods of statistics. Princeton: Princeton University Press, p. 575, 1946.

Everitt, B. S. The Cambridge dictionary of statistics. Cambridge: Cambridge University Press, p. 432, 2006.

Examples

if (interactive()) {
  data(meuse, package = "sp")
  str(meuse)
  test <- cramer(meuse[, c("ffreq", "soil", "lime", "landuse")])
}
if (interactive()) {
  data(meuse, package = "sp")
  str(meuse)
  test <- cramer(meuse[, c("ffreq", "soil", "lime", "landuse")])
}

Calculate module and azimuth

Description

Calculate the module and azimuth of the difference on x and y coordinates between two sets of ground control points (GCP). It is suited to perform calculations for topographical coordinates only. The origin is set in the y coordinate, and rotation performed clockwise.

Usage

gcpVector(dx, dy)
gcpVector(dx, dy)

Arguments

`dx`	Numeric vector containing the difference on the ‘x’ coordinate between two sets of GCP.
`dy`	Numeric vector containing the difference on the ‘y’ coordinate between two sets of GCP.

Value

A data frame containing the module, its square and azimuth. These three columns are named ‘module’, ‘sq.module’ and ‘azimuth’.

Note

This function was adapted from package's VecStatGraphs2D function LoadData().

Author(s)

Juan Carlos Ruiz Cuetos [email protected]
Maria Eugenia Polo Garcia [email protected]
Pablo Garcia Rodriguez [email protected]
Alessandro Samuel-Rosa [email protected]

References

Ruiz-Cuetos J.C., Polo M.E. and Rodriguez P.G. (2012). VecStatGraphs2D: Vector analysis using graphical and analytical methods in 2D. R package version 1.6. https://CRAN.R-project.org/package=VecStatGraphs2D.

Examples

x <- gcpVector(dx = rnorm(3, 5, 10), dy = rnorm(3, 5, 10))

x <- gcpVector(dx = rnorm(3, 5, 10), dy = rnorm(3, 5, 10))

Tests for data types

Description

Evaluate the data type contained in an object.

Usage

isNumint(x)

allNumint(x)

anyNumint(x)

whichNumint(x)

allInteger(x)

anyInteger(x)

whichInteger(x)

allFactor(x)

anyFactor(x)

whichFactor(x)

allNumeric(x)

anyNumeric(x)

whichNumeric(x)

uniqueClass(x)
isNumint(x)

allNumint(x)

anyNumint(x)

whichNumint(x)

allInteger(x)

anyInteger(x)

whichInteger(x)

allFactor(x)

anyFactor(x)

whichFactor(x)

allNumeric(x)

anyNumeric(x)

whichNumeric(x)

uniqueClass(x)

Arguments

`x`	Object to be tested.

Value

TRUE or FALSE depending on whether x contains a given data type.

Author(s)

Alessandro Samuel-Rosa [email protected]

Examples

# Vector of integers
x <- 1:10
isNumint(x) # FALSE

# Vector of numeric integers
x <- as.numeric(x)
isNumint(x) # TRUE

# Vector of numeric values
x <- c(1.1, 1, 1, 1, 2)
isNumint(x) # FALSE
allNumint(x) # FALSE
anyNumint(x) # TRUE
whichNumint(x)

# Single numeric integer
isNumint(1) # TRUE

# Single numeric value
isNumint(1.1) # FALSE

# Vector of integers
x <- 1:10
isNumint(x) # FALSE

# Vector of numeric integers
x <- as.numeric(x)
isNumint(x) # TRUE

# Vector of numeric values
x <- c(1.1, 1, 1, 1, 2)
isNumint(x) # FALSE
allNumint(x) # FALSE
anyNumint(x) # TRUE
whichNumint(x)

# Single numeric integer
isNumint(1) # TRUE

# Single numeric value
isNumint(1.1) # FALSE

Correlation plot

Description

Plotting correlation matrices.

Usage

plotCor(r, r2, col, breaks, col.names, ...)
plotCor(r, r2, col, breaks, col.names, ...)

Arguments

`r`	A square matrix with correlation values.
`r2`	(optional) A second square matrix with correlation values.
`col`	(optional) Color table to use for `image` – see `graphics::image()` for details. The default is a colorblind-friendly palette created using the RColorBrewer palette `"RdBu"`.
`breaks`	(optional) Break points in sorted order to indicate the intervals for assigning the colors. See `fields::image.plot()` for more details.
`col.names`	(optional) Character vector with short (up to 5 characters) column names.
`...`	(optional) Additional parameters passed to plotting functions.

Details

A correlation plot in an alternative way of showing the strength of the empirical correlations between variables. This is done by using a diverging color palette, where the darker the color, the stronger the absolute correlation value.

plotCor() can also be used to compare correlations between the same variables at different points in time or space or for different observations. This is done by passing two square correlation matrices using arguments r and r2. The lower triangle of the resulting correlation plot will contain correlations from r, correlations from r2 will be in the upper triangle, and the diagonal will be empty.

Value

A correlation plot.

Dependencies

The fields package, provider of tools for spatial data in R, is required for plotCor() to work. The development version of the fields package is available on https://github.com/dnychka/fieldsRPackage while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/fields/.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Neuwirth E (2022). RColorBrewer: ColorBrewer Palettes. R package version 1.1-3, https://CRAN.R-project.org/package=RColorBrewer.

Examples

if (all(c(require(sp), require(fields)))) {
  data(meuse, package = "sp")
  cols <- c("cadmium", "copper", "lead", "zinc", "elev", "dist", "om")

  # A single correlation matrix
  r <- cor(meuse[1:20, cols], use = "complete")
  r <- round(r, 2)
  plotCor(r)

  # Two correlation matrices: r2 goes in the upper triangle
  r2 <- cor(meuse[21:40, cols], use = "complete")
  r2 <- round(r2, 2)
  plotCor(r, r2)
}
if (all(c(require(sp), require(fields)))) {
  data(meuse, package = "sp")
  cols <- c("cadmium", "copper", "lead", "zinc", "elev", "dist", "om")

  # A single correlation matrix
  r <- cor(meuse[1:20, cols], use = "complete")
  r <- round(r, 2)
  plotCor(r)

  # Two correlation matrices: r2 goes in the upper triangle
  r2 <- cor(meuse[21:40, cols], use = "complete")
  r2 <- round(r2, 2)
  plotCor(r, r2)
}

Plots for exploratory spatial data analysis (ESDA)

Description

Create four plots for exploratory spatial data analysis (ESDA): histogram + density plot, bubble plot, variogram plot, and variogram map.

Usage

plotESDA(
  z,
  lat,
  lon,
  lags = NULL,
  cutoff = NULL,
  width = c(cutoff/20),
  leg.pos = "right"
)
plotESDA(
  z,
  lat,
  lon,
  lags = NULL,
  cutoff = NULL,
  width = c(cutoff/20),
  leg.pos = "right"
)

Arguments

`z`	Vector of numeric values of the variable for with ESDA plots should be created.
`lat`	Vector of numeric values containing the y coordinate (latitude) of the point locations where the `z` variable was observed.
`lon`	Vector of numeric values containing the x coordinate (longitude) of the point locations where the `z` variable was observed.
`lags`	(optional) Numerical vector; upper boundaries of lag-distance classes. See argument `boundaries` of `gstat::variogram()` for more info.
`cutoff`	(optional) Integer value defining the spatial separation distance up to which point pairs are included in semi-variance estimates. Defaults to the length of the diagonal of the box spanning the data divided by three.
`width`	Integer value specifying the width of subsequent distance intervals into which data point pairs are grouped for semi-variance estimates. Defaults to `width = cutoff / 20`.
`leg.pos`	(optional) Character value indication the location of the legend of the bubble plot. Defaults to `leg.pos = "right"`.

Details

The user should visit the help pages of gstat::variogram(), plotHD(), sp::bubble() and sp::spplot() to obtain more details about the main functions used to built plotESDA().

Value

Four plots: histogram and density plot, bubble plot, empirical variogram, and variogram map.

Dependencies

The sp package, provider of classes and methods for spatial data in R, is required for plotESDA() to work. The development version of the sp package is available on https://github.com/edzer/sp/ while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/sp/.

The gstat package, provider of methods for spatial and spatio-temporal geostatistical modelling, prediction and simulation in R, is required for plotESDA() to work. The development version of the sp package is available on https://github.com/r-spatial/gstat while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/gstat/.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Cressie, N.A.C. (1993) Statistics for Spatial Data. New York: John Wiley and Sons, p.900, 1993.

Pebesma, E.J. (2004) Multivariable geostatistics in S: the gstat package. Computers and Geosciences, 30:683-691, 2004.

Webster, R., Oliver, M.A. Geostatistics for environmental scientists. Chichester: John Wiley and Sons, p.315, 2007.

Examples

if (all(require(sp), require(gstat))) {
  data(meuse, package = "sp")
  p <- plotESDA(z = meuse$zinc, lat = meuse$y, lon = meuse$x)
}
if (all(require(sp), require(gstat))) {
  data(meuse, package = "sp")
  p <- plotESDA(z = meuse$zinc, lat = meuse$y, lon = meuse$x)
}

Histogram and density plot

Description

Plot a histogram and a density plot of a single variable using the R-package lattice.

Usage

plotHist(
  x,
  HD = "over",
  nint = 20,
  digits = 2,
  stats = TRUE,
  BoxCox = FALSE,
  col = c("lightgray", "black"),
  lwd = c(1, 1),
  lty = "dashed",
  xlim,
  ylim,
  ...
)

plotHD(
  x,
  HD = "over",
  nint = 20,
  digits = 2,
  stats = TRUE,
  BoxCox = FALSE,
  col = c("lightgray", "black"),
  lwd = c(1, 1),
  lty = "dashed",
  xlim,
  ylim,
  ...
)
plotHist(
  x,
  HD = "over",
  nint = 20,
  digits = 2,
  stats = TRUE,
  BoxCox = FALSE,
  col = c("lightgray", "black"),
  lwd = c(1, 1),
  lty = "dashed",
  xlim,
  ylim,
  ...
)

plotHD(
  x,
  HD = "over",
  nint = 20,
  digits = 2,
  stats = TRUE,
  BoxCox = FALSE,
  col = c("lightgray", "black"),
  lwd = c(1, 1),
  lty = "dashed",
  xlim,
  ylim,
  ...
)

Arguments

`x`	Vector of numeric values of the variable for which the histogram and density plot should be created.
`HD`	Character value indicating the type of plot to be created. Available options are `"over"`, to create a histogram superimposed by the theoretical density plot of a normally distributed variable, and `"stack"`, to create a histogram and an empirical density plot in separated panels. Defaults to `HD = "over"`.
`nint`	Integer specifying the number of histogram bins. Defaults to `nint = 20`.
`digits`	Integer indicating the number of decimal places to be used when printing the statistics of the variable `x`. Defaults to `digits = 2`.
`stats`	Logical to indicate if descriptive statistics of the variable `x` should be added to the plot. Available only when `HD = "over"`. The function tries to automatically find the best location to put the descriptive statistics given the shape of the histogram. Defaults to `stats = TRUE`.
`BoxCox`	Logical to indicate if the variable `x` should be transformed using the Box-Cox family of power transformations. The estimated lambda value of the Box-Cox transform is printed in the console. It is set to zero when negative. Defaults to `BoxCox = FALSE`.
`col`	Vector of two elements, the first indicating the color of the histogram, the second indicating the color of the density plot. Defaults to `col = c("lightgray", "black")`.
`lwd`	Vector of two elements, the first indicating the line width of the histogram, the second indicating the line width of the density plot. Defaults to `lwd = c(1, 1)`.
`lty`	Character value indicating the line type for the density plot. Defaults to `lty = "dashed"`.
`xlim`	Vector of two elements defining the limits of the x axis. The function automatically optimizes `xlim` based on the density plot.
`ylim`	Vector of two elements defining the limits of the y axis. The function automatically optimizes `ylim` based both histogram and density plot.
`...`	Other arguments that can be passed to lattice functions. There is no guarantee that they will work.

Details

The user should visit the help pages of lattice::histogram(), lattice::densityplot(), lattice::panel.mathdensity(), car::powerTransform(), and car::bcPower() to obtain more details about the main functions used to built plotHD().

Value

An object of class "trellis". The lattice::update.trellis() method can be used to update components of the object and the lattice::print.trellis() print method (usually called by default) will plot it on an appropriate plotting device.

Dependencies

The car package, provider of functions to accompany Fox and Weisberg's An R Companion to Applied Regression, is required for plotHist() to work. The development version of the car package is available on https://r-forge.r-project.org/projects/car/ while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/car/.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Sarkar, Deepayan (2008) Lattice: Multivariate Data Visualization with R, Springer. http://lmdvr.r-forge.r-project.org/

Examples

if (all(c(require(car), require(lattice), require(latticeExtra)))) {
  x <- rnorm(100, 10, 2)
  p1 <- plotHist(x, HD = "stack")
  p2 <- plotHist(x, HD = "over")
}
if (all(c(require(car), require(lattice), require(latticeExtra)))) {
  x <- rnorm(100, 10, 2)
  p1 <- plotHist(x, HD = "stack")
  p2 <- plotHist(x, HD = "over")
}

Model series plot

Description

Produce a graphical output to examine the effect of using different model specifications (design) on the predictive performance of these models (a model series). Devised to access the results of buildModelSeries() and statsMS(), but can be easily adapted to work with any model structure and performance measure.

Usage

plotModelSeries(
  obj,
  grid,
  line,
  ind,
  type = c("b", "g"),
  pch = c(20, 2),
  size = 0.5,
  arrange = "desc",
  color = NULL,
  xlim = NULL,
  ylab = NULL,
  xlab = NULL,
  at = NULL,
  ...
)

plotMS(
  obj,
  grid,
  line,
  ind,
  type = c("b", "g"),
  pch = c(20, 2),
  size = 0.5,
  arrange = "desc",
  color = NULL,
  xlim = NULL,
  ylab = NULL,
  xlab = NULL,
  at = NULL,
  ...
)
plotModelSeries(
  obj,
  grid,
  line,
  ind,
  type = c("b", "g"),
  pch = c(20, 2),
  size = 0.5,
  arrange = "desc",
  color = NULL,
  xlim = NULL,
  ylab = NULL,
  xlab = NULL,
  at = NULL,
  ...
)

plotMS(
  obj,
  grid,
  line,
  ind,
  type = c("b", "g"),
  pch = c(20, 2),
  size = 0.5,
  arrange = "desc",
  color = NULL,
  xlim = NULL,
  ylab = NULL,
  xlab = NULL,
  at = NULL,
  ...
)

Arguments

`obj`	Object of class `data.frame`, generally returned by `statsMS()`, containing: a series of performance statistics of several models, and the design information of each model. See ‘Details’ for more information.
`grid`	Vector of integer values or character strings indicating the columns of the `data.frame` containing the design data which will be gridded using the function `lattice::levelplot()`. See ‘Details’ for more information.
`line`	Character string or integer value indicating which of the performance statistics (usually calculated by `statsMS()`) should be plotted using the function `lattice::xyplot()`. See ‘Details’ for more information.
`ind`	Integer value indicating for which group of models the mean rank is to be calculated. See ‘Details’ for more information.
`type`	Vector of character strings indicating some of the effects to be used when plotting the performance statistics using `lattice::xyplot()`. Defaults to `type = c("b", "g")`. See `lattice::panel.xyplot()` for more information on how to set this argument.
`pch`	Vector with two integer values specifying the symbols to be used to plot points. The first sets the symbol used to plot the performance statistic, while the second sets the symbol used to plot the mean rank of the indicator set using argument `ind`. Defaults to `pch = c(20, 2)`. See `graphics::points()` for possible values and their interpretation.
`size`	Numeric value specifying the size of the symbols used for plotting the mean rank of the indicator set using argument `ind`. Defaults to `size = 0.5`. See `grid::grid.points()` for more information.
`arrange`	Character string indicating how the model series should be arranged, which can be in ascending (`"asc"`) or descending (`"desc"`, default) order.
`color`	Vector defining the colors to be used in the grid produced by function `lattice::levelplot()`. If `color = NULL`, defaults to `color = cm.colors(n)`, where `n` is the number of unique values in the columns defined by argument `grid`. See `grDevices::cm.colors()` to see how to use other color palettes.
`xlim`	Numeric vector of length 2, giving the x coordinates range. If `xlim = NULL` (which is the recommended value), defaults to `xlim = c(0.5, dim(obj)[1] + 0.5)`. This is, so far, the optimum range for adequate plotting.
`ylab`	Character vector of length 2, giving the y-axis labels. When `obj` is a `data.frame` returned by `statsMS()`, and the performance statistic passed to argument `line` is one of those calculated by `statsMS()` (`"candidates"`, `"df"`, `"aic"`, `"rmse"`, `"nrmse"`, `"r2"`, `"adj_r2"`, or `"ADJ_r2"`), the function tries to automatically identify the correct `ylab`.
`xlab`	Character vector of unit length, the x-axis label. Defaults `xlab = "Model ranking"`.
`at`	Numeric vector indicating the location of tick marks along the x axis (in native coordinates).
`...`	Other arguments for plotting, although most of these have no been tested. Argument `asp`, for example, is not effective since the function automatically identifies the best aspect for plotting based on the dimensions of the design data.

Details

This section gives more details about arguments obj, grid, line, arrange, and ind.

obj

The argument obj usually constitutes a data.frame returned by statsMS(). However, the user can use any data.frame object as far as it contains the two basic units of information needed:

design data passed with argument grid
performance statistic passed with argument line

grid

The argument grid indicates the design data which is used to produce the grid output in the top of the model series plot. By design we mean the data that specify the structure of each model and how they differ from each other. Suppose that eight linear models were fit using three types of predictor variables (a, b, and c). Each of these predictor variables is available in two versions that differ by their accuracy, where 0 means a less accurate predictor variable, while 1 means a more accurate predictor variable. This yields 2^3 = 8 total possible combinations. The design data would be of the following form:

⁠ > design a b c 1 0 0 0 2 0 0 1 3 0 1 0 4 1 0 0 5 0 1 1 6 1 0 1 7 1 1 0 8 1 1 1 ⁠

line

The argument line corresponds to the performance statistic that is used to arrange the models in ascending or descending order, and to produce the line output in the bottom of the model series plot. For example, it can be a series of values of adjusted coefficient of determination, one for each model:

⁠ adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89) ⁠

arrange

The argument arrange automatically arranges the model series according to the performance statistics selected with argument line. If obj is a data.frame returned by statsMS(), then the function uses standard arranging approaches. For most performance statistics, the models are arranged in descending order. The exception is when "r2", "adj_r2", or "ADJ_r2" are used, in which case the models are arranged in ascending order. This means that the model with lowest value appears in the leftmost side of the model series plot, while the models with the highest value appears in the rightmost side of the plot.

⁠ > arrange(obj, adj_r2) id a b c adj_r2 1 5 1 0 1 0.54 2 2 0 0 1 0.74 3 3 1 0 0 0.81 4 4 0 1 0 0.85 5 6 0 1 1 0.86 6 1 0 0 0 0.87 7 8 1 1 1 0.89 8 7 1 1 0 0.90 ⁠

This results suggest that the best performing model is that of id = 7, while the model of id = 5 is the poorest one.

ind

The model series plot allows to see how the design influences model performance. This is achieved mainly through the use of different colors in the grid output, where each unique value in the design data is represented by a different color. For the example given above, one could try to see if the models built with the more accurate versions of the predictor variables have a better performance by identifying their relative distribution in the model series plot. The models placed at the rightmost side of the plot are those with the best performance.

The argument ind provides another tool to help identifying how the design, more specifically how each variable in the design data, influences model performance. This is done by simply calculating the mean ranking of the models that were built using the updated version of each predictor variable. This very same mean ranking is also used to rank the predictor variables and thus identify which of them is the most important.

After arranging the design data described above using the adjusted coefficient of determination, the following mean rank is obtained for each predictor variable:

⁠ > rank_center a b c 1 5.75 6.25 5.25 ⁠

This result suggests that the best model performance is obtained when using the updated version of the predictor variable b. In the model series plot, the predictor variable b appears in the top row, while the predictor variable c appears in the bottom row.

Value

An object of class "trellis" consisting of a model series plot.

Dependencies

The grDevices package, provider of graphics devices and support for colours and fonts in R, is required for plotModelSeries() to work.

The grid package, a rewrite of the graphics layout capabilities in R, is required for plotModelSeries() to work.

Warning

Use the original functions lattice::xyplot() and lattice::levelplot() for higher customization.

Note

Some of the solutions used to build this function were found in the source code of the R-package mvtsplot. As such, the author of that package, Roger D. Peng [email protected], is entitled ‘contributors’ to the R-package pedometrics.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Deepayan Sarkar (2008). Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN 978-0-387-75968-5.

Roger D. Peng (2008). A method for visualizing multivariate time series data. Journal of Statistical Software. v. 25 (Code Snippet), p. 1-17.

Roger D. Peng (2012). mvtsplot: Multivariate Time Series Plot. R package version 1.0-1. https://CRAN.R-project.org/package=mvtsplot.

Examples

if (all(require(grDevices), require(grid))) {
  # This example follows the discussion in section "Details"
  # Note that the data.frame is created manually
  id <- c(1:8)
  design <- data.frame(a = c(0, 0, 1, 0, 1, 0, 1, 1),
                       b = c(0, 0, 0, 1, 0, 1, 1, 1),
                       c = c(0, 1, 0, 0, 1, 1, 0, 1))
  adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
  obj <- cbind(id, design, adj_r2)
  p <- plotModelSeries(obj, grid = c(2:4), line = "adj_r2", ind = 1, 
              color = c("lightyellow", "palegreen"),
              main = "Model Series Plot")
}
if (all(require(grDevices), require(grid))) {
  # This example follows the discussion in section "Details"
  # Note that the data.frame is created manually
  id <- c(1:8)
  design <- data.frame(a = c(0, 0, 1, 0, 1, 0, 1, 1),
                       b = c(0, 0, 0, 1, 0, 1, 1, 1),
                       c = c(0, 1, 0, 0, 1, 1, 0, 1))
  adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
  obj <- cbind(id, design, adj_r2)
  p <- plotModelSeries(obj, grid = c(2:4), line = "adj_r2", ind = 1, 
              color = c("lightyellow", "palegreen"),
              main = "Model Series Plot")
}

Return the minimum value in each row of a numeric matrix

Description

This function returns the minimum value in each row of a numeric matrix.

Usage

rowMinCpp(x)
rowMinCpp(x)

Arguments

`x`	Numeric matrix with two or more rows and/or columns.

Details

This function is implemented in C++ to speed-up the computation time for large matrices.

Value

A numeric vector with the minimum value of each row if the matrix.

Author(s)

Alessandro Samuel-Rosa [email protected]

Examples

x <- matrix(rnorm(20), nrow = 5)
rowMinCpp(x)
x <- matrix(rnorm(20), nrow = 5)
rowMinCpp(x)

Moment coefficient of skewness

Description

Compute the moment coefficient of skewness of a continuous, possibly non-normal variable.

Usage

skewness(x)
skewness(x)

Arguments

`x`	Numeric vector, the values of the variable of interest.

Value

A numerical value: the moment coefficient of skewness of x.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

B. S. Everitt, The Cambridge Dictionary of Statistics, 3rd ed. Cambridge: Cambridge University Press, 2006, p. 432.

D. N. Joanes and C. A. Gill, Comparing measures of sample skewness and kurtosis, J Royal Statistical Soc D, vol. 47, no. 1, pp. 183–189, Mar. 1998, doi: 10.1111/1467-9884.00122.

H. Cramér, Mathematical Methods of Statistics. Princeton: Princeton University Press, 1946, p. 575.

Examples

x <- rlnorm(10)
skw <- skewness(x)
x <- rlnorm(10)
skw <- skewness(x)

Obtain performance statistics of a series of linear models

Description

Compute several statistics measuring the performance of a series of linear models built using buildModelSeries(), with an option to rank the models based on one of the returned performance statistics.

Usage

statsModelSeries(model, design.info, arrange.by, digits)

statsMS(model, design.info, arrange.by, digits)
statsModelSeries(model, design.info, arrange.by, digits)

statsMS(model, design.info, arrange.by, digits)

Arguments

`model`	A list of linear models returned by `buildModelSeries()`.
`design.info`	Extra information about the linear models in the series.
`arrange.by`	Character string defining if the table with the performance statistics of the linear models should be arranged, and which column should be used. Available options are `"candidates"`, `"df"`, `"aic"`, `"rmse"`, `"nrmse"`, `"r2"`, `"adj_r2"`, and `"ADJ_r2"`. Descending order is used by default and cannot be changed in the current implementation. See ‘Value’ for more information.
`digits`	Integer or vector with six integers indicating the number of decimal places to be used to round the performance statistics. If a vector is passed to the function, the number of decimal places should be in the following order: `c("aic", "rmse", "nrmse", "r2", "adj_r2", "ADJ_r2")`.

Details

This function was devised to deal with a list of linear models generated by the function buildModelSeries(). The main objective is to compare several linear models using several performance statistics. Such statistics can then be used to rank the linear models and identify, for example, the best performing model, given the selected performance statistics.

An important feature of buildModelSeries() is that it uses the information about the initial number of candidate predictor variables offered to the build the model to calculate penalized or adjusted measures of model performance. Such information is recorded as an attribute of the final model selected by buildModelSeries(). This feature was included in statsModelSeries() because data-driven variable selection results biased linear models (too optimistic), and the effective number of degrees of freedom is close to the number of candidate predictor variables initially offered to the model (Harrell, 2001).

Value

A data frame with several performance statistics:

id: Identification of the model.
candidates: Number of candidate predictor variables initially offered to the model.
df: Number of degrees of freedom of the final selected model.
aic: Akaike's Information Criterion (AIC). Obtained using extractAIC.
rmse: Root-mean squared error, calculated based on the number of candidate predictor variables initially offered to the model.
nrmse: Normalized Root-mean squared error, calculated as the ratio between the RMSE and the standard deviation of the observed values of the dependent variable.
r2: Multiple coefficient of determination.
adj_r2: Adjusted multiple coefficient of determination.
ADJ_r2: Adjusted multiple coefficient of determination. Calculations are done based on the number of candidate predictor variables initially offered to the model.

TODO

Include other performance statistics such as: PRESS, BIC, Mallow's Cp, max(VIF);
Add option to select which performance statistics should be returned.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Harrell, F. E. (2001) Regression modelling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.

Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.

Examples

if (interactive()) {
  # based on the second example of function MASS:stepAIC()
  require(MASS)
  cpus1 <- cpus
  for(v in names(cpus)[2:7])
    cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])), 
                      include.lowest = TRUE)
  cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
  cpus.samp <- sample(1:209, 100)
  cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                    chmax + perf),
                    formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                    formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
  data <- cpus1[cpus.samp,2:8]
  cpus.ms <- buildModelSeries(cpus.form, data, vif = TRUE, aic = TRUE)
  cpus.des <- data.frame(a = c(0, 1, 0), b = c(1, 0, 1), c = c(1, 1, 0))
  stats <- statsModelSeries(cpus.ms, design.info = cpus.des, arrange.by = "aic")
}
if (interactive()) {
  # based on the second example of function MASS:stepAIC()
  require(MASS)
  cpus1 <- cpus
  for(v in names(cpus)[2:7])
    cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])), 
                      include.lowest = TRUE)
  cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
  cpus.samp <- sample(1:209, 100)
  cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
                    chmax + perf),
                    formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
                    formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
  data <- cpus1[cpus.samp,2:8]
  cpus.ms <- buildModelSeries(cpus.form, data, vif = TRUE, aic = TRUE)
  cpus.des <- data.frame(a = c(0, 1, 0), b = c(1, 0, 1), c = c(1, 1, 0))
  stats <- statsModelSeries(cpus.ms, design.info = cpus.des, arrange.by = "aic")
}

Variable selection using the (generalized) variance-inflation factor (VIF)

Description

This function takes a linear model and selects the subset of predictor variables that meet a user-specific collinearity threshold measured by the (generalized) variance-inflation factor (VIF).

Usage

stepVIF(model, threshold = 10, verbose = FALSE)
stepVIF(model, threshold = 10, verbose = FALSE)

Arguments

`model`	Linear model (object of class 'lm') containing collinear predictor variables.
`threshold`	Positive number defining the maximum allowed VIF. Defaults to `threshold = 10`.
`verbose`	Logical indicating if iteration results should be printed. Defaults to `verbose = FALSE`.

Details

stepVIF starts computing the VIF of all predictor variables in the linear model. If the linear model contains categorical predictor variables, generalized variance-inflation factors (GVIF) (Fox and Monette, 1992), are calculated instead using car::vif(). GVIF is interpretable as the inflation in size of the confidence ellipse or ellipsoid for the coefficients of the predictor variable in comparison with what would be obtained for orthogonal, uncorrelated data. Since categorical predictors have more than one degree of freedom, df, the confidence ellipsoid will have df dimensions, and GVIF will need to be adjusted so that it can be comparable across predictor variables. The adjustment is made using the following equation:

$GVIF^{1/(2\times df)}$

The next step consists of evaluating if any of the predictor variables has a (G)VIF larger than the specified threshold, the function default being threshold = 10. For, GVIF^(1/(2*df)), the threshold will be sqrt(threshold).

If there is only one predictor variable that does not meet the VIF threshold, it is automatically removed from the model and no further processing occurs. When there are two or more predictor variables that do not meet the (G)VIF threshold, stepVIF() fits a linear model between each of them and the dependent variable. The predictor variable with the lowest adjusted coefficient of determination is dropped from the model and new coefficients are calculated, resulting in a new linear model.

This process lasts until all predictor variables included in the new model meet the (G)VIF threshold.

Nothing is done if all predictor variables have a (G)VIF value lower that the threshold, and stepVIF() returns the original linear model.

Value

A linear model (object of class 'lm') with low collinearity.

Dependencies

Note

More on the use of GVIF to measure the collinearity in linear models containing categorical predictor variables can be found on StackExchange.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Fox, J. and Monette, G. (1992) Generalized collinearity diagnostics. JASA, 87, 178–183.

Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, Second Edition. Sage.

Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition. Thousand Oaks: Sage.

Hair, J. F., Black, B., Babin, B. and Anderson, R. E. (2010) Multivariate data analysis. New Jersey: Pearson Prentice Hall.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples

if (require(car)) {
  fit <- lm(prestige ~ income + education + type, data = Duncan)
  fit <- stepVIF(fit, threshold = 10, verbose = TRUE)
}
if (require(car)) {
  fit <- lm(prestige ~ income + education + type, data = Duncan)
  fit <- stepVIF(fit, threshold = 10, verbose = TRUE)
}

Variogram binning

Description

Computation of bins for sample (experimental) variograms.

Usage

variogramBins(
  coords,
  n.lags = 7,
  type = "exp",
  cutoff = 0.5,
  base = 2,
  zero = 0.001,
  count = "pairs"
)

vgmLags(
  coords,
  n.lags = 7,
  type = "exp",
  cutoff = 0.5,
  base = 2,
  zero = 0.001,
  count = "pairs"
)
variogramBins(
  coords,
  n.lags = 7,
  type = "exp",
  cutoff = 0.5,
  base = 2,
  zero = 0.001,
  count = "pairs"
)

vgmLags(
  coords,
  n.lags = 7,
  type = "exp",
  cutoff = 0.5,
  base = 2,
  zero = 0.001,
  count = "pairs"
)

Arguments

`coords`	Data frame or matrix with the projected x- and y-coordinates.
`n.lags`	Integer value defining the number of variogram bins (distance classes) that should be computed. Defaults to `n = 7`.
`type`	Character value defining the type of variogram bins that should be computed, with options `"equi"` (equidistant, equal-width) and `"exp"` (exponential, exponentially growing). Defaults to `type = "exp"`.
`cutoff`	Numeric value defining the fraction of the diagonal of the rectangle that spans the data (bounding box) that should be used to set the maximum distance up to which variogram bins should be computed. Defaults to `cutoff = 0.5`, i.e. half the diagonal of the bounding box.
`base`	Numeric value defining the base of the exponential expression used to create exponentially growing variogram bins. Used only when `type = "exp"`. Defaults to `base = 2`, i.e. the width of the rightmost bin is equal to half the diagonal of `cutoff`, and so on.
`zero`	Numeric value setting the minimum pair-wise separation distance that should be used to compute the variogram bins. Defaults to `zero = 0.0001`.
`count`	Should the number of points (`"points"`) or point-pairs (`"pairs"`) per variogram bin be computed? Defaults to `count = "pairs"`.

Value

Vector of numeric values with the lower and upper boundaries of the variogram bins. The number of points or point-pairs per variogram bin is returned as an attribute.

Dependencies

The SpatialTools package, provider of tools for spatial data analysis in R, is required for variogramBins() to work. The development version of the SpatialTools package is available on https://github.com/jfrench/SpatialTools while its old versions are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/SpatialTools/.

Reverse dependencies

The spsann package, provider of methods for the optimization of sample configurations using spatial simulated annealing in R, requires variogramBins() for some of its functions to work. The development version of the spsann package is available on https://github.com/Laboratorio-de-Pedometria/spsann-package.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Truong, P. N.; Heuvelink, G. B. M.; Gosling, J. P. Web-based tool for expert elicitation of the variogram. Computers and Geosciences. v. 51, p. 390-399, 2013.

Examples

if (all(c(require(SpatialTools), require(sp)))) {
  data(meuse, package = "sp")
  lags_points <- variogramBins(coords = meuse[, 1:2], count = "points")
  lags_pairs <- variogramBins(coords = meuse[, 1:2], count = "pairs")
}
if (all(c(require(SpatialTools), require(sp)))) {
  data(meuse, package = "sp")
  lags_points <- variogramBins(coords = meuse[, 1:2], count = "points")
  lags_pairs <- variogramBins(coords = meuse[, 1:2], count = "pairs")
}

Guess the parameters of a spatial covariance function

Description

Guess the parameters of the spatial covariance function of a random, regionalized variable. A guess of such parameters is required to start the fitting functions of many geostatistical packages such as gstat, geoR, and georob.

Usage

variogramGuess(
  z,
  coords,
  lags,
  cutoff = 0.5,
  method = "a",
  min.npairs = 30,
  model = "matern",
  nu = 0.5,
  estimator = "qn",
  plotit = FALSE
)

vgmICP(
  z,
  coords,
  lags,
  cutoff = 0.5,
  method = "a",
  min.npairs = 30,
  model = "matern",
  nu = 0.5,
  estimator = "qn",
  plotit = FALSE
)
variogramGuess(
  z,
  coords,
  lags,
  cutoff = 0.5,
  method = "a",
  min.npairs = 30,
  model = "matern",
  nu = 0.5,
  estimator = "qn",
  plotit = FALSE
)

vgmICP(
  z,
  coords,
  lags,
  cutoff = 0.5,
  method = "a",
  min.npairs = 30,
  model = "matern",
  nu = 0.5,
  estimator = "qn",
  plotit = FALSE
)

Arguments

`z`	Numeric vector with the values of the regionalized variable for which the values for the spatial covariance parameters should be guessed.
`coords`	Data frame or matrix with the projected x- and y-coordinates.
`lags`	Numeric scalar defining the width of the variogram bins or a numeric vector with the lower and upper bounds of the variogram bins. If missing, the variogram bins are computed using `variogramBins()`. See ‘Details’ for more information.
`cutoff`	Numeric value defining the fraction of the diagonal of the rectangle that spans the data (bounding box) that should be used to set the maximum distance up to which variogram bins should be computed. Defaults to `cutoff = 0.5`, i.e. half the diagonal of the bounding box.
`method`	Character keyword defining the method used for guessing the spatial covariance parameters of the regionalized variable. Defaults to `method = "a"`. See ‘Details’ for more information.
`min.npairs`	Positive integer defining the minimum number of point-pairs required so that a variogram bin is used to guessing the spatial covariance parameters of the of the regionalized variable. Defaults to `min.npairs = 30`.
`model`	Character keyword defining the spatial covariance function that will be fitted to the data of the regionalized variable. Currently, most of the basic spatial covariance function are accepted. See `geoR::cov.spatial()` for more information. Defaults to `model = "matern"`.
`nu`	Numerical value for the additional smoothness parameter $\nu$ of the spatial covariance function of the regionalized variable. See `RandomFields::RMmodel()` and argument `kappa` of `geoR::cov.spatial()` for more information.
`estimator`	Character keyword defining the estimator for computing the sample (experimental) variogram of the regionalized variable, with options `"qn"` (default), `"mad"`, `"matheron"`, and `"ch"`. See `georob::sample.variogram()` for more details.
`plotit`	Should the guessed spatial covariance parameters be plotted along with the sample (experimental) variogram of the regionalized variable? Defaults to `plotit = FALSE`.

Details

There are five methods two guess the covariance parameters. Two of them, "a" and "c", rely on a sample variogram with exponentially growing variogram bins, while the other three, "b", "d", and "e", use equal-width variogram bins (see variogramBins()). All of them are heuristic.

Method "a" was developed in-house and is the most elaborated of them, specially for guessing the nugget variance.

Method "b" was proposed by doi:10.1016/0098-3004(95)00095-XJian et al. (1996) and is implemented in SAS/STAT(R) 9.22.

Method "c" is implemented in the automap-package and was developed by doi:10.1016/j.cageo.2008.10.011Hiemstra et al. (2009).

Method "d" was developed by doi:10.1007/s11004-012-9434-1Desassis & Renard (2012).

Method "e" was proposed by Larrondo et al. (2003) http://www.ccgalberta.com/ccgresources/report05/2003-122-varfit.pdf and is implemented in the VARFIT module of GSLIB http://www.gslib.com/.

Value

A vector of numerical values, the guesses for the spatial covariance parameters of the regionalized variable:

nugget
partial sill
range

Dependencies

The georob package, provider of functions for the robust geostatistical analysis of spatial data in R, is required for variogramGuess() to work. The old versions of the georob package are available on the CRAN archive at https://cran.r-project.org/src/contrib/Archive/georob/.

Author(s)

Alessandro Samuel-Rosa [email protected]

References

Desassis, N. & Renard, D. Automatic variogram modelling by iterative least squares: univariate and multivariate cases. Mathematical Geosciences. Springer Science + Business Media, v. 45, p. 453-470, 2012.

Hiemstra, P. H.; Pebesma, E. J.; Twenhöfel, C. J. & Heuvelink, G. B. Real-time automatic interpolation of ambient gamma dose rates from the Dutch radioactivity monitoring network. Computers & Geosciences. Elsevier BV, v. 35, p. 1711-1721, 2009.

Jian, X.; Olea, R. A. & Yu, Y.-S. Semivariogram modelling by weighted least squares. Computers & Geosciences. Elsevier BV, v. 22, p. 387-397, 1996.

Larrondo, P. F.; Neufeld, C. T. & Deutsch, C. V. VARFIT: a program for semi-automatic variogram modelling. Edmonton: Department of Civil and Environmental Engineering, University of Alberta, p. 17, 2003.

Examples

if (all(c(require(sp), require(georob)))) {
  data(meuse, package = "sp")
  icp <- variogramGuess(z = log(meuse$copper), coords = meuse[, 1:2])
}
if (all(c(require(sp), require(georob)))) {
  data(meuse, package = "sp")
  icp <- variogramGuess(z = log(meuse$copper), coords = meuse[, 1:2])
}

Spatially correlated variance (SCV)

Description

Compute the proportion of the variance that is spatially correlated.

Usage

vgmSCV(obj, digits = 4)

## S3 method for class 'variomodel'
vgmSCV(obj, digits = 4)

## S3 method for class 'variogramModel'
vgmSCV(obj, digits = 4)

## S3 method for class 'georob'
vgmSCV(obj, digits = 4)
vgmSCV(obj, digits = 4)

## S3 method for class 'variomodel'
vgmSCV(obj, digits = 4)

## S3 method for class 'variogramModel'
vgmSCV(obj, digits = 4)

## S3 method for class 'georob'
vgmSCV(obj, digits = 4)

Arguments

`obj`	Variogram model fitted with available function in geostatistical packages such as gstat, geoR, and georob.
`digits`	Integer indicating the number of decimal places to be used.

Value

Numeric value indicating the proportion of the variance that is spatially correlated.

Author(s)

Alessandro Samuel-Rosa [email protected]

Examples

if (all(c(require(gstat), require(sp)))) {
  library(gstat)
  library(sp)
  data(meuse)
  sp::coordinates(meuse) <- ~ x + y
  vgm1 <- gstat::variogram(log(zinc) ~ 1, meuse)
  fit <- gstat::fit.variogram(vgm1, gstat::vgm(1, "Sph", 300, 1))
  res <- vgmSCV(fit)
}
if (all(c(require(gstat), require(sp)))) {
  library(gstat)
  library(sp)
  data(meuse)
  sp::coordinates(meuse) <- ~ x + y
  vgm1 <- gstat::variogram(log(zinc) ~ 1, meuse)
  fit <- gstat::fit.variogram(vgm1, gstat::vgm(1, "Sph", 300, 1))
  res <- vgmSCV(fit)
}

Package 'pedometrics'

Help Index

Pedometric Tools and Techniques

Description

Details

Author(s)

Adjusted coefficient of determination

Description

Usage

Arguments

Value

Author(s)

References

Examples

Create a Spatial* object from a bounding box

Description

Usage

Arguments

Value

Dependencies

Note

Author(s)

References

Examples

Build a series of linear models using automated variable selection

Description

Usage

Arguments

Details

Value

TODO

Dependencies

Author(s)

References

See Also

Examples

Evaluation of geostatistical models of uncertainty

Description

Usage

Arguments

Details

Error statistics

Coverage probabilities

Value

Note

Author(s)

References

Examples

Categorize/stratify numerical variable(s)

Description

Usage

Arguments

Details

Value

Dependencies

Reverse dependencies

Author(s)

References

Examples

Defunct functions

Description

Usage

Arguments

Value

Author(s)

Association between categorical variables

Description

Usage

Arguments

Details

Value

Reverse dependencies

Note

Author(s)

References

Examples

Calculate module and azimuth

Description

Usage

Arguments