assert_col {eCerto} | R Documentation |
assert_col
will check in a data.frame for name, position,
type of a specific column and ensure that the return value (data frame)
contains a respective column. If possible, the current values are converted
into the specified type.
assert_col(
df,
name,
pos = NULL,
type = c("character", "integer", "numeric", "factor", "logical", "Date"),
fuzzy_name = TRUE,
default_value = NULL
)
df |
Input data frame. |
name |
Name of the column to ensure (and to search for). |
pos |
Position of this column. NULL to keep position where found in df. |
type |
Desired data type of this column. |
fuzzy_name |
Allow fuzzy matching (additional blanks and case insensitive search allowed). |
default_value |
Default value if column needs to be created or can not be converted to specified type. Keep NULL to use pre defined default values. |
tbd.
A data frame with a column of the specified name and type at the specified position. An error message is attached to the result as an attribute in case of unexpected events.
x <- data.frame(
"analyte" = c("A", "B"),
"tmp" = rep(0L, 2),
"unit" = c("x", "y")
)
str(x)
ac <- eCerto::assert_col
str(ac(df = x, name = "analyte", pos = 1, type = "factor"))
str(ac(df = x, name = "Analyte", pos = 3, type = "character"))
str(ac(df = x, name = " Analyte", pos = 2, type = "factor"))
str(ac(df = x, name = "Analyte", pos = 2, type = "factor", fuzzy_name = FALSE))
str(ac(df = x, name = "test", type = "factor", default_value = "test"))
# this will lead to NAs in column unit because the conversion does not lead to an error
# hence the default value is not used
str(ac(df = x, name = "unit", type = "numeric", default_value = 10))
# this will lead to the specified default data in column unit because the
# conversion attempt does lead to an error
str(ac(df = x, name = "unit", type = "Date"))
str(ac(df = data.frame("test" = "2022-03-31"), name = "test", type = "Date"))
# show type and class of internal default values
x <- data.frame(
"character" = "", "integer" = 0L, "numeric" = 0, "factor" = factor(NA),
"logical" = NA, "date" = Sys.Date(), NA
)
sapply(1:ncol(x), function(i) {
typeof(x[, i])
})
sapply(1:ncol(x), function(i) {
class(x[, i])
})