calc_type_metrics {qtkit} | R Documentation |
Calculate Type Metrics for Text Data
Description
This function calculates type metrics for tokenized text data.
Usage
calc_type_metrics(data, type, document, frequency = NULL, dispersion = NULL)
Arguments
data |
A data frame containing the tokenized text data |
type |
The variable in |
document |
The variable in |
frequency |
A character vector indicating which
frequency metrics to use. If NULL (default),
only the |
dispersion |
A character vector indicating which
dispersion metrics to use. If NULL (default),
only the |
Value
A data frame with columns:
-
type
: The unique types from the input data. -
n
: The frequency of each type across all document. Optionally (based on thefrequency
anddispersion
arguments): -
rf
: The relative frequency of each type across all document. -
orf
: The observed relative frequency (per 100) of each type across all document. -
df
: The document frequency of each type. -
idf
: The inverse document frequency of each type. -
dp
: Gries' Deviation of Proportions of each type.
References
Gries, Stefan Th. (2023). Statistical Methods in Corpus Linguistics. In Readings in Corpus Linguistics: A Teaching and Research Guide for Scholars in Nigeria and Beyond, pp. 78-114.
Examples
data_path <- system.file("extdata", "types_data.rds", package = "qtkit")
data <- readRDS(data_path)
calc_type_metrics(
data = data,
type = type,
document = document,
frequency = c("rf", "orf"),
dispersion = c("df", "idf")
)