calc_type_metrics {qtkit}R Documentation

Calculate Type Metrics for Text Data

Description

This function calculates type metrics for tokenized text data.

Usage

calc_type_metrics(data, type, document, frequency = NULL, dispersion = NULL)

Arguments

data

A data frame containing the tokenized text data

type

The variable in data that contains the type (e.g., term, lemma) to analyze.

document

The variable in data that contains the document IDs.

frequency

A character vector indicating which frequency metrics to use. If NULL (default), only the type and n are returned. Other options: 'all', 'rf' calculates relative frequency, 'orf' calculates observed relative frequency. Can specify multiple options: c("rf", "orf").

dispersion

A character vector indicating which dispersion metrics to use. If NULL (default), only the type and n are returned. Other options: 'all', 'df' calculates Document Frequency. 'idf' calculates Inverse Document Frequency. 'dp' calculates Gries' Deviation of Proportions. Can specify multiple options: c("df", "idf").

Value

A data frame with columns:

References

Gries, Stefan Th. (2023). Statistical Methods in Corpus Linguistics. In Readings in Corpus Linguistics: A Teaching and Research Guide for Scholars in Nigeria and Beyond, pp. 78-114.

Examples

data_path <- system.file("extdata", "types_data.rds", package = "qtkit")
data <- readRDS(data_path)
calc_type_metrics(
  data = data,
  type = type,
  document = document,
  frequency = c("rf", "orf"),
  dispersion = c("df", "idf")
)


[Package qtkit version 1.0.0 Index]