mine_text {inlpubs}R Documentation

Mine Text Components in the INLPO Publications

Description

Performs a word frequency text analysis of Idaho National Laboratory Project Office (INLPO) publications.

Usage

mine_text(
  pubs,
  components = c("title", "abstract"),
  ngmin = 1L,
  ngmax = ngmin,
  lowfreq = 1L
)

Arguments

pubs

'pub' class. Bibliographic information, see pubs dataset for details.

components

character vector. One or more text components to analyze. Choices include the "title", "abstract", "annotation", and "bibentry" of the document.

ngmin, ngmax

integer number. Splits strings into n-grams with given minimal and maximal numbers of grams. An n-gram is an ordered sequence of n words taken from the body of a text. Requires the RWeka package is available and that the environment variable JAVA_HOME points to where the Java software is located. Recommended for single text compoents only.

lowfreq

integer number. Lower frequency bound. Words that occur less than this bound are excluded from the returned frequency table.

Details

HTML entities are decoded when the textutils package is available.

Value

A word frequency table giving the number of times each word occurs in a publication's text component(s). A table column represents a single publication that is identified using its bibentry-key. And each row provides frequency counts for a particular word (also known as a 'term').

Author(s)

J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center

See Also

make_wordcloud function to create a word cloud.

Examples

m <- head(pubs, 3) |> mine_text()
head(m)

## Not run: 
  d <- data.frame(word = rownames(m), freq = rowSums(m))
  file <- make_wordcloud(d, display = interactive())
  unlink(file)

## End(Not run)

[Package inlpubs version 1.1.1 Index]