mine_text {inlpubs} | R Documentation |
Mine Text Components in the INLPO Publications
Description
Performs a word frequency text analysis of Idaho National Laboratory Project Office (INLPO) publications.
Usage
mine_text(
pubs,
components = c("title", "abstract"),
ngmin = 1L,
ngmax = ngmin,
lowfreq = 1L
)
Arguments
pubs |
'pub' class.
Bibliographic information, see |
components |
character vector. One or more text components to analyze. Choices include the "title", "abstract", "annotation", and "bibentry" of the document. |
ngmin , ngmax |
integer number. Splits strings into n-grams with given minimal and maximal numbers of grams. An n-gram is an ordered sequence of n words taken from the body of a text. Requires the RWeka package is available and that the environment variable JAVA_HOME points to where the Java software is located. Recommended for single text compoents only. |
lowfreq |
integer number. Lower frequency bound. Words that occur less than this bound are excluded from the returned frequency table. |
Details
HTML entities are decoded when the textutils package is available.
Value
A word frequency table giving the number of times each word occurs in a publication's text component(s). A table column represents a single publication that is identified using its bibentry-key. And each row provides frequency counts for a particular word (also known as a 'term').
Author(s)
J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center
See Also
make_wordcloud
function to create a word cloud.
Examples
m <- head(pubs, 3) |> mine_text()
head(m)
## Not run:
d <- data.frame(word = rownames(m), freq = rowSums(m))
file <- make_wordcloud(d, display = interactive())
unlink(file)
## End(Not run)