bow_pp_create_vocab_draft {aifeducation} | R Documentation |
Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.
bow_pp_create_vocab_draft(
path_language_model,
data,
upos = c("NOUN", "ADJ", "VERB"),
label_language_model = NULL,
language = NULL,
chunk_size = 100,
trace = TRUE
)
path_language_model |
|
data |
|
upos |
|
label_language_model |
|
language |
|
chunk_size |
|
trace |
|
list
with the following components.
vocab:
data.frame
containing the tokens, lemmas, tokens in lower case, and
lemmas in lower case.
ud_language_model
udpipe language model that is used for tagging.
label_language_model
Label of the udpipe language model.
language
Language of the raw texts.
upos
Used univerisal part-of-speech tags.
n_sentence
int
Estimated number of sentences in the raw texts.
n_token
int
Estimated number of tokens in the raw texts.
n_document_segments
int
Estimated number of document segments/raw texts.
A list of possible tags can be found here: https://universaldependencies.org/u/pos/index.html.
A huge number of models can be found here: https://ufal.mff.cuni.cz/udpipe/2/models.
Other Preparation:
bow_pp_create_basic_text_rep()