nlp_split_paragraphs {textpress} | R Documentation |
Split Text into Paragraphs
Description
Splits text from the 'text' column of a data frame into individual paragraphs, based on a specified paragraph delimiter.
Usage
nlp_split_paragraphs(tif, paragraph_delim = "\\n+")
Arguments
tif |
A data frame with at least two columns: 'doc_id' and 'text'. |
paragraph_delim |
A regular expression pattern used to split text into paragraphs. |
Value
A data.table with columns: 'doc_id', 'paragraph_id', and 'text'. Each row represents a paragraph, along with its associated document and paragraph identifiers.
Examples
tif <- data.frame(doc_id = c('1', '2'),
text = c("Hello world.\n\nMind your business!",
"This is an example.n\nThis is a party!"))
paragraphs <- nlp_split_paragraphs(tif)
[Package textpress version 1.0.0 Index]