A Lightweight and Versatile NLP Toolkit


[Up] [Top]

Documentation for package ‘textpress’ version 1.0.0

Help Pages

.decode_duckduckgo_urls Decode DuckDuckGo Redirect URLs
.extract_links Extract links from a search engine result page
.get_site Get Site Content and Extract HTML Elements
.process_bing Process Bing search results
.process_duckduckgo Process DuckDuckGo search results
.process_yahoo Process Yahoo News search results
abbreviations Common Abbreviations for Sentence Splitting
api_huggingface_embeddings Call Hugging Face API for Embeddings
extract_date Extract Date from HTML Content
nlp_build_chunks Build Chunks for NLP Analysis
nlp_cast_tokens Convert Token List to Data Frame
nlp_melt_tokens Tokenize Data Frame by Specified Column(s)
nlp_split_paragraphs Split Text into Paragraphs
nlp_split_sentences Split Text into Sentences
nlp_tokenize_text Tokenize Text Data (mostly) Non-Destructively
sem_nearest_neighbors Find Nearest Neighbors Based on Cosine Similarity
sem_search_corpus NLP Search Corpus
standardize_date Standardize Date Format
web_scrape_urls Scrape News Data from Various Sources
web_search Process search results from multiple search engines