Processing Text with R Essential Training

Processing Text with R Essential Training

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 0h 55m | 414 MB

Today’s big data and analytics pipelines are consuming more and more text data generated through websites, social media, and private communications. But deriving insights from text isn’t straightforward; it requires a series of techniques and forms for preparing text for analytics and machine learning. In this course, learn the essential techniques for cleansing and processing text in R, and discover how to convert text to a form that’s ready for analytics and predictions. Kumaran Ponnambalam begins by reviewing techniques for extracting, cleansing, and processing text. He then shows how to convert text into an analytics-ready form, including how to use n-grams and TF-IDF. Throughout the course, he provides examples for exercising these techniques using the R and tm libraries.

Topics include:

  • Acquiring text from various sources
  • Cleansing and transforming text data
  • Preparing TF-IDF matrices for machine learning
  • Building n-grams databases for text predictions
  • Best practices for scalability and storing text
Table of Contents

Introduction
1 The emergence of text analytics

Introduction to Text Mining
2 Purpose
3 Document
4 Corpus
5 R text processing libraries
6 Setting up the environment

Corpus in R
7 PCorpus and VCorpus
8 Reading files with CorpusReader
9 Exploring the corpus
10 Persisting the corpus

Text Cleansing and Extraction
11 Setup for processing
12 Cleansing text
13 Stop word removal
14 Stemming
15 Managing metadata

TF-IDF
16 Introduction to tf-idf
17 Generating term frequency matrix
18 Improving term frequency matrix
19 Plotting term frequency
20 Generating tf-idf

N-Grams
21 N-grams concepts
22 Using RWeka NGramTokenizer
23 Creating an n-gram text frequency matrix
24 Extracting n-gram pairs

Best Practices
25 Storing text
26 Processing text data
27 Scalability

Conclusion
28 Next steps