Processing Text with Python Essential Training

Processing Text with Python Essential Training

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 0h 33m | 83 MB

In the world of big data, more and more information is consumed and analyzed in text form. Websites, social media, emails, and chats have become the key sources for data and insights. If you work with data, then understanding how to deal with unstructured text data is essential. In this course, instructor Kumaran Ponnambalam helps you build your text mining skill set, covering key techniques for extracting, cleansing, and processing text in Python. Kumaran reviews key text processing concepts like tokenization and stemming. He also looks at techniques for converting text into analytics-ready form, including n-grams and TF-IDF. Along the way, he provides examples of these techniques using Python and the NLTK library.

Topics include:

  • Text mining today
  • Reading text files using Python
  • Cleansing text data
  • Build n-grams databases for text predictions
  • Preparing TF-IDF matrices for machine learning
  • Scaling text processing for performance
Table of Contents

1 The need for text mining skills in data science
2 Text mining today
3 Document concepts
4 Corpus concepts
5 Introduction to the NLTK library
6 Setting up the environment
7 Reading raw files
8 Reading files with corpus reader
9 Exploring the corpus
10 Analyzing the corpus
11 Tokenization
12 Cleansing text
13 Stop word removal
14 Stemming
15 Lemmatization
16 Building n-grams
17 Tagging parts of speech
18 Term frequency-inverse document frequency (TF-IDF)
19 Building a TF-IDF matrix
20 Storing text
21 Processing text data
22 Scalable processing of text data
23 Next steps