Hands-on NLP with NLTK and Scikit-learn

Hands-on NLP with NLTK and Scikit-learn

English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 2h 46m | 716 MB

A complete Python guide to Natural Language Processing to build spam filters, topic classifiers, and sentiment analyzers

There is an overflow of text data online nowadays. As a Python developer, you need to create a new solution using Natural Language Processing for your next project. Your colleagues depend on you to monetize gigabytes of unstructured text data. What do you do?

Hands-on NLP with NLTK and scikit-learn is the answer. This course puts you right on the spot, starting off with building a spam classifier in our first video. At the end of the course, you are going to walk away with three NLP applications: a spam filter, a topic classifier, and a sentiment analyzer. There is no need for fancy mathematical theory, just plain English explanations of core NLP concepts and how to apply those using Python libraries.
Taking this course will help you to precisely create new applications with Python and NLP. You will be able to build actual solutions backed by machine learning and NLP processing models with ease.

The course is full of hands-on instructions, interesting and illustrative visualizations, and clear explanations from a data scientist. It is packed full of useful tips and relevant advice. Throughout the course, we maintain a focus on practicality and getting things done, not fancy mathematical theory.

What You Will Learn

  • Build end-to-end Natural Language Processing solutions, ranging from getting data for your model to presenting its results.
  • Core NLP concepts such as tokenization, stemming, and stop word removal.
  • Use open source libraries such as NLTK, scikit-learn, and spaCy to perform routine NLP tasks.
  • Classify emails as spam or not-spam using basic NLP techniques and simple machine learning models.
  • Put documents in their relevant topics using techniques such as TF-IDF, SVMs, and LDAs.
  • Common text data processing steps to increase the performance of your machine learning models.
Table of Contents

Working with Natural Language Data
1 The Course Overview
2 Use Python, NLTK, spaCy, and Scikit-learn to Build Your NLP Toolset
3 Reading a Simple Natural Language File into Memory
4 Split the Text into Individual Words with Regular Expression
5 Converting Words into Lists of Lower Case Tokens
6 Removing Uncommon Words and Stop Words

Spam Classification with an Email Dataset
7 Use an Open Source Dataset, and What Is the Enron Dataset
8 Loading the Enron Dataset into Memory
9 Tokenization, Lemmatization, and Stop Word Removal
10 Bag-of-Words Feature Extraction Process with Scikit-learn
11 Basic Spam Classification with NLTK’s Naive Bayes

Sentiment Analysis with a Movie Review Dataset
12 Understanding the Origin and Features of the Movie Review Dataset
13 Loading and Cleaning the Review Data
14 Preprocessing the Dataset to Remove Unwanted Words and Characters
15 Creating TF-IDF Weighted Natural Language Features
16 Basic Sentiment Analysis with Logistic Regression Model

Boosting the Performance of Your Models with N-grams
17 Deep Dive into Raw Tokens from the Movie Reviews
18 Advanced Cleaning of Tokens Using Python String Functions and Regex
19 Creating N-gram Features Using Scikit-learn
20 Experimenting with Advanced Scikit-learn Models Using the NLTK Wrapper
21 Building a Voting Model with Scikit-learn

Document Classification with a Newsgroup Dataset
22 Understanding the Origin and Features of the 20 Newsgroups Dataset
23 Loading the Newsgroup Data and Extracting Features
24 Building a Document Classification Pipeline
25 Creating a Performance Report of the Model on the Test Set
26 Finding Optimal Hyper-parameters Using Grid Search

Advanced Topic Modelling with TF-IDF, LSA, and SVMs
27 Building a Text Preprocessing Pipeline with NLTK
28 Creating Hashing Based Features from Natural Language
29 Classify Documents into 20 Topics with LSA
30 Document Classification with TF-IDF and SVMs