Mining Data from Text

Mining Data from Text

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 2h 21m | 392 MB

This course discusses text and document feature vectors that can be passed into machine learning models, topic modeling using Latent Semantic Analysis, Latent Dirichlet Allocation, Non-negative Matrix Factorization, and keyword extraction using RAKE.

A large part of the appeal of deep learning models is their ability to work with unstructured data types such as text, images, and video. However such models are only as good as the feature vectors that they operate on. In this course, Mining Data from Text, you will gain the ability to build highly optimized and efficient feature vectors from textual and document data. First, you will learn how to represent documents as numeric data using simple numeric identifiers for individual words as well as more elegant methods such as term frequency and inverse document frequency. Next, you will discover how to perform topic modeling using techniques such as latent semantic analysis, latent Dirichlet allocation, and non-negative matrix factorization. Finally, you will explore how to implement keyword extraction using a popular algorithm – RAKE. When you’re finished with this course, you will have the skills and knowledge to move on to build efficient and optimized feature vectors from a large document corpus and use those feature vectors in building powerful machine learning models.

Table of Contents

Course Overview
1 Course Overview

Modeling Text Using Natural Language Processing
2 Module Overview
3 Prerequisites and Course Outline
4 Mining Data from Text
5 Numeric Representations of Text – One Hot Encoding
6 Numeric Representations of Text – Frequency Based Encodings
7 Numeric Representations of Text – Prediction Based Embeddings
8 Feature Hashing
9 Bag of Words – Bag of N Grams
10 Install and Setup
11 Frequency Based Representation Using Bag of Words and Bag of N Grams Model
12 Representing Documents Using TFIDF Scores and Feature Hashes
13 Module Summary

Building Classification Models Using Text Data
14 Module Overview
15 Naive Bayes Classifier
16 Sentiment Analysis Using the Naive Bayes Classifier
17 scikit-learn Pipelines to Build Features
18 Multiclass Classification
19 Module Summary

Understanding Topic Modeling
20 Module Overview
21 Topic Modeling
22 Topic Modeling Algorithms
23 Module Summary

Implementing Topic Modeling
24 Module Overview
25 Latent Dirichlet Allocation – Topic Modeling with the Newspaper Headlines Dataset
26 Visualizing Topic Assignments Using Manifold Learning to Reduce Dimensions
27 Latent Dirichlet Allocation – Topic Modeling with the DBPedia Dataset
28 Visualizing Topics Using Manifold Learning to Reduce Dimensions
29 Interactive Topic Model Visualization Using PyLDAVis
30 Non-negative Matrix Factorization – Topic Modeling with the DBPedia Dataset
31 Interactive Topic Visualization Using Bokeh
32 Latent Semantic Indexing – Preprocessing Text
33 Concept Modeling Using LSI
34 Module Summary

Understanding and Implementing Keyword Extraction
35 Module Overview
36 Understanding RAKE for Keyword Extraction
37 Keyword Extraction Using RAKE
38 Summary and Further Study