Web Scraping with Python: BeautifulSoup, Requests & Selenium

Web Scraping with Python: BeautifulSoup, Requests & Selenium

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 8 Hours | 1.23 GB

Web Scraping and Crawling with Python: Beautiful Soup, Requests & Selenium

Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique for extracting large amounts of data from websites and save the the extracted data to a local file or to a database.

In this course, you will learn how to perform web scraping using Python 3 and the Beautiful Soup, a free open-source library written in Python for parsing HTML.

We will use lxml, which is an extensive library for parsing XML and HTML documents very quickly; it can even handle messed up tags. We will also be using the Requests module instead of the already built-in urllib2 module due to improvements in speed and readability.

Finally, we will use Selenium alongside Beautiful Soup to crawl AJAX & JavaScript driven pages.

The course cover the following topics: accessing web pages programmatically; scraping web pages to extract the required data using Beautiful Soup to parse web pages; interacting with web pages to do different things with them programmatically; and using Selenium for web scraping and when we need it.

By the end of this course, you will be able to understand how websites and servers function, diverse data extraction techniques, and methods of handling and organizing data.

This Web Scraping course covers the following topics:

  • Review of data structures (Lists, Dictionaries, Tuples, File Handling)
  • How websites are hosted on servers
  • Calls to the server (GET, POST methods)
  • Review of HTML and CSS
  • Requests Module and BeautifulSoup Module overview
  • Parsing HTML using BeautifulSoup
  • Filtering elements using BeautifulSoup and navigating the Parse Tree
  • JavaScript and AJAX overview
  • Selenium and the need for it
  • Selecting elements using Selenium
  • CSS selectors
  • XPath selectors
  • Navigating pages using Selenium
  • Practical Projects

What you’ll learn

  • Python Refresher: Review of Data Structures, Conditionals, File Handling
  • How Websites are Hosted on Servers; Basic Calls to Server (GET, POST Methods)
  • Web Scraping with Python Beautiful Soup and Requests
  • Using Selenium to handle JavaScript and AJAX
  • Diverse Web Scraping Exercises
  • Source codes (*.py files) for all Exercises can be downloaded
  • Q&A board to send your questions and get them answered quickly
Table of Contents

Web Scraping Course Overview
1 Web Scraping Course Overview

Python Refresher Data Structures
2 Lists
3 Dictionaries
4 Tuples
5 List Comprehensions – Part 1
6 List Comprehensions – Part 2
7 Inline – if else and List Comprehensions
8 Installing xlrd and XlsxWriter to ReadWrite to Excel Files
9 Wrting to Excel Files
10 first-file.xlsx
10 Reading from Excel Files
11 Python Editor Other Software
12 Exercise 1 YOU Web Scraping Expert

How Servers Work
13 How Websites are Hosted
14 HTML Revision

BeautifulSoup Warm-up Exercise
15 BeautifulSoup Solved Exercise

Installing Required Python Packages
16 Installing Required Python Packages

Introduction to Requests Python Library
17 Requests Get Method
18 User Agent
19 Installing fake_useragent Package

Introduction to Beautiful Soup Python Library
20 Web Scraping with Beautiful Soup – Overview
21 Web Scraping with Beautiful Soup – Overview P.2
22 Accessing Tags
22 tags
23 Navigable Strings

Navigating with Beautiful Soup – Going Down
24 Navigating through Tag Names
25 Contents and Children Methods
26 Descendants Method

Navigating with Beautiful Soup – Going Up
27 Parent Method
28 Parents Method

Navigating with Beautiful Soup – Going Sideways
29 next_sibling
30 previous_sibling
31 next_siblings previous_siblings

Regular Expressions with Python
32 Metacharacters Overview
33 Compile Function and Character Class
34 Special Sequences
35 Repeating Things
36 Repeating Things
37 and mn Repeating Things
38 Metacharacters part2

Searching the Parse Tree Using Beautiful Soup
39 Introduction to Searching with BeautifulSoup
40 find_all Function
41 find_all More Parameters
42 find Function

Project 1 Scraping CustomerReports Website
43 Web Scraping CustomerReports – part 1
44 Web Scraping CustomerReports – part 2

Project 2 Web Scraping CodingBat Website with Beautiful Soup
45 Project 2 Description
46 Web Scraping CodingBat – part 1
47 Web Scraping CodingBat – part 2
48 Web Scraping CodingBat – part 3

Using Selenium to Handle AJAX JavaScript Driven Web Pages
49 JavaScript AJAX and Selenium intro
50 Installing Selenium
51 Installing ChromeDriver
52 Introduction to Selenium
53 Searching Elements and Inputting Data
54 Clicking Elements
55 XPath Introduction
56 XPath Examples

Project 3 Web Scraping Your Instagram Account
57 Project 3 Description
58 Logging in to Instagram
59 Settings Tab
60 Opening Target Profile (NEW)
61 Scrolling Down v.1 (NEW)
62 Scrolling Down v.2 (NEW)
63 Exception Handling (NEW)
64 Making Folders (NEW)
65 Downloading Images v.1 (NEW)
66 Downloading Images v.2 (NEW)
67 Downloading Captions (NEW)
68 Writing Captions to Excel File (NEW)
69 Instagram Final Code – Updated E8-05-10

Web Scraping Best Practices
70 Web Scraping Best Practices

Bonus Scrapy Powerful Web Scraping and Crawling Framework in Python
71 Coupon for Scrapy Powerful Web Scraping Crawling with Python Course