Web Crawling with Nodejs (H&M, Amazon, LinkedIn, AliExpress)

Web Crawling with Nodejs (H&M, Amazon, LinkedIn, AliExpress)

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 2.5 Hours | 1.31 GB

Learn how to create a web crawler using various methods on popular sites like H&M, Amazon, LinkedIn, AliExpress!

Do you want to build a webcrawler in Nodejs?

In this course you will learn how to build a webcrawler using the newest JavaScript syntax with popular sites like H&M, Amazon, LinkedIn and AliExpress!

You’ll learn how to find hidden API’s on sites like H&M and AliExpress and see how you can even avoid building a web crawler in the first place, you can save a lot of time this way!

Then I show how to build a web crawler for Amazon the test-driven way, by building out tests for the various product page layouts there is on Amazon.

After that we’ll take a look at how to automate login and scraping profiles from LinkedIn using Puppeteer, the automated Chromium browser!

What you’ll learn

  • Differences between web crawling and web scraping in Nodejs
  • The 3 main methods to use in web crawling, and when to use what method!
  • How to get data from sites like H&M and AliExpress easily and fast using their hidden API’s
  • How to build a web crawler for server rendered sites like Amazon to crawl all their products
  • How to build a Puppeteer based web crawler for a site that requires JavaScript like Linkedin
Table of Contents

Intro to web crawling and web scraping
1 What is the web crawling and web scraping and how is it different
2 Legality of web scraping and web crawling
3 Tools we will be using during development
4 Methods of web crawling and web scraping

Getting all products from H&M and saving it to mongodb (Method 1)
5 Finding hidden API using Chrome Dev Tools
6 Testing hidden API inside Postman, and finding other section API endpoints
7 Initializing NPM + some info about Nodejs Request and Needle
8 Creating our HTTP request with needle inside Nodejs
9 Adding User-Agent header to get past denial in nodejs
10 Creating MongoDB cluster for saving data
11 Connecting to MongoDB cluster from Nodejs
12 Saving data to MongoDB
13 Getting all products in MongoDB using a loop with offset variable and pagesize

AliExpress – getting lots of products and prices using Method 1 (hidden API)
14 Finding hidden API using Chrome Dev Tools
15 Making API request from Postman with correct headers
16 Making API request from Nodejs using Fetch API
17 Getting many items using a for loop and sleep function
18 Saving AliExpress products to MongoDB

Building a Amazon webcrawler in Nodejs (Method 2, HTTP Requests)
19 Intro to project
20 Making our second test and getting product links from page
21 Writing out our actual webcrawling in 6 minutes!
22 Setup so we only crawl only unique product ID’s
23 Adding a new test case for different layout + outtro
24 Why are we using HTTP requests and not Puppeteer
25 Initializing NPM + installing jest, cheerio and needle npm packages
26 Writing our reuseable httpRequest module for our testing and crawling
27 Creating our test HTML file (check resources for URL)
28 Setting up testing and intro to testing
29 Writing our first test for our HTML parser
30 Getting title from product page and making our test pass
31 Getting the price from product page

Puppeteer web crawling on LinkedIn
32 Intro to project
33 Initializing project with puppeteer and cheerio packages
34 Opening Puppeteer browser and navigating to URL
35 Login to Linkedin using Puppeteer
36 Getting profile links on a LinkedIn profile
37 Building web crawler loop for Puppeteer