From 0 to 1: Hive for Processing Big Data

From 0 to 1: Hive for Processing Big Data

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 15h 16m | 4.53 GB

End-to-End Hive: HQL, Partitioning, Bucketing, UDFs, Windowing, Optimization, Map Joins, Indexes

Hive is like a new friend with an old face (SQL). This course is an end-to-end, practical guide to using Hive for Big Data processing. Let’s parse that A new friend with an old face: Hive helps you leverage the power of Distributed computing and Hadoop for Analytical processing. Its interface is like an old friend: the very SQL like HiveQL. This course will fill in all the gaps between SQL and what you need to use Hive. End-to-End: The course is an end-to-end guide for using Hive: whether you are analyst who wants to process data or an Engineer who needs to build custom functionality or optimize performance – everything you’ll need is right here. New to SQL? No need to look elsewhere. The course has a primer on all the basic SQL constructs, Practical: Everything is taught using real-life examples, working queries and code.

A 15- hour course which gives you a very detailed coverage of topics, excellent graphics used to explain concepts.

What You Will Learn

  • Write complex analytical queries on data in Hive and uncover insights
  • Leverage ideas of partitioning, bucketing to optimize queries in Hive
  • Customize hive with user-defined functions in Java and Python
  • Understand what goes on under the hood of Hive with HDFS and MapReduce
Table of Contents

01 You, Us & This Course
02 Hive – An Open-Source Data Warehouse
03 Hive and Hadoop
04 Hive vs Traditional Relational DBMS
05 HiveQL and SQL
06 Hadoop Install Modes
07 Hadoop Install Step 1 – Standalone Mode
08 Hadoop Install Step 2 – Pseudo-Distributed Mode
09 Hive install
10 Code-Along – Getting started
11 What is Hadoop
12 HDFS or the Hadoop Distributed File System
13 Primitive Datatypes
14 Collections_Arrays_Maps
15 Structs and Unions
16 Create Table
17 Insert Into Table
18 Insert into Table 2
19 Alter Table
20 HDFS
21 HDFS CLI – Interacting with HDFS
22 Code-Along – Create Table
23 Code-Along – Hive CLI
24 Three types of Hive functions
25 The Case-When statement, the Size function, the Cast function
26 The Explode function
27 Code-Along – Hive Built – in functions
28 Quirky Sub-Queries
29 More on subqueries – Exists and In
30 Inserting via subqueries
31 Code-Along – Use Subqueries to work with Collection Datatypes
32 Views
33 Indices
34 Partitioning Introduced
35 The Rationale for Partitioning
36 How Tables are partitioned
37 Using Partitioned Tables
38 Dynamic Partitioning – Inserting data into partitioned tables
39 Code-Along – Partitioning
40 Introducing Bucketing
41 The Advantages of Bucketing
42 How Tables are bucketed
43 Using Bucketed Tables
44 Sampling
45 Windowing Introduced
46 Windowing – A Simple Example – Cumulative Sum
47 Windowing – A More Involved Example – Partitioning
48 Windowing – Special Aggregation Functions
49 The basic philosophy underlying MapReduce
50 MapReduce – Visualized and Explained
51 MapReduce – Digging a little deeper at every step
52 MapReduce Overview – Basic Select-From-Where
53 MapReduce Overview – Group-By and Having
54 MapReduce Overview – Joins
55 Improving Join performance with tables of different sizes
56 The Where clause in Joins
57 The Left Semi Join
58 Map Side Joins – The Inner Join
59 Map Side Joins – The Left, Right and Full Outer Joins
60 Map Side Joins – The Bucketed Map Join and the Sorted Merge Join
61 Custom functions in Python
62 Code-Along – Custom Function in Python
63 Introducing UDFs – you’re not limited by what Hive offers
64 The Simple UDF – The standard function for primitive types
65 The Simple UDF – Java implementation for replacetext()
66 Generic UDFs, the Object Inspector and DeferredObjects
67 The Generic UDF – Java implementation for containsstring()
68 The UDAF – Custom aggregate functions can get pretty complex
69 The UDAF – Java implementation for max()
70 The UDAF – Java implementation for Standard Deviation
71 The Generic UDTF – Custom table generating functions
72 The Generic UDTF – Java implementation for namesplit()
73 Select Statements
74 Select Statements 2
75 Operator Functions
76 Aggregation Operators Introduced
77 The Group by Clause
78 More Group by Examples
79 Order by
80 Having
81 Introduction to SQL Joins
82 Cross Joins and Cartesian Joins
83 Inner Joins
84 Left Outer Joins
85 Right, Full Outer Joins, Natural Joins, Self Joins
86 [For Linux_Mac OS Shell Newbies] Path and other Environment Variables
87 Setting up a Virtual Linux Instance – For Windows Users