Solving 10 Hadoop’able Problems

Solving 10 Hadoop’able Problems

English | MP4 | AVC 1920×1080 | AAC 44KHz 2ch | 3h 12m | 813 MB

Need solutions to your big data problems? Here are 10 real-world projects demonstrating problems solved using Hadoop.

The Apache Hadoop ecosystem is a popular and powerful tool to solve big data problems. With so many competing tools to process data, many users want to know which particular problems are well suited to Hadoop, and how to implement those solutions.

To know what types of problems are Hadoop-able it is good to start with a basic understanding of the core components of Hadoop. You will learn about the ecosystem designed to run on top of Hadoop as well as software that is deployed alongside it. These tools give us the building blocks to build data processing applications. This course covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.

By the end of this course, you will have been exposed to a wide variety of Hadoop software and examples of how it is used to solve common big data problems.

This course is filled in with hands-on exercises and implementation/execution techniques to help you solve 10 real-time, big data problems. First, you’ll learn the Hadoop Ecosystem in a nutshell, then set up a development environment and sandbox. Finally you’ll learn solutions to the problems you come across using big data techniques.

What You Will Learn

  • Explore the Hadoop big data Ecosystem in a nutshell
  • Process payment data from an event stream using the streaming API: Payment Analyzer
  • Detect BOT traffic using Spark Streaming, make log data queryable, and investigate customer data
  • Supply Chain analysis – find top-seller items in a streaming way, enhance top-seller items
  • Analyze Customer churn amounts quantitatively with DataFrame queries
  • Perform IoT sensor data analysis with device response to system failures and data streams
  • High-performance computation with neighborhood aggregations
  • Page ranking using Spark GraphX
  • Threat Analysis – Analyzing weblogs for suspicious activity and anomalies in network traffic
  • Extract information from unstructured text via Spark DataFrames
  • Perform sentiment analysis of posts using Logistic Regression, and find the author of a post
  • Find what product users want to buy using Cloudera Sandbox Toolkit
  • Use movie history to suggest content, and test and experiment with Recommendation Enginec
Table of Contents

01 The Course Overview
02 Hadoop Distributed File System (HDFS)
03 Distributed Compute Capability YARN
04 Apache Hive for ETL and SQL Like
05 Message Queuing and Data Ingestion Kafka
06 NoSQL Datastores – Hadoop HBase, Accumulo
07 Machine Learning – Spark and Spark MLlib
08 Stream Processing – Spark Streaming
09 Processing Payment Data from an Event Stream
10 Advanced Aggregations Using Streaming API – PaymentAnalyzer
11 Storing Time Series Data in HBase
12 Detecting BOT Traffic Using Spark Streaming
13 Make Web Log Data Queryable – Hive Sink
14 Investigating Customers Data in Hive
15 Trending Supply Chain – Finding Top Seller Item in a Streaming Way
16 Enriching Top Sellers with Additional Information
17 Analyzing Customer Churn (Quantitative) Using DataFrame Queries
18 Analyzing Customer Churn (Amounts) Using DataFrame Queries
19 Storing Low Granularity Structured Sensor Data in HBase
20 Consuming Sensor Data Stored in HBase – Scan and Count
21 Building Summaries on Data Streaming from Devices
22 Introducing Spark GraphX – How to Represent a Graph
23 Perform Graph Operations Using GraphX
24 Counting Degree of Vertices
25 Neighborhood Aggregations – Collecting Neighbors
26 Structural Operators – Connected Components
27 Page Rank Using Spark GraphX
28 Anomaly Detection
29 Analyzing Web Logs for Suspicious Activity and Loading into Spark
30 Implementing Clustering – Choosing Number of Clusters
31 Detecting Anomalies in Network Traffic
32 Analyzing Post for an Author
33 Extracting Information from Unstructured Text
34 Extracting Information Via Spark DataFrame
35 Sentiment Analysis of Posts Using Logistic Regression
36 Finding an Author of a Post
37 Downloading and Setting Cloudera Sandbox
38 Finding What Products Users Wants to Buy Using Cloudera Sandbox Toolkit
39 Using Movies History to Suggest Interesting Content
40 Testing and Experimenting with Recommendation Engine