Data Science on Google Cloud Platform: Designing Data Warehouses

Data Science on Google Cloud Platform: Designing Data Warehouses

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 1 Hour | 150 MB

Cloud computing brings unlimited scalability and elasticity to data science applications. Expertise in the major platforms, such as Google Cloud Platform (GCP), is essential to the IT professional. This course—one of a series by veteran cloud engineering specialist and data scientists Kumaran Ponnambalam—shows how to design and build data warehouses using GCP. Explore the different types of storage options available in GCP for files, relational data, documents, and big data, including Cloud SQL, Cloud Bigtable, and Cloud BigQuery. Then learn how to use one solution, BigQuery, to perform data storage and query operations, and review advanced use cases, such as working with partition tables and external data sources. Finally, learn best practices for table design, storage and query optimization, and monitoring of data warehouses in BigQuery.

Topics include:

  • Options for storing data in Google Cloud Platform
  • Creating data assets in BigQuery
  • Querying data in BigQuery
  • Advanced data warehouse techniques
  • Best practices for data warehouses in GCP
Table of Contents

Introduction
1 Why data warehouses are important
2 Data science modules covered

Storing Data in GCP
3 GCP storage options
4 Google Cloud Storage
5 Cloud SQL
6 Cloud Spanner
7 Cloud Bigtable
8 Cloud Datastore
9 Cloud BigQuery

BigQuery Data Creation
10 Intro to BigQuery
11 Projects and datasets
12 Tables
13 Create a dataset
14 Create a table with schema
15 Create a table from CSV
16 Load data from Cloud Storage

Querying Data in BigQuery
17 Simple queries
18 Filter data
19 SQL functions
20 Regular expressions
21 Grouping and aggregations
22 Joins and sub-queries
23 Update data

Advanced BigQuery
24 Partition tables
25 External data sources
26 Create views
27 Create labels
28 Google Cloud shell
29 Other interfaces

Best Practices in BigQuery
30 Table design considerations
31 Optimize storage
32 Load data
33 Speed up queries
34 Monitoring and logging

Conclusion
35 Next steps