Data science on the Google cloud platform : implementing end-to-end real-time data pipelines : from ingest to machine learning

>> Google Books

所蔵情報QRコード

Data science on the Google cloud platform : implementing end-to-end real-time data pipelines : from ingest to machine learning / Valliappa Lakshmanan

資料種別:: 図書
出版情報:: Beijing : O'Reilly, 2017
形態:: xiv, 393 p. ; 24 cm
著者名:: Lakshmanan, Valliappa
ISBN:: 9781491974568 [1491974567]
書誌ID:: BB25362614

子書誌情報

フルテキスト

所蔵情報

他の版・巻

書誌詳細

注記:

主題:

Cloud computing; Computing platforms

言語:

英語

目次情報:

Preface

Making Better Decisions Based on Data / 1：

Many Similar Decisions

The Role of Data Engineers

The Cloud Makes Data Engineers Possible

The Cloud Turbocharges Data Science

Case Studies Get at the Stubborn Facts

A Probabilistic Decision

Data and Tools

Getting Started with the Code

Summary

Ingesting Data into the Cloud / 2：

Airline On-Time Performance Data

Knowability

Training-Serving Skew

Download Procedure

Dataset Attributes

Why Not Store the Data in Situ?

Scaling Up

Scaling Out

Data in Situ with Colossus and Jupiter

Ingesting Data

Reverse Engineering a Web Form

Dataset Download

Exploration and Cleanup

Uploading Data to Google Cloud Storage

Scheduling Monthly Downloads

Ingesting in Python

Flask Web App

Running on App Engine

Securing the URL

Scheduling a Cron Task

Code Break

Creating Compelling Dashboards / 3：

Explain Your Model with Dashboards

Why Build a Dashboard First?

Accuracy, Honesty, and Good Design

Loading Data into Google Cloud SQL

Create a Google Cloud SQL Instance

Interacting with Google Cloud Platform

Controlling Access to MySQL

Create Tables

Populating Tables

Building Our First Model

Contingency Table

Threshold Optimization

Machine Learning

Building a Dashboard

Getting Started with Data Studio

Creating Charts

Adding End-User Controls

Showing Proportions with a Pie Chart

Explaining a Contingency Table

Streaming Data: Publication and Ingest / 4：

Designing the Event Feed

Time Correction

Apache Beam/Cloud Dataflow

Parsing Airports Data

Adding Time Zone Information

Converting Times to UTC

Correcting Dates

Creating Events

Running the Pipeline in the Cloud

Publishing an Event Stream to Cloud Pub/Sub

Get Records to Publish

Paging Through Records

Building a Batch of Events

Publishing a Batch of Events

Real-Time Stream Processing

Streaming in Java Dataflow

Executing the Stream Processing

Analyzing Streaming Data in BigQuery

Real-Time Dashboard

Interactive Data Exploration / 5：

Exploratory Data Analysis

Loading Flights Data into BigQuery

Advantages of a Serverless Columnar Database

Staging on Cloud Storage

Access Control

Federated Queries

Ingesting CSV Files

Exploratory Data Analysis in Cloud Datalab

Jupyter Notebooks

Cloud Datalab

Installing Packages in Cloud Datalab

Jupyter Magic for Google Cloud Platform

Quality Control

Oddball Values

Outlier Removal: Big Data Is Different

Filtering Data on Occurrence Frequency

Arrival Delay Conditioned on Departure Delay

Applying Probabilistic Decision Threshold

Empirical Probability Distribution Function

The Answer Is…

Evaluating the Model

Random Shuffling

Splitting by Date

Training and Testing

Bayes Classifier on Cloud Dataproc / 6：

MapReduce and the Hadoop Ecosystem

How MapReduce Works

Apache Hadoop

Google Cloud Dataproc

Need for Higher-Level Tools

Jobs, Not Clusters

Initialization Actions

Quantization Using Spark SQL

Google Cloud Datalab on Cloud Dataproc

Independence Check Using BigQuery

Spark SQL in Google Cloud Datalab

Histogram Equalization

Dynamically Resizing Clusters

Bayes Classification Using Pig

Running a Pig Job on Cloud Dataproc

Limiting to Training Days

The Decision Criteria

Evaluating the Bayesian Model

Machine Learning: Logistic Regression on Spark / 7：

Logistic Regression

Spark ML Library

Getting Started with Spark Machine Learning

Spark Logistic Regression

Creating a Training Dataset

Dealing with Corner Cases

Creating Training Examples

Training

Predicting by Using a Model

Evaluating a Model

Feature Engineering

Experimental Framework

Creating the Held-Out Dataset

Feature Selection

Scaling and Clipping Features

Feature Transforms

Categorical Variables

Scalable, Repeatable, Real Time

Time-Windowed Aggregate Features / 8：

The Need for Time Averages

Dataflow in Java

Setting Up Development Environment

Filtering with Beam

Pipeline Options and Text I/O

Run on Cloud

Parsing into Objects

Computing Time Averages

Grouping and Combining

Parallel Do with Side Input

Debugging

BigQueryIO

Mutating the Flight Object

Sliding Window Computation in Batch Mode

Running in the Cloud

Monitoring, Troubleshooting, and Performance Tuning

Troubleshooting Pipeline

Side Input Limitations

Redesigning the Pipeline

Removing Duplicates

Machine Learning Classifier Using TensorFlow / 9：

Toward More Complex Models

Reading Data into TensorFlow

Setting Up an Experiment

Linear Classifier

Training and Evaluating Input Functions

Serving Input Function

Creating an Experiment

Performing a Training Run

Distributed Training in the Cloud

Improving the ML Model

Deep Neural Network Model

Embeddings

Wide-and-Deep Model

Hyperparameter Tuning

Deploying the Model

Predicting with the Model

Explaining the Model

Real-Time Machine Learning / 10：

Invoking Prediction Service

Java Classes for Request and Response

Post Request and Parse Response

Client of Prediction Service

Adding Predictions to Flight Information

Batch Input and Output

Data Processing Pipeline

Identifying Inefficiency

Batching Requests

Streaming Pipeline

Flattening PCollections

Executing Streaming Pipeline

Late and Out-of-Order Records

Watermarks and Triggers

Transactions, Throughput, and Latency

Possible Streaming Sinks

Cloud Bigtable

Designing Tables

Designing the Row Key

Streaming into Cloud Bigtable

Querying from Cloud Bigtable

Evaluating Model Performance

The Need for Continuous Training

Evaluation Pipeline

Evaluating Performance

Marginal Distributions

Checking Model Behavior

Identifying Behavioral Change

Book Summary

Considerations for Sensitive Data within Machine Learning Datasets / A：

Index

Preface

Making Better Decisions Based on Data / 1：

Many Similar Decisions

The Role of Data Engineers

The Cloud Makes Data Engineers Possible

The Cloud Turbocharges Data Science

続きを見る

東工大ブックレビュー

類似資料:

1 図書 Design and implementation of datacenter protocols for cloud computing Liu, Alex X., Munir, Ali World Scientific	7 電子ブック Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning Association for Computing Machinery-Digital Library. ACM Digital Library Proceedings, ACM
2 図書 Zen of cloud : learning cloud computing by examples on Microsoft Azure (: pbk.) Bai, Haishi CRC Press	8 電子ブック Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning Association for Computing Machinery-Digital Library. ACM Digital Library Proceedings, ACM
3 図書スケーラブルデータサイエンス : データエンジニアのための実践Google Cloud Platform Lakshmanan, Valliappa, 葛木, 美紀, 中井, 悦司(1971-), 長谷部, 光治(1980-) 翔泳社	9 電子ブック Machine Learning, Optimization, and Data Science Nicosia, Giuffrida, Giovanni, Pardalos, Panos, Sciacca, Vincenzo, Umeton, Renato SpringerLink Books - AutoHoldings, Springer International Publishing
4 図書 Cloud computing (: pbk) Ruparelia, Nayan MIT Press	10 電子ブック Machine Learning, Optimization, and Data Science Nicosia, Giuffrida, Giovanni, Pardalos, Panos, Sciacca, Vincenzo, Umeton, Renato SpringerLink Books - AutoHoldings, Springer International Publishing
5 電子ブック Cloud computing / ((ebook) :) Ruparelia, Nayan	11 電子ブック Machine Learning, Optimization, and Data Science Giuseppe Nicosia, Giuffrida, Giovanni, Jansen, Giorgio, La Malfa, Emanuele, Ojha, Varun, Pardalos, Panos, Sciacca, … SpringerLink Books - AutoHoldings, Springer International Publishing
6 電子ブック Autonomous Learning Systems: From Data Streams to Knowledge in Real-time Angelov,Plamen Wiley Online Library - AutoHoldings Books, John Wiley & Sons, Inc.	12 電子ブック Machine Learning, Optimization, and Data Science Giuseppe Nicosia, Giuffrida, Giovanni, Jansen, Giorgio, La Malfa, Emanuele, Ojha, Varun, Pardalos, Panos, Sciacca, … SpringerLink Books - AutoHoldings, Springer International Publishing