close
1.

図書

図書
Valliappa Lakshmanan著 ; 葛木美紀訳
出版情報: [東京] : 翔泳社, 2019.6  xv, 390p ; 23cm
所蔵情報: loading…
目次情報: 続きを見る
データに基づくより良い意思決定
クラウドへのデータの取り込み
魅力的なダッシュボードを作成する
ストリーミング・データ処理
インタラクティブなデータ探索
Cloud : Dataprocによるベイズ分類器
Sparkによるロジスティック回帰分析
スライディングウィンドウによる集計処理
TensorFlowを用いた分類モデル
リアルタイム機械学習
付録A : 機械学習データセット内の機密データに関する考慮事項
データに基づくより良い意思決定
クラウドへのデータの取り込み
魅力的なダッシュボードを作成する
2.

図書

図書
Valliappa Lakshmanan
出版情報: Beijing : O'Reilly, 2017  xiv, 392 p. ; 24 cm
所蔵情報: loading…
目次情報: 続きを見る
Preface
Making Better Decisions Based on Data / 1:
Many Similar Decisions
The Role of Data Engineers
The Cloud Makes Data Engineers Possible
The Cloud Turbocharges Data Science
Case Studies Get at the Stubborn Facts
A Probabilistic Decision
Data and Tools
Getting Started with the Code
Summary
Ingesting Data into the Cloud / 2:
Airline On-Time Performance Data
Knowability
Training-Serving Skew
Download Procedure
Dataset Attributes
Why Not Store the Data in Situ?
Scaling Up
Scaling Out
Data in Situ with Colossus and Jupiter
Ingesting Data
Reverse Engineering a Web Form
Dataset Download
Exploration and Cleanup
Uploading Data to Google Cloud Storage
Scheduling Monthly Downloads
Ingesting in Python
Flask Web App
Running on App Engine
Securing the URL
Scheduling a Cron Task
Code Break
Creating Compelling Dashboards / 3:
Explain Your Model with Dashboards
Why Build a Dashboard First?
Accuracy, Honesty, and Good Design
Loading Data into Google Cloud SQL
Create a Google Cloud SQL Instance
Interacting with Google Cloud Platform
Controlling Access to MySQL
Create Tables
Populating Tables
Building Our First Model
Contingency Table
Threshold Optimization
Machine Learning
Building a Dashboard
Getting Started with Data Studio
Creating Charts
Adding End-User Controls
Showing Proportions with a Pie Chart
Explaining a Contingency Table
Streaming Data: Publication and Ingest / 4:
Designing the Event Feed
Time Correction
Apache Beam/Cloud Dataflow
Parsing Airports Data
Adding Time Zone Information
Converting Times to UTC
Correcting Dates
Creating Events
Running the Pipeline in the Cloud
Publishing an Event Stream to Cloud Pub/Sub
Get Records to Publish
Paging Through Records
Building a Batch of Events
Publishing a Batch of Events
Real-Time Stream Processing
Streaming in Java Dataflow
Executing the Stream Processing
Analyzing Streaming Data in BigQuery
Real-Time Dashboard
Interactive Data Exploration / 5:
Exploratory Data Analysis
Loading Flights Data into BigQuery
Advantages of a Serverless Columnar Database
Staging on Cloud Storage
Access Control
Federated Queries
Ingesting CSV Files
Exploratory Data Analysis in Cloud Datalab
Jupyter Notebooks
Cloud Datalab
Installing Packages in Cloud Datalab
Jupyter Magic for Google Cloud Platform
Quality Control
Oddball Values
Outlier Removal: Big Data Is Different
Filtering Data on Occurrence Frequency
Arrival Delay Conditioned on Departure Delay
Applying Probabilistic Decision Threshold
Empirical Probability Distribution Function
The Answer Is…
Evaluating the Model
Random Shuffling
Splitting by Date
Training and Testing
Bayes Classifier on Cloud Dataproc / 6:
MapReduce and the Hadoop Ecosystem
How MapReduce Works
Apache Hadoop
Google Cloud Dataproc
Need for Higher-Level Tools
Jobs, Not Clusters
Initialization Actions
Quantization Using Spark SQL
Google Cloud Datalab on Cloud Dataproc
Independence Check Using BigQuery
Spark SQL in Google Cloud Datalab
Histogram Equalization
Dynamically Resizing Clusters
Bayes Classification Using Pig
Running a Pig Job on Cloud Dataproc
Limiting to Training Days
The Decision Criteria
Evaluating the Bayesian Model
Machine Learning: Logistic Regression on Spark / 7:
Logistic Regression
Spark ML Library
Getting Started with Spark Machine Learning
Spark Logistic Regression
Creating a Training Dataset
Dealing with Corner Cases
Creating Training Examples
Training
Predicting by Using a Model
Evaluating a Model
Feature Engineering
Experimental Framework
Creating the Held-Out Dataset
Feature Selection
Scaling and Clipping Features
Feature Transforms
Categorical Variables
Scalable, Repeatable, Real Time
Time-Windowed Aggregate Features / 8:
The Need for Time Averages
Dataflow in Java
Setting Up Development Environment
Filtering with Beam
Pipeline Options and Text I/O
Run on Cloud
Parsing into Objects
Computing Time Averages
Grouping and Combining
Parallel Do with Side Input
Debugging
BigQueryIO
Mutating the Flight Object
Sliding Window Computation in Batch Mode
Running in the Cloud
Monitoring, Troubleshooting, and Performance Tuning
Troubleshooting Pipeline
Side Input Limitations
Redesigning the Pipeline
Removing Duplicates
Machine Learning Classifier Using TensorFlow / 9:
Toward More Complex Models
Reading Data into TensorFlow
Setting Up an Experiment
Linear Classifier
Training and Evaluating Input Functions
Serving Input Function
Creating an Experiment
Performing a Training Run
Distributed Training in the Cloud
Improving the ML Model
Deep Neural Network Model
Embeddings
Wide-and-Deep Model
Hyperparameter Tuning
Deploying the Model
Predicting with the Model
Explaining the Model
Real-Time Machine Learning / 10:
Invoking Prediction Service
Java Classes for Request and Response
Post Request and Parse Response
Client of Prediction Service
Adding Predictions to Flight Information
Batch Input and Output
Data Processing Pipeline
Identifying Inefficiency
Batching Requests
Streaming Pipeline
Flattening PCollections
Executing Streaming Pipeline
Late and Out-of-Order Records
Watermarks and Triggers
Transactions, Throughput, and Latency
Possible Streaming Sinks
Cloud Bigtable
Designing Tables
Designing the Row Key
Streaming into Cloud Bigtable
Querying from Cloud Bigtable
Evaluating Model Performance
The Need for Continuous Training
Evaluation Pipeline
Evaluating Performance
Marginal Distributions
Checking Model Behavior
Identifying Behavioral Change
Book Summary
Considerations for Sensitive Data within Machine Learning Datasets / A:
Index
Preface
Making Better Decisions Based on Data / 1:
Many Similar Decisions
文献の複写および貸借の依頼を行う
 文献複写・貸借依頼