Introduction / Part I: |
Basic Concepts and Problems / 1: |
The Timeout Problem / 1.1: |
System and Fault Models / 1.2: |
Preventive Maintenance / 1.3: |
Note on Terminology / 1.4: |
Outline / 1.5: |
Task Completion Time / 2: |
Bounded Downtime / 2.1: |
System Lifetime / 2.1.1: |
Cumulative Uptime / 2.1.2: |
Probability of Task Completion / 2.1.3: |
Bounded Accumulated Downtime / 2.2: |
system Lifetime / 2.2.1: |
Bounded Number of Failures / 2.2.2: |
Restart / 2.3.1: |
Applicability Analysis of Restart / 3: |
Applications of Restart / 3.1: |
Randomised Algorithms / 3.1.1: |
Optimal Restart Time for a Randomised Algorithm / 3.1.2: |
Failure Detectors / 3.1.3: |
Congestion Control in TCP / 3.1.4: |
Criteria for Successful Restarts / 3.2: |
When Does Restart Improve the Expected Completion Time? / 3.2.1: |
When Does Restart Improve the Probability of Meeting a Deadline? / 3.2.2: |
Conclusions / 3.3: |
Moments of Completion Time Under Restart / 4: |
The Information Captured by the Moments of a Distribution / 4.1: |
Models for Moments of Completion Time / 4.2: |
Unbounded Number of Restarts / 4.2.1: |
Finite Number of Restarts / 4.2.2: |
Optimal Restart Times for the Moments of Completion Time / 4.3: |
Expected Completion Time / 4.3.1: |
Optimal Restart Times for Higher Moments / 4.3.2: |
Case Study: Optimising Expected Completion Time in Web Services Reliable Messaging / 4.4: |
Metrics for the Fairness-Timeliness tradeoff / 4.4.1: |
Oracles for Restart / 4.4.2: |
Results / 4.4.3: |
HTTP Transport / 4.5: |
60 s Disruption / 4.5.1: |
Packet Loss / 4.5.2: |
Mail Transport / 4.5.3: |
Meeting Deadlines Through Restart / 5: |
A Model for the Probability of Meeting a Deadline Under Restart / 5.1: |
Algorithms for Optimal Restart Times / 5.2: |
An Engineering Rule to Approximate the Optimal Restart Time / 5.3: |
Towards Online Restart for Self-Management of Systems / 5.4: |
Estimating the Hazard Rate / 5.4.1: |
Experiments / 5.4.2: |
Software Rejuvenation / Part III: |
Practical Aspects of Preventive Maintenance and Software Rejuvenation / 6: |
Stochastic Models for Preventive Maintenance and Software Rejuvenation / 6.1: |
A Markovian Software Rejuvenation Model / 7.1: |
Aging in the Modelling of Software Rejuvenation / 7.2: |
Behaviour in State A under Policy I / 7.2.1: |
Behaviour in State A under Policy II / 7.2.2: |
A Petri Net Model / 7.3: |
A Non-Markovian Preventive Maintenance Model / 7.4: |
Stochastic Processes for Shock and Inspection-Based Modelling / 7.5: |
The Inspection Model with Alert Threshold Policy / 7.5.1: |
The Shock Model with a Risk Policy / 7.5.2: |
Inspection-Based Modelling using the Möbius Modelling Tool / 7.6: |
Comparative Summary of the Stochastic Models / 7.7: |
Further Reading / 7.8: |
Checkpointing / Part IV: |
Checkpointing Systems / 8: |
Checkpointing Single-Unit Systems / 8.1: |
Checkpointing in Distributed Systems / 8.2: |
Stochastic Models for Checkpointing / 9: |
Checkpointing at Program Level / 9.1: |
Equidistant Checkpointing / 9.1.1: |
Checkpointing Real-Time Tasks / 9.1.2: |
Random Checkpointing Intervals / 9.1.3: |
Algorithms for Optimum Checkpoint Selection / 9.1.4: |
Checkpointing at System Level / 9.2: |
Analytic Models for Checkpointing Transaction-Based Systems / 9.2.1: |
Checkpointing Policies for Transaction-Based Systems / 9.2.2: |
A Queueing Model for Checkpointing Transaction-Based Systems / 9.2.3: |
A Trade-Off Metric for Optimal Checkpoint Selection / 9.3: |
Summary / 9.4: |
Summary, Conclusion and Outlook / 10: |
Properties in Discrete Systems / A: |
Cumulative First Moment / A.1: |
The Gamma Function / A.2: |
Important Probability Distributions / B: |
Discrete Probability Distributions / B.1: |
The Binomial Distribution / B.1.1: |
The Multinomial Distribution / B.1.2: |
The Geometric Distribution / B.1.3: |
The Poisson Distribution / B.1.4: |
Continuous Probability Distributions / B.2: |
The Exponential Distribution / B.2.1: |
The Erlang Distribution and the Hypo-exponential Distribution / B.2.2: |
The Hyperexponential Distribution / B.2.3: |
The Mixed Hyper/Hypo-exponential Distribution / B.2.4: |
The Weibull Distribution / B.2.5: |
The Lognormal Distribution / B.2.6: |
Cumulative Hazard Rate / C: |
Epanechnikov Kernel / C.2: |
Bandwidth Estimation / C.3: |
The Laplace and the Laplace-Stieltjes Transform / D: |
References |
Index |
Glossary |
Introduction / Part I: |
Basic Concepts and Problems / 1: |
The Timeout Problem / 1.1: |
System and Fault Models / 1.2: |
Preventive Maintenance / 1.3: |
Note on Terminology / 1.4: |