- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
System reliability, availability and robustness are often not well understood by system architects, engineers and developers. They often don't understand what drives customer's availability expectations, how to frame verifiable availability/robustness requirements, how to manage and budget availability/robustness, how to methodically architect and design systems that meet robustness requirements, and so on. The book takes a very pragmatic approach of framing reliability and robustness as a functional aspect of a system so that architects, designers, developers and testers can address it as a…mehr
Andere Kunden interessierten sich auch für
- Eric BauerBeyond Redundancy121,99 €
- Andy J. A. WangComponent-Oriented Programming160,99 €
- Steven A. GabarroWeb Application Design and Implementation141,99 €
- Veli-Pekka ElorantaDesigning Distributed Control Systems71,99 €
- Scott MillettPatterns, Principles, and Practices of Domain-Driven Design46,99 €
- Jeff GarlandLarge-Scale Software Architecture101,99 €
- William R. StanekMicrosoft Powershell, VBScript and JScript Bible62,99 €
-
-
-
System reliability, availability and robustness are often not well understood by system architects, engineers and developers. They often don't understand what drives customer's availability expectations, how to frame verifiable availability/robustness requirements, how to manage and budget availability/robustness, how to methodically architect and design systems that meet robustness requirements, and so on. The book takes a very pragmatic approach of framing reliability and robustness as a functional aspect of a system so that architects, designers, developers and testers can address it as a concrete, functional attribute of a system, rather than an abstract, non-functional notion.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
Produktdetails
- Produktdetails
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 348
- Erscheinungstermin: 4. Oktober 2010
- Englisch
- Abmessung: 240mm x 161mm x 23mm
- Gewicht: 689g
- ISBN-13: 9780470604656
- ISBN-10: 0470604654
- Artikelnr.: 30882184
- Verlag: Wiley & Sons
- 1. Auflage
- Seitenzahl: 348
- Erscheinungstermin: 4. Oktober 2010
- Englisch
- Abmessung: 240mm x 161mm x 23mm
- Gewicht: 689g
- ISBN-13: 9780470604656
- ISBN-10: 0470604654
- Artikelnr.: 30882184
ERIC BAUER is Reliability Engineering Manager in the Wireline Division of Alcatel-Lucent. After two decades of software development experience, he joined the Lucent reliability team to lead a reliability group, and has since worked reliability engineering on a variety of wireless and wireline products and solutions. Mr. Bauer currently focuses on increasing the reliability of Alcatel-Lucent's IP Multimedia Subsystem (IMS) solution and the network elements that comprise the IMS solution. He has been awarded twelve U.S. patents, coauthored Practical System Reliability (Wiley), and has published several papers in the Bell Labs Technical Journal.
Figures. Tables. Preface. Acknowledgements. PART ONE RELIABILITY BASICS. 1
Reliability and Availability Concepts. 1.1 Reliability and Availability.
1.2 Faults, Errors and Failures. 1.3 Error Severity. 1.4 Failure Recovery.
1.5 Highly Available Systems. 1.6 Quantifying Availability. 1.7 Outage
Attributability. 1.8 Hardware Reliability. 1.9 Software Reliability. 1.10
Problems. 1.11 For Further Study. 2 System Basics. 2.1 Hardware and
Software. 2.2 External Entities. 2.3 System Management. 2.4 System Outages.
2.5 Service Quality. 2.6 Total Cost of Ownership. 2.7 Problems. 3 What Can
Go Wrong. 3.1 Failures in the Real World. 3.2 Eight-Ingredient Framework.
3.3 Mapping Ingredients to Error Categories. 3.4 Applying Error Categories.
3.5 Error Category: Field Replaceable Unit (FRU) Hardware. 3.6 Error
Category: Programming Errors. 3.7 Error Category: Data Error. 3.8 Error
Category: Redundancy. 3.9 Error Category: System Power. 3.10 Error
Category: Network. 3.11 Error Category: Application Protocol. 3.12 Error
Category: Procedures. 3.13 Summary. 3.14 Problems. 3.15 For Further Study.
PART TWO RELIABILITY CONCEPTS. 4 Failure Containment and Redundancy. 4.1
Units of Design. 4.2 Failure Recovery Groups. 4.3 Redundancy. 4.4 Summary.
4.5 Problems. 4.6 For Further Study. 5 Robust Design Principles. 5.1 Robust
Design Principles. 5.2 Robust Protocols. 5.3 Robust Concurrency Controls.
5.4 Overload Control. 5.5 Process, Resource and Throughput Monitoring. 5.6
Data Auditing. 5.7 Fault Correlation. 5.8 Failed Error Detection, Isolation
or Recovery. 5.9 Geographic Redundancy. 5.10 Security, Availability and
System Robustness. 5.11 Procedural Considerations. 5.12 Problems. 5.13 For
Further Study. 6 Error Detection. 6.1 Detecting Field Replaceable Unit
(FRU) Hardware Faults. 6.2 Detecting Programming and Data Faults. 6.3
Detecting Redundancy Failures. 6.4 Detecting Power Failures. 6.5 Detecting
Networking Failures. 6.6 Detecting Application Protocol Failures. 6.7
Detecting Procedural Failures. 6.8 Problems. For Further Study. 7 Analyzing
and Modeling Reliability and Robustness. 7.1 Reliability Block Diagrams.
7.2 Qualitative Model of Redundancy. 7.3 Failure Mode and Effects Analysis.
7.4 Availability Modeling. 7.5 Planned Downtime. 7.6 Problems. 7.7 For
Further Study. PART THREE DESIGN FOR RELIABILITY. 8 Reliability
Requirements. 8.1 Background. 8.2 Defining Service Outages. 8.3 Service
Availability Requirements. 8.4 Detailed Service Availability Requirements.
8.5 Service Reliability Requirements. 8.6 Triangulating Reliability
Requirements. 8.7 Problems. 9 Reliability Analysis. 9.1 Step 1: Enumerate
Recoverable Modules. 9.2 Step 2: Construct Reliability Block Diagrams. 9.3
Step 3: Characterize Impact of Recovery. 9.4 Step 4: Characterize Impact of
Procedures. 9.5 Step 5: Audit Adequacy of Automatic Failure Detection and
Recovery. 9.6 Step 6: Consider Failures of Robustness Mechanisms. 9.7 Step
7: Prioritizing Gaps. 9.8 Reliability of Sourced Modules and Components.
9.9 Problems. 10 Reliability Budgeting and Modeling. 10.1 Downtime
Categories. 10.2 Service Downtime Budget. 10.3 Availability Modeling. 10.4
Update Downtime Budget. 10.5 Robustness Latency Budgets. 10.6 Problems. 11
Robustness and Stability Testing. 11.1 Robustness Testing. 11.2 Context of
Robustness Testing. 11.3 Factoring Robustness Testing. 11.4 Robustness
Testing in the Development Process. 11.5 Robustness Testing Techniques.
11.6 Selecting Robustness Test Cases. 11.7 Analyzing Robustness Test
Results. 11.8 Stability Testing. 11.9 Release Criteria. 11.10 Problems. 12
Closing the Loop. 12.1 Analyzing Field Outage Events. 12.2 Reliability
Roadmapping. 12.3 Problems. 13 Design for Reliability Case Study. 13.1
System Context. 13.2 System Reliability Requirements. 13.3 Reliability
Analysis. 13.4 Downtime Budgeting. 13.5 Availability Modeling. 13.6
Reliability Roadmap. 13.7 Robustness Testing. 13.8 Stability Testing. 13.9
Reliability Review. 13.10 Reliability Report. 13.11 Release Criteria. 13.12
Field Data Analysis. 14 Conclusion. 14.1 Overview of Design for
Reliability. 14.2 Concluding Remarks. 14.3 Problems. 15 Appendix: Assessing
Design for Reliability Diligence. 15.1 Assessment Methodology. 15.2
Reliability Requirements. 15.3 Reliability Analysis. 15.4 Reliability
Modeling and Budgeting. 15.5 Robustness Testing. 15.6 Stability Testing.
15.7 Release Criteria. 15.8 Field Availability. 15.9 Reliability Roadmap.
15.10 Hardware Reliability. Abbreviations. References. Photo Credits. About
the Author. Index.
Reliability and Availability Concepts. 1.1 Reliability and Availability.
1.2 Faults, Errors and Failures. 1.3 Error Severity. 1.4 Failure Recovery.
1.5 Highly Available Systems. 1.6 Quantifying Availability. 1.7 Outage
Attributability. 1.8 Hardware Reliability. 1.9 Software Reliability. 1.10
Problems. 1.11 For Further Study. 2 System Basics. 2.1 Hardware and
Software. 2.2 External Entities. 2.3 System Management. 2.4 System Outages.
2.5 Service Quality. 2.6 Total Cost of Ownership. 2.7 Problems. 3 What Can
Go Wrong. 3.1 Failures in the Real World. 3.2 Eight-Ingredient Framework.
3.3 Mapping Ingredients to Error Categories. 3.4 Applying Error Categories.
3.5 Error Category: Field Replaceable Unit (FRU) Hardware. 3.6 Error
Category: Programming Errors. 3.7 Error Category: Data Error. 3.8 Error
Category: Redundancy. 3.9 Error Category: System Power. 3.10 Error
Category: Network. 3.11 Error Category: Application Protocol. 3.12 Error
Category: Procedures. 3.13 Summary. 3.14 Problems. 3.15 For Further Study.
PART TWO RELIABILITY CONCEPTS. 4 Failure Containment and Redundancy. 4.1
Units of Design. 4.2 Failure Recovery Groups. 4.3 Redundancy. 4.4 Summary.
4.5 Problems. 4.6 For Further Study. 5 Robust Design Principles. 5.1 Robust
Design Principles. 5.2 Robust Protocols. 5.3 Robust Concurrency Controls.
5.4 Overload Control. 5.5 Process, Resource and Throughput Monitoring. 5.6
Data Auditing. 5.7 Fault Correlation. 5.8 Failed Error Detection, Isolation
or Recovery. 5.9 Geographic Redundancy. 5.10 Security, Availability and
System Robustness. 5.11 Procedural Considerations. 5.12 Problems. 5.13 For
Further Study. 6 Error Detection. 6.1 Detecting Field Replaceable Unit
(FRU) Hardware Faults. 6.2 Detecting Programming and Data Faults. 6.3
Detecting Redundancy Failures. 6.4 Detecting Power Failures. 6.5 Detecting
Networking Failures. 6.6 Detecting Application Protocol Failures. 6.7
Detecting Procedural Failures. 6.8 Problems. For Further Study. 7 Analyzing
and Modeling Reliability and Robustness. 7.1 Reliability Block Diagrams.
7.2 Qualitative Model of Redundancy. 7.3 Failure Mode and Effects Analysis.
7.4 Availability Modeling. 7.5 Planned Downtime. 7.6 Problems. 7.7 For
Further Study. PART THREE DESIGN FOR RELIABILITY. 8 Reliability
Requirements. 8.1 Background. 8.2 Defining Service Outages. 8.3 Service
Availability Requirements. 8.4 Detailed Service Availability Requirements.
8.5 Service Reliability Requirements. 8.6 Triangulating Reliability
Requirements. 8.7 Problems. 9 Reliability Analysis. 9.1 Step 1: Enumerate
Recoverable Modules. 9.2 Step 2: Construct Reliability Block Diagrams. 9.3
Step 3: Characterize Impact of Recovery. 9.4 Step 4: Characterize Impact of
Procedures. 9.5 Step 5: Audit Adequacy of Automatic Failure Detection and
Recovery. 9.6 Step 6: Consider Failures of Robustness Mechanisms. 9.7 Step
7: Prioritizing Gaps. 9.8 Reliability of Sourced Modules and Components.
9.9 Problems. 10 Reliability Budgeting and Modeling. 10.1 Downtime
Categories. 10.2 Service Downtime Budget. 10.3 Availability Modeling. 10.4
Update Downtime Budget. 10.5 Robustness Latency Budgets. 10.6 Problems. 11
Robustness and Stability Testing. 11.1 Robustness Testing. 11.2 Context of
Robustness Testing. 11.3 Factoring Robustness Testing. 11.4 Robustness
Testing in the Development Process. 11.5 Robustness Testing Techniques.
11.6 Selecting Robustness Test Cases. 11.7 Analyzing Robustness Test
Results. 11.8 Stability Testing. 11.9 Release Criteria. 11.10 Problems. 12
Closing the Loop. 12.1 Analyzing Field Outage Events. 12.2 Reliability
Roadmapping. 12.3 Problems. 13 Design for Reliability Case Study. 13.1
System Context. 13.2 System Reliability Requirements. 13.3 Reliability
Analysis. 13.4 Downtime Budgeting. 13.5 Availability Modeling. 13.6
Reliability Roadmap. 13.7 Robustness Testing. 13.8 Stability Testing. 13.9
Reliability Review. 13.10 Reliability Report. 13.11 Release Criteria. 13.12
Field Data Analysis. 14 Conclusion. 14.1 Overview of Design for
Reliability. 14.2 Concluding Remarks. 14.3 Problems. 15 Appendix: Assessing
Design for Reliability Diligence. 15.1 Assessment Methodology. 15.2
Reliability Requirements. 15.3 Reliability Analysis. 15.4 Reliability
Modeling and Budgeting. 15.5 Robustness Testing. 15.6 Stability Testing.
15.7 Release Criteria. 15.8 Field Availability. 15.9 Reliability Roadmap.
15.10 Hardware Reliability. Abbreviations. References. Photo Credits. About
the Author. Index.
Figures. Tables. Preface. Acknowledgements. PART ONE RELIABILITY BASICS. 1
Reliability and Availability Concepts. 1.1 Reliability and Availability.
1.2 Faults, Errors and Failures. 1.3 Error Severity. 1.4 Failure Recovery.
1.5 Highly Available Systems. 1.6 Quantifying Availability. 1.7 Outage
Attributability. 1.8 Hardware Reliability. 1.9 Software Reliability. 1.10
Problems. 1.11 For Further Study. 2 System Basics. 2.1 Hardware and
Software. 2.2 External Entities. 2.3 System Management. 2.4 System Outages.
2.5 Service Quality. 2.6 Total Cost of Ownership. 2.7 Problems. 3 What Can
Go Wrong. 3.1 Failures in the Real World. 3.2 Eight-Ingredient Framework.
3.3 Mapping Ingredients to Error Categories. 3.4 Applying Error Categories.
3.5 Error Category: Field Replaceable Unit (FRU) Hardware. 3.6 Error
Category: Programming Errors. 3.7 Error Category: Data Error. 3.8 Error
Category: Redundancy. 3.9 Error Category: System Power. 3.10 Error
Category: Network. 3.11 Error Category: Application Protocol. 3.12 Error
Category: Procedures. 3.13 Summary. 3.14 Problems. 3.15 For Further Study.
PART TWO RELIABILITY CONCEPTS. 4 Failure Containment and Redundancy. 4.1
Units of Design. 4.2 Failure Recovery Groups. 4.3 Redundancy. 4.4 Summary.
4.5 Problems. 4.6 For Further Study. 5 Robust Design Principles. 5.1 Robust
Design Principles. 5.2 Robust Protocols. 5.3 Robust Concurrency Controls.
5.4 Overload Control. 5.5 Process, Resource and Throughput Monitoring. 5.6
Data Auditing. 5.7 Fault Correlation. 5.8 Failed Error Detection, Isolation
or Recovery. 5.9 Geographic Redundancy. 5.10 Security, Availability and
System Robustness. 5.11 Procedural Considerations. 5.12 Problems. 5.13 For
Further Study. 6 Error Detection. 6.1 Detecting Field Replaceable Unit
(FRU) Hardware Faults. 6.2 Detecting Programming and Data Faults. 6.3
Detecting Redundancy Failures. 6.4 Detecting Power Failures. 6.5 Detecting
Networking Failures. 6.6 Detecting Application Protocol Failures. 6.7
Detecting Procedural Failures. 6.8 Problems. For Further Study. 7 Analyzing
and Modeling Reliability and Robustness. 7.1 Reliability Block Diagrams.
7.2 Qualitative Model of Redundancy. 7.3 Failure Mode and Effects Analysis.
7.4 Availability Modeling. 7.5 Planned Downtime. 7.6 Problems. 7.7 For
Further Study. PART THREE DESIGN FOR RELIABILITY. 8 Reliability
Requirements. 8.1 Background. 8.2 Defining Service Outages. 8.3 Service
Availability Requirements. 8.4 Detailed Service Availability Requirements.
8.5 Service Reliability Requirements. 8.6 Triangulating Reliability
Requirements. 8.7 Problems. 9 Reliability Analysis. 9.1 Step 1: Enumerate
Recoverable Modules. 9.2 Step 2: Construct Reliability Block Diagrams. 9.3
Step 3: Characterize Impact of Recovery. 9.4 Step 4: Characterize Impact of
Procedures. 9.5 Step 5: Audit Adequacy of Automatic Failure Detection and
Recovery. 9.6 Step 6: Consider Failures of Robustness Mechanisms. 9.7 Step
7: Prioritizing Gaps. 9.8 Reliability of Sourced Modules and Components.
9.9 Problems. 10 Reliability Budgeting and Modeling. 10.1 Downtime
Categories. 10.2 Service Downtime Budget. 10.3 Availability Modeling. 10.4
Update Downtime Budget. 10.5 Robustness Latency Budgets. 10.6 Problems. 11
Robustness and Stability Testing. 11.1 Robustness Testing. 11.2 Context of
Robustness Testing. 11.3 Factoring Robustness Testing. 11.4 Robustness
Testing in the Development Process. 11.5 Robustness Testing Techniques.
11.6 Selecting Robustness Test Cases. 11.7 Analyzing Robustness Test
Results. 11.8 Stability Testing. 11.9 Release Criteria. 11.10 Problems. 12
Closing the Loop. 12.1 Analyzing Field Outage Events. 12.2 Reliability
Roadmapping. 12.3 Problems. 13 Design for Reliability Case Study. 13.1
System Context. 13.2 System Reliability Requirements. 13.3 Reliability
Analysis. 13.4 Downtime Budgeting. 13.5 Availability Modeling. 13.6
Reliability Roadmap. 13.7 Robustness Testing. 13.8 Stability Testing. 13.9
Reliability Review. 13.10 Reliability Report. 13.11 Release Criteria. 13.12
Field Data Analysis. 14 Conclusion. 14.1 Overview of Design for
Reliability. 14.2 Concluding Remarks. 14.3 Problems. 15 Appendix: Assessing
Design for Reliability Diligence. 15.1 Assessment Methodology. 15.2
Reliability Requirements. 15.3 Reliability Analysis. 15.4 Reliability
Modeling and Budgeting. 15.5 Robustness Testing. 15.6 Stability Testing.
15.7 Release Criteria. 15.8 Field Availability. 15.9 Reliability Roadmap.
15.10 Hardware Reliability. Abbreviations. References. Photo Credits. About
the Author. Index.
Reliability and Availability Concepts. 1.1 Reliability and Availability.
1.2 Faults, Errors and Failures. 1.3 Error Severity. 1.4 Failure Recovery.
1.5 Highly Available Systems. 1.6 Quantifying Availability. 1.7 Outage
Attributability. 1.8 Hardware Reliability. 1.9 Software Reliability. 1.10
Problems. 1.11 For Further Study. 2 System Basics. 2.1 Hardware and
Software. 2.2 External Entities. 2.3 System Management. 2.4 System Outages.
2.5 Service Quality. 2.6 Total Cost of Ownership. 2.7 Problems. 3 What Can
Go Wrong. 3.1 Failures in the Real World. 3.2 Eight-Ingredient Framework.
3.3 Mapping Ingredients to Error Categories. 3.4 Applying Error Categories.
3.5 Error Category: Field Replaceable Unit (FRU) Hardware. 3.6 Error
Category: Programming Errors. 3.7 Error Category: Data Error. 3.8 Error
Category: Redundancy. 3.9 Error Category: System Power. 3.10 Error
Category: Network. 3.11 Error Category: Application Protocol. 3.12 Error
Category: Procedures. 3.13 Summary. 3.14 Problems. 3.15 For Further Study.
PART TWO RELIABILITY CONCEPTS. 4 Failure Containment and Redundancy. 4.1
Units of Design. 4.2 Failure Recovery Groups. 4.3 Redundancy. 4.4 Summary.
4.5 Problems. 4.6 For Further Study. 5 Robust Design Principles. 5.1 Robust
Design Principles. 5.2 Robust Protocols. 5.3 Robust Concurrency Controls.
5.4 Overload Control. 5.5 Process, Resource and Throughput Monitoring. 5.6
Data Auditing. 5.7 Fault Correlation. 5.8 Failed Error Detection, Isolation
or Recovery. 5.9 Geographic Redundancy. 5.10 Security, Availability and
System Robustness. 5.11 Procedural Considerations. 5.12 Problems. 5.13 For
Further Study. 6 Error Detection. 6.1 Detecting Field Replaceable Unit
(FRU) Hardware Faults. 6.2 Detecting Programming and Data Faults. 6.3
Detecting Redundancy Failures. 6.4 Detecting Power Failures. 6.5 Detecting
Networking Failures. 6.6 Detecting Application Protocol Failures. 6.7
Detecting Procedural Failures. 6.8 Problems. For Further Study. 7 Analyzing
and Modeling Reliability and Robustness. 7.1 Reliability Block Diagrams.
7.2 Qualitative Model of Redundancy. 7.3 Failure Mode and Effects Analysis.
7.4 Availability Modeling. 7.5 Planned Downtime. 7.6 Problems. 7.7 For
Further Study. PART THREE DESIGN FOR RELIABILITY. 8 Reliability
Requirements. 8.1 Background. 8.2 Defining Service Outages. 8.3 Service
Availability Requirements. 8.4 Detailed Service Availability Requirements.
8.5 Service Reliability Requirements. 8.6 Triangulating Reliability
Requirements. 8.7 Problems. 9 Reliability Analysis. 9.1 Step 1: Enumerate
Recoverable Modules. 9.2 Step 2: Construct Reliability Block Diagrams. 9.3
Step 3: Characterize Impact of Recovery. 9.4 Step 4: Characterize Impact of
Procedures. 9.5 Step 5: Audit Adequacy of Automatic Failure Detection and
Recovery. 9.6 Step 6: Consider Failures of Robustness Mechanisms. 9.7 Step
7: Prioritizing Gaps. 9.8 Reliability of Sourced Modules and Components.
9.9 Problems. 10 Reliability Budgeting and Modeling. 10.1 Downtime
Categories. 10.2 Service Downtime Budget. 10.3 Availability Modeling. 10.4
Update Downtime Budget. 10.5 Robustness Latency Budgets. 10.6 Problems. 11
Robustness and Stability Testing. 11.1 Robustness Testing. 11.2 Context of
Robustness Testing. 11.3 Factoring Robustness Testing. 11.4 Robustness
Testing in the Development Process. 11.5 Robustness Testing Techniques.
11.6 Selecting Robustness Test Cases. 11.7 Analyzing Robustness Test
Results. 11.8 Stability Testing. 11.9 Release Criteria. 11.10 Problems. 12
Closing the Loop. 12.1 Analyzing Field Outage Events. 12.2 Reliability
Roadmapping. 12.3 Problems. 13 Design for Reliability Case Study. 13.1
System Context. 13.2 System Reliability Requirements. 13.3 Reliability
Analysis. 13.4 Downtime Budgeting. 13.5 Availability Modeling. 13.6
Reliability Roadmap. 13.7 Robustness Testing. 13.8 Stability Testing. 13.9
Reliability Review. 13.10 Reliability Report. 13.11 Release Criteria. 13.12
Field Data Analysis. 14 Conclusion. 14.1 Overview of Design for
Reliability. 14.2 Concluding Remarks. 14.3 Problems. 15 Appendix: Assessing
Design for Reliability Diligence. 15.1 Assessment Methodology. 15.2
Reliability Requirements. 15.3 Reliability Analysis. 15.4 Reliability
Modeling and Budgeting. 15.5 Robustness Testing. 15.6 Stability Testing.
15.7 Release Criteria. 15.8 Field Availability. 15.9 Reliability Roadmap.
15.10 Hardware Reliability. Abbreviations. References. Photo Credits. About
the Author. Index.