tencent cloud

Tencent Smart Advisor-Chaotic Fault Generator

Product Introduction
Overview
Strengths
Scenarios
Purchase Guide
Purchase Instructions
Getting Started
Quick Start with the Console
Quick Start with API
Operation Guide
Template Library
Experiments
Fault Action
Guardrail Monitoring
Tag
Agent Management
Fault Action Library
Compute
Database
Network
Container
Big Data
Cloud Load Balancer
Message Queue
Direct Connect
Custom Actions
Cloud Streaming Services (CSS)
Permission Management Guide
Overview
Authorization Policy Syntax
Authorizable Resource Types
Service Authorization and Role Permissions
Sub-users and Authorization
API Documentation
History
Introduction
API Category
Making API Requests
Task APIs
Template Library APIs
Data Types
Error Codes
FAQs
Product Feature Issues
Action Execution Issues
Agent FAQ
Related Protocol
PRIVACY POLICY MODULE CHAOTIC FAULT GENERATOR
DATA PRIVACY AND SECURITY AGREEMENT MODULE CHAOTIC FAULT GENERATOR
Contact Us

Cross-AZ Experiment in CVM

PDF
Focus Mode
Font Size
Last updated: 2024-09-26 15:47:37

Background

To ensure the ability of your business to provide continuous service, CVM products allow cross-AZ deployment so that your applications can be protected from impact in the situation of regional or availability zone faults in some special scenes.
If you are not confident in your services or cloud products, and worry that the impact of an IDC fault on the production environment may result in inaccessibility to your business, you can execute fault simulation and experiment through Tencent Smart Advisor-Chaotic Fault Generator to allow timely avoidance of hidden dangers.

Experiment Objectives

Objective 1

Check whether the cross-AZ service architecture can provide normal services in the case of an availability zone instance down.

Objective 2

Check whether the service recovery time and recovery effects meet business requirements.

Experiment Implementation

Step 1: Preliminary Preparation

Prepare several CVM instances close to the production environment for tests in different availability zones in the same region, and provide identical services.
Prepare complete log recording tools.
Provide emergency measures against unexpected situations.
Count the visits to daily business, and write scripts for simulating user requests.

Step 2: Experimental Design

1. Log in to the Tencent Smart Advisor > Chaotic Fault Generator, go to the Experiment Management page, and click Create a New Experiment.
2. Click Skip and create a blank experiment, and fill in the experiment information.
3. Select the ready test instance object, and configure instance shutdown action for instances in the same availability zone to simulate instance down fault. After a fault action is added, the 'start up' recovery action will be automatically added. For the experiment, the shell script custom action is added for simulating start up and self-start to start original services in the instance and facilitate observing the recovery situation of the instance.
4. Cloud monitoring metrics or guardrail policy can be configured to observe the operating status of CVM instances.

Step 3: Experiment Implementation

1. Go to Experiment Details, and click Execute.
2. Execute fault injection, shut down the instance, and monitor data forwarding by the load balancing traffic.
3. After completing the fault injection experiment, execute fault recovery, and click Execute of the recovery action 'start up' to recover instance status. The platform will automatically execute and perform recovery verification.
Note
If booting from startup is not configured, manually trigger the shell script for service recovery.
4. An experiment is completed if all experiment actions are completed. You can click Record Drill Conclusion at the top right-hand corner to record experiment results. Register the experiment and record issues in the experiment to allow subsequent replay.

Experiment Result Analysis

Monitoring Metrics via Platform Tools

For a target fault instance, fault injection during execution time will result in the instance down, and CLB monitor will detect that the instance is inaccessible.
Now, traffic will be forwarded to a CVM instance in another availability zone, resulting in a sudden increase in traffic at the time point. When the fault instance is repaired, that is, after the start up and service restart are completed, CLB monitor will detect that the instance port is healthy and restored to steady status.
Objective Attainment:
In the case of an availability zone down, CLB will automatically forward traffic to another availability zone, making the entire zone available.
When the availability zone is recovered and service is restarted, the steady-status metrics before fault injection can be recovered, and requests can be received and processed normally.
Considering the two results, the overall performance of the cross-AZ fault experiment in CVM meets the expectations.

Theoretical Analysis

Qualitative Analysis: Compare the difference between system metrics and the steady-status metrics during fault injection.
Quantitative Analysis:
System Performance Metrics = Performance metrics in the experiment / Performance metrics at steady status
System Recovery Rate = Performance metrics after an experiment and recovery action / Performance metrics at steady status
Analysis of Causes of System Defects:
Analyze system weaknesses.
Analyze deficiencies in fault handling.
Analyze disturbance resistance of the system.
Analyze monitoring alarm effectiveness.
Analyze dependency relations between modules.

Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback