Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey

AWS has announced the next generation of AWS Resilience Hub, introducing generative AI-powered failure mode analysis, automated dependency discovery, and modular resilience policies to help SREs manage application availability at scale.

Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey

Today, we’re announcing the next generation of AWS Resilience Hub with a significantly expanded experience that brings together a new application model, dependency discovery assessment, generative AI-powered failure mode analysis, modular resilience policies, and organization-wide reporting.

Organizations running hundreds of applications share a common challenge: availability is a top concern, yet there is no consistent way to set resilience goals, measure progress, or prove compliance across a portfolio. Teams set different standards, use different tools, and struggle to exchange information about whether applications actually meet expectations.

The next generation of AWS Resilience Hub changes this by giving Site Reliability Engineers (SREs) and development teams a structured way to align on resilience policy expectations, help application teams achieve them, and demonstrate compliance through testing. With integration into AWS Organizations, teams can now evaluate resilience at scale, identify failure modes, discover hidden dependencies, and report on progress across the enterprise.

The next generation of Resilience Hub walks you through your resilience journey and to help you there are the following concepts built into it.

Resilience policy: You can define your resilience expectations through modular, composable requirements. Rather than choosing a single rigid policy type, you construct policies by selecting the requirements that matter to your application, such as service level objective (SLO), multi-AZ and multi-Region disaster recovery, and data recovery requirements.

Business-level understanding: You can use new application modeling through critical end-user paths that map directly to business outcomes. Systems represent a business application, user journeys describe critical business paths, and services are the deployable units comprising AWS resources, code, and observability. Resilience Hub automatically discovers and maps them into a topology showing how resources connect.

AI failure mode assessments: You can run generative AI-powered assessments that analyze your services against your defined resilience policies, AWS Well-Architected best practices, and the AWS Resilience Analysis Framework. These assessments identify potential failure modes and provide actionable recommendations.

Dependency discovery assessment: You can automatically discover AWS services, internal endpoints, and third-party endpoints that your services depend on. This dependency assessment uses DNS query log analysis to identify dependencies you may not know about—including unexpected cross-region calls or critical third-party dependencies.

The next generation of AWS Resilience Hub in action

To get started, you configure a resilience policy, set up your first system and service, run a failure mode assessment, review the results, and implement the findings.

Before you begin, you should set up the invoker IAM role, which grants Resilience Hub read-only access to your AWS resources, cross-account roles (if not using AWS Organizations), or service-linked roles (SLRs) with AWS Organizations. Resilience Hub also integrates with AWS Organizations to enable organization-wide resilience management from a single delegated administrator account. This eliminates the need to log in to individual accounts to assess resilience posture across your enterprise. To learn more, visit For prerequisite details in the AWS Resilience Hub User Guide.

To configure a resilience policy, choose Create policy in the Policies menu through the AWS Resilience Hub console. Enter a policy name, description, and choose resilience requirements. For example, you can create a reusable policy for multi-Region disaster recovery used in financial applications—including 99.95% availability SLO, 15-minutes RTO, 5-minutes RPO for multi-Region disaster recovery, and disaster recovery approach that aligns with your RTO and RPO requirements.

If you choose data recovery requirements, you can define the data recovery time objective for restoring from backups for each service associated with this policy.

To create your first system representing your business application, choose Create a system in the Systems menu. Optionally, you can enable AWS Organizations account access for this system.

Now you can create a service that represents a deployable unit, like one of your microservices, and associate it with your system, and tell Resilience Hub where to find your resources. Enter a service name, for example, stock-exchange-service, choose your resilience policy and invoker AWS IAM role name. You can choose service Regions, service resources such as your resource tags, AWS CloudFormation stack, Terraform state file location, or Amazon EKS cluster and namespace.

When you enable dependency discovery for this service, AWS examines your VPC query logs for the VPCs associated with the resources in your service. You can disable this feature anytime from the dependency discovery settings in the service details page.

Now, you can run your first assessment with the service creation complete and a policy applied. Choose Run failure mode assessment in your service page and wait for the assessment to complete.

During the assessment, Resilience Hub assumes your invoker role, reads resources from your configured input sources, identifies parent-child relationships, queries the application topology service to map connections between resources, and builds a topology showing data flow, containment, and permissions.

By choosing Service topology, you can see service resources grouped by service functions in the graph, table, or JSON format.

By choosing Failure mode guidance, you can add assertions used to guide the agents while performing the failure mode assessment. Assertions are either generated by the agent or added by users. You can update them to improve assessment accuracy.

Once the assessment is complete, you can review findings and recommendations in the Assessment tab of your service page. Each finding tells you what the failure mode is, why it matters for your architecture, how to fix it, and which policy requirement it relates to.

You can choose Mark as resolved to implement the recommendation or Mark as irrelevant if the finding doesn’t apply to your use case.

If you’re an existing Resilience Hub customer, Resilience Hub provides migration APIs to simplify the transition of your previous applications. These APIs convert your previous assessment policies to new resilience policies, map your previous applications to the new model, such as multiple related applications to one system with multiple services.

For more information about new features, visit the AWS Resilience Hub User Guide.

Now available

The next generation of AWS Resilience Hub is now generally available in AWS commercial Regions where Resilience Hub is available. For Regional availability and the future roadmap, visit the AWS Capabilities by Region.

Resilience Hub uses a new service-based pricing model. Pricing includes two failure mode assessments per month for services, and optionally automated dependency assessment. You can try AWS Resilience Hub free. For pricing details, visit the AWS Resilience Hub pricing page.

Give the new AWS Resilience Hub a try in the Resilience Hub console and send feedback to AWS re:Post for Resilience Hub or through your usual AWS Support contacts.

— Channy

Source: AWS News Blog

Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey

Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey

More in this category

AWS Weekly Roundup: One-click Lambda setup prompt, OpenAI GPT-5.6 models on Bedrock, and more (July 20, 2026)

Amazon SQS turns 20: Two decades of reliable messaging at scale

AWS Weekly Roundup: AWS Builder Center at 1 year, Network Scanning in Security Hub, Loom for AWS, and more (July 13, 2026)

AWS Weekly Roundup: Claude Sonnet 5 on AWS, Amazon WorkSpaces for AI agents, AWS service availability updates, and more (July 6, 2026)

Upgrade Amazon EKS clusters with confidence using Kubernetes version rollbacks

Accelerate your infrastructure deployments by up to 4x with AWS CloudFormation Express mode

Most read

Discover All Categories