- Purpose / Background: This document outlines good practices for Authorized Institutions (AIs) to strengthen operational resilience. It is designed to assist AIs in fortifying critical operations against vulnerabilities stemming from ICT failures, cyber threats, third-party dependencies, and business continuity disruptions.
- One-line conclusion: AIs must shift toward "resilience-by-design" and "resilience-first" frameworks to ensure that critical operations remain functional within defined tolerance levels during severe disruptions.
- Key Changes:
- Implementation of redundant, auto-switching, and "break-the-glass" manual bypasses to eliminate single points of failure.
- Transition to active-active data center architectures and modular, virtualized IT infrastructure.
- Adoption of threat-based cybersecurity strategies, including Secure Tertiary Data Backup (STDB).
- Integration of "resilience-by-design" into Third-Party Risk Management Frameworks (TPRMF), including exit strategy planning.
- Refinement of Business Continuity Planning (BCP) to focus on "critical-operation-centric" recovery rather than just team/system recovery.
- Introduction of time-based incident management indicators set significantly shorter than official tolerance for disruption thresholds.
- Key Dates / Deadlines: Not explicitly stated as a mandatory regulation, but AIs are expected to incorporate these practices into their ongoing resilience and risk management programs.
- Applicability / Impact scope: All Authorized Institutions (AIs) in Hong Kong, with a focus on those managing complex ICT, third-party, and critical banking operations.
- Recommended management actions:
- Conduct a gap analysis of existing ICT systems against "resilience-by-design" principles.
- Implement granular capacity planning that accounts for end-to-end critical operation dependencies.
- Enhance TPRMF by including resilience metrics in SLAs and establishing exit/interoperability plans.
- Integrate STDB arrangements and participate in industry-wide cyber simulation exercises.
- Redesign incident management frameworks to incorporate time-based recovery indicators.
1) Document Overview
The document serves as a guide for AIs to enhance operational resilience through proactive risk management. It categorizes good practices into four pillars: ICT Risk Management, Cyber Security, Third-Party Dependency Management, and BCP/Incident Management.
2) Main Requirements
- ICT Resilience: Eliminate single points of failure via alternative pathways/auto-switching; implement active-active data center configurations; modernize systems through modular/virtualized architecture.
- Cyber Security: Adopt multi-year, threat-based programs; utilize advanced tools (AI-powered threat detection); maintain STDB; and engage in collective industry intelligence sharing.
- Third-Party Risk: Integrate resilience into the TPRMF; use clear SLAs with defined recovery objectives; perform joint BCP/DRP testing with vendors; and prepare exit strategies.
- BCP/Incident Management: Establish critical-operation-centric recovery plans; improve dependency mapping; and set stringent, time-based incident response indicators.
3) Key Changes
- Movement from "traditional DR" to "continuous availability" through active-active setups.
- Evolution of capacity planning from static server monitoring to end-to-end, granular customer journey monitoring.
- Shift from reactive vendor management to proactive "resilience-based" third-party governance.
4) Important Dates & Transition
No specific regulatory deadlines are provided; however, these practices are aligned with the HKMA’s ongoing supervisory expectations (e.g., SPM TM-C-1).
5) Impact and Risks
- Operations: Requires significant investment in infrastructure redundancy and staff resource planning.
- Compliance: Increased burden to document "resilience-first" assessments for HKMA review.
- IT/Data: Shift towards modular architecture may increase initial complexity but reduces long-term legacy obsolescence risks.
6) Compliance Action Checklist
- [ ] Review system architecture for single points of failure; implement auto-switching or manual bypasses.
- [ ] Update TPRMF to include resilience-based exit strategies and interoperability requirements.
- [ ] Set time-based KPIs for incident response that are tighter than regulatory tolerance thresholds.
- [ ] Establish or uplift STDB solutions with frequent, end-to-end testing.
- [ ] Conduct role-swap recovery exercises in production environments.
7) Appendices/Attachments Summary
The document references the HKMA’s "Supervisory Approach on Cyber Risk Management" (SPM TM-C-1) and the Cyber Resilience Assessment Framework (C-RAF) 2.0. These references serve as the foundational regulatory context for the good practices outlined in this Annex.