Annex - Good practices for addressing vulnerabilities related to operational resilience

Back to Summaries PDF Original Document

Document Information

Title: Annex - Good practices for addressing vulnerabilities related to operational resilience

Type: Annex

URL: https://brdr.hkma.gov.hk/eng/doc-ldg/docId/20260401-2-EN

Email Received: 2026-04-02 19:34

Summary Created: 2026-04-02 14:01

English Summary (4996 chars)

Quick section switch

Management Summary

Purpose / Background: This document outlines good practices for Authorized Institutions (AIs) to strengthen operational resilience. It is designed to assist AIs in fortifying critical operations against vulnerabilities stemming from ICT failures, cyber threats, third-party dependencies, and business continuity disruptions.
One-line conclusion: AIs must shift toward "resilience-by-design" and "resilience-first" frameworks to ensure that critical operations remain functional within defined tolerance levels during severe disruptions.
Key Changes:
Implementation of redundant, auto-switching, and "break-the-glass" manual bypasses to eliminate single points of failure.
Transition to active-active data center architectures and modular, virtualized IT infrastructure.
Adoption of threat-based cybersecurity strategies, including Secure Tertiary Data Backup (STDB).
Integration of "resilience-by-design" into Third-Party Risk Management Frameworks (TPRMF), including exit strategy planning.
Refinement of Business Continuity Planning (BCP) to focus on "critical-operation-centric" recovery rather than just team/system recovery.
Introduction of time-based incident management indicators set significantly shorter than official tolerance for disruption thresholds.
Key Dates / Deadlines: Not explicitly stated as a mandatory regulation, but AIs are expected to incorporate these practices into their ongoing resilience and risk management programs.
Applicability / Impact scope: All Authorized Institutions (AIs) in Hong Kong, with a focus on those managing complex ICT, third-party, and critical banking operations.
Recommended management actions:
Conduct a gap analysis of existing ICT systems against "resilience-by-design" principles.
Implement granular capacity planning that accounts for end-to-end critical operation dependencies.
Enhance TPRMF by including resilience metrics in SLAs and establishing exit/interoperability plans.
Integrate STDB arrangements and participate in industry-wide cyber simulation exercises.
Redesign incident management frameworks to incorporate time-based recovery indicators.

Detailed Summary

1) Document Overview
The document serves as a guide for AIs to enhance operational resilience through proactive risk management. It categorizes good practices into four pillars: ICT Risk Management, Cyber Security, Third-Party Dependency Management, and BCP/Incident Management.

2) Main Requirements

ICT Resilience: Eliminate single points of failure via alternative pathways/auto-switching; implement active-active data center configurations; modernize systems through modular/virtualized architecture.
Cyber Security: Adopt multi-year, threat-based programs; utilize advanced tools (AI-powered threat detection); maintain STDB; and engage in collective industry intelligence sharing.
Third-Party Risk: Integrate resilience into the TPRMF; use clear SLAs with defined recovery objectives; perform joint BCP/DRP testing with vendors; and prepare exit strategies.
BCP/Incident Management: Establish critical-operation-centric recovery plans; improve dependency mapping; and set stringent, time-based incident response indicators.

3) Key Changes

Movement from "traditional DR" to "continuous availability" through active-active setups.
Evolution of capacity planning from static server monitoring to end-to-end, granular customer journey monitoring.
Shift from reactive vendor management to proactive "resilience-based" third-party governance.

4) Important Dates & Transition
No specific regulatory deadlines are provided; however, these practices are aligned with the HKMA’s ongoing supervisory expectations (e.g., SPM TM-C-1).

5) Impact and Risks

Operations: Requires significant investment in infrastructure redundancy and staff resource planning.
Compliance: Increased burden to document "resilience-first" assessments for HKMA review.
IT/Data: Shift towards modular architecture may increase initial complexity but reduces long-term legacy obsolescence risks.

6) Compliance Action Checklist

[ ] Review system architecture for single points of failure; implement auto-switching or manual bypasses.
[ ] Update TPRMF to include resilience-based exit strategies and interoperability requirements.
[ ] Set time-based KPIs for incident response that are tighter than regulatory tolerance thresholds.
[ ] Establish or uplift STDB solutions with frequent, end-to-end testing.
[ ] Conduct role-swap recovery exercises in production environments.

7) Appendices/Attachments Summary
The document references the HKMA’s "Supervisory Approach on Cyber Risk Management" (SPM TM-C-1) and the Cyber Resilience Assessment Framework (C-RAF) 2.0. These references serve as the foundational regulatory context for the good practices outlined in this Annex.

中文摘要 (1965 chars)

快速切換摘要區塊

管理層摘要

目的/背景：針對金融業日益依賴 ICT 系統及第三方服務，香港金管局（HKMA）發佈此指南，旨在協助認可機構（AIs）識別並修復營運韌性中的薄弱環節，確保在遭遇嚴重中斷時仍能維持關鍵業務運作。
一句話結論：機構需從「防禦設計」轉向「韌性優先」思維，透過消除單點故障、強化網絡安全、管理第三方風險及優化業務持續計劃（BCP），確保各項指標（如恢復時間目標 RTO）符合韌性容忍度。
關鍵變更（核心實務建議）：

ICT 韌性：實施自動切換、內置旁路（Break-the-glass）機制及「主動-主動」（Active-Active）架構，消除單點故障。
復原測試：測試從桌面演練升級為生產環境下的「角色互換」及突擊式測試。
監控升級：建立具備端到端可視性的監控看板，由被動異常監控轉為前瞻性預警。
網絡安全：實施威脅導向的長期韌性戰略，並落實「安全三級數據備份」（STDB）。
第三方風險：將韌性要求嵌入第三方合約（SLA）及退出機制中，並進行供應鏈聯合測試。
危機處理：建立以「關鍵操作」為中心的恢復計畫，而非僅針對單一系統或團隊。

重要日期 / 截止日：本文件為最佳實務指引，無特定合規截止日期，但建議機構立即對照進行自我評估與缺口分析。
適用對象 / 影響範圍：全港所有認可機構（AIs），特別是其 ICT 風險管理、網絡安全、第三方供應商管理及業務持續管理部門。
管理層建議行動：

針對核心銀行系統等關鍵操作進行單點故障盤點，並檢視自動化切換的可行性。
將韌性評估納入第三方供應商的續約決策與日常績效考核。
提升測試嚴格度，推動在生產環境中進行部分功能性的恢復測試。
建立跨部門協作的恢復優先級（Recovery Sequencing），處理多業務線間的資源爭奪。
確保監控系統能捕捉異常的客戶體驗指標（如回應延遲），而非僅看基礎架構數據。

詳細摘要

1) 文檔概述
本文件為 HKMA 關於營運韌性的最佳實務參考，涵蓋 ICT 風險、網絡安全、第三方依賴管理及 BCP 四大範疇，旨在指導機構應對數位化轉型帶來的營運中斷風險。

2) 主要要求

ICT 韌性：要求消除單點故障，建立自動化切換機制，並在系統架構上採用模組化設計以利故障隔離。
網絡安全：採取威脅導向策略，強化威脅偵測技術（AI 驅動），落實 STDB 並與供應鏈建立聯防機制。
第三方管理 (TPRMF)：將韌性考慮因素（容忍度指標）整合進合約，確保第三方供應商的恢復能力與機構自身要求一致。
BCP 與 incident 管理：重新評估恢復策略，設定嚴格的時間指標（如 RTO），確保在極端中斷下仍能維持業務。

3) 關鍵變更

從「災難恢復」到「營運韌性」：由單純的備援機制轉為確保「關鍵操作」連續性的全流程管理。
測試方法演進：由傳統 tabletop 演練演變為更逼真的生產環境測試（如角色互換）。
監控深度：從技術指標監控深入到業務端指標（客戶行為/體驗）監控。

4) 重要日期與過渡安排
無強制合規死線，屬持續優化之指引。建議將此內容融入年度風險評估週期中。

5) 對機構的影響與風險

合規： 需定期重新評估關鍵業務的「容忍度指標」。
運作： 涉及第三方合約的重新審議與技術架構的重整。
IT/數據：需投入資源實施 Active-Active 架構及 STDB，可能增加運營成本。

6) 合規動作清單 (Checklist)

[ ] 是否已盤點所有關鍵操作並繪製完整的技術依賴圖譜？
[ ] 關鍵操作是否具備自動切換機制或手動旁路 (Break-the-glass) 功能？
[ ] 關鍵第三方的合約中是否定義了與我方韌性容忍度一致的 RTO？
[ ] 是否已針對核心數據實施安全三級數據備份 (STDB)？
[ ] 恢復演練是否包含第三方供應商及複雜的跨部門協調測試？

7) 附件/附錄摘要
本文檔本身即為針對營運韌性脆弱點的實務指引，無獨立附錄。內容全數整合於上述章節，涵蓋了從技術層面的 ICT 設計到管理層面的第三方針對性管理策略。