Migration

Migrating payment infrastructure to post-quantum cryptography: a phased approach for network operators

October 13, 2025 · Robert Lindstrom, CEO & Co-Founder · 14 min read

Migration architecture diagram showing phased transition from classical RSA/ECC to hybrid and full post-quantum payment network

Payment infrastructure — by which I mean the HSM clusters, key management systems, network-level TLS sessions, and application-layer cryptographic controls that protect card authorization, PIN processing, and interbank settlement — is among the most complex environments in which to execute a cryptographic migration. The combination of regulatory requirements, multi-party key ceremonies, network certification dependencies, and zero-tolerance latency budgets means that a PQC migration cannot be executed as a single deployment event.

This is a practical guide for payment network operators and bank infrastructure architects planning a post-quantum migration. It covers the phased approach we have developed through work with early evaluation customers across card authorization, ACH, and SWIFT environments. The phases are not Cryptrig-specific — the structure reflects the constraints of payment infrastructure, regardless of which PQC hardware vendor you are evaluating.

Prerequisites: what you need before Phase 1

Before starting any migration phase, three foundational assessments must be complete:

Cryptographic inventory

Identify every system that performs or relies on asymmetric cryptographic operations in the payment stack. This includes HSMs, software cryptographic modules (PKCS#11 applications), TLS termination points, certificate authorities, and key distribution systems. Map each to: the algorithm in use (RSA-2048, RSA-4096, ECDH P-256, ECDSA P-256/P-384), the key type (session key, master key, PIN encryption key, MAC key, CA signing key), and the data classification life of what it protects.

This inventory is the prioritization input. Keys with longer classification lives and higher sensitivity scores migrate first. A master key encryption key (MKEK) protecting cardholder data with a 10-year retention requirement is higher priority than a TLS session key for an internal monitoring system.

Counterparty readiness assessment

Payment flows are multi-party. A Kyber-1024 key exchange between your acquiring HSM and your issuer partner requires both parties to support the same algorithm. SWIFT messaging authentication requires SWIFT Alliance to support the post-quantum primitive on their side. Card network zone key distribution requires the scheme's key injection infrastructure to support Kyber.

Map your top 20 counterparties by transaction volume and assess their current PQC roadmap. This is not a blocking step — hybrid mode handles interoperability — but it determines which migration phases can be completed end-to-end versus which remain hybrid-constrained by counterparty capability.

Regulatory alignment

Engage your primary regulatory examiner before beginning migration. For US banks, this is the OCC, Federal Reserve, or FDIC depending on charter. Document the migration plan and timeline. Examiners are increasingly familiar with PQC migration requirements — NIST SP 800-131A Rev 3 and OMB M-23-02 have established the federal expectation. Institutions with documented plans are in a better examination position than those without.

Phase 1: Classical HSM replacement with PQC-capable hardware (Months 1–18)

The first phase replaces classical HSMs in the payment stack with PQC-capable hardware running in classical-only mode. The goal of Phase 1 is operational familiarity with the new hardware, not PQC deployment. Running CQ1 in classical-only mode means it handles RSA, ECDSA, and AES operations via its PKCS#11 interface exactly as a Thales Luna or Entrust nShield would. No counterparty coordination is required. No application changes are needed if the PKCS#11 interface is used correctly.

Phase 1 activities:

Integration testing: Connect CQ1 to payment middleware via PKCS#11. Validate key generation, key wrap/unwrap, PIN block encryption/decryption, MAC generation, and signature operations against existing test suites. Identify any application code that makes assumptions about key sizes, buffer sizes, or timing that will need modification before PQC is enabled.
Performance baseline: Establish P50, P95, and P99 latency baselines for all critical operations at production transaction volumes. This baseline confirms that CQ1 in classical mode meets existing SLAs and establishes the comparison point for PQC mode.
Key ceremony preparation: For keys that will be migrated to CQ1 custody, plan the key ceremony — dual-control requirements, smartcard initialization, key loading procedures. FIPS 140-3 Level 3 key ceremonies are more procedurally intensive than Level 2. Plan auditor attendance and regulatory notification.
Runbook development: Document operational procedures for CQ1 management, including firmware update procedures, administrator authentication recovery, tamper event response, and hardware failure failover.

Phase 1 is complete when CQ1 is handling all production key operations in classical mode with documented performance meeting SLAs and operational runbooks approved by the security and operations teams.

Phase 2: Hybrid mode deployment for TLS and external sessions (Months 12–24)

With CQ1 established in production for classical operations, Phase 2 enables Kyber-1024 hybrid mode for TLS sessions with external counterparties — starting with those who have confirmed hybrid TLS support on their end.

The hybrid mode uses X25519+Kyber-1024 key agreement for TLS 1.3, as specified in IETF draft-ietf-tls-hybrid-design. A TLS session that negotiates hybrid key exchange is protected by both classical ECDH security and Kyber quantum resistance. Sessions with classical-only endpoints automatically fall back to X25519-only — no configuration change required on the classical endpoint's side.

Phase 2 activities:

JCA/JCE provider configuration: Update TLS configuration on application servers and payment gateways to load CQ1's JCA/JCE provider and advertise the hybrid key agreement extensions in TLS ClientHello. Test with both hybrid-capable and classical-only counterparties to verify backward compatibility.
Certificate chain evaluation: Assess whether certificate chain sizes with PQC-hybrid components exceed any fixed buffer sizes in your TLS stack. Some older TLS stacks have hardcoded limits on certificate message sizes that will reject valid hybrid certificates.
Monitoring for hybrid negotiation rates: Instrument TLS sessions to track the fraction of sessions that successfully negotiate hybrid versus classical-only. This data identifies which counterparties are hybrid-capable and tracks migration progress over time.
SWIFT hybrid assessment: Engage SWIFT's technical team regarding their timeline for supporting hybrid TLS on Alliance gateways. SWIFT has published PQC transition guidance but implementation timelines are subject to their release cycle.

HYBRID MODE IS NOT THE FINAL STATE

Hybrid mode protects against harvest-now-decrypt-later because an adversary recording hybrid TLS sessions cannot decrypt them with a classical computer (X25519) or a quantum computer (Kyber-1024). But it is a transition mechanism, not the destination. The operational overhead of managing hybrid certificate chains, monitoring dual-algorithm negotiation, and handling fallback cases should be planned for a 2–4 year window, not indefinitely.

Phase 3: PQC-native key ceremonies and zone key migration (Months 18–36)

Phase 3 is the operationally intensive phase: migrating key hierarchies from classical to post-quantum. This means generating new master keys using Kyber-1024 key encapsulation inside CQ1's FIPS 140-3 boundary, executing new key ceremonies with all the regulatory and procedural requirements that entails, and re-keying the zone keys distributed to acquirers, switches, and issuers across the payment network.

Zone key migration sequencing:

Zone key distribution in payment networks flows from the top of the hierarchy down:

Scheme-level key ceremony (card network generates new PQC-native ZMKs)
Acquirer HSM key injection (acquirer key management system receives new ZMKs, distributes ZPKs to HSM clusters)
Network switch key update (switch HSMs receive new zone working keys via Kyber-secured key exchange)
Issuer HSM key update (issuer authorization HSMs receive new zone keys, update card-level key derivation)
EMV chip key re-derivation (for EMV-based card programs, chip master keys must be re-derived from the new PQC-protected zone keys)

The EMV chip key re-derivation step is the longest lead-time item in Phase 3. Physical card replacement is typically not required — card-level key derivation can often be updated at the next card personalization cycle without card replacement, but this depends on card profile configuration and scheme requirements. Acquirers and issuers need to coordinate on transition timing to maintain authorization continuity.

ACH and interbank settlement key migration:

ACH batch processing uses symmetric keys for file encryption and MAC verification, but the key exchange sessions that establish those symmetric keys use classical asymmetric cryptography. Migrating ACH settlement key exchange to Kyber-1024 requires coordinated updates with your ACH processor and originating depository financial institution (ODFI) counterparties. The Nacha operating rules are silent on PQC requirements as of 2025, but the key exchange layer migration follows the same hybrid-then-pure-PQC pattern as TLS.

Phase 4: PKI and certificate infrastructure migration (Months 24–48)

The final phase is the longest lead-time item in the entire migration: retiring classical PKI and replacing with Dilithium-3 certificates throughout the internal and external certificate hierarchy.

Why does this take longest? Certificate chains have trust anchors. Internal CA root keys with 20-year lifetimes signed hundreds of intermediate certificates, which signed thousands of end-entity certificates. Replacing the root requires re-issuing every certificate in the chain. For a large bank with internal PKI serving TLS, code signing, VPN authentication, and regulatory reporting, this is a multi-year effort independent of the cryptographic algorithm change.

Dual-certificate strategy:

The practical approach during Phase 4 is dual-certificate issuance — PKI infrastructure issues both classical (ECDSA P-384) and post-quantum (Dilithium-3) certificates for each subject. Systems that support Dilithium-3 certificate validation use the PQC certificate; legacy systems unable to validate Dilithium-3 signatures use the classical certificate. This approach avoids forced migration of legacy relying parties but doubles certificate management overhead.

CQ1 supports generating both ECDSA P-384 and Dilithium-3 key pairs in the same hardware boundary, allowing CA operations to issue both certificate types from a single HSM. The key ceremony for the Dilithium-3 root uses Kyber-1024 key encapsulation for the ceremony key exchange — making the entire root key protection post-quantum from day one.

What success looks like: target state

A fully migrated payment infrastructure has the following characteristics:

All HSM key operations (PIN encryption, MAC generation, transaction signing) executed in PQC-native mode using CRYSTALS-Dilithium-3 for signing and CRYSTALS-Kyber-1024 for key encapsulation
All internode TLS sessions negotiating Kyber-1024 hybrid or pure PQC key exchange
Zone key hierarchies rooted in Kyber-1024 key encapsulation, distributed via quantum-resistant key distribution sessions
PKI certificate chains with Dilithium-3 intermediate and end-entity certificates (root certificate may remain classical during the extended root transition window)
FIPS 140-3 Level 3 certified hardware at all points in the key hierarchy where certification is required for regulatory compliance

This state is achievable by 2028–2029 for institutions that begin Phase 1 hardware integration in 2025. It is not achievable by 2028 for institutions that wait until 2027 to begin procurement evaluation.

The one thing migration planners consistently underestimate

Every payment infrastructure migration project I have been involved with — SWIFT Alliance upgrades, EMV chip rollouts, TLS 1.2 to 1.3 migrations — has underestimated the time required to coordinate counterparty readiness. Your internal timeline can be perfect. Your vendor can deliver hardware on schedule. Your test environment can clear in six weeks. And then you discover that your top-10 ACH partner by volume has a 14-month change management cycle for HSM interface updates, or that your card scheme certification process for new cryptographic mechanisms requires 18 months of pre-production testing.

Start counterparty conversations in Phase 1, while you are doing internal integration testing. The question "what is your PQC migration timeline?" is not a premature question — it is a risk identification question. Counterparties who don't have an answer in 2025 are a dependency risk for your 2028 target.

Cryptrig is not a migration consulting firm. We build hardware. But hardware-only thinking misses the most significant bottleneck in payment PQC migration, which is organizational and multi-party coordination, not cryptographic algorithm performance. The technical work of implementing Kyber and Dilithium in silicon is complete. The operational work of deploying them across a payment network is the constraint that determines whether you reach the target state before the harvest window closes.

FedNow and real-time gross settlement: a special case

The Federal Reserve's FedNow instant payment service, launched in 2023, represents a different migration dynamic than legacy payment rails. FedNow's endpoint set is smaller than card networks — a few hundred participating financial institutions rather than millions of merchants — and the service was designed with modern cryptographic flexibility in mind. The Fed has published guidance indicating that FedNow's cryptographic agility design supports future algorithm migration without a protocol version change.

For institutions participating in FedNow, the TLS sessions between their core banking systems and the Federal Reserve's FedNow service are the primary cryptographic surface to address. FedNow transactions themselves use AES-256 for payload encryption with RSA or ECDH key establishment at the session layer. Migrating FedNow TLS to Kyber-1024 hybrid mode requires coordination with the Federal Reserve's FedNow network team on timeline and client TLS stack requirements — but this is a bilateral conversation with a single counterparty rather than the multi-stakeholder coordination required for card schemes.

FedNow's real-time settlement model does impose strict latency requirements. A FedNow credit transfer must complete irrevocably within seconds. Any cryptographic operation that adds latency to the message flow — including the HSM key derivation that protects the message authentication — must stay within the FedNow settlement window. Hardware-accelerated Kyber-1024 at sub-millisecond P99 latency comfortably fits within FedNow timing constraints. Software PQC at 4–18ms P99 under burst load does not, and burst load during peak retail settlement windows is precisely when FedNow volume is highest.

Application-layer migration: what PKCS#11 callers must change

One of the underappreciated costs of PQC migration is application-layer code changes. The PKCS#11 interface that applications use to call HSM operations does not automatically abstract algorithm changes. Applications must be updated to use the new CKM (Cryptographic Mechanism) identifiers for ML-KEM and ML-DSA operations.

The most common application code issues we encounter in integration testing:

Fixed-size signature buffers: Code that allocates 64 bytes for an ECDSA-P256 signature and passes that buffer to C_Sign will fail when the mechanism is changed to CKM_ML_DSA and the signature is 3,293 bytes. The fix is to query the signature length with a first C_Sign call (passing NULL output buffer), allocate the returned size, then call again. This is correct PKCS#11 usage, but many legacy applications skip the query step because classical signature sizes were predictable.
Hardcoded key object attributes: Applications that set CKA_VALUE_LEN on key generation calls with RSA or ECDH-sized values will generate errors when the mechanism changes to ML-KEM. Key size is algorithm-determined in ML-KEM; the CKA_VALUE_LEN attribute is not meaningful for Kyber keys.
Session context assumptions: ML-KEM encapsulation is a two-party operation (encapsulate on the sender's side, decapsulate on the receiver's). PKCS#11 applications designed for ECDH key agreement — which is a single CKD call that derives a shared secret — require refactoring to the KEM pattern, where the encapsulation call produces an encapsulated ciphertext that must be transmitted to the decapsulating party before both can derive the session key.

None of these issues are blockers — they are straightforward code changes. But identifying them requires running actual application code against a PQC-capable HSM in a test environment. This is precisely why Phase 1 integration testing starts with classical-mode CQ1 (to establish baseline behavior) and Phase 2 enables hybrid mode for selected application paths (to surface the algorithm-specific code issues before they affect production flows).

For institutions using JCA/JCE in Java-based payment middleware, CQ1's JCA provider exposes ML-KEM and ML-DSA through standard JCE KeyAgreement and Signature SPIs. Java applications using standard KeyAgreement.getInstance("ECDH") calls can switch to KeyAgreement.getInstance("Kyber1024", "CQ1Provider") with a provider configuration change if — and this is the important condition — their key management code is not hardcoding the KEM's session semantics. Most well-structured JCE applications can make this transition with provider substitution and buffer-size updates alone.

Regulatory documentation: what examiners will ask for

Payment institutions subject to OCC, Federal Reserve, or FDIC examination should expect cryptographic migration to appear on examination schedules beginning in 2025–2026. Examiners are not yet requiring PQC deployment; they are requiring documented migration plans. What counts as a documented migration plan in the context of PQC?

Based on FFIEC's existing technology guidance framework and how examiners have approached previous cryptographic transitions (TLS 1.0 deprecation, SHA-1 end-of-life), the documentation elements that satisfy examination expectations are:

Risk assessment for HNDL exposure: Identification of data types and classifications with retention periods that overlap with CRQC timeline projections. Which assets are at risk under a 7-10 year harvest window.
Cryptographic inventory: A complete list of asymmetric cryptographic operations, the hardware or software performing them, and the algorithm in use. This is the same inventory that Phase 0 produces.
Migration timeline with milestones: Phase-by-phase plan with target dates and dependencies. Regulators understand that full deployment is a multi-year effort; they want to see a plan with milestone commitments, not just an intent to migrate "eventually."
Vendor qualification status: Evidence that hardware vendors being evaluated are engaged in FIPS 140-3 validation processes. This is where asking vendors for NVLAP lab engagement confirmation and Security Policy drafts (as described in the procurement section) becomes formally useful.
Testing results: Integration testing logs from Phase 1 evaluation environments. Evidence that the institution has actually run the hardware against their applications, not just reviewed specifications.

Institutions that completed Phase 1 integration testing in 2025 can walk into a 2026 examination with all five elements. Institutions that have not engaged hardware vendors or begun integration testing cannot credibly claim a migration plan — they have a migration aspiration.

Migration governance: who owns the program

A persistent pattern in failed cryptographic migration programs is the governance gap — the work spans multiple organizational boundaries (CISO, infrastructure, application development, vendor management, regulatory affairs) and no single owner has the authority to drive decisions across all of them. TLS 1.0 deprecation programs at large financial institutions ran 3–5 years at organizations where the scope was clear and the technology was mature. Post-quantum migration is more complex and less mature.

The migration program should be owned at the CISO level with a dedicated program manager, not a working group that meets quarterly. The program manager needs authority to mandate application code changes across business lines, engage vendor management for HSM procurement and FIPS certification tracking, and represent the program to regulatory examiners. Without executive sponsorship and a named owner, PQC migration joins the queue of well-intentioned technology programs that accumulate documentation and produce limited deployment.

The board-level framing that cuts through this governance gap: PQC migration is not an IT project. It is a risk management program addressing a quantifiable threat (HNDL exposure against classified financial data) with a calculable window of action (the period before data currently being generated ages into its classification window and a CRQC becomes operational). The risk is manageable and the migration is achievable — if it starts now and is governed as a business priority rather than a technology project.

The payment networks, clearing institutions, and banks that complete Phase 1 and Phase 2 before 2027 will have production deployment windows that close before the HNDL threat window opens. Those that begin in 2028 will be migrating into an environment where the threat is measurably closer and regulatory patience is measurably shorter. The technical path is well-defined. The constraint is organizational urgency.

Handling legacy HSM clusters during the transition

Large financial institutions do not operate single HSMs. A typical card authorization environment runs HSM clusters — often 4–8 units in redundant configurations across primary and disaster recovery sites. Migrating a cluster from classical to PQC-capable hardware requires planning that avoids authorization service interruption during the transition.

The recommended migration pattern for HSM clusters is rolling replacement: replace one HSM in the cluster with CQ1 (in classical-only mode), validate it handles authorization traffic identically to the replaced unit across a full business cycle (a week minimum, including a weekend peak), then replace the next unit. This rolling approach means the cluster is never in a state where more than one unit is being commissioned simultaneously, and fallback is straightforward — any individual CQ1 failure falls back to the remaining classical units without service interruption.

Key state consistency during rolling replacement requires attention. Symmetric key material shared across the cluster (zone working keys, PIN encryption keys) must be synchronized to CQ1 before it begins handling live traffic. The key injection procedure for CQ1 uses the same dual-control smartcard ceremony as classical HSMs — the difference is the key wrapping mechanism (classical RSA wrapping for existing key material, Kyber-1024 wrapping once all cluster members support it). During the hybrid period when some cluster members are classical and some are CQ1, key wrapping must use the classical mechanism to remain interoperable within the cluster.

This constraint lifts when all cluster members have been replaced with CQ1. At that point, intra-cluster key synchronization can switch to Kyber-1024 key wrapping, providing end-to-end PQC protection within the HSM cluster layer. The transition from classical-wrapped to Kyber-wrapped internal key synchronization is Phase 3's HSM cluster milestone.

Testing strategy across payment rail types

Each payment rail in the migration scope has a different test environment structure and different risk tolerance for in-flight testing. The testing strategy must be tailored per rail:

SWIFT Alliance: SWIFT provides a test environment (the SWIFT Alliance TestLab) that mirrors the production network's message processing. PQC-related changes to SWIFT gateway TLS configuration and message authentication mechanisms should be validated in the TestLab before any production changes. SWIFT's support for hybrid TLS as of Q3 2025 enables testing X25519+Kyber-1024 handshakes end-to-end in the test environment before enabling them on production gateways.

Card network authorization: Card schemes (Visa, Mastercard, and domestic schemes) each maintain certification environments where acquirer and issuer HSM configurations are tested before production deployment. Cryptographic changes to HSM configuration require recertification with the scheme. Plan for 3–6 month scheme certification cycles when scheduling Phase 2 enablement of hybrid mode on card authorization HSMs.

ACH processing: ACH batch files use symmetric encryption for file transport, with the key exchange happening at the network session layer. Migration of the session key exchange to Kyber-1024 hybrid can be tested against your ACH processor's test instance before production cutover. ACH's batch processing model (end-of-day settlement rather than real-time) means there is a natural test window every night — a batch processed successfully in test mode before the production batch run confirms readiness without affecting settlement.

FedNow: The Federal Reserve's FedNow pilot testing program for new participant configurations provides a structured path for testing TLS changes before going live on the production FedNow network. FedNow's real-time settlement model requires that test validation include latency testing under representative load — not just functional correctness — to confirm that PQC key exchange operations fit within FedNow's settlement timing requirements.

The common thread across all rail testing is: never enable PQC operations on production payment flows until the corresponding test environment has validated both correctness (the right cryptographic output) and performance (latency within SLA). The risk of a misconfigured cipher suite or buffer-size error in production payment flows is authorization failures that cannot be retroactively corrected. Test environment validation is the gate, not the preview.

Discuss your payment network migration path with our engineering team

Request Evaluation Unit