Standards

NIST PQC finalists explained: CRYSTALS-Kyber (ML-KEM) and Dilithium (ML-DSA) for security engineers

March 17, 2025 · Maya Osei, CTO & Co-Founder · 11 min read

Side-by-side comparison diagram of Kyber key encapsulation and Dilithium signature mechanisms

In August 2024, NIST finalized the first three post-quantum cryptographic standards: FIPS 203 (ML-KEM, formerly CRYSTALS-Kyber), FIPS 204 (ML-DSA, formerly CRYSTALS-Dilithium), and FIPS 205 (SLH-DSA, formerly SPHINCS+). For financial sector security engineers, these standards answer the question that has been open since 2016: which algorithms should we build hardware and infrastructure around?

This article focuses on FIPS 203 and FIPS 204 — the two algorithms most directly relevant to HSM-based financial infrastructure. Both are built on the Module Learning With Errors (MLWE) problem, which means they share a mathematical foundation and can be implemented efficiently in the same hardware fabric. Understanding why these algorithms were selected, how they work at the construction level, and what their operational parameters mean for financial deployment is the basis for making sound procurement and migration decisions.

The selection process and what finalization means

NIST's Post-Quantum Cryptography standardization process ran from 2016 to 2024. 69 algorithms entered Round 1. Kyber and Dilithium survived all rounds and were selected for standardization. The multi-round process included open global cryptanalysis — academic teams from dozens of institutions attempted to break each candidate. Neither Kyber nor Dilithium has been broken. The best known attacks against both algorithms are significantly less efficient than brute force at their recommended parameter levels.

Finalization means the algorithm specifications are frozen. The test vectors are published. The standard documentation (FIPS 203 at 124 pages; FIPS 204 at 90 pages) defines every arithmetic operation, encoding format, and parameter set. For hardware implementers, finalization is the signal to build production silicon — not pre-production or evaluation silicon, but hardware that will be certified and deployed.

The distinction between "finalist" and "finalized standard" matters in procurement. An HSM implementing a finalist algorithm in 2022 was betting on that algorithm surviving the final standardization round. An HSM implementing FIPS 203/204 in 2024 is implementing a frozen standard that will not change. Migration to finalized standards is the right posture for regulated financial infrastructure.

CRYSTALS-Kyber (FIPS 203 / ML-KEM): the key encapsulation mechanism

Kyber is a key encapsulation mechanism (KEM), not a key exchange protocol. The distinction matters. In a KEM, one party generates and encapsulates a shared secret; the other party decapsulates it. This is a different flow from ECDH, where both parties contribute ephemeral key material to derive a shared secret. KEMs are often simpler to integrate correctly because there is no protocol-level state machine around the key agreement — one message generates the shared secret, one message decapsulates it.

The mathematical foundation is Module-LWE. A matrix A is sampled from a public seed. Secret vectors s and e are sampled from a discrete Gaussian error distribution. The public key is a pair (A, b = As + e) over a polynomial ring Z_q[x]/(x^n + 1) where n=256 and q=3329 for Kyber. The core security assumption is that given A and b, recovering s is hard even for a quantum adversary.

The three parameter sets in FIPS 203:

ML-KEM-512 (Kyber-512): NIST Security Level 1. Classical equivalent: ~128-bit security. Public key: 800 bytes. Ciphertext: 768 bytes. Intended for applications where classical 128-bit AES equivalence is sufficient.
ML-KEM-768 (Kyber-768): NIST Security Level 3. Classical equivalent: ~192-bit security. Public key: 1,184 bytes. Ciphertext: 1,088 bytes. Balanced performance and security for general TLS applications.
ML-KEM-1024 (Kyber-1024): NIST Security Level 5. Classical equivalent: ~256-bit security. Public key: 1,568 bytes. Ciphertext: 1,568 bytes. Maximum security level — appropriate for long-lived financial key exchange and HSM-to-HSM communications.

For financial infrastructure, the choice between Kyber-768 and Kyber-1024 depends on the classification life of the protected data. Transaction keys with a 90-day rotation cycle can use Kyber-768. Long-lived key wrapping keys, CA root operations, and SWIFT correspondent banking sessions whose records are retained for 7+ years should use Kyber-1024 — the 256-bit classical equivalent provides the necessary security margin against cryptanalytic improvements over the retention window.

The NTT operation: why hardware matters for Kyber

The core computational operation in Kyber is polynomial multiplication in the ring Z_q[x]/(x^n + 1). Efficient polynomial multiplication uses the Number Theoretic Transform (NTT), the integer-domain analogue of the Fast Fourier Transform. NTT converts polynomials to their NTT representations, performs pointwise multiplication in NTT domain, and converts back via inverse NTT (INTT).

A single Kyber-1024 key encapsulation requires approximately:

Matrix-vector multiplication: 16 NTT operations (for the k=4 parameter set)
Compression and encoding operations on the ciphertext components
Hashing operations using SHA3-256, SHA3-512, and SHAKE-256

On a general-purpose CPU, a Kyber-1024 encapsulation takes approximately 55–80 microseconds on a modern x86 core using AVX2-optimized code (NIST reference implementation benchmarks). At 10,000 transactions per second — not unusual for a busy payment authorization environment — this imposes 550–800ms of CPU time per second just for key encapsulation, before any other cryptographic operations. On a shared server running authorization middleware, this creates measurable latency spikes under burst load.

An FPGA implementation of NTT can execute the butterfly operations in parallel across the 256-element polynomial, reducing a full NTT to a fixed number of pipeline stages rather than sequential iterations. CQ1's FPGA pipeline executes Kyber-1024 encapsulation in under 55 microseconds at maximum clock rate, with deterministic timing — no branch-prediction or cache-miss variance. At 18,000 ops/sec sustained throughput, the NTT pipeline is the dominant path and it is not shared with any other workload.

CRYSTALS-Dilithium (FIPS 204 / ML-DSA): the digital signature scheme

Dilithium is a digital signature scheme based on Module-LWE and Module-SIS (Module Short Integer Solution). It was standardized simultaneously with Kyber and shares similar algebraic structure. The signature scheme uses the "Fiat-Shamir with aborts" construction — the signer uses a rejection-sampling technique to ensure signature vectors do not leak information about the secret key.

The three parameter sets in FIPS 204:

ML-DSA-44 (Dilithium-2): NIST Security Level 2. Public key: 1,312 bytes. Signature: 2,420 bytes.
ML-DSA-65 (Dilithium-3): NIST Security Level 3. Public key: 1,952 bytes. Signature: 3,293 bytes. Appropriate for financial transaction signing and certificate issuance.
ML-DSA-87 (Dilithium-5): NIST Security Level 5. Public key: 2,592 bytes. Signature: 4,595 bytes.

For financial institutions, Dilithium-3 is the appropriate baseline for payment authorization signing, SWIFT message authentication, and PKI certificate issuance. The rejection-sampling in signing means signing is somewhat slower than verification — CQ1 benchmarks Dilithium-3 signing at greater than 12,000 ops/sec and verification at greater than 22,000 ops/sec, which reflects the asymmetry in the algorithm.

Key size implications for financial infrastructure

The most operationally significant difference between PQC and classical algorithms is key and signature sizes. A Kyber-1024 public key (1,568 bytes) is 24x larger than a P-256 ECDH key (65 bytes). A Dilithium-3 signature (3,293 bytes) is 51x larger than an ECDSA P-256 signature (64 bytes).

For financial applications, this matters in several places:

TLS handshake size: Hybrid X25519+Kyber-1024 adds approximately 1.5KB to the ClientHello and ServerHello messages. At high session establishment rates (new TLS connections for API calls), this increases network overhead but does not meaningfully affect throughput on gigabit links.
PKCS#11 buffer sizes: Applications that assume fixed-size key blobs or signature buffers need testing. A common failure mode in early PQC HSM integrations is buffer overflows in application code that allocated 64-byte signature buffers for ECDSA and wasn't modified before connecting to a Dilithium-capable HSM.
Certificate chain depth: PKI chains that include Dilithium-3 certificates become larger. TLS certificate chain delivery, OCSP response sizes, and CRL distribution point payloads all increase. Capacity planning for certificate infrastructure should account for 10–15x increases in certificate payload sizes.
ISO 8583 message extensions: Payment messages that carry cryptographic material (PIN blocks, MACs, certificate data) will require field length extensions. This is a standards-level question that payment networks and schemes are actively addressing.

Hybrid modes: the practical migration path

IETF draft-ietf-tls-hybrid-design specifies a hybrid key exchange combining X25519 (classical ECDH) and Kyber-768 or Kyber-1024 for TLS 1.3. The combined key agreement produces a shared secret that requires both algorithms to be broken — providing classical security guarantees today and quantum resistance once CRQC capability arrives.

The hybrid approach is appropriate for the transition period for two reasons:

It maintains interoperability with classical-only endpoints. A TLS endpoint that does not support Kyber can still complete a classical ECDH handshake with the hybrid-capable endpoint.
It provides defense-in-depth against unforeseen cryptanalytic weaknesses in either algorithm. If Kyber is broken by a classical attack (unlikely, but the security community is appropriately cautious about newly deployed algorithms), the X25519 component still provides classical security.

CQ1 supports X25519+Kyber-1024 hybrid mode natively via its PKCS#11 CKM_HYBRID_KEM mechanism extension and via the JCA/JCE provider's KeyAgreement SPI. Applications that use standard JCE KeyAgreement calls can switch to hybrid mode via provider configuration without code changes.

What finalized standards mean for procurement timelines

Financial institutions running multi-year technology procurement cycles need to understand that August 2024 finalization is the starting gun for FIPS 140-3 CMVP testing of PQC-native HSMs, not the arrival of certified products. NIST's CMVP process for hardware modules at Level 3 takes 12–24 months from laboratory submission. Hardware vendors who began designing around the finalist algorithms in 2022 are entering the CMVP queue now. The first certificates for PQC-native HSMs at FIPS 140-3 Level 3 are expected 2026–2027.

Cryptrig is not a software library. We build physical hardware that executes these algorithms in FPGA fabric inside a tamper-evident boundary. The specifications in FIPS 203 and FIPS 204 are what CQ1 implements — not a draft, not a finalist, the final published standard with published test vectors that our implementation must pass. For institutions planning procurement, the timeline question is not "when will the standard be ready" — it was ready in August 2024. The question is "when will validated hardware be available and how do we manage the gap between now and certification."

Performance characteristics that financial teams need to plan around

Both ML-KEM and ML-DSA have well-characterized performance profiles that differ enough from classical algorithms to require deliberate capacity planning. The performance picture is more nuanced than most vendor marketing suggests.

For ML-KEM-768 (Kyber-768), published liboqs benchmarks on a modern x86-64 core show encapsulation at approximately 40–60 microseconds and decapsulation at a similar range — faster than RSA-2048 operations, which run 150–400 microseconds per operation on the same hardware. At first glance, this suggests PQC is a straightforward upgrade in software. The nuance is throughput under concurrency. RSA-2048 benefits from decades of hardware and software optimization; the AVX2 and AVX-512 acceleration paths for Kyber NTT are newer, and the memory access patterns of NTT butterfly operations create cache pressure that is not present in RSA's modular exponentiation. At sustained 5,000 concurrent operations per second on a shared authorization server, Kyber's P99 latency diverges from its P50 latency more sharply than RSA does — not because Kyber is slower, but because NTT operations interact with CPU cache state in ways that authorization middleware also stresses.

For ML-DSA-65 (Dilithium-3), the performance split between signing and verification is operationally important. Signing involves rejection sampling — the algorithm may need to attempt the signing operation multiple times before producing a valid signature that doesn't leak information about the secret key. The average number of rejections is well-characterized (approximately 4.25 for Dilithium-3), but it introduces latency variance that deterministic algorithms like ECDSA P-256 do not have. Verification, by contrast, is deterministic and fast. For payment systems where HSMs sign transaction records that are verified by many relying parties, this asymmetry is favorable — the HSM's signing path is the one that needs hardware acceleration, and verification can run in software on relying party systems.

The key size implications require explicit application testing. Signature buffer sizes hardcoded for ECDSA-P256 (64 bytes) will overflow when a Dilithium-3 signature (3,293 bytes) is returned. This is not a hypothetical — integration teams doing first-run PKCS#11 tests against PQC-capable HSMs frequently encounter buffer overflow errors in application code that was never written to accommodate variable-length signatures. The fix is straightforward but requires identifying all such hardcoded values across the application stack, which requires a complete audit of authorization middleware, logging systems, and certificate processing code.

What FIPS 205 (SLH-DSA / SPHINCS+) means for financial infrastructure

The third algorithm finalized in August 2024 was FIPS 205 (SLH-DSA, formerly SPHINCS+). Unlike ML-KEM and ML-DSA, SLH-DSA does not rely on lattice problems — it is a hash-based signature scheme. Its security rests entirely on the properties of its underlying hash function (SHA-256 or SHAKE variants), which are well-understood and quantum-resistant to the extent that Grover's algorithm only halves the effective security level.

We are not saying SLH-DSA is irrelevant to financial infrastructure — it has a role in specific contexts. For very long-lived signing keys (CA root keys with 20-year lifetimes) where mathematical assumptions about lattice hardness carry uncertainty over decade-scale time horizons, a hash-based scheme whose security depends only on hash collision resistance is a compelling choice. Several national security agencies and standards bodies have flagged hash-based signatures as the most conservative choice for extremely high-value, long-lived keys precisely because the security assumptions are simpler.

The tradeoff is signature size. SLH-DSA-SHA2-128s produces 7,856-byte signatures. SLH-DSA-SHA2-256s produces 29,792-byte signatures. For systems that sign large numbers of transactions and transmit signatures in messages, this overhead is operationally prohibitive. SWIFT MT-series messages have strict length constraints. ISO 8583 card authorization messages cannot accommodate 30KB signature appends. SLH-DSA is appropriate for the root CA layer — where signatures are infrequent and long-lived — but ML-DSA (Dilithium) is the right choice for transaction-layer signing where throughput and message size matter.

CQ1 implements ML-KEM and ML-DSA. SLH-DSA support is on the roadmap for CA root key operations but is not part of the current hardware specification. For institutions designing their PQC architecture, the practical recommendation is: ML-KEM-1024 for all key exchange, ML-DSA-65 for transaction and certificate signing, SLH-DSA consideration only for root CA keys where the lattice assumption carries a very long time horizon risk premium.

Algorithm agility: what it means and what it requires

A principle that surfaces frequently in post-quantum migration guidance is "crypto agility" — the ability to swap cryptographic algorithms without major infrastructure changes. The NSA CNSA 2.0 guidance, NIST SP 800-131A Rev 3, and FFIEC examination guidance all emphasize crypto agility as a property that institutions should be building toward.

In practice, crypto agility means three things for payment infrastructure:

Algorithm negotiation in protocols: TLS 1.3 already supports negotiating cipher suites, so adding hybrid Kyber cipher suites is a configuration change, not a protocol change. PKCS#11 CKM (Cryptographic Mechanism) identifiers for new algorithms require application code that checks the mechanism list rather than hardcoding specific algorithm constants.
Key size abstraction in applications: Applications must treat key sizes and signature sizes as variable rather than fixed. This requires audit and likely refactoring of any code that preallocates fixed-size buffers for cryptographic output.
HSM firmware upgradability: Hardware that can receive firmware updates to add new algorithms within its validated boundary (or via a delta CMVP submission) provides significantly more flexibility than hardware with fixed algorithm support baked into silicon. CQ1's FPGA architecture allows algorithm updates via firmware without hardware replacement, within the scope of the validated boundary.

The limitation of crypto agility as a concept is that it is primarily useful for future-proofing against algorithms that don't exist yet. For the transition from classical RSA/ECDH to ML-KEM/ML-DSA — which is the immediate task — agility doesn't reduce the integration work required. Every TLS endpoint, HSM, CA, and PKCS#11 application still needs to be updated. Agility makes the next transition easier, not the current one.

Decision framework for financial security teams

For a financial institution security team translating these algorithm specifications into procurement and migration decisions, the key questions are:

Which security level for key exchange? For session keys with rotation periods under 90 days protecting data with retention lives under 5 years: ML-KEM-768 is appropriate. For long-lived keys, CA operations, and SWIFT correspondent banking sessions: ML-KEM-1024. When in doubt, use ML-KEM-1024 — the performance penalty versus ML-KEM-768 is modest in hardware, and the security margin for data that might be within a classification window in 10+ years is material.
Which security level for signatures? ML-DSA-65 (Dilithium-3) is the appropriate baseline for payment transaction signing and PKI certificate issuance. ML-DSA-87 provides additional margin for root CA operations but at the cost of larger signatures (4,595 bytes vs. 3,293 bytes).
FIPS 203/204 or pre-standard implementations? Any hardware or software evaluated today should implement FIPS 203 and FIPS 204 — the finalized standards with frozen test vectors. Pre-finalization implementations of the finalist algorithms are not equivalent, and the test vector sets differ. ACVTS test evidence should reference FIPS 203/204, not earlier NIST submissions.
Timeline for hybrid vs. pure PQC? Hybrid X25519+Kyber-1024 is the transition posture until full counterparty coverage in the payment network supports pure ML-KEM. For institutions beginning now, planning a 3–5 year hybrid window before moving to pure PQC is realistic given counterparty migration timelines in payment networks.

The algorithm standards are settled. The engineering decisions required to implement them in financial infrastructure are not complex in principle — the math is well-specified, the test vectors are published, the security levels are defined. What remains is the operational and procurement work: hardware that implements the standards correctly, testing environments that validate those implementations, and migration sequencing that accounts for the multi-party dependencies of payment networks. That work is what the coming 3–5 years require.

See how CQ1 implements Kyber-1024 and Dilithium-3 in hardware

CQ1 Module