Qubits are inherently sensitive to noise, and it is expected that even the most robust qubits will always exhibit noise levels orders of magnitude from what’s required for practical quantum applications.
This noise problem is solved with quantum error correction (QEC). This is a collection of techniques that can identify and eliminate errors in a controlled way, so long as qubits can be designed with noise levels below some more achievable threshold level. QEC codes encode many physical qubits as logical qubits, which remain robust against errors.
In this approach, errors are corrected by repeatedly measuring select groups of the many physical qubits making up a logical qubit and then using the measurement results in conventional algorithms that infer where errors occurred, a process called decoding. Decoding is computationally challenging and is one of the primary bottlenecks of QEC techniques.
Building decoders that are fast, accurate, and scalable is essential to realize useful quantum computers. This is a prime example of the many cases where AI enables quantum computing by addressing challenges related to QEC, compilation, algorithm development, and more.
At GTC 25, NVIDIA announced a transformer-based AI decoder developed using the NVIDIA CUDA-Q platform in collaboration with QuEra. Not only does the decoder beat a state-of-the-art decoder but it also provides a promising path towards scalable decoding in the future.
The results of this work demonstrates how critical AI supercomputers such as the recently announced NVIDIA Accelerated Quantum Research Center (NVAQC) are to both the development and deployment of quantum error correction techniques.
Decoding decoders
QEC codes are usually characterized with the [[n,k,d]] nomenclature, where n is the number of physical qubits, k the number of logical qubits, and d the distance. Higher distance codes are capable of correcting more errors but generally require more complex encoding schemes and larger physical qubit overheads.
The first step to correct errors acting on a set of physical qubits (Figure 1) is to perform a select set of measurements on subsets of them, which collectively produce a so-called error syndrome.
Syndrome data is then transferred to a classical processor where the decoding takes place. The goal of the decoder is to infer from the error syndrome if and where any errors occurred. The decoder then outputs best guesses for error locations, which can be tracked and eventually used to determine corrective operations to be sent back to the QPU. This cycle is continually repeated for the full duration of a quantum algorithm.

It’s important for a decoder to be accurate: if the decoder makes a mistake, errors are either missed or introduced by inappropriate corrections, potentially corrupting the encoded information and ruining the algorithm.
A high accuracy decoder reduces the logical error rate. The same target error rate can be achieved with lower distance codes than might be required with other decoders, reducing the number of physical qubits needed.
In addition to being accurate, decoders must be fast and scalable. If a decoder cannot keep up with the incoming syndrome data, a backlog occurs, causing errors to snowball and making error correction impossible. This also places strict latency requirements on how quickly a decoder must transfer data to and from the QPU.
AI’s aptitude for complex pattern recognition and its natural scalability makes it one of the most promising tools for building decoders that are fast, accurate, and able to scale to handle the millions of physical qubits believed to be needed for useful quantum computations.
Making magic with data from QuEra’s QPU
Implementing quantum algorithms on QPUs requires a fault-tolerant, universal gate set, from which any algorithm can be programmed. In most fault-tolerant approaches to quantum computing, the strategy for fault-tolerantly performing operations rests on being able to prepare a so-called magic state. These special states are a resource that can be consumed in the computation to carry out an arbitrary quantum computation, also known as a universal quantum computation.
In fact, a large part of the work done in a fault-tolerant quantum computer is producing magic states. But there seems to be a Catch-22. How can magic states be reliably produced without access to the fault-tolerant operations that they themselves promise to deliver?
One solution is magic state distillation (MSD). The MSD protocol takes a number of noisy magic states as inputs and “distills” them to a single, higher fidelity magic state using a sequence of operations that are essentially a simple quantum error correcting code, ensuring that a high quality magic state is produced.
But MSD is costly. Multiple rounds of MSD are often necessary to produce magic states of sufficient fidelity to be used in an algorithm. Moreover, the resources required for MSD grow exponentially with the number of rounds you need for getting to a sufficiently noise-free magic state. This means that any way to boost the efficiency and fidelity of each round of MSD outputs has the potential to dramatically reduce the overheads of fault-tolerant quantum computations.
QuEra’s recent paper, Experimental Demonstration of Logical Magic State Distillation, showcased an experiment performing magic state distillation with logical qubits on their neutral atom QPU. They begin by encoding 35 neutral atom qubits into five logical magic states. They then used a 5-to-1 protocol (Figure 2), which distills a single, higher fidelity magic state.
![A diagram shows the [[7,1,3]] color code which is used to prepare the five logical qubits for the 5-to-1 magic state distillation circuit.](https://developer-blogs.nvidia.com/wp-content/uploads/2025/03/five-logical-magical-states-1024x343.png)
(source: Experimental Demonstration of Logical Magic State Distillation)
QuEra used the [[7,1,3]] color code (Figure 2) to encode each logical qubit. The MSD procedure contains two qubit logical gates, which also lead to propagation of errors between the logical qubits.
To boost the fidelity of the output, QuEra uses a method called correlated decoding, where syndromes were interpreted from all logical qubits simultaneously to infer errors, rather than decoding each logical qubit individually. The benefit here is that correlated errors caused by the logical 2-qubit gates can be decoded, improving decoder accuracy. This requires a powerful decoder that can interpret syndromes from all 35 physical qubits.
QuEra’s approach was to solve the correlated decoding problem using a most-likely error (MLE) decoder. MLE is a high-accuracy decoding algorithm, which requires solving an NP-hard problem, so its state-of-the-art performance comes at the cost of the algorithm runtime growing exponentially with the size of the code.
This is a crippling problem for MLE decoders. They don’t scale beyond the smallest code distances. For MSD with the [[85,1,5]] distance 5 color code, MLE takes over 100 ms, which is well outside the time requirements for any practical use.
NVIDIA and QuEra have developed a transformer-based-AI decoder, trained with NVIDIA PhysicsNeMo, to solve this problem. For QuEra’s distance 3 MSD circuit, the NVIDIA decoder outperformed the high-performing but poorly scaling MLE decoder.

Figure 3 plots MLE and the NVIDIA decoder results as a function of acceptance ratio, the fraction of successful experimental runs where the confidence of the syndrome is above a certain threshold, a larger acceptance ratio results in higher production rates of magic states. Figure 3 shows that for a given target for magic state fidelity, the NVIDIA decoder can operate more efficiently and produce more magic states than MLE in the high-acceptance-ratio region.
A sub-optimal but scalable decoder would still be an improvement on MLE for practical applications, but the NVIDIA decoder’s potential to scale while outperforming MLE is promising. By outperforming MLE, the NVIDIA decoder provides a powerful new tool that might also scale to code distances necessary for researchers to explore much more powerful error correction codes.
One of the reasons the NVIDIA decoder can outperform MLE is its carefully designed architecture. The attention mechanism of a transformer helps dynamically model dependencies between different inputs and makes it highly effective for capturing complex interactions. A graph neural network (GNN) is useful for combining neighboring information from the graph structure, representing the relationship between syndromes and logical qubits.
Another major benefit of the NVIDIA decoder is that it can be trained primarily with synthetic data, generated by simulations, requiring less experimental data drawn from actual quantum hardware. This avoids the need for performing many costly runs on limited QPU resources.
The QuEra hardware team generated valuable data to validate and in the future fine-tune the performance of the NVIDIA decoder, but could otherwise focus their machine time on other work while the model’s training data was generated with stim, a stabilizer circuit simulator developed by Google and integrated with the CUDA-Q platform.
Scaling the NVIDIA decoder with AI supercomputers
Higher distance codes are necessary to produce sufficiently low logical error rates (Figure 4). The MLE can’t scale to this point, so NVIDIA and QuEra are working to scale the NVIDIA decoder by leveraging AI supercomputing to generate the training data needed for higher code distances, as well as provide the AI architecture for parallelization of model training and inference. Even for the smallest (d=3) case, the MLE decoder takes tens of milliseconds, while the NVIDIA decoder can decode the task in under a millisecond.

The burden of decoding in MLE is replaced with the challenge of training. The amount of data necessary to train the NVIDIA decoder grows exponentially with code distance. The team is working to refine and scale the decoder using data generated by CUDA-Q’s GPU-accelerated, trajectory-based simulators. NVIDIA researchers have developed novel sampling algorithms to efficiently generate massive high-quality data sets that include more realistic non-Clifford noise 1Mx faster than previously possible with CUDA-Q.
With these algorithms, a single NVIDIA DGX-H100 node can generate just over a billion shots per hour for statevector simulation of the 35-qubit MSD circuit. Data generation can be pushed to the limits with entire supercomputers such as the NVIDIA Eos supercomputer, which can generate data at the impressive rate of half a trillion shots of data per hour.
CUDA-Q’s noisy tensor network backends can also scale circuit simulations to the qubit counts necessary to produce data for training larger code distances. Training the NVIDIA decoder for the distance=3 code completed in one hour running on 42 H100 GPUs. Larger distances will be much more challenging to train and require the power of AI supercomputing, in addition to fine tuning with experimental data from QuEra’s QPU.
The NVAQC will be an indispensable resource for NVIDIA and QuEra’s ongoing efforts to scale the NVIDIA decoder by enabling data generation and training at massive scale with state-of-the-art NVIDIA Blackwell GPUs.
Learn more about the NVIDIA decoder and quantum computing
NVIDIA is working with partners like QuEra to produce meaningful quantum error correction, and more broadly, AI for quantum breakthroughs and reduce timelines to useful quantum computing.
For more information about other NVIDIA tools to enable your quantum error computing research, see CUDA-Q QEC.
For more information about all of the other work NVIDIA is doing to accelerate quantum computing development, see NVIDIA Quantum Computing.