
NVIDIA researchers have introduced a breakthrough compression technology called KVTC (KV Cache Transform Coding), designed to dramatically reduce the memory footprint of large language models (LLMs) during long conversations. Key Highlights Why It Matters Industry experts believe KVTC could become as standard as video compression, enabling AI systems to handle ever-longer conversations efficiently and at scale.

In recent years, the focus of AI has shifted from training to inference, and NVIDIA is aiming to reshape this space with its newly announced LPU (Language Processing Unit) chips at last week’s GTC conference. During the event, NVIDIA’s Chief Scientist Bill Dally sat down with Google’s Chief Scientist Jeff Dean for a deep technical discussion. Dally highlighted that the real bottleneck in AI inference today isn’t raw compute power—it’s communication overhead. This leap would represent a massive acceleration in AI responsiveness, making real-time, high-throughput inference practical for everyday use.
NVIDIA is shifting its CPU strategy from internal use only to a dual approach—serving both its own needs and external customers. This transformation begins with the Vera CPU, based on the Arm instruction set and NVIDIA’s custom architecture. This marks NVIDIA’s direct challenge to Intel and AMD, while also competing against other Arm-based custom processors. Rosa CPU – Coming in 2028 According to NVIDIA’s roadmap, the next step after Vera is the Rosa CPU, expected in 2028. The name “Rosa” is short for Rosalyn, honoring Nobel Prize–winning medical physicist Rosalyn Sussman Yalow (1921–2011). She pioneered radioimmunoassay technology, enabling detection of viruses, hormones, and drug levels in blood without biological assays. Remarkably, she waived patent rights, allowing global free adoption. Despite...
As GDC 2026 approaches, NVIDIA has not announced any new gaming GPUs or graphics cards (the RTX 50 SUPER series is officially dead), but it did share an interesting slide. NVIDIA claims that the path-traced ray tracing performance of the Blackwell RTX 50 series is already 10,000 times greater than in the past, and in the future, it will rise to a staggering 1,000,000 times. However, NVIDIA did not specify when this would be achieved, and the claim comes with some caveats. First, the baseline for these multipliers is not the first-generation hardware ray tracing cards (Turing RTX 20 series), but rather the Pascal GTX 10 series, which had no ray tracing engine and could only run software-based simulations. NVIDIA...

The United States is poised to approve exports of NVIDIA’s H200 AI chip to China, which will be the most powerful chip NVIDIA can provide to the Chinese market. According to the latest reports, the key factor behind this decision is the U.S. government’s assessment that the national security risks are relatively low, since Huawei—NVIDIA’s main competitor in China—already offers AI systems with comparable performance. Sources revealed that in evaluating whether to approve H200 exports, the U.S. considered multiple scenarios, ranging from a complete ban on AI chip sales to China, to allowing all products to flood the Chinese market and overwhelm Huawei. Ultimately, the policy backed by President Trump is to approve H200 sales to China, while reserving NVIDIA’s...