A curated list of best cuda programming books
-
Updated
May 19, 2026
A curated list of best cuda programming books
Real-time stream editing pipeline powered by the FLUX.2-klein-4B model, optimized for consumer GPUs
A GPU-Accelerated First-Order LP Solver
The intelligent OptiScaler installer Linux gamers needed. Automates FSR4, XeSS & DLSS configuration with GPU-optimized profiles for RDNA3/4, Arc & RTX cards.
GVProf: A Value Profiler for GPU-based Clusters
Boost Valheim's FPS to forge a smoother Viking journey!
Fast waifu2x converter with GPU optimization
Fast waifu2x converter with GPU optimization
AI Infrastructure Performance Engineer Learning Track - GPU optimization, inference optimization, and cost reduction
Handwritten Flash Attention 2 CUDA kernel for Blackwell (SM120) with TMA, swizzle, double buffering & warp specialization
KeSSie HUGE Context Semantic recall for Large Language Models
The GPU Optimizer for ML Models enhances GPU performance for machine learning. It offers advanced scheduling, real-time monitoring, and efficient resource management through a user-friendly web interface and robust API, integrating big data technologies for seamless data processing and model optimization. @NVIDIA
Production-ready checklists and frameworks for deploying LLMs, GenAI models, and AI infrastructure. Covers vLLM, Kubernetes, GPU optimization, observability, compliance, and Day-0 to Day-2 operations.
First open-source real-time face filter app using MediaPipe FaceMesh for high-performance, GPU-accelerated effects.
Bilingual CUDA SGEMM optimization tutorial and reference implementation, from naive kernels to Tensor Core WMMA | 双语 CUDA SGEMM 优化教程与参考实现,从朴素内核到 Tensor Core WMMA
This is a short course covering GPU optimization techniques for LLM inference
Physics-based computation at scale — Hamiltonian dynamics, spectral theory, and statistical mechanics powering optimization, drug discovery, genomics, molecular proof, and agentic commerce.
用于复现和优化常见的深度学习算子,基于cuda和triton两种方案,可供学习和参考
AI Infrastructure Senior Engineer Learning Track - Advanced ML infrastructure and technical leadership
Text Embeddings Inference optimized for NVIDIA Jetson Orin (SM87) and L4 GPU (SM89) with community PRs integration
Add a description, image, and links to the gpu-optimization topic page so that developers can more easily learn about it.
To associate your repository with the gpu-optimization topic, visit your repo's landing page and select "manage topics."