A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
-
Updated
May 21, 2026 - Python
A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Computations and statistics on manifolds with geometric structures.
Nabla: High-Performance Scientific Computing
Implementation of a Transformer, but completely in Triton
Fast deterministic all-Python Lennard-Jones particle simulator that utilizes Numba for GPU-accelerated computation.
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.
Boilerplate for GPU-Accelerated TensorFlow and PyTorch code on M1 Macbook
🌟 Compiler for vertex-centric programming of GNNs/TGNNs
pyCUDA implementation of forward propagation for Convolutional Neural Networks
Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.
bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码
Learn Triton by building FlashAttention from scratch — V2 kernels, persistent threads, mask DSL, profiling toolkit, bilingual docs
vgg16 inference implementation using tensorflow, numpy and pycuda
A package to run commands when GPU resources are available
A helper package to easily time Numba CUDA GPU events ⌛
Real-time object detection app using YOLOv5/YOLOv8 with custom UI built from scratch using Pyglet & OpenGL. UI animations made in Adobe After Effects, rendered as GIFs, and integrated via uxElements.py. Multi-core processing enables live capture, detection, and display with low latency. Uses Open Images v7 dataset. Train mode is WIP.
Simplify GPU Setup: Drivers, CUDA, Frameworks, and more!
An opinionated, end‑to‑end tutorial project for learning Reinforcement Learning (RL) from first principles to deployment. No notebooks. Everything is an explicit, inspectable Python script you can diff, profile, containerize, and ship.
A Taichi component for automatically compiling and launching compute graph.
Add a description, image, and links to the gpu-programming topic page so that developers can more easily learn about it.
To associate your repository with the gpu-programming topic, visit your repo's landing page and select "manage topics."