聚焦 AI 基础设施、CUDA Kernel 与高性能系统工程
🔬 Focus: AI Infrastructure · CUDA Kernels · LLM Inference · HPC Systems
🌱 Currently: Building high-throughput inference pipelines and GPU-first systems
🤝 Open to: AI infrastructure, performance engineering, research collaboration, and open-source collaboration
I build AI infrastructure and GPU-first high-performance systems with C++/CUDA, Python, and Go. 主要聚焦 AI 基础设施、GPU 算子优化与高性能系统工程实践。
- 🔥 GPU Kernel Engineering — CUDA/Triton kernels for FlashAttention, GEMM, quantization, and memory-aware operator design
GPU 算子工程 — FlashAttention、GEMM、量化与内存感知算子设计 - 🧠 AI Inference Systems — lightweight LLM runtimes, KV Cache, W8A16/FP8 quantization, and inference path optimization
AI 推理系统 — 轻量 LLM 运行时、KV Cache、量化方案与推理路径优化 - ⚡ High-Performance Computing — simulation, rendering, and image-processing pipelines tuned for throughput and scalability
高性能计算 — 面向吞吐与可扩展性的仿真、渲染与图像处理流水线 - 🌐 Real-time Systems — RTC signaling, streaming applications, and digital human platforms with system-level integration
实时系统 — RTC 信令、流媒体应用与数字人平台的系统级集成
Currently / 当前关注: inference acceleration, kernel fusion, and end-to-end GPU system design.
推理加速、算子融合与端到端 GPU 系统设计。
Featured Projects / 核心项目 — Start here for the quickest overview of my work in bioinformatics, HPC, AI inference, and developer tooling.
如果你想快速判断我的技术重心与代表作,建议先看下面 4 个项目。
Best entry points for collaboration, hiring conversations, and technical review.
|
Cursor AI 编程规则精选集 | 132+ 规则,覆盖前端/后端/AI/DevOps 等 32 个领域 |
Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready. |
|
End-to-end Metagenomic Intelligence and Comprehensive Omics Suite (Mammoth Cup 2024) |
High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM algorithms. |
|
Systematic knowledge base for bioinformatics (Chinese community) |
End-to-end Metagenomic Intelligence and Comprehensive Omics Suite |
|
High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM. |
High-performance FASTQ QC toolkit (stat/filter/trim); zero-copy I/O, TBB pipeline, C++23. |
|
Curated bioinformatics algorithms knowledge base with complexity analysis, CLI tools, and bilingual docs. |
|
Bilingual CUDA SGEMM optimization tutorial, from naive kernels to Tensor Core WMMA. |
High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization. |
|
Header-only C++23 bit manipulation library with SIMD acceleration (SSE2/AVX2/AVX-512/NEON). |
Classic lossless compression algorithms in C++17, Go, and Rust with cross-language binary verification. |
|
HPC textbooks covering MPI, OpenMP, CUDA, and Scientific Computing (CC-BY 4.0) |
Compression Knowledge Base: Algorithm Theory, Performance Benchmarks & C++ Examples |
|
Cursor AI 编程规则精选集 | 132+ 规则,覆盖前端/后端/AI/DevOps 等 32 个领域 |
Archive-grade .mdc rule library for Cursor AI — 26 production-ready rules |
|
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows |
Offline-first bookmark cleaner: rules-first, ML-assisted, LLM-optional |
|
Multi-Model Real-Time Visual Recognition System with REST API and WebSocket Streaming |
Privacy-first diagram editor with local WASM rendering, Kroki full mode, sharing, and export. |
|
Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready. |
Lightweight WebRTC Demo: Go Signaling Server + Vanilla JavaScript Client, OpenSpec-Driven |
|
Browser-based memory training PWA with FSRS-4.5 spaced repetition, N-back training, and adaptive difficulty |
|
Background in communications engineering. / 通信与信息工程相关背景 |
Engineering across medical imaging, RTC systems, and genomic-scale data workflows. / 覆盖医疗影像、实时音视频系统与基因数据工程。 |
Reach out if you're building AI infrastructure, inference acceleration, GPU systems, or performance-critical tooling.
欢迎联系我交流 AI 基础设施、推理加速、GPU 系统,以及对性能敏感的工程项目。
Open to technical collaboration, engineering roles, research discussions, and thoughtful open-source work.



