Lessup LessUp

聚焦 AI 基础设施、CUDA Kernel 与高性能系统工程

🔬 Focus: AI Infrastructure · CUDA Kernels · LLM Inference · HPC Systems
🌱 Currently: Building high-throughput inference pipelines and GPU-first systems
🤝 Open to: AI infrastructure, performance engineering, research collaboration, and open-source collaboration

👨‍💻 About Me / 关于我

I build AI infrastructure and GPU-first high-performance systems with C++/CUDA, Python, and Go. 主要聚焦 AI 基础设施、GPU 算子优化与高性能系统工程实践。

🔥 GPU Kernel Engineering — CUDA/Triton kernels for FlashAttention, GEMM, quantization, and memory-aware operator design
GPU 算子工程 — FlashAttention、GEMM、量化与内存感知算子设计
🧠 AI Inference Systems — lightweight LLM runtimes, KV Cache, W8A16/FP8 quantization, and inference path optimization
AI 推理系统 — 轻量 LLM 运行时、KV Cache、量化方案与推理路径优化
⚡ High-Performance Computing — simulation, rendering, and image-processing pipelines tuned for throughput and scalability
高性能计算 — 面向吞吐与可扩展性的仿真、渲染与图像处理流水线
🌐 Real-time Systems — RTC signaling, streaming applications, and digital human platforms with system-level integration
实时系统 — RTC 信令、流媒体应用与数字人平台的系统级集成

Currently / 当前关注: inference acceleration, kernel fusion, and end-to-end GPU system design.
推理加速、算子融合与端到端 GPU 系统设计。

🚀 Selected Work / 项目全景

Featured Projects / 核心项目 — Start here for the quickest overview of my work in bioinformatics, HPC, AI inference, and developer tooling.
如果你想快速判断我的技术重心与代表作，建议先看下面 4 个项目。
_{Best entry points for collaboration, hiring conversations, and technical review.}

⭐ Awesome CursorRules 中文

Cursor AI 编程规则精选集 | 132+ 规则，覆盖前端/后端/AI/DevOps 等 32 个领域

⭐ Meta Human

Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready.
浏览器原生 3D 数字人引擎，支持语音、视觉与对话。零配置、离线可用。

⭐ MICOS-2024

End-to-end Metagenomic Intelligence and Comprehensive Omics Suite (Mammoth Cup 2024)
端到端宏基因组综合分析平台（猛犸杯 2024 参赛项目）

⭐ FASTQ Compressor

High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM algorithms.
高性能 FASTQ 压缩工具：3.97x 压缩比，O(1) 随机访问。

🧬 Bioinformatics & Genomics / 生物信息学

🟢 Wiki-Bioinfo

Systematic knowledge base for bioinformatics (Chinese community)
面向中文社区的生物信息学体系化知识库

🟢 MICOS-2024

End-to-end Metagenomic Intelligence and Comprehensive Omics Suite
端到端宏基因组综合分析平台（猛犸杯 2024）

🟢 FASTQ Compressor

High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM.
高性能 FASTQ 压缩：3.97x 压缩比，O(1) 随机访问

🟢 FASTQ Tools

High-performance FASTQ QC toolkit (stat/filter/trim); zero-copy I/O, TBB pipeline, C++23.
高性能 FASTQ 质控工具：零拷贝 I/O、TBB 流水线、C++23

🟢 Awesome Bioinfo Algorithms

Curated bioinformatics algorithms knowledge base with complexity analysis, CLI tools, and bilingual docs.
精选生物信息学算法知识库，含复杂度分析、CLI 维护工具与双语文档

⚡ CUDA & HPC / 高性能计算

🔷 SGEMM Optimization

Bilingual CUDA SGEMM optimization tutorial, from naive kernels to Tensor Core WMMA.
双语 CUDA SGEMM 优化教程与参考实现，从朴素内核到 Tensor Core WMMA

🔷 C++ High Performance Guide

High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization.
高性能 C++ 优化指南，含无锁数据结构、SIMD 和内存优化示例

🔷 BitCal

Header-only C++23 bit manipulation library with SIMD acceleration (SSE2/AVX2/AVX-512/NEON).
仅头文件 C++23 位操作库，支持 SIMD 加速

🔷 Compress Kit

Classic lossless compression algorithms in C++17, Go, and Rust with cross-language binary verification.
经典无损压缩算法，支持 C++17、Go 和 Rust，跨语言二进制验证

🔷 The Art of HPC 中文翻译

HPC textbooks covering MPI, OpenMP, CUDA, and Scientific Computing (CC-BY 4.0)
《高性能计算艺术》系列中文翻译，涵盖 MPI、OpenMP、CUDA 与科学计算

🔷 Awesome Compression

Compression Knowledge Base: Algorithm Theory, Performance Benchmarks & C++ Examples
压缩算法知识库：原理、性能基准与 C++ 示例

🤖 AI & Developer Tooling / AI 与开发者工具

🟣 Awesome CursorRules 中文

Cursor AI 编程规则精选集 | 132+ 规则，覆盖前端/后端/AI/DevOps 等 32 个领域

🟣 Cursor Rules

Archive-grade .mdc rule library for Cursor AI — 26 production-ready rules
归档级 Cursor .mdc 规则库 — 26 个生产就绪规则，低漂移设计

🟣 Awesome Claude Skills 中文

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
Claude Skills 精选列表：定制 Claude AI 工作流的技能、资源和工具合集

🟣 Bookmarks Cleaner

Offline-first bookmark cleaner: rules-first, ML-assisted, LLM-optional
智能书签清理与分类：规则+ML+LLM（可选）

🟣 YOLO-Toys

Multi-Model Real-Time Visual Recognition System with REST API and WebSocket Streaming
多模型实时视觉识别系统，提供 REST API 和 WebSocket 流式推理

🟣 Graph Viewer

Privacy-first diagram editor with local WASM rendering, Kroki full mode, sharing, and export.
隐私优先的图表编辑器：本地 WASM 渲染、Kroki 全模式、分享与导出

🎓 Background & Experience / 教育与经历

🎓 Education

Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray · ZEGO · BGI

Engineering across medical imaging, RTC systems, and genomic-scale data workflows. / 覆盖医疗影像、实时音视频系统与基因数据工程。

🛠️ Tech Stack / 技术栈

Category	Technologies
Languages
AI & HPC	CUDA · Triton · cuBLAS · Tensor Core · WebGPU · Quantization
System & DevOps	Inference pipelines · Performance tuning
Web & Frontend	Real-time apps · Visualization

📊 Signals & Activity / 数据概览

📫 Collaboration & Contact / 联系方式

Reach out if you're building AI infrastructure, inference acceleration, GPU systems, or performance-critical tooling.
欢迎联系我交流 AI 基础设施、推理加速、GPU 系统，以及对性能敏感的工程项目。
_{Open to technical collaboration, engineering roles, research discussions, and thoughtful open-source work.}