Skip to content
View LessUp's full-sized avatar
  • shenzhen
  • 22:53 (UTC +08:00)

Block or report LessUp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
LessUp/README.md
Static title

聚焦 AI 基础设施、CUDA Kernel 与高性能系统工程

🔬 Focus: AI Infrastructure · CUDA Kernels · LLM Inference · HPC Systems
🌱 Currently: Building high-throughput inference pipelines and GPU-first systems
🤝 Open to: AI infrastructure, performance engineering, research collaboration, and open-source collaboration


Followers   Stars   Views



Profile  Selected Work  Background  Stack  Signals  Connect



👨‍💻 About Me / 关于我

Top Languages

I build AI infrastructure and GPU-first high-performance systems with C++/CUDA, Python, and Go. 主要聚焦 AI 基础设施、GPU 算子优化与高性能系统工程实践。

  • 🔥 GPU Kernel Engineering — CUDA/Triton kernels for FlashAttention, GEMM, quantization, and memory-aware operator design
    GPU 算子工程 — FlashAttention、GEMM、量化与内存感知算子设计
  • 🧠 AI Inference Systems — lightweight LLM runtimes, KV Cache, W8A16/FP8 quantization, and inference path optimization
    AI 推理系统 — 轻量 LLM 运行时、KV Cache、量化方案与推理路径优化
  • High-Performance Computing — simulation, rendering, and image-processing pipelines tuned for throughput and scalability
    高性能计算 — 面向吞吐与可扩展性的仿真、渲染与图像处理流水线
  • 🌐 Real-time Systems — RTC signaling, streaming applications, and digital human platforms with system-level integration
    实时系统 — RTC 信令、流媒体应用与数字人平台的系统级集成

Currently / 当前关注: inference acceleration, kernel fusion, and end-to-end GPU system design.
推理加速、算子融合与端到端 GPU 系统设计。



🚀 Selected Work / 项目全景

Featured Projects / 核心项目 — Start here for the quickest overview of my work in bioinformatics, HPC, AI inference, and developer tooling.
如果你想快速判断我的技术重心与代表作,建议先看下面 4 个项目。
Best entry points for collaboration, hiring conversations, and technical review.

Cursor AI 编程规则精选集 | 132+ 规则,覆盖前端/后端/AI/DevOps 等 32 个领域

Stars JavaScript

Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready.
浏览器原生 3D 数字人引擎,支持语音、视觉与对话。零配置、离线可用。

Stars TypeScript

End-to-end Metagenomic Intelligence and Comprehensive Omics Suite (Mammoth Cup 2024)
端到端宏基因组综合分析平台(猛犸杯 2024 参赛项目)

Stars R

High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM algorithms.
高性能 FASTQ 压缩工具:3.97x 压缩比,O(1) 随机访问。

Stars C++23

🧬 Bioinformatics & Genomics / 生物信息学

Systematic knowledge base for bioinformatics (Chinese community)
面向中文社区的生物信息学体系化知识库

MDX Bioinformatics

End-to-end Metagenomic Intelligence and Comprehensive Omics Suite
端到端宏基因组综合分析平台(猛犸杯 2024)

R Metagenomics

High-performance FASTQ compression with 3.97x ratio and O(1) random access. C++23, ABC+SCM.
高性能 FASTQ 压缩:3.97x 压缩比,O(1) 随机访问

C++23 oneTBB

High-performance FASTQ QC toolkit (stat/filter/trim); zero-copy I/O, TBB pipeline, C++23.
高性能 FASTQ 质控工具:零拷贝 I/O、TBB 流水线、C++23

C++23 Zero-Copy

Curated bioinformatics algorithms knowledge base with complexity analysis, CLI tools, and bilingual docs.
精选生物信息学算法知识库,含复杂度分析、CLI 维护工具与双语文档

Python Algorithms

⚡ CUDA & HPC / 高性能计算

Bilingual CUDA SGEMM optimization tutorial, from naive kernels to Tensor Core WMMA.
双语 CUDA SGEMM 优化教程与参考实现,从朴素内核到 Tensor Core WMMA

CUDA Tensor Core

High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization.
高性能 C++ 优化指南,含无锁数据结构、SIMD 和内存优化示例

C++17 SIMD

Header-only C++23 bit manipulation library with SIMD acceleration (SSE2/AVX2/AVX-512/NEON).
仅头文件 C++23 位操作库,支持 SIMD 加速

C++23 SIMD

Classic lossless compression algorithms in C++17, Go, and Rust with cross-language binary verification.
经典无损压缩算法,支持 C++17、Go 和 Rust,跨语言二进制验证

Go Rust

HPC textbooks covering MPI, OpenMP, CUDA, and Scientific Computing (CC-BY 4.0)
《高性能计算艺术》系列中文翻译,涵盖 MPI、OpenMP、CUDA 与科学计算

MPI OpenMP

Compression Knowledge Base: Algorithm Theory, Performance Benchmarks & C++ Examples
压缩算法知识库:原理、性能基准与 C++ 示例

C++17 Algorithms

🤖 AI & Developer Tooling / AI 与开发者工具

Cursor AI 编程规则精选集 | 132+ 规则,覆盖前端/后端/AI/DevOps 等 32 个领域

Stars JavaScript

Archive-grade .mdc rule library for Cursor AI — 26 production-ready rules
归档级 Cursor .mdc 规则库 — 26 个生产就绪规则,低漂移设计

Stars JavaScript

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
Claude Skills 精选列表:定制 Claude AI 工作流的技能、资源和工具合集

Python Claude

Offline-first bookmark cleaner: rules-first, ML-assisted, LLM-optional
智能书签清理与分类:规则+ML+LLM(可选)

Python ML

Multi-Model Real-Time Visual Recognition System with REST API and WebSocket Streaming
多模型实时视觉识别系统,提供 REST API 和 WebSocket 流式推理

Python YOLOv8

Privacy-first diagram editor with local WASM rendering, Kroki full mode, sharing, and export.
隐私优先的图表编辑器:本地 WASM 渲染、Kroki 全模式、分享与导出

TypeScript WASM

🌐 Applications / 应用项目

Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready.
浏览器原生 3D 数字人引擎,支持语音、视觉与对话。零配置、离线可用。

Stars TypeScript

Lightweight WebRTC Demo: Go Signaling Server + Vanilla JavaScript Client, OpenSpec-Driven
轻量级 WebRTC 演示:Go 信令服务 + 原生 JavaScript 客户端,OpenSpec 驱动开发

Go WebRTC

Browser-based memory training PWA with FSRS-4.5 spaced repetition, N-back training, and adaptive difficulty
基于 FSRS-4.5 间隔重复、N-back 训练和自适应难度的浏览器记忆力训练 PWA

JavaScript PWA


🎓 Background & Experience / 教育与经历

🎓 Education

Xidian University Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray Mindray · ZEGO ZEGO · BGI BGI

Engineering across medical imaging, RTC systems, and genomic-scale data workflows. / 覆盖医疗影像、实时音视频系统与基因数据工程。


🛠️ Tech Stack / 技术栈

Category Technologies
Languages Languages
AI & HPC AI   CUDA · Triton · cuBLAS · Tensor Core · WebGPU · Quantization
System & DevOps System   Inference pipelines · Performance tuning
Web & Frontend Web   Real-time apps · Visualization

📊 Signals & Activity / 数据概览

LessUp's GitHub stats   GitHub Streak

📫 Collaboration & Contact / 联系方式

Reach out if you're building AI infrastructure, inference acceleration, GPU systems, or performance-critical tooling.
欢迎联系我交流 AI 基础设施、推理加速、GPU 系统,以及对性能敏感的工程项目。
Open to technical collaboration, engineering roles, research discussions, and thoughtful open-source work.
Email   GitHub

Footer

Pinned Loading

  1. awesome-cursorrules-zh awesome-cursorrules-zh Public

    Cursor AI 编程规则精选集 | 132+ 规则,覆盖前端/后端/AI/DevOps 等 32 个领域

    JavaScript 191 27

  2. meta-human meta-human Public

    Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready AI avatar platform. | 浏览器原生 3D 数字人引擎,支持语音、视觉与对话。零配置、离线可用的 AI 虚拟人平台。

    TypeScript 18 6

  3. ⚡ GLM Coding Rush — 智谱编程助手一键抢购脚本 | A... ⚡ GLM Coding Rush — 智谱编程助手一键抢购脚本 | Auto-Purchase Userscript for GLM Coding | 自动解锁售罄 · 高速重试 · 定时触发 · 支付保护 · 中英双语面板 | Auto-unlock sold-out · High-speed retry · Scheduled trigger · Payment guard · Bilingual panel | Tampermonkey/Violentmonkey | 点击 Raw 安装 · Click Raw to install
    1
    // ==UserScript==
    2
    // @name         GLM Coding Rush - 智谱编程助手抢购脚本
    3
    // @namespace    https://gist.github.com/LessUp
    4
    // @version      1.1.0
    5
    // @description  智谱 GLM Coding 一键抢购脚本 — 自动解锁售罄按钮 / 高速重试引擎 / bizId 双重校验 / 错误弹窗自动恢复 / 支付弹窗保护 / 秒级定时触发 / 可拖拽浮动面板
  4. micos-2024 micos-2024 Public

    End-to-end Metagenomic Intelligence and Comprehensive Omics Suite (Mammoth Cup 2024 Entry) | 端到端宏基因组综合分析平台(猛犸杯2024参赛项目)

    R 11 3

  5. cpp-high-performance-guide cpp-high-performance-guide Public

    High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization examples | 高性能 C++ 优化指南,包含无锁数据结构、SIMD 和内存优化示例

    C++ 6 1

  6. wiki-bioinfo wiki-bioinfo Public

    面向中文社区的生物信息学体系化知识库 | Systematic knowledge base for bioinformatics (Chinese)

    MDX 5 1