llama-cpp

Star

Here are 395 public repositories matching this topic...

antoinezambelli / forge

Star

A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows

python self-hosted agents llm llama-cpp function-calling ollama llamafile agentic-workflow agentic-ai tool-calling

Updated May 31, 2026
Python

the-crypt-keeper / can-ai-code

Star

Self-evaluating interview for AI coders

ai transformers humaneval llm langchain llama-cpp ggml

Updated Jun 21, 2025
Python

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

metal optimization quantization m2 m3 m1 memory-optimization kv-cache apple-silicon llm generative-ai llama-cpp

Updated May 21, 2025
Python

jlonge4 / local_llama

Star

This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.

python offline artificial-intelligence machinelearning langchain llama-cpp llamaindex

Updated Jul 12, 2024
Python

spark-arena / sparkrun

Star

sparkrun - launch, manage, and stop LLM inference workloads on NVIDIA DGX Spark systems

inference llama-cpp vllm sglang dgx-spark

Updated May 31, 2026
Python

1038lab / ComfyUI-MiniCPM

Sponsor

Star

A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.

custom-nodes stable-diffusion muti-models llama-cpp comfyui gguf minicpm minicpm-v

Updated Aug 28, 2025
Python

nuance1979 / llama-server

Star

LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.

llama chatbot-ui llamacpp llama-cpp

Updated Jun 10, 2023
Python

BjornMelin / docmind-ai-llm

Star

DocMind AI is a powerful, open-source Streamlit application leveraging LlamaIndex, LangGraph, and local Large Language Models (LLMs) via Ollama, LMStudio, llama.cpp, or vLLM for advanced document analysis. Analyze, summarize, and extract insights from a wide array of file formats, securely and privately, all offline.

python transformers torch document-analysis ai-agents streamlit sentence-transformers hybrid-search qdrant langchain llamacpp llama-cpp vllm local-llm ollama lmstudio multimodal-embeddings private-ai-agents langgraph-supervisor-py

Updated May 27, 2026
Python

Abhi5h3k / PrivateDocBot

Star

📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy

Updated Mar 24, 2025
Python

OpenCSGs / llm-inference

Star

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

transformer ray deepspeed llama-cpp vllm llm-inference

Updated May 17, 2024
Python

vtuber-plan / langport

Star

Langport is a language model inference service

api openai llama language-model tabby llm fauxpilot chatgpt langchain chatgpt-api llama-cpp

Updated Sep 9, 2024
Python

robiwan303 / babyagi

Star

BabyAGI-🦙: Enhanced for Llama models (running 100% local) and persistent memory, with smart internet search based on BabyCatAGI and document embedding in langchain based on privateGPT

python agi artificial-intelligence artificial-general-intelligence llama reasoning task-based ai-agents serpapi openai-api autonomous-agent google-search-api llm chatgpt langchain llama-cpp babyagi

Updated Jun 4, 2023
Python

asierarranz / Cool_Demos

Star

Demos of Google's Gemma models running locally on NVIDIA Jetson Orin Nano, from the Tokyo Dev Day (Gemma 2) to the latest Gemma 4 VLA agent with voice + vision.

nvidia gemma vla jetson voice-assistant multimodal edge-ai on-device-ai llama-cpp jetson-orin-nano

Updated Apr 17, 2026
Python

fengzhizi715 / OpenVitamin

Star

OpenVitamin is a local-first AI execution platform that unifies Agents, Workflows, and multi-model inference into a single programmable system — designed for building real, production-grade AI applications.

agent workflow ai multi-model execution-engine ai-agents rag fastapi ai-platform local-first onnxruntime llm llama-cpp local-ai ai-platforms agent-orchestration openai-compatible agent-runtime model-routing

Updated May 15, 2026
Python

hogeheer499-commits / strix-halo-guide

Star

Reproducible local LLM setup and benchmark evidence for AMD Strix Halo / Ryzen AI MAX+ 395: 63-98.5 t/s direct Qwen MoE, 101.1 t/s MTP.

Updated May 31, 2026
Python

spacehendrix / universal-intelligence

Star

◉ Universal Intelligence: AI made simple.

Updated Apr 16, 2026
Python

joeynyc / spark-doctor

Star

Local diagnostic CLI for NVIDIA DGX Spark (GB10). Detects power caps, unified memory pressure, thermal risk, Docker/runtime issues, and validates vLLM/Ollama/llama.cpp/SGLang recipes.

cli nvidia diagnostics dgx llama-cpp vllm local-llm ollama sglang gb10 dgx-spark grace-blackwell nvidia-dgx-spark

Updated May 15, 2026
Python

gyunggyung / Tiny-MoA

Star

Running Mixture of Agents on CPU: LFM2.5 Brain (1.2B) + Falcon-R Reasoner (600M) + Tool Caller (90M). CPU-only, 16GB RAM. Lightweight AI Legion.

multilingual lightweight falcon agents moa uv on-device-ai cpu-inference llm llama-cpp mixture-of-agents tool-calling lfm2

Updated Feb 7, 2026
Python

Talnz007 / VulkanIlm

Star

GPU-accelerated LLaMA inference wrapper for legacy Vulkan-capable systems a Pythonic way to run AI with knowledge (Ilm) on fire (Vulkan).

machine-learning vulkan python-wrapper fastai amd-gpu intel-gpu llama-cpp gpu-inference llm-inference localllm local-ai open-source-llm llama-cpp-python gguf legacy-gpus

Updated Oct 14, 2025
Python

ossirytk / llama-cpp-chat-memory

Star

Local character AI chatbot with chroma vector store memory and some scripts to process documents for Chroma

chatbot spacy ner llama-cpp langchain-python chromadb chainlit llama2 llama-cpp-python gguf

Updated Oct 7, 2024
Python

Improve this page

Add a description, image, and links to the llama-cpp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llama-cpp topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-cpp

Here are 395 public repositories matching this topic...

antoinezambelli / forge

the-crypt-keeper / can-ai-code

dipampaul17 / KVSplit

jlonge4 / local_llama

spark-arena / sparkrun

1038lab / ComfyUI-MiniCPM

nuance1979 / llama-server

BjornMelin / docmind-ai-llm

Abhi5h3k / PrivateDocBot

OpenCSGs / llm-inference

vtuber-plan / langport

robiwan303 / babyagi

asierarranz / Cool_Demos

fengzhizi715 / OpenVitamin

hogeheer499-commits / strix-halo-guide

spacehendrix / universal-intelligence

joeynyc / spark-doctor

gyunggyung / Tiny-MoA

Talnz007 / VulkanIlm

ossirytk / llama-cpp-chat-memory

Improve this page

Add this topic to your repo