Open Source · MIT License

L'engine open source dell'AI.

Un singolo binario open source che parla le API di OpenAI, Anthropic e ElevenLabs — gira qualunque modello, su qualunque hardware, senza GPU obbligatoria.

46k+ stars MIT 50+ backend 25+ endpoint 3 API drop-in 100+ release

Star su GitHub

Punta il tuo SDK a localhost

Stesso SDK. Tre righe cambiano.

Scegli il client. Il boilerplate resta uguale, cambia solo il base_url. I modelli restano sul tuo hardware.

Stesso SDK. Tre righe cambiano. I modelli restano sul tuo hardware.

Un engine, qualunque API

Drop-in compatibile con quello che usi già.

OpenAI-compatible

Chat · embeddings · audio · immagini · tools · realtime · responses.

http://localhost:8080/v1

Anthropic-compatible

Messages API con streaming e tool use.

http://localhost:8080/anthropic

ElevenLabs-compatible

Text-to-speech, voci, streaming audio.

http://localhost:8080/elevenlabs

Ollama · MCP

Ollama API + Model Context Protocol come client e server.

http://localhost:8080/api

Sei livelli, un binario

Dal tuo SDK fino al metallo, in un singolo processo Go.

L6 · Clients apps · IDE · CLI · agenti · SDK OpenAI/Anthropic esistenti unmodified

L5 · API Layer OpenAI · Anthropic · ElevenLabs · Ollama · MCP · Realtime HTTP · WS · SSE

L4 · Core auth + quote · router · gallery · agenti · P2P · scheduler Go · single binary

L3 · Backend bus gRPC · hot-swappable · process-isolated · OCI-distributed dalla v3.2

L2 · Backends llama.cpp · vLLM · transformers · diffusers · whisper · MLX · +30 altri 50+ oggi

L1 · Hardware NVIDIA · AMD · Intel · Apple Silicon · Vulkan · CPU-only auto-detected

Un bus, tante runtime

50+ backend pluggable.

Bus gRPC, distribuzione OCI. Installa, rimuovi, aggiorna al volo. Hot-swap fra runtime senza riavviare l'engine.

Gira su quello che hai

Auto-detect allo startup.

NVIDIA, AMD, Intel, Apple Silicon, Vulkan o solo CPU — supportati tutti come cittadini di prima classe.

NVIDIA AMD ROCm Intel oneAPI Apple Silicon Vulkan CPU NVIDIA AMD ROCm Intel oneAPI Apple Silicon Vulkan CPU

NVIDIA

CUDA 12 · 13 · Jetson L4T

AMD

ROCm · HIP

Intel

oneAPI · SYCL

Apple

Metal · MLX

Vulkan

GPU cross-vendor

CPU

x86_64 · ARM64 first-class

25+ endpoint, quattro famiglie

L'intera superficie di capacità.

Generative

Chat, vision, image, video, TTS, sound, Anthropic messages.

POST/v1/chat/completions

POST/v1/images/generations

POST/v1/images/inpainting

POST/v1/audio/speech

POST/v1/messages (Anthropic)

POST/elevenlabs/sound-generation

Understanding & search

STT, diarization, VAD, object detection, embeddings, rerank, tokenize.

POST/v1/audio/transcriptions

POST/v1/audio/diarization

POST/v1/vad

POST/v1/detection

POST/v1/embeddings

POST/v1/rerank

POST/v1/tokenize

Realtime & agentic

Realtime voice (WS/WebRTC), agenti con job e task, MCP client + server, Responses API.

WS /v1/realtime

POST/v1/responses

POST/v1/mcp/chat/completions

POST/api/agents/*

POST/api/agent/jobs

POST/api/agent/tasks

Biometrics · identity

Face e voice register, identify, verify, embed, analyze, forget.

POST/v1/face/register

POST/v1/face/identify

POST/v1/face/verify

POST/v1/voice/register

POST/v1/voice/identify

POST/v1/voice/verify

Pronto per la produzione

Cos'altro c'è dentro.

Distribuito · federated

Tanti nodi pieni, un solo entry point. Bootstrap libp2p + EdgeVPN, onboarding con shared token.

Distribuito · sharded

Un modello, diviso fra macchine. Pesi distribuiti in base alla memoria; ogni nodo contribuisce a ogni token.

Agenti + MCP

Loop scheduler, planner, memory, streaming SSE. MCP come client e server. Agenti pre-built dall'Agent Hub.

Auth + RBAC

API key, OIDC, GitHub OAuth. Ruoli admin / user / read-only per endpoint.

Quote + usage

Budget di token, rate limit, attribuzione per utente. Metriche esportabili nello stack che hai già.

Air-gapped

Backend e modelli si scaricano una volta via OCI; la rete è opzionale da lì in poi.

Cinque minuti

Provalo.

# 1. Pull and run (CPU-only example)
$ docker run -ti --name local-ai \
  -p 8080:8080 localai/localai:latest

# 2. Pull a model from the gallery
$ local-ai run llama-3.2-1b-instruct:q4_k_m

# 3. Call it like OpenAI
$ curl localhost:8080/v1/chat/completions \
  -d '{"model":"llama-3.2-1b-instruct",
       "messages":[{"role":"user",
       "content":"hello"}]}'

macOS · LocalAI.dmg Download & double-click

NVIDIA · CUDA 13 localai/localai:latest-gpu-nvidia-cuda-13

AMD · HIP/ROCm localai/localai:latest-gpu-hipblas

Apple · Metal auto localai/localai:latest

Kubernetes · Helm helm install local-ai go-skynet/local-ai

localai.io localai.io →

Parliamone.

Open source quassù. Quaggiù, il team che l'ha costruito.

Star su GitHub