Vanna AI is an open-source, Python-based Text-to-SQL framework that uses Retrieval-Augmented Generation (RAG) + LLMs to let anyone query a database in plain English — no SQL expertise required.
import vanna from vanna.openai import OpenAI_Chat from vanna.chromadb import ChromaDB_VectorStore # 1. Initialize with your LLM + vector store class MyVanna(ChromaDB_VectorStore, OpenAI_Chat): def __init__(self, config=None): ChromaDB_VectorStore.__init__(self, config=config) OpenAI_Chat.__init__(self, config=config) vn = MyVanna(config={'api_key': 'sk-...', 'model': 'gpt-4'}) # 2. Connect to your database vn.connect_to_postgres(host='localhost', dbname='mydb', ...) # 3. Train on your schema (run once) vn.train(ddl="CREATE TABLE orders (id INT, amount DECIMAL, user_id INT, ...)") vn.train(documentation="OTIF = % of orders delivered on time and in full") vn.train(sql="SELECT user_id, SUM(amount) FROM orders GROUP BY user_id") # 4. Ask questions in plain English sql = vn.generate_sql("What were our top 10 products by revenue last month?") df = vn.run_sql(sql) fig = vn.get_plotly_figure(df=df, question=...)
| Category | Supported Options | Notes |
|---|---|---|
| LLMs | GPT-4oClaudeGemini OllamaLLaMAMistral | Any OpenAI-compatible API or local model |
| Vector Stores | ChromaDBQdrantPinecone WeaviatePgVector | Bring your own or use hosted Vanna metadata |
| Databases | PostgreSQLMySQLSnowflake BigQuerySQLiteDuckDB RedshiftMSSQL | Any SQL-speaking DB via connectors |
| Frontends | JupyterStreamlitFlask FastAPISlack BotReact | <vanna-chat> web component available |
| Auth / Security | JWTOAuthRow-Level Security SOC 2 | Vanna 2.0: user-scoped execution built-in |
vn.train(documentation=...), the model will guess — and guess differently each time. Invest in a business glossary as training data.
<vanna-chat> web component for business users.
Vanna AI is one of the most compelling Text-to-SQL solutions in production today — not because it's magic, but because it's honest about being a learnable system. The RAG architecture means it improves with investment, security is taken seriously from day one, and the open-source model means you're never locked in. The real tradeoff: it rewards teams willing to invest in training data, and punishes those who don't.