Taris: The AI Assistant That Keeps Data at the Client's Side
Taris: The AI Assistant That Keeps Data at the Client's Side
TL;DR. Taris is an AI assistant where client data never leaves to the vendor. The baseline principle: the model is a plugin behind a stable interface, not the centre of the architecture. Inside: vendor-neutral LLM dispatcher, hybrid RAG (BM25 + dense + RRF + cross-encoder rerank), multi-tenant Postgres with pgvector, optionally local models via Ollama. This article covers how Taris is built, why it is built that way, and where that delivers value for SMBs in the EU and CIS.
1. The Conflict: "Let's Get GPT-4 and Forget About It"
When a small business owner asks "which AI assistant should we install?", they usually get one of two extremes:
- "Get ChatGPT / Copilot Studio / Microsoft 365 — it's all there." Convenient, but: data goes to the vendor, customisation is limited, pricing grows opaquely, migrating to another platform means a full migration.
- "We'll build it for you from scratch with LangChain." Slow, expensive, and a year later 60% of the code turns out to be model-migration glue that nobody likes.
Taris is the third path: a productised base (model dispatcher, hybrid RAG, multi-tenant Postgres, channel adapters) that we deploy for the client and leave with the client. Not SaaS. Not "build from scratch." A half-product that is straightforward to adapt.
2. Who This Concerns
- SMB teams of 10–500 people who have accumulated documents, regulations, and client histories — but no AI specialists.
- Regulated niches (healthcare, occupational safety, legal consulting) — where "data goes to OpenAI" = "fine."
- CIS companies with EU presence: they need one solution for two jurisdictions.
- Startups that need a shared assistant base for several internal use cases.
3. The Common Wrong Approach
What we see in 70% of "pilots" started before us:
- Connected one API. When the vendor raises prices — nobody is ready to migrate.
- Feed everything into one big prompt. Lost-in-the-middle hits (Liu et al., 2023) — the model "forgets" the middle.
- Built RAG on embeddings only — exact codes and identifiers are not found.
- No eval set. Impossible to say whether quality improved after the latest "prompt improvement."
- Client documents sit in someone else's cloud without a DPA. Legal review postponed.
4. The Engineering Approach: What's Inside Taris
Architecture — four independent layers:
flowchart LR
subgraph Channels
TG[Telegram Bot]
WEB[Web UI / PWA]
VOICE[Voice]
API[REST API]
end
subgraph Core
GW[FastAPI Gateway]
ORCH[Agent Orchestrator]
DISP[LLM Dispatcher]
KB[KB Service]
AUTH[Auth + RBAC]
end
subgraph Storage
PG[(Postgres + pgvector)]
OBJ[(MinIO / S3)]
LOG[(Audit log)]
end
subgraph Models
LOCAL[Ollama / llama.cpp]
CLOUD[OpenAI / Anthropic / Gemini / YandexGPT]
end
TG --> GW
WEB --> GW
VOICE --> GW
API --> GW
GW --> AUTH --> ORCH
ORCH --> KB --> PG
ORCH --> DISP
DISP --> LOCAL
DISP --> CLOUD
ORCH --> LOG
Each layer is replaceable — that's the key point. Channels are adapters. The model is a plugin. Storage is a backend. The orchestrator is the only place where business logic lives. If OpenAI triples its prices tomorrow, a Taris installation switches with a single config file.
4.1. LLM Dispatcher
class LLMProvider(Protocol):
async def complete(
self,
messages: list[ChatMessage],
*,
max_tokens: int,
temperature: float,
tools: list[Tool] | None = None,
) -> ChatCompletion: ...
Seven concrete providers: OpenAI, Anthropic, Gemini, YandexGPT, OpenRouter, Ollama, llama.cpp. Routing via YAML:
default: openrouter:openai/gpt-4o-mini
routes:
- match: { task: rerank }
use: ollama:bge-reranker-base
- match: { task: summary, locale: ru }
use: yandexgpt:latest
- match: { sensitive: true }
use: ollama:llama3.1:8b
fallback:
- openrouter:anthropic/claude-3-5-sonnet
- ollama:llama3.1:8b
4.2. Hybrid RAG with RRF
Retrieval — three-pass stages:
- Lexical (BM25) — Postgres FTS with language-aware analyser for RU/EN/DE/SL.
- Dense — pgvector cosine, default
text-embedding-3-small, for on-prem —bge-m3. - Metadata boost — exact match on tags (
product,section,last_updated).
Fusion — Reciprocal Rank Fusion:
$$ \text{score}(d) = \sum_{i \in \text{retrievers}} \frac{1}{k + \text{rank}_i(d)}, \quad k = 60 $$
Then cross-encoder rerank (bge-reranker-base) down to top-5. Empirical gain on our internal occupational safety golden set: recall@5 0.71 → 0.88 (RRF vs pure dense), grounding-rate +0.07 after rerank. This is not "slightly better" — it is the difference between "usable" and "give the client their money back."
4.3. Multi-Tenant Postgres with RLS
CREATE POLICY tenant_isolation ON chunks
USING (tenant_id = current_setting('app.tenant_id')::int);
Every connection sets SET app.tenant_id = $1 before querying. It is impossible to accidentally read another client's data: the database itself enforces it.
5. Table: Which Components Are Replaceable
| Layer | Default | Alternative | Switching cost |
|---|---|---|---|
| Embedding | text-embedding-3-small |
bge-m3 |
config + re-index |
| Reranker | bge-reranker-base |
mxbai-rerank |
config |
| Vector store | pgvector | Qdrant | docker-compose + migration |
| LLM | gpt-4o-mini |
claude-3-5-sonnet, llama3.1:8b |
config |
| Channel | Telegram | Web / VK / Slack / WhatsApp | adapter ~200 lines |
| File storage | MinIO | S3 / Nextcloud | config |
| Deployment | Docker Compose | Kubernetes / Nomad | manifests |
6. Sintaris Mini-Case
The Worksafety Superassistant product is an example of Taris in a real deployment. Task:
- Internal assistant on workplace safety regulations (~3000 PDF pages).
- Telegram chatbot for production workers (RU + DE).
- Full on-prem (DPO requirement).
Technical implementation:
- Taris in OpenClaw configuration, two servers: primary (Ubuntu 24.04, 64 GB RAM, RTX 4090) + backup.
- Embedding:
bge-m3, generation:llama3.1:8b-instruct(sufficient for RU/DE), fallbackqwen2.5:14b. - Telegram bot deployed with local NAT — no cloud.
- Golden set: 120 questions with reference citations.
Metrics after 90 days:
- Recall@5: 0.86 on golden set.
- Citation accuracy: 92%.
- Median latency: 1.9 sec.
- LLM cost: €0 (all local).
- Infrastructure cost: ~€120/month for electricity + amortisation.
Details: Worksafety § 6 RAG pipeline and OpenClaw § 8 AI dispatch.
7. Checklist (15 Points) When Choosing an AI Assistant for SMB
- Vendor lock-in verified: can you switch the LLM provider in a week?
- Data — where are client documents physically stored?
- Embeddings — where are they stored? (often forgotten: they are also PII-derived)
- DPA signed with every LLM provider you use.
- Eval set — do you have one, and how many questions are in it?
- Citation — does the system generate source references?
- Grounding rate — is it measured? (if not — nobody knows whether the model is lying)
- Retrieval regression tested after every prompt change?
- Multi-tenant security — RLS at DB level, not "agreed in code"?
- Local models available — is there a Plan B if the cloud is down?
- Cost per token — monitored in real time?
- DSAR + erasure — implemented as code, not a manual procedure?
- Audit log — present, immutable, with the required retention period?
- Channels — is adding a new channel < 500 lines or a core rewrite?
- Documentation — in what language, for whom, how often updated?
8. Risks
- Frontier model temptation. The team gets used to Sonnet, then the client requires on-prem — and the system has to be retrained on 8B. Solution: test on 8B from day one, use frontier only where you've proven the gain.
- Dispatcher complexity. The more providers you support, the more edge cases. Our rule: a new provider only appears when a client requires it.
- Embeddings ≠ portable. When switching embedding models you need to re-index. Budget the time.
- Multilingual quality.
bge-m3is good for EU and RU, but in DE/SL quality is lower than in EN. Check against your golden set.
9. What to Do Next
If you already have an AI assistant and it's time to replace it — we run an AI Audit for €900–4500. If you want to try Taris — there is an AI Pilot over 4–8 weeks for €3000–12000 with a fixed scope. −25% for Slovenian companies from 1 to 30 June 2026 — see packages.
If you'd rather read first — see the KB chapters Taris (full description) and OpenClaw (on-prem topology).
10. References
- Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401.
- Karpukhin, V. et al. (2020). Dense Passage Retrieval for Open-Domain Question Answering. arXiv:2004.04906.
- Cormack, G., Clarke, C., Buettcher, S. (2009). Reciprocal Rank Fusion outperforms Condorcet and individual rank learning methods. SIGIR '09.
- Liu, N. F. et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. arXiv:2307.03172.
- BAAI (2024). BGE-M3: One-stop multi-lingual, multi-functionality, multi-granularity text embeddings.
- pgvector — https://github.com/pgvector/pgvector
- Ollama — https://ollama.com
Sintaris runs AI process audits, AI pilots and Taris deployments for SMBs in the EU and CIS. Discovery call — free, 30 minutes.