Taris: The AI Assistant That Keeps Data at the Client's Side

2026-06-26 · Sintaris · taris, rag, on-prem, ollama, vendor-neutral, smb, ai-platform

Taris: The AI Assistant That Keeps Data at the Client's Side

TL;DR. Taris is an AI assistant where client data never leaves to the vendor. The baseline principle: the model is a plugin behind a stable interface, not the centre of the architecture. Inside: vendor-neutral LLM dispatcher, hybrid RAG (BM25 + dense + RRF + cross-encoder rerank), multi-tenant Postgres with pgvector, optionally local models via Ollama. This article covers how Taris is built, why it is built that way, and where that delivers value for SMBs in the EU and CIS.

1. The Conflict: "Let's Get GPT-4 and Forget About It"

When a small business owner asks "which AI assistant should we install?", they usually get one of two extremes:

"Get ChatGPT / Copilot Studio / Microsoft 365 — it's all there." Convenient, but: data goes to the vendor, customisation is limited, pricing grows opaquely, migrating to another platform means a full migration.
"We'll build it for you from scratch with LangChain." Slow, expensive, and a year later 60% of the code turns out to be model-migration glue that nobody likes.

Taris is the third path: a productised base (model dispatcher, hybrid RAG, multi-tenant Postgres, channel adapters) that we deploy for the client and leave with the client. Not SaaS. Not "build from scratch." A half-product that is straightforward to adapt.

2. Who This Concerns

SMB teams of 10–500 people who have accumulated documents, regulations, and client histories — but no AI specialists.
Regulated niches (healthcare, occupational safety, legal consulting) — where "data goes to OpenAI" = "fine."
CIS companies with EU presence: they need one solution for two jurisdictions.
Startups that need a shared assistant base for several internal use cases.

3. The Common Wrong Approach

What we see in 70% of "pilots" started before us:

Connected one API. When the vendor raises prices — nobody is ready to migrate.
Feed everything into one big prompt. Lost-in-the-middle hits (Liu et al., 2023) — the model "forgets" the middle.
Built RAG on embeddings only — exact codes and identifiers are not found.
No eval set. Impossible to say whether quality improved after the latest "prompt improvement."
Client documents sit in someone else's cloud without a DPA. Legal review postponed.

4. The Engineering Approach: What's Inside Taris

Architecture — four independent layers:

flowchart LR
  subgraph Channels
    TG[Telegram Bot]
    WEB[Web UI / PWA]
    VOICE[Voice]
    API[REST API]
  end
  subgraph Core
    GW[FastAPI Gateway]
    ORCH[Agent Orchestrator]
    DISP[LLM Dispatcher]
    KB[KB Service]
    AUTH[Auth + RBAC]
  end
  subgraph Storage
    PG[(Postgres + pgvector)]
    OBJ[(MinIO / S3)]
    LOG[(Audit log)]
  end
  subgraph Models
    LOCAL[Ollama / llama.cpp]
    CLOUD[OpenAI / Anthropic / Gemini / YandexGPT]
  end
  TG --> GW
  WEB --> GW
  VOICE --> GW
  API --> GW
  GW --> AUTH --> ORCH
  ORCH --> KB --> PG
  ORCH --> DISP
  DISP --> LOCAL
  DISP --> CLOUD
  ORCH --> LOG

Each layer is replaceable — that's the key point. Channels are adapters. The model is a plugin. Storage is a backend. The orchestrator is the only place where business logic lives. If OpenAI triples its prices tomorrow, a Taris installation switches with a single config file.

4.1. LLM Dispatcher

class LLMProvider(Protocol):
    async def complete(
        self,
        messages: list[ChatMessage],
        *,
        max_tokens: int,
        temperature: float,
        tools: list[Tool] | None = None,
    ) -> ChatCompletion: ...

Seven concrete providers: OpenAI, Anthropic, Gemini, YandexGPT, OpenRouter, Ollama, llama.cpp. Routing via YAML:

default: openrouter:openai/gpt-4o-mini
routes:
  - match: { task: rerank }
    use:   ollama:bge-reranker-base
  - match: { task: summary, locale: ru }
    use:   yandexgpt:latest
  - match: { sensitive: true }
    use:   ollama:llama3.1:8b
fallback:
  - openrouter:anthropic/claude-3-5-sonnet
  - ollama:llama3.1:8b

4.2. Hybrid RAG with RRF

Retrieval — three-pass stages:

Lexical (BM25) — Postgres FTS with language-aware analyser for RU/EN/DE/SL.
Dense — pgvector cosine, default text-embedding-3-small, for on-prem — bge-m3.
Metadata boost — exact match on tags (product, section, last_updated).

Fusion — Reciprocal Rank Fusion:

$$ \text{score}(d) = \sum_{i \in \text{retrievers}} \frac{1}{k + \text{rank}_i(d)}, \quad k = 60 $$

Then cross-encoder rerank (bge-reranker-base) down to top-5. Empirical gain on our internal occupational safety golden set: recall@5 0.71 → 0.88 (RRF vs pure dense), grounding-rate +0.07 after rerank. This is not "slightly better" — it is the difference between "usable" and "give the client their money back."

4.3. Multi-Tenant Postgres with RLS

CREATE POLICY tenant_isolation ON chunks
  USING (tenant_id = current_setting('app.tenant_id')::int);

Every connection sets SET app.tenant_id = $1 before querying. It is impossible to accidentally read another client's data: the database itself enforces it.

5. Table: Which Components Are Replaceable

Layer	Default	Alternative	Switching cost
Embedding	`text-embedding-3-small`	`bge-m3`	config + re-index
Reranker	`bge-reranker-base`	`mxbai-rerank`	config
Vector store	pgvector	Qdrant	docker-compose + migration
LLM	`gpt-4o-mini`	`claude-3-5-sonnet`, `llama3.1:8b`	config
Channel	Telegram	Web / VK / Slack / WhatsApp	adapter ~200 lines
File storage	MinIO	S3 / Nextcloud	config
Deployment	Docker Compose	Kubernetes / Nomad	manifests

6. Sintaris Mini-Case

The Worksafety Superassistant product is an example of Taris in a real deployment. Task:

Internal assistant on workplace safety regulations (~3000 PDF pages).
Telegram chatbot for production workers (RU + DE).
Full on-prem (DPO requirement).

Technical implementation:

Taris in OpenClaw configuration, two servers: primary (Ubuntu 24.04, 64 GB RAM, RTX 4090) + backup.
Embedding: bge-m3, generation: llama3.1:8b-instruct (sufficient for RU/DE), fallback qwen2.5:14b.
Telegram bot deployed with local NAT — no cloud.
Golden set: 120 questions with reference citations.

Metrics after 90 days:

Recall@5: 0.86 on golden set.
Citation accuracy: 92%.
Median latency: 1.9 sec.
LLM cost: €0 (all local).
Infrastructure cost: ~€120/month for electricity + amortisation.

Details: Worksafety § 6 RAG pipeline and OpenClaw § 8 AI dispatch.

7. Checklist (15 Points) When Choosing an AI Assistant for SMB

Vendor lock-in verified: can you switch the LLM provider in a week?
Data — where are client documents physically stored?
Embeddings — where are they stored? (often forgotten: they are also PII-derived)
DPA signed with every LLM provider you use.
Eval set — do you have one, and how many questions are in it?
Citation — does the system generate source references?
Grounding rate — is it measured? (if not — nobody knows whether the model is lying)
Retrieval regression tested after every prompt change?
Multi-tenant security — RLS at DB level, not "agreed in code"?
Local models available — is there a Plan B if the cloud is down?
Cost per token — monitored in real time?
DSAR + erasure — implemented as code, not a manual procedure?
Audit log — present, immutable, with the required retention period?
Channels — is adding a new channel < 500 lines or a core rewrite?
Documentation — in what language, for whom, how often updated?

8. Risks

Frontier model temptation. The team gets used to Sonnet, then the client requires on-prem — and the system has to be retrained on 8B. Solution: test on 8B from day one, use frontier only where you've proven the gain.
Dispatcher complexity. The more providers you support, the more edge cases. Our rule: a new provider only appears when a client requires it.
Embeddings ≠ portable. When switching embedding models you need to re-index. Budget the time.
Multilingual quality. bge-m3 is good for EU and RU, but in DE/SL quality is lower than in EN. Check against your golden set.

9. What to Do Next

If you already have an AI assistant and it's time to replace it — we run an AI Audit for €900–4500. If you want to try Taris — there is an AI Pilot over 4–8 weeks for €3000–12000 with a fixed scope. −25% for Slovenian companies from 1 to 30 June 2026 — see packages.

If you'd rather read first — see the KB chapters Taris (full description) and OpenClaw (on-prem topology).

10. References

Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401.
Karpukhin, V. et al. (2020). Dense Passage Retrieval for Open-Domain Question Answering. arXiv:2004.04906.
Cormack, G., Clarke, C., Buettcher, S. (2009). Reciprocal Rank Fusion outperforms Condorcet and individual rank learning methods. SIGIR '09.
Liu, N. F. et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. arXiv:2307.03172.
BAAI (2024). BGE-M3: One-stop multi-lingual, multi-functionality, multi-granularity text embeddings.
pgvector — https://github.com/pgvector/pgvector
Ollama — https://ollama.com

Sintaris runs AI process audits, AI pilots and Taris deployments for SMBs in the EU and CIS. Discovery call — free, 30 minutes.