OPEN TO WORK · RELOCATION / REMOTE

CASE FILE · NLP-2026 YEREVAN · AM

Albert
Hakobyan

AI ENGINEER · AGENTIC SYSTEMS · NLP · BSDS '26

AI engineer building agentic systems, retrieval-augmented generation, and efficient models for low-resource languages. Capstone author of the first neural system for Armenian participle punctuation: I distill a frontier LLM's judgment into a 48.5M-parameter BiLSTM and mBERT ensemble that runs 1000+ sentences/sec on a laptop CPU, within 2.5% of the teacher, at zero marginal cost.

⚡ ENTER THE LAB → CV ⌁ TRANSMIT A SIGNAL

SKILLS MAPPED

KNOWLEDGE DOMAINS

1ST

NEURAL HY PUNCTUATION SYSTEM

🏆

AUA ACSE RESEARCH POSTER SHOWCASE
WINNER · MAY 8

CASE FILE · NLP-2026

[ CAPSTONE · BSDS 2026 ]

PROTOCOL · DISTILLATION

KNOWLEDGE DISTILLATION FOR ARMENIAN PARTICIPLE
PHRASE PUNCTUATION RESTORATION

From LLM Teacher to Neural Student Models · Akian College of Science and Engineering, AUA

MACRO-F1

0.675

ENSEMBLE

VS TEACHER

−2.5%

p=0.11 N.S.

INFERENCE

1000+

SENT/SEC · CPU

TRAINING DATA

112K

ANNOT. PAIRS

▌ PIPELINE

OSCAR 23.01

22M sent.

→

AFFIX FILTER

4.45M cand.

→

STANZA POS

120K filtered

→

GEMINI 2.5 ★

CoT teacher

→

DISTILL

BiLSTM + mBERT

→

ENSEMBLE

α=0.45/0.55

▌ KEY FINDINGS

Ensemble matches teacher. Macro-F1 0.675 vs Gemini 0.700 (Δ=2.5%). McNemar p=0.11 — not statistically significant.

Student beats teacher on COMMA_AFTER. F1 = 0.495 vs 0.450. Student filtered out teacher noise.

Depth > language specificity. 12-layer multilingual mBERT (0.519) outperforms 6-layer Armenian-specific HyeBERT (0.326).

Zero inference cost. BiLSTM runs 1000+ sentences/sec on a laptop CPU. Teacher costs ≈ $0.003 per sentence.

Teacher-agnostic. Swapping to gemini-3-flash-preview (+5.3pts) requires zero architectural changes.

▌ THE PROBLEM

Armenian participle phrases ending in -ելով / -ալով / -ած require position-dependent punctuation. Errors change meaning.

Նա տեսնելով Արմենին տխրեց
↓
Նա, տեսնելով Արմենին, տխրեց:

▌ BENCHMARK · SHTEMARAN 292

BiLSTM

0.366

HyeBERT

0.326

mBERT

0.519

Ensemble ★

0.675

Gemini

0.700

▌ RULES R1–R5

POSITION	MARK
Intraposition	, p.phrase ,
Pre-position	p.phrase ՝ V
Post-position	V ՝ p.phrase
Adverbial	adv , R1
Relative	, rp , R1

EXPERIMENT · TOKENIZER SURGERY

[ NLP GROUP PROJECT · QWEN2.5-0.5B ]

Grafted 30,766 Armenian tokens onto Qwen2.5-0.5B. Trained custom SentencePiece tokenizers, initialized new embeddings three different ways, then recovered the model with LoRA rank-16 on 500K Armenian lines. Final perplexity 8.33 · token count reduced 78.3%.

Analysis

TOKENIZER
FERTILITY

Benchmarked 9 tokenizers on 25,621 lines / 516,860 words. Spread 6.5×.

Best

2.18

Worst

14.26

Training

CUSTOM
TOKENIZER

Trained 6 SentencePiece variants on 5M sentences. All beat XLM-R.

Fertility

1.67

UNK

Surgery

VOCAB
GRAFTING

Extended Qwen2.5 vocab 151k → 182k. 78.3% token reduction.

New tok.

30,766

Best PPL

24.4K

Recovery

LoRA
FINE-TUNE

Rank-16 adapters across 24 layers, 500K lines, cosine schedule. PPL 8.33.

Rank

Adapter

~9MB

FERTILITY · TOKENS / WORD

Lower is better. We measured how aggressively each tokenizer fragments Armenian text against 516,860 words from CC-100. Our trained SentencePiece BPE-32k beats every baseline by > 23%.

bpe_32k (ours) ★

1.67

XLM-R

2.18

mBERT

2.41

LLaMA-2

3.65

LLaMA-3

4.92

Qwen2.5 (base)

7.81

GPT-2

14.26

▌ QWEN2.5-0.5B — SURGERY REPORT

Base modelQwen2.5-0.5B

Parameters494M

Layers / Hidden24 / 896

AttentionGQA (14Q / 2KV)

Vocab in → out151,665 → 182,431

New tokens30,766

Init strategyHeuristic / FOCUS

LoRA targetQ · K · V · O — ×24

Train data500K Armenian lines

Final PPL (HY)8.33

▌ TEAM — Albert Hakobyan Levon Gevorgyan Robert Gadukyan Silva Vardanyan

ARCHIVE · THE VAULT

[ HYBRID RAG ENGINE · NOW OPEN SOURCE · FREE / LOCAL ]

IN SERVICE · DAILY DRIVER ◈ NOW OPEN SOURCE

HYBRID-RETRIEVAL RAG OVER
MY ENTIRE EDUCATION

Grounded, cited answers drawn from my own Obsidian vault, not the open internet. My private instance indexes a 100GB knowledge base of everything I have studied, with citations back to the exact page or heading. The engine behind it is now public: a config-driven hybrid-retrieval stack anyone can point at their own corpus. Zero running cost.

OPEN SOURCE Advanced-Obsidian-RAG PRIVATE INSTANCE 175K+ chunks · never shipped

CHUNKS INDEXED

4,500+ DOCS · MINE

VAULT SIZE

100GB

OBSIDIAN KNOWLEDGE BASE

KNOWLEDGE DOMAINS

NLP → DEVOPS

RUNNING COST

FREE / LOCAL STACK

▌ RETRIEVAL PIPELINE

HyDE

query expansion

→

DENSE

ChromaDB · top-20

→

SPARSE

BM25 · top-20

→

RRF FUSION

+ domain boost

→

RERANK

cross-encoder · top-5

→

GROUNDED GEN

[n] cites · confidence

▌ ENGINEERING LOG

Serve agents JSON, serve humans HTML. The moment an AI agent tried to "click" the web UI, the vault grew a warm JSON API — one HTTP call now replaces an entire browser session.

Memory-bounded indexing. The sparse index was rebuilt on sparse-matrix foundations, cutting its memory footprint several-fold with zero change in retrieval behavior.

Idempotent ingestion. Content-hashed document IDs make every ingest pass safely re-runnable — new material appends, nothing duplicates, and both indexes rebuild from one source of truth.

The cheapest lever wins. Large-scale corrections are applied as in-place metadata updates rather than re-embedding the corpus — maintenance costs minutes, not compute.

▌ THE MISSION

"What do I know about X? How did I implement Y?" The vault answers from my own materials — with citations back to the exact chapter, page, or heading.

Every answer carries its source.

▌ INSIDE THE VAULT

Lecture notes · 280+ textbooks · passed coursework · current-course slides · code notebooks · OCR'd scans · a self-study software-engineering library — every format parsed into one unified chunk schema.

▌ ONE ENGINE · MANY DOORS

CLI · a Corpus console for ingest, retag and live querying · a warm FastAPI JSON API for agents with schema discovery. Every retrieval knob is a config preset, and a 3-tier eval suite scores retrieval, answers, and confidence calibration.

      ▌ STACK — ChromaDB · bge-small-en-v1.5 (local CPU embeddings) · bm25s · RRF fusion · cross-encoder reranking · HyDE · Tesseract / DeepSeek-OCR · FastAPI · Docker Compose · versioned YAML prompts
    

Repo Advanced-Obsidian-RAG License LGPL-2.1 Deploy docker compose up Corpus 4,500+ docs · 17 code langs

AGENT · ARGUS

[ LANGGRAPH RESEARCH ENGINE · TELEGRAM · CITED REPORTS ]

PROTOCOL · DEEP RESEARCH

ONE TELEGRAM COMMAND IN —
A CITED RESEARCH REPORT OUT

A multi-agent research engine that turns a single question into a verified, cited report in Markdown and PDF. Live multi-source discovery, evidence-first synthesis, a three-judge review panel, and two human-in-the-loop gates — resumable across restarts, running on free-tier models.

GITHUB argus-research-bot

GRAPH NODES

10+

LANGGRAPH ENGINE

REVIEW PANEL

JUDGES · MERGE

HUMAN GATES

PLAN · REPORT

CODEBASE

21.5K

LOC · 39 TEST FILES

▌ RESEARCH GRAPH

INTAKE

quick / deep

→

SCOUT

multi-source

→

⟨ PLAN GATE ⟩

human approve

→

RESEARCH

fetch · digest

→

COMPOSE

parallel writers

→

3-JUDGE PANEL

revise loop

→

⟨ REPORT GATE ⟩

preview

→

MD + PDF

delivered

▌ SYSTEM DESIGN

Evidence-first synthesis. A cheap-tier model reads every fetched document into structured evidence notes — claims, supporting quotes, a 0–5 relevance score, a stance — and section writers compose only from those notes using [n] source IDs. Nothing reaches the report unread.

A plan gate you can trust. The graph pauses after live discovery, so the plan preview shows real sources search actually found. The brief never contains URLs, so there is nothing for the model to hallucinate.

Three judges, one verdict. Grounding, coverage, and precision judges score every draft with family-diverse model routing; a deterministic merge forces section-targeted revision on grounding failures, capped to stay bounded.

Discovery past text. Fetches and transcribes video from YouTube, X, Reddit and Instagram (async yt-dlp, captions or local faster-whisper ASR) and folds transcripts into the same evidence base as crawled articles.

Citation integrity + resumability. A 5-strategy URL verification cascade with domain-trust scoring guards every claim, and each run is checkpointed in SQLite (async) so it survives bot restarts.

▌ THE FLOW

Send one command. Argus scouts, plans, researches, writes, reviews, and delivers — pausing only twice to let you edit, approve the plan and preview the report before it ships. There are options to extend or revise the outputs. Media download feature enables an option to transcribe the media (if no tr. available -> via whisper), and to append the txt content to the research materials

▌ MODEL-TIER ROUTING

Three tiers — cheap · strong · judge — map onto whatever models the proxy exposes. Every call is logged as requested vs. served, with graceful fallback to auto. Swap the whole roster from config.

▌ COMMAND DECK

COMMAND	ACTION
/research	deep cited report
/ask	quick answer
/status	run progress
/fetch url..	download links (media from: youtube/X/instagram/reddit)
/transcript	captions for /find picks
/quality	changes the download quality
/cancel	abort run

      ▌ STACK — Python · LangGraph · python-telegram-bot (async) · Exa · DDGS · arXiv · GitHub · crawl4ai · yt-dlp · faster-whisper · ReportLab · SQLite (AsyncSqliteSaver)
    

Repo argus-research-bot Scale ~21.5K LOC · 39 test files CI on every push Discovery Exa · DDGS · arXiv · GitHub

AGENT · TEACHING ASSISTANT

[ TELEGRAM BOT · AUA NLP PROJECT · LOCAL LLM · MIT ]

APPLIED · DEPLOYED

LECTURE SLIDES IN —
STUDY SESSION OUT

An agentic Telegram bot: upload a lecture PDF, receive a personalized, reviewer-approved deep-work study plan in your inbox · powered by a local Mistral-Nemo 12B — zero API cost, private by default · works on slides from any subject

▌ SIX-AGENT PIPELINE

SLIDE PARSER

PyMuPDF · summary

→

CONCEPT MAP

5–7 core concepts

→

WEB RESEARCH

DuckDuckGo · 3–5 links

→

PLANNER

timed session

→

REVIEWER

QA · pass / revise

→

SMTP · on approval

▌ CAPABILITIES

Human-in-the-loop control. Inline Send / Adjust / Apply-Fixes buttons: free-text feedback revises the plan, or the reviewer's own suggestions are auto-fed back to produce a corrected V2 — looping until the user is satisfied. Email is blocked until the QA verdict is PASS.

Self-reviewing agent. A dedicated reviewer agent audits every plan for timing, grounding in the slides, realism, and completeness before anything ships.

Concept-aware research. Web queries are derived from the LLM-built concept map, so external resources target the lecture's actual ideas — each link delivered with a justification.

Bilingual output. Session plans generated in English or Armenian, chosen during the /plan flow alongside duration, audience, and delivery email.

Local-first serving. Mistral-Nemo Instruct 12B (Q4_K_M GGUF · 32K context) behind an OpenAI-compatible KoboldCpp endpoint — swap models or servers by changing one env var.

▌ UTILITY

Turns any raw lecture deck into a ready-to-run deep-work session — objectives, timed exercises, and vetted external resources — in one Telegram conversation. The study plan a TA would write, generated and quality-checked on a laptop.

▌ COMMAND DECK

COMMAND	ACTION
/plan	full pipeline run
/conceptmap	map slide concepts
/research	standalone web search
/status	pipeline progress
/send	re-send approved plan

▌ STACK

Python · python-telegram-bot (async) · PyMuPDF · DuckDuckGo search · SMTP delivery · KoboldCpp / llama.cpp serving — MIT-licensed and fully reproducible.

INSTRUMENTS

[ STACK · TOOLS · ENVIRONMENT ]

▌ LANGUAGES

Python · R · SQL · T-SQL · DAX

▌ ML / DL

PyTorch · TensorFlow · scikit-learn · Hugging Face · Gymnasium

▌ NLP

NLTK · spaCy · Transformers · mBERT · Trax · Stanza · SentencePiece

▌ DATA

Pandas · NumPy · dplyr · tidyr · Stanza

▌ VISUALIZATION

Matplotlib · Seaborn · ggplot2 · Plotly · Power BI · Tableau

▌ DATABASES

PostgreSQL · MySQL · MongoDB · SQL Server · SQLAlchemy

▌ BACKEND

FastAPI · Pydantic · Docker · Docker Compose

▌ DEV TOOLS

Git · GitHub · Obsidian · Streamlit · MkDocs

▌ APPLIED

Open-source RAG engine (hybrid retrieval · 175K+ chunks) · Agentic Telegram bots (LangGraph · local LLM) · Web scraping · API integration

▌ EDUCATION

AUA

B.S. IN DATA SCIENCE

AMERICAN UNIVERSITY OF ARMENIA · 2022 — 2026 · TRACK: BUSINESS ANALYTICS

24 AUA courses spanning statistics, ML/AI, NLP, RL, time series, BI, marketing analytics, databases, visualization, and mathematical foundations. Capstone research in low-resource NLP.

▌ CERTIFICATIONS

TRANSMIT A SIGNAL.

REPLY TIME · YEREVAN HOURS · LAB ONLINE

EMAIL THE LAB LINKEDIN GITHUB HUGGING FACE CV

Albert
Hakobyan

DOSSIER

CASE FILE · NLP-2026

KNOWLEDGE DISTILLATION FOR ARMENIAN PARTICIPLE
PHRASE PUNCTUATION RESTORATION

EXPERIMENT · TOKENIZER SURGERY

FERTILITY · TOKENS / WORD

▌ QWEN2.5-0.5B — SURGERY REPORT

ARCHIVE · THE VAULT

HYBRID-RETRIEVAL RAG OVER
MY ENTIRE EDUCATION

AGENT · ARGUS

ONE TELEGRAM COMMAND IN —
A CITED RESEARCH REPORT OUT

AGENT · TEACHING ASSISTANT

LECTURE SLIDES IN —
STUDY SESSION OUT

CLASSIFIED · IN THE LAB

DOMAINS

INSTRUMENTS

B.S. IN DATA SCIENCE

Albert Hakobyan

DOSSIER

CASE FILE · NLP-2026

KNOWLEDGE DISTILLATION FOR ARMENIAN PARTICIPLEPHRASE PUNCTUATION RESTORATION

EXPERIMENT · TOKENIZER SURGERY

FERTILITY · TOKENS / WORD

▌ QWEN2.5-0.5B — SURGERY REPORT

ARCHIVE · THE VAULT

HYBRID-RETRIEVAL RAG OVERMY ENTIRE EDUCATION

AGENT · ARGUS

ONE TELEGRAM COMMAND IN —A CITED RESEARCH REPORT OUT

AGENT · TEACHING ASSISTANT

LECTURE SLIDES IN —STUDY SESSION OUT

CLASSIFIED · IN THE LAB

DOMAINS

INSTRUMENTS

B.S. IN DATA SCIENCE

Albert
Hakobyan

KNOWLEDGE DISTILLATION FOR ARMENIAN PARTICIPLE
PHRASE PUNCTUATION RESTORATION

HYBRID-RETRIEVAL RAG OVER
MY ENTIRE EDUCATION

ONE TELEGRAM COMMAND IN —
A CITED RESEARCH REPORT OUT

LECTURE SLIDES IN —
STUDY SESSION OUT