Capstone author of the first neural system for Armenian participle phrase punctuation. I run an LLM teacher over 22M scraped sentences, distill its judgment into a 48.5M-parameter BiLSTM and a fine-tuned mBERT, and run the ensemble at 1000+ sentences/sec on a laptop CPU — within 2.5% of the teacher (Gemini LLM), at zero marginal cost.
Armenian participle phrases ending in -ելով / -ալով / -ած require position-dependent punctuation. Errors change meaning.
Նա տեսնելով Արմենին տխրեց
↓
Նա, տեսնելով Արմենին, տխրեց:
| POSITION | MARK |
|---|---|
| Intraposition | , p.phrase , |
| Pre-position | p.phrase ՝ V |
| Post-position | V ՝ p.phrase |
| Adverbial | adv , R1 |
| Relative | , rp , R1 |
Grafted 30,766 Armenian tokens onto Qwen2.5-0.5B. Trained custom SentencePiece tokenizers, initialized new embeddings three different ways, then recovered the model with LoRA rank-16 on 500K Armenian lines. Final perplexity 8.33 · token count reduced 78.3%.
Lower is better. We measured how aggressively each tokenizer fragments Armenian text against 516,860 words from CC-100. Our trained SentencePiece BPE-32k beats every baseline by > 23%.
"What do I know about X? How did I implement Y?" The vault answers from my own materials — with citations back to the exact chapter, page, or heading.
Every answer carries its source.
Lecture notes · 280+ textbooks · passed coursework · current-course slides · code notebooks · OCR'd scans · a self-study software-engineering library — every format parsed into one unified chunk schema.
CLI · Streamlit study cockpit · warm FastAPI JSON endpoint for agents · Telegram access on the roadmap. The pipeline loads once and stays warm behind every front-end.
Turns any raw lecture deck into a ready-to-run deep-work session — objectives, timed exercises, and vetted external resources — in one Telegram conversation. The study plan a TA would write, generated and quality-checked on a laptop.
| COMMAND | ACTION |
|---|---|
| /plan | full pipeline run |
| /conceptmap | map slide concepts |
| /research | standalone web search |
| /status | pipeline progress |
| /send | re-send approved plan |
Python · python-telegram-bot (async) · PyMuPDF · DuckDuckGo search · SMTP delivery · KoboldCpp / llama.cpp serving — MIT-licensed and fully reproducible.
24 AUA courses spanning statistics, ML/AI, NLP, RL, time series, BI, marketing analytics, databases, visualization, and mathematical foundations. Capstone research in low-resource NLP.