HO - AI Engineer Expert
🚀Key Responsibilities
Architecture & Technical Design
- Own end-to-end architecture for the team‘s RAG, Agentic and Multi-Agent systems - retrieval strategy, agent orchestration patterns (Planner–Executor, Router, Verifier, Reviewer), tool-calling layer and state management.
- Make and defend technology choices (vector DB, embedding model, orchestration framework, observability stack) with explicit trade-off analysis on capability, cost, latency and compliance.
- Set engineering standards (coding conventions, inter-agent contracts, MCP tool schemas, minimum observability bar) and write ADRs for high-impact decisions.
- Design reusable components (retrieval templates, MCP server skeletons, guardrail middleware, evaluation harnesses) so engineers don‘t rebuild foundations for each product.
Hands-on Engineering
- Personally write code for the hardest components: agent orchestration core, tool-calling middleware, complex retrieval, guardrail engine and tricky integrations with bank legacy systems.
- Prototype new architectural ideas and own the hard production incidents the team can‘t crack — hallucination edge cases, retrieval regressions, vLLM latency spikes, schema-breaking tool-call failures.
- Stay hands-on: expect 40–50% of your time on direct coding. This is not a technical PM seat.
Mentorship & Code Review
- Run high-quality code reviews that teach failure modes, cost awareness, observability and maintainability — not just check logic.
- Mentor engineers 1-on-1 on technical growth and design decisions.
Stakeholder Representation, R&D and Reliability
- Be the team‘s technical face with Risk, Compliance, IT Infrastructure and Business - translating regulatory and business constraints into design decisions, and technical trade-offs back to stakeholders.
- Drive technical R&D: evaluate emerging techniques, run benchmarks, separate hype from value, and contribute to the division‘s technical roadmap.
- Own production quality KPIs (hallucination rate, retrieval recall@k, tool success rate, latency, uptime); maintain evaluation frameworks that gate deployments; lead incident response and post-mortems.
- Enforce banking-grade non-functional requirements: auditability, explainability and end-to-end traceability for internal audit and legal.
💼 Core Requirements
Must-Have
- Bachelor‘s degree or higher in Computer Science, AI, Data Science or a related field.
- At least 10 years of professional software engineering, with at least 3 years hands-on production work on LLM systems / RAG / Agentic AI. Shipped at least 2 AI systems to production with real users - not POCs or demos.
- Production-grade Python: async-first (asyncio, aiohttp), Pydantic data modeling, clean modular design.
- Real RAG / Agentic experience: debugged retrieval quality issues for real, understands cosine vs. dot product.
- LangChain / LangGraph: fluent with the state-machine model, custom nodes and practical edge cases.
- Vector databases & LLM serving: non-trivial production pipelines on Qdrant, Milvus or Pinecone (index schema, namespaces, metadata filtering); hands-on deployment of open-source LLMs via vLLM with concurrency, batching and quantization (AWQ/GPTQ) tuned to cost-vs-latency targets.
- MCP & containerization: designed or implemented MCP servers / tool schemas for agent consumption in production; comfortable with Docker, Compose and Kubernetes basics - can package and ship a service end-to-end.
- LLM observability: designed observability for new systems with Langfuse, W&B or LangSmith - not just used existing setups.
- Systems thinking & technical leadership: thinks in failure modes, blast radius and tail latency; writes clear design docs/ADRs; defends decisions with data; has mentored engineers with concrete impact; works effectively with non-technical stakeholders.
Nice-to-Have
- Experience in banking, financial services or other regulated industries.
- Hands-on fine-tuning (LoRA, QLoRA) for domain-specific tasks, especially Vietnamese or financial terminology.
- Experience with the Qwen series in on-premise or air-gapped environments; or production work with Voicebot, OCR/VLM or other multimodal AI.
- Open-source contributions to AI tooling (LangChain, LangGraph, vLLM, Qdrant or similar).