HO - AI Engineer Expert

HO - AI Engineer Expert

🚀Key Responsibilities

Architecture & Technical Design

  • Own end-to-end architecture for the team‘s RAG, Agentic and Multi-Agent systems - retrieval strategy, agent orchestration patterns (Planner–Executor, Router, Verifier, Reviewer), tool-calling layer and state management.
  • Make and defend technology choices (vector DB, embedding model, orchestration framework, observability stack) with explicit trade-off analysis on capability, cost, latency and compliance.
  • Set engineering standards (coding conventions, inter-agent contracts, MCP tool schemas, minimum observability bar) and write ADRs for high-impact decisions.
  • Design reusable components (retrieval templates, MCP server skeletons, guardrail middleware, evaluation harnesses) so engineers don‘t rebuild foundations for each product.

Hands-on Engineering

  • Personally write code for the hardest components: agent orchestration core, tool-calling middleware, complex retrieval, guardrail engine and tricky integrations with bank legacy systems.
  • Prototype new architectural ideas and own the hard production incidents the team can‘t crack — hallucination edge cases, retrieval regressions, vLLM latency spikes, schema-breaking tool-call failures.
  • Stay hands-on: expect 40–50% of your time on direct coding. This is not a technical PM seat.

Mentorship & Code Review

  • Run high-quality code reviews that teach failure modes, cost awareness, observability and maintainability — not just check logic.
  • Mentor engineers 1-on-1 on technical growth and design decisions.

Stakeholder Representation, R&D and Reliability

  • Be the team‘s technical face with Risk, Compliance, IT Infrastructure and Business - translating regulatory and business constraints into design decisions, and technical trade-offs back to stakeholders.
  • Drive technical R&D: evaluate emerging techniques, run benchmarks, separate hype from value, and contribute to the division‘s technical roadmap.
  • Own production quality KPIs (hallucination rate, retrieval recall@k, tool success rate, latency, uptime); maintain evaluation frameworks that gate deployments; lead incident response and post-mortems.
  • Enforce banking-grade non-functional requirements: auditability, explainability and end-to-end traceability for internal audit and legal.

💼 Core Requirements

Must-Have

  • Bachelor‘s degree or higher in Computer Science, AI, Data Science or a related field.
  • At least 10 years of professional software engineering, with at least 3 years hands-on production work on LLM systems / RAG / Agentic AI. Shipped at least 2 AI systems to production with real users - not POCs or demos.
  • Production-grade Python: async-first (asyncio, aiohttp), Pydantic data modeling, clean modular design.
  • Real RAG / Agentic experience: debugged retrieval quality issues for real, understands cosine vs. dot product.
  • LangChain / LangGraph: fluent with the state-machine model, custom nodes and practical edge cases.
  • Vector databases & LLM serving: non-trivial production pipelines on Qdrant, Milvus or Pinecone (index schema, namespaces, metadata filtering); hands-on deployment of open-source LLMs via vLLM with concurrency, batching and quantization (AWQ/GPTQ) tuned to cost-vs-latency targets.
  • MCP & containerization: designed or implemented MCP servers / tool schemas for agent consumption in production; comfortable with Docker, Compose and Kubernetes basics - can package and ship a service end-to-end.
  • LLM observability: designed observability for new systems with Langfuse, W&B or LangSmith - not just used existing setups.
  • Systems thinking & technical leadership: thinks in failure modes, blast radius and tail latency; writes clear design docs/ADRs; defends decisions with data; has mentored engineers with concrete impact; works effectively with non-technical stakeholders.

Nice-to-Have

  • Experience in banking, financial services or other regulated industries.
  • Hands-on fine-tuning (LoRA, QLoRA) for domain-specific tasks, especially Vietnamese or financial terminology.
  • Experience with the Qwen series in on-premise or air-gapped environments; or production work with Voicebot, OCR/VLM or other multimodal AI.
  • Open-source contributions to AI tooling (LangChain, LangGraph, vLLM, Qdrant or similar).

Nộp đơn ứng tuyển công việc này

Họ & tên bạn *
Địa chỉ email *
Số điện thoại *
Ngày tháng năm sinh *
Trình độ học vấn (Education)  *
Bạn biết đến cơ hội ứng tuyển này qua kênh nào?  *
CV của bạn *
Click để chọn & tải lên CV của bạn
Nộp đơn ứng tuyển