Machine Learning Engineer
Actively Reviewing the ApplicationsNeoITO
Job Description
AI / ML Engineer – SLM & RAG Specialist
Location: Trivandrum(Kerala)
Company: NeoITO
Experience: 5+ Years
About the Role
NeoITO is hiring an AI / ML Engineer to build and own an AI-powered Proposal & RFP generation system designed to transform meeting notes into structured, client-ready proposals within minutes.
You will be responsible for designing and managing the core AI layer, including the inference engine, RAG pipeline, embedding models, and compliance validation system.
Y
ou will collaborate closely with backend (Node.js) and frontend (React) engineers to deliver a production-ready AI system within a defined delivery timeline.
Key Responsibilities
Model Deployment & Inference
- Deploy and manage Small Language Models (SLMs) on on-premise GPU infrastructure.
- Configure and optimize LLM inference pipelines using frameworks such as vLLM or HuggingFace Transformers.
- Implement token streaming, continuous batching, and optimized sampling strategies for reliable text generation.
- Apply quantization techniques (GPTQ/AWQ) to reduce GPU memory footprint while maintaining inference performance.
- Monitor GPU health and performance metrics including VRAM usage, latency, and throughput
Retrieval-Augmented Generation (RAG)
- Design and implement RAG pipelines to enable context-aware proposal generation.
- Build text chunking pipelines and generate embeddings using sentence-transformer models.
- Store and retrieve vector embeddings using PostgreSQL with pgvector.
- Implement semantic similarity search to retrieve relevant historical proposal data.
- Continuously evaluate and optimize retrieval quality and performance.
AI-Driven Proposal Generation
- Design structured pipelines to generate multi-section proposals including:
- Executive Summary
- Project Scope
- Technical Approach
- Implementation Timeline
- Investment Summary
- Risk Mitigation
- Create section-specific prompts and templates for high-quality generation.
- Implement real-time streaming responses to backend services.
- Support partial regeneration of sections for iterative proposal refinement.
AI Quality, Validation & Compliance
- Develop a validation engine to ensure generated content meets compliance and quality standards.
- Implement rule-based checks including:
- Client name verification
- Budget reference validation
- Section completeness
- Sensitive data detection
- Support an optional AI-based review layer for deeper quality checks.
- Deliver structured feedback and annotations for use within editing workflows.
Prompt Engineering & Model Optimization
- Design and maintain structured prompts for classification, generation, and validation tasks.
- Conduct iterative prompt optimization to improve accuracy, tone, and consistency.
- Maintain prompt versioning and regression testing frameworks.
- Evaluate output quality through structured human evaluation metrics.
Fine-Tuning & Model Improvement
- Lead fine-tuning initiatives to improve model performance over time.
- Prepare and curate training datasets from finalized proposals.
- Implement LoRA / QLoRA fine-tuning strategies for efficient model updates.
- Track experiments and model versions using tools such as MLflow.
Collaboration & Engineering Practices
- Expose AI capabilities via FastAPI services consumed by backend applications.
- Collaborate with backend teams on job orchestration, queue processing, and event streaming.
- Implement unit tests and quality checks for ML pipelines.
- Contribute to containerized deployment environments using Docker.
- Support CI/CD pipelines with automated testing and linting workflows.
Required Skills & Experience
Large Language Models & AI Systems
- Hands-on experience with LLMs or SLMs
- Experience deploying models using vLLM, HuggingFace Transformers, or similar frameworks
- Knowledge of quantization techniques and inference optimization
RAG & Vector Search
- Experience building Retrieval-Augmented Generation pipelines
- Knowledge of vector databases such as pgvector, FAISS, or similar
- Familiarity with embedding models and semantic search
Programming & Frameworks
- Strong Python development experience
- Experience with FastAPI, Pydantic, and PyTorch
- Knowledge of libraries such as sentence-transformers, LangChain, or LlamaIndex
Infrastructure & GPU Systems
- Experience working with GPU-based model deployment
- Familiarity with CUDA environments and GPU monitoring
- Experience deploying applications with Docker on Linux environments
Databases & Storage
- Experience with PostgreSQL
- Familiarity with vector extensions or vector search databases
- Knowledge of object storage solutions such as S3 or MinIO
MLOps & Model Lifecycle
- Experience with LoRA / QLoRA fine-tuning
- Familiarity with experiment tracking tools
- Knowledge of dataset preparation and model evaluation
Nice to Have
- Experience working with Meta Llama models
- Familiarity with document generation systems
- Experience with queue-based ML pipelines
- Exposure to secure enterprise environments requiring strict data governance
- Knowledge of observability tools such as Prometheus
In this role, you will:
- Deliver a fully functional AI proposal generation system running entirely on-premise
- Achieve high-quality, structured proposal outputs
- Ensure stable performance under concurrent usage
- Establish a foundation for continuous model improvement through fine-tuning
Tech Stack
Primary Language: Python
API Framework: FastAPI
LLM Inference: vLLM / Transformers
Embedding Models: Sentence Transformers
Vector Database: PostgreSQL + pgvector
GPU Infrastructure: NVIDIA GPU environments
Containerization: Docker
Monitoring: Prometheus
Testing: Pytest
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
Process Associate Order Management - Health Care (Voice)
IBM
Remote Software Engineer (Rust)
Turing
Planning Engineer
Saudi Tashyeed Company
SW Engineer (1-2 years experience - Java, JavaScript, SQL, Spring Boot, Python, and NoSQL Database)
Visa
Junior Data Scientist / Mathematical Model Developer
Volcore
Share
Quick Apply
Upload your resume to apply for this position