➡️ Apply here: Applied AI Developer – LoRA/QLoRA Fine-Tuning
🔔 Monitor #backend #python jobs
👩💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume
**Company** is pleased to announce an opening for the position of **Applied AI Developer (LoRA/QLoRA Fine-Tuning)**.
Location: Remote (anywhere)
Type: Contract/Full-Time
Hardware: 1× A100 GPU provided (40 GB or 80 GB)
Role Overview
We are looking for an Applied AI Developer experienced in parameter-efficient fine-tuning (LoRA/QLoRA) of LLMs. You will design, run, and optimize training pipelines on a single NVIDIA A100, with an emphasis on VRAM efficiency, reproducibility, and production-ready artifacts.
Your work will directly improve the adaptability of large models to domain-specific data while keeping costs and hardware requirements manageable.
Responsibilities:
** Fine-tune 7B-13B class models (LLaMA, Mistral, Gemma, etc.) with LoRA/QLoRA.
** Configure quantization-aware training (nf4/int4) and paged optimizers to minimize VRAM use.
** Apply gradient checkpointing, sequence packing, and FlashAttention to maximize throughput.
** Design reproducible training pipelines (PyTorch, Hugging Face Transformers, PEFT, bitsandbytes).
** Run experiments and ablations (different ranks, α values, sequence lengths) and document trade-offs.
** Export fine-tuned checkpoints for inference with vLLM/ExLlama2.
** Build lightweight FastAPI/Flask endpoints to serve models in production.
** Provide evaluation reports on domain performance (loss curves, F1, ROUGE, EM, etc.).
Requirements:
** Strong experience with PyTorch + Hugging Face (Transformers, PEFT, Accelerate).
** Hands-on with LoRA/QLoRA (bitsandbytes, nf4/int4 quantization).
** Deep understanding of GPU memory optimization: optimizer offload, gradient accumulation, ZeRO/FSDP basics.
** Practical knowledge of attention efficiency (FlashAttention, xFormers).
** Ability to explain and control VRAM budget during training.
** Comfort with Dockerized pipelines and Linux CLI (CUDA, NCCL, drivers).
** Familiarity with model serving (vLLM, TGI, ExLlama2) for low-latency inference.
** Solid grasp of evaluation methods for instruction-tuned models.
Nice to Have:
** Experience with RAG systems (LangChain, LlamaIndex).
** Knowledge of RLHF/DPO/ORPO pipelines.
** MLOps: CI/CD, Weights & Biases, MLflow tracking.
** Prior work in domain-specific instruction tuning (health, finance, legal, etc.).
Trial Project (Paid)
As part of the hiring process, candidates will:
** Fine-tune a 7B-9B model with QLoRA on a provided dataset.
** Deliver a LoRA checkpoint, inference API, and VRAM usage report.
** Show reproducibility with documented steps and logs.
Compensation:
** Competitive contract rate based on experience.
** Opportunity for ongoing projects if successful.
How to apply:
Send your resume and cover letter to hr@makyo.co