➡️ Apply here: Lead AI Inference Engineer
🔔 Monitor #prompt_engineer #team_lead jobs
👩💼 Want to stand out? Improve your resume to appeal to recruiters, hiring managers, and Applicant Tracking Systems. ➡️ Improve your resume
**Job Title:** Lead AI Inference Engineer (100% Remote)
**Company:** Tether.io
**Location:** Georgia (100% Remote)
**About the Job:**
The role involves owning the inference backbone for QVAC’s local AI stack, focusing on the C++ systems layer that ensures fast, reliable, and predictable model execution on user hardware. Key responsibilities include engineering quality at the runtime level, managing startup behavior, memory pressure, throughput/latency balance, and long-session stability. The position requires defining and evolving core abstractions for inference features, enabling the addition of new capabilities without sacrificing performance or maintainability. This role is for individuals who enjoy low-level problem-solving, clear technical ownership, and building trusted production infrastructure for other teams. The work directly supports private, on-device AI experiences and contributes to the technical foundation for QVAC’s next generation of peer-to-peer AI products.
**Responsibilities:**
* Deploy machine learning models to edge devices using frameworks like llama.cpp, ggml, and onnx.
* Collaborate with researchers to code, train, and transition models from research to production.
* Integrate AI features into existing products.
* Manage a cross-functional team (pod) comprising middleware (JS), foundation (C++), QA, and documentation engineers to deliver high-quality results.
* Regularly assess market positioning relative to similar products or platforms.
* Leverage technical architect expertise for robust architectural choices and code quality.
* Ensure stable releases through precise internal release processes.
**Qualifications:**
* Excellent programming skills in C++.
* Strong experience with Llama.cpp and ggml inference engines.
* Good understanding of deep learning concepts and model architectures.
* Experience with transformers, LLMs, and Diffusion Models.
* Demonstrated ability to quickly learn and apply new technologies.
* Experience managing a small, specialized, cross-functional team (pod) of 3-5 people.
* Passion for building products that improve people’s lives.
* A degree in Computer Science, AI, Machine Learning, or a related field, with a proven track record in AI R&D.
**Bonus Points:**
* Extensive experience with Javascript/Typescript.
* Understanding of peer-to-peer (P2P) technology.
* Experience with Vulkan, Metal, or OpenCL.
* Experience with productionizing models.
