CozenTech is pleased to present urgent Job opportunity for AI Nvidia Infrastructure – Remote
Why Apply Now?
Due to an immediate hiring need, qualified candidates who apply early will be fast-tracked through the hiring process. If your background and experience align with the role, we strongly encourage you to submit your resume promptly.
Job Title: AI Nvidia Infrastructure Architect
Duration: 6 Months
Location: Remote, USA
Job Description:
This role focuses on managing and optimizing our AI infrastructure, ensuring seamless operations, and providing guidance and training to our team members.
The ideal candidate will have hands-on experience with AI operations, infrastructure management, and a strong understanding of high-performance computing (HPC) environments.
This position emphasizes operational excellence and team education rather than strategic development or workload definition.
Key Responsibilities:
- Manage and maintain AI infrastructure, ensuring high availability and performance.
- Implement and optimize AI operations using tools like NVIDIA Mission Control and RunAI.
- NVIDIA Mission Control helps manage and monitor AI workloads running on NVIDIA systems — like a control center for your AI projects.
- RunAI organizes and shares GPU resources efficiently so multiple users or teams can run AI jobs smoothly.
- Together, they make running, scaling, and managing AI workloads easier and more automated.
- Collaborate with cross-functional teams to support AI workloads and ensure efficient resource utilization.
- Provide training and mentorship to team members on AI infrastructure tools and best practices.
- Monitor system performance and troubleshoot issues to minimize downtime and optimize resource allocation.
- Assist in the deployment and scaling of AI models and applications.
- Stay updated with the latest advancements in AI infrastructure technologies and recommend improvements.
- Document processes, configurations, and best practices for AI infrastructure management.
Required Skills and Qualifications:
- Proven experience in managing AI infrastructure and operations.
- Proficiency with NVIDIA Mission Control/Bright Cluster Manager and Run: AI.
- Proficiency with Linux Operation Systems such as Ubuntu, RHEL.
- Strong understanding of high-performance computing (HPC) environments.
- Experience with cloud platforms and on-premises infrastructure.
- Excellent problem-solving skills and attention to detail.
- Ability to work collaboratively in a team environment and communicate effectively.
- Experience in training and mentoring technical teams.
- Bachelor’s degree in computer science, Engineering, or a related field, or equivalent experience.
Preferred Qualifications:
- Experience with containerization technologies such as Docker and Kubernetes.
- Familiarity with AI frameworks and libraries (e.g., TensorFlow, PyTorch).
- Knowledge of network and storage solutions for AI workloads.
- Familiarity with job scheduling such as SLURM.
Best Regards
if you’re seeking to apply please share your resume to vignesh@cozentech.com



