Machine Learning Stack Integration Engineer - 115801

Location: Markham, Ontario, CA

Company: Advanced Micro Devices

Apply now

Apply for Job

What you do at AMD changes everything 

At AMD, we push the boundaries of what is possible.  We believe in changing the world for the better by driving innovation in high-performance computing, graphics, and visualization technologies – building blocks for gaming, immersive platforms, and the data center. 

Developing great technology takes more than talent: it takes amazing people who understand collaboration, respect, and who will go the “extra mile” to achieve unthinkable results.  It takes people who have the passion and desire to disrupt the status quo, push boundaries, deliver innovation, and change the world.   If you have this type of passion, we invite you to take a look at the opportunities available to come join our team.



Machine Learning Stack Integration Engineer

The Role:

AMD is looking for people who will be working with Solution Validation & Debug within the Machine Learning Software Engineering group. As a team member responsibility will include working closely on the debug and triage of Machine learning and High-Performance Computing related issues that contribute to the Solution Validation of ROCm Stack.

The Person:

The ideal candidate will bring broad experience in dealing with sophisticated software level issues related to Machine Learning and High-Performance Computing. They are self-motivated, have outstanding problem-solving skills, will thrive in a fast-paced environment, and have a validated ability to collaborate within and across diverse teams.

Key Responsibilities:

  • Work on ROCm stack packaging solution for individual and enterprise level deployment on distributed cloud infrastructure
  • Debug Machine Learning/ High Performance Computing related issues on Radeon Open Compute Stack (ROCm)
  • Develop test contents for sophisticated Machine Learning algorithms on distributed nodes
  • Port High Performance computing application on ROCm
  • Reproduce field defects and develop appropriate tests to prevent future issues.
  • Design, develop and deploy testing tools and automation libraries vital to perform testing.
  • Be responsible for the adoption of tooling and industry standard methodologies by means of advocacy and outreach to help our development communities’ level up.

Preferred Experience:

  • Languages: Python, C, C++, Linux Shell scripting.
  • Frameworks/Libraries: TensorFlow, PyTorch, ONNXRT
  • Tools: Prior experience with Linux, Docker, LLVM compilers, GNU make /CMAKE, Jenkins, Git/Gerrit
  • Understanding of High-Performance Computing application, Machine learning and GPU Programming, MPI Parallel Programming

Academic Credentials:

Bachelor's Degree in Computer Science or related quantitative field. An advanced degree or equivalent practical work experience is a plus.


Markham, ON, Canada



Requisition Number: 115801 
Country: Canada Province: Ontario City: Markham 
Job Function:Design


AMD is an inclusive employer dedicated to building a diverse workforce. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective provincial human rights codes throughout all stages of the recruitment and selection process. Any applicant who requires accommodation should contact

AMD does not accept unsolicited resumes from headhunters, recruitment agencies or fee based recruitment services.


Apply now

Apply for Job

Share this Job