Machine Learning System Architect

Location: Austin, Texas, US

Company: Advanced Micro Devices

Apply now

Apply for Job


What you do at AMD changes everything 
 

At AMD, we push the boundaries of what is possible.  We believe in changing the world for the better by driving innovation in high-performance computing, graphics, and visualization technologies – building blocks for gaming, immersive platforms, and the data center. 
 

Developing great technology takes more than talent: it takes amazing people who understand collaboration, respect, and who will go the “extra mile” to achieve unthinkable results.  It takes people who have the passion and desire to disrupt the status quo, push boundaries, deliver innovation, and change the world.   If you have this type of passion, we invite you to take a look at the opportunities available to come join our team.
 

Machine Learning System Architect

 

THE ROLE:

AMD is searching for Fellow level architects to join the team driving the definition of future AMD Data Center ML Training and Inference solutions.   These technical roles are within the Data Center GPU and Accelerated computing BU at AMD.  These individuals will join a team of seasoned architects & engineers focused on all aspects of ML Training and Inference solutions from SOC to training clusters of 10s of thousands of GPUs both hardware and software with an emphasis on future machine learning networks.   One of the goals of this architecture team is to define solutions which make AMD the preferred at scale Training and Inference solution provider for AMD’s Hyperscale customers.

 

 

THE PERSON:

Ideal candidates will have broad and deep computer system architecture backgrounds, with an emphasis on Machine Learning Training and Inference networks.  Ability to translate state of the art Machine Learning network advancements and bottlenecks to optimized definitions for SOC to Cluster level hardware and software Training and Inference solutions.

 

  • Knowledge & experience with state of the art training & inference networks software
    • Networks: NLG, Recommendation, etc.
    • Communication collectives
  • Knowledge & experience with state of the art training & inference hardware SOC to Cluster level.
  • Exceptional communication skills to address multiple internal and external stakeholders.
  • Skill in analyzing tradeoffs and making analytical, fact-based recommendations that have substantial technical, business, and financial impacts.
  • Ability to guide product definition using data and credible insights developed by expert knowledge, interacting with internal experts, end customers, and strategic partners.
  • Ability to build relationships with strategic customers and partners to ensure that AMD defines and develops leadership Machine Learning solutions that meet the needs of the market and our customers.
  • Collaboration skills to engage a team of engineering, product management and business development professionals to achieve outstanding business results.

 

KEY RESPONSIBILITIES:

  • Partner with SoC architects, aoftware architects, platform architects, product managment, business development, and customers to drive definition of leadership SoC/platform Node / cluster level AMD Machine Learning Training and inference product roadmaps.
  • Document how trends in ML networks will affect / benefit from future ML Training and Inference HW (SOC to Cluster) and SW solutions.
  • Drive competitive analysis with SOC, and other architects.
  • Drive product/solution architecture discussions with one or more Hyperscale customers.

 

PREFERRED EXPERIENCE:

  • Greater than 10 years of experience multiple product development cycles
  • Desired candidates will have experience in most of the areas below:
    • Machine Learning Training and Inference solutions for NLG and or Recommendation at the cluster level
      • ML network / algorithm development / performance analysis a plus
      • HW and SW
    • SOC & Node development
      • SOC
      • Package
      • Module/Board
      • Platform / system including physical considerations
      • Power/Thermal management
    • Cloud node & cluster architecture & Interconnect
    • Virtualization and Security
    • Performance modeling / analysis
    • TCO analysis
    • Deep customer/partner technical engagements

 

ACADEMIC CREDENTIALS:

  • BS/MS/PhD in Electrical Engineering, Computer Science or Computer Engineering, AI / Machine Learning preferred

 

LOCATION:

Santa Clara, California or Austin, Texas.  

#LI-DKDAMD1


Requisition Number: 150624 
Country: United States State: Texas City: Austin 
Job Function: Design
  

Benefits offered are described here.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies or fee based recruitment services. AMD and its subsidiaries are equal opportunity employers. We consider candidates regardless of age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status. Please click here for more information.

Apply now

Apply for Job

Share this Job