On-site
USA
Posted 3 months ago

Why Here?
Apple Services Engineering builds the scientific foundation for evaluating AI systems across products and services. The team applies psychometric theory, validity frameworks, and statistical methods to create trustworthy measurement standards. They bridge measurement science with AI evaluation challenges, working alongside ML researchers and engineers.

What Will You Do?
As a Measurement Scientist, AI Evaluation Platform at Apple, you will design validity frameworks for AI evaluators like LLM-as-judge and benchmarks. You will develop psychometric methods using item response theory to assess benchmark quality. Additionally, you will create statistical tools for sample-size planning and bias detection in automated evaluations.

To apply for this job please visit jobs.apple.com.