We are seeking a Prompt Engineer responsible for the end-to-end technical migration workflow for transitioning templates to LLM autoraters. The role requires the use of the client's internal tools to leverage prompt engineering techniques to maximize model performance.
## Responsibilities:
- Utilize Automatic Prompt Generation (APG) tools to create baseline prompts for complex parent-child template clusters.
- Run and supervise Automated Prompt Optimization (APO) tool, review outputs, and flag when APO stagnates or reaches a plateau.
- Manually draft, test, and refine prompts to navigate complex template architectures, overcome anti-patterns, and address edge cases where tooling is insufficient or broken. Solve edge-case scenarios by designing and refining manual prompts.
- Monitor shadowbot runs to ensure sufficient disagreements (between human and LLM ratings) are registered, generated, and tracked.
- Run prompt versions against established gold data to continuously measure autorater quality against the human crowd baseline, calculating accuracy metrics such as F1 scores, precision, and recall.
- Draft technical launch readiness justifications (Launch Certification Documentation) for final approval.