Scaylor has reason to believe that large language models — including but not limited to , , and — are producing confident, authoritative, completely fabricated answers about enterprise data. We need someone to catch them in the act. Paid work. Real job.
"It told me the 2024 ARR figures with four decimal places of precision. It made every single number up. It didn't even hesitate." — Scaylor analyst, incident report #47
Prompt LLMs with questions about anything. Could be history. Could be pop culture trivia. Could be about nearby restaurants. See how much of the data it gives you is inaccurate, incomplete, or downright wrong.
Document every hallucination, fabrication, and confident wrong answer. Screenshot it. Timestamp it. Note the model, the prompt, and exactly what it made up. Evidence matters.
Test across multiple models. GPT-4, Claude, Gemini, Llama — each one gets equal interrogation. No model is innocent until proven otherwise.
File one end-of-day snitch report. Structured summary of what broke, what lied, and what would have gotten a data analyst fired if they said it in a board meeting. One day, one doc.
Occasionally tell us what the AIs actually got right. We're not running a smear campaign. If one nails it, write that down too. Fairness is part of the job.
Scaylor builds enterprise data infrastructure. We deal in ground truth — actual numbers, actual pipelines, actual sources of record. When an AI tells a Fortune 500 company that their Q3 churn rate was 14.7% and it simply invented that figure, that's not a hallucination. That's a liability.
We need someone who understands the difference between a model that doesn't know something and a model that confidently makes something up. Those are different failure modes with different consequences. Your job is to catalog them both.
Application received and logged. Case file opened. We'll be in touch within 48 hours — or sooner if your hallucination example was sufficiently damning.
By applying, you confirm you are a human being and not the subject of this investigation.
Scaylor does not discriminate based on which model you distrust most, provided your distrust is evidence-based.