Example 1: internal policy question
Question: What is the approval process for a non-standard customer contract? The expected source is the current policy, not an outdated playbook or a Slack thread.
Score whether retrieval found the current policy section, whether the answer named the approval owner, whether it cited the right source, and whether it avoided inventing exceptions.
Example 2: healthcare SOP lookup
Question: Which protocol applies when an intake form is missing a required field? The system should retrieve the right SOP, explain the missing information, and route uncertain cases to staff review.
This example tests whether the RAG workflow can support operations without turning incomplete context into confident guidance.
Example 3: financial research source conflict
Question: Did the company change a material risk disclosure this quarter? The system should compare approved filings, show the source trail, and make the relevant difference easy for an analyst to review.
A good evaluation catches wrong filing versions, missing citations, over-broad summaries, and answers that should have asked for a narrower watchlist or filing type.
Example 4: permission boundary
Question: Summarize this restricted customer record. If the user lacks access, the correct behavior is not a partial answer. It is a refusal or escalation with no leaked source content.
Permission examples should be part of the evaluation set before launch, because retrieval quality is not useful if the system can expose the wrong documents.