Method9 min read

How to measure the real ROI of an AI engagement — no BS

The ROI of an AI engagement isn't a percentage improvement estimated at the end. It's a number set before, measured during, and validated after. Here's how to do it.

The ShiftLab team·

"The engagement brought a significant improvement in our teams' productivity."

That sentence means nothing. "Significant" is an opinion. "Improvement" is vague. "Productivity" is an abstraction.

We hear phrasings like that in almost every AI engagement report we analyze during our diagnostics. And they all share the same problem: they don't let you decide whether the engagement was worth the investment, or whether it should be repeated or expanded.

The real ROI of an AI engagement is measured. Not roughly, not "at the end" — but before, during, and after. Here's our method.

Why measuring is hard (and why it's still essential)

There are three reasons companies avoid precisely measuring the ROI of their AI engagements:

Reason 1: The baseline doesn't exist. How do you measure improvement if you never measured the situation before? Most companies don't know how much time their teams spend on specific tasks. They have a vague impression — "a lot of time" — but no number.

Reason 2: Attribution is complex. When a team's productivity rises, is it thanks to the deployed AI tool? A new hire who joined the team? The season? A parallel process change? Untangling causality is hard.

Reason 3: There's a fear of bad news. If the measurement shows the engagement didn't produce the expected results, it has to be explained. That's uncomfortable. It's easier not to measure.

These reasons are real. They don't justify giving up on measurement — they justify being rigorous about the method.

The 4 types of gains to measure

Before measuring, you have to define what you're measuring. An AI rollout typically produces four types of gains:

1. Direct time savings

This is the easiest gain to measure and the most often underestimated. How long does a task take before the rollout? How long after?

Concrete examples:

  • Writing a sales email: 25 minutes → 5 minutes
  • Preparing a weekly report: 3 hours → 45 minutes
  • Answering a typical customer request: 12 minutes → 2 minutes

On a 10-person team that saves 8 hours a week each, that's 80 hours recovered weekly — that is, 2 FTEs (full-time equivalents) freed up for higher-value tasks.

2. Quality gains

Harder to measure, but quantifiable with the right metrics.

Examples:

  • Error rate in reports: before/after
  • Quote signing rate: before/after
  • Customer response time: before/after
  • Customer satisfaction on interactions: before/after

These metrics need to be defined and tracked before the rollout in order to compare.

3. Capacity gains

When teams save time, what do they do with it? If that time is reinvested in value-adding activities (selling, client development, innovation), the ROI is amplifying.

Example: A sales team that saves 8 hours a week and reinvests them in prospecting. If the usual conversion rate is 20% and the average deal is MAD 50,000, each extra hour of prospecting is theoretically worth MAD 10,000.

4. Risk reduction

Less visible but real. Fewer errors in contracts, better process compliance, reduced delays that created client tension.

These gains are harder to quantify but can be translated into avoided cost (disputes, credit notes, lost customers).

The 3-phase measurement method

Phase 1 — The baseline (before the rollout)

The week before the tool's rollout, we ask teams to precisely measure the time spent on the targeted tasks. Not an estimate — a real measurement, over 5 working days.

Simple tools to capture this baseline:

  • A shared Google Sheet with a list of tasks and a "time spent today" column
  • Toggl or Clockify for automatic tracking
  • A daily note at the end of each day (5 minutes max)

This baseline is uncomfortable to collect because it reveals truths no one wants to see. The finance team that thought it spent "a few hours" on reports discovers it spends 40% of its time there. It's awkward. It's also exactly the information you need.

Phase 2 — Measurement during the engagement

At mid-engagement (after 2 to 4 weeks of using the AI tools), we run an intermediate measurement: the same tracking protocol over 5 days.

This intermediate measurement has two functions:

  • Validate that the expected gains are materializing (if not, adjust)
  • Identify adoption barriers (some team members aren't using the tool — why?)

Phase 3 — Final measurement and ROI report

30 days after the engagement ends, we run the final measurement. 5 days of tracking, then an ROI report comparing baseline, mid-engagement, and 30 days post-engagement.

The ROI report we produce contains:

1. The measured-gains table By task, by team, in hours recovered per week and in equivalent cost (based on the average hourly cost of the people involved).

2. The direct ROI calculation ROI = (Annualized value of gains - Engagement cost) / Engagement cost × 100

Concrete example: A sales workflow modernization engagement at MAD 80,000 produces gains of 10 hours a week on a team of 5 reps (50 hours/week). At an average hourly cost of MAD 150/hour, the annual gains are: 50 hours × 52 weeks × MAD 150 = MAD 390,000. ROI = (390,000 - 80,000) / 80,000 × 100 = 387.5%

3. Documented qualitative gains Team testimonials, before/after quality metrics, incidents avoided.

4. Recommendations for what comes next Which additional automations could be deployed? Which teams would benefit most from an extension?

The traps to avoid

Trap 1: Announcing an ROI before measuring it We refuse to promise an ROI before running the baseline. What we can promise is a rigorous measurement method and an engagement that's adjusted if the gains don't materialize as expected.

Trap 2: Measuring without comparing An isolated number means nothing. "The teams spend 5 hours a week on reporting" is only informative compared to the situation before: was it 5 hours or 15 hours?

Trap 3: Stopping measurement after the engagement AI tool adoption can decline over time if it isn't maintained. A measurement at 60 days post-engagement helps detect an adoption decline and intervene before it's too late.

Trap 4: Confusing usage with adoption "100% of the team has access to the tool" is not "100% of the team uses it regularly." Measure real usage (frequency, volume, prompt quality), not access.

What we guarantee

At ShiftLab, we commit to measuring with you:

  • The baseline before every engagement
  • The gains at mid-engagement
  • The ROI 30 days after delivery

If the measured gains don't match the objectives we defined together at the start of the engagement, we keep working at no extra charge until the goal is reached, or until we have a clear explanation of why it can't be.

It's this obligation of measurable results that makes our approach different. And it's precisely why we insist so much on the baseline — because without it, there's neither obligation nor proof.


Our post-engagement ROI report is included in all our workflow modernization engagements. If you want to understand how we measure before committing, our Operational Diagnostic always starts by establishing the baseline.

Ready to go from reading to action?

A 3-to-5-day diagnostic to identify your company's operational priorities.