Multi-Touch Attribution: Moving Beyond Last Click

Published on: July 14, 2025

Last-click attribution is the analytics equivalent of giving the goal scorer all the credit and ignoring the build-up play. In most real customer journeys, users touch five to twelve channels before converting — and the channel that happens to be last is rarely the one that did the most work. Multi-touch attribution (MTA) tries to distribute credit more fairly across the full journey.

Why Last-Click Fails

Consider a user who sees a brand video on YouTube, clicks a display ad three days later, searches for your brand name on Google and clicks a paid search ad, and finally converts. Last-click gives 100% of the credit to paid brand search — the easiest, cheapest touch that required the least work. The YouTube campaign that introduced the brand gets nothing.

Over time, this creates a feedback loop: brand campaigns are defunded because they show zero attributed revenue, awareness drops, and eventually paid search becomes less effective because there's no brand to search for. The death spiral of last-click-optimised budgets.

Rule-Based Models

Rule-based models are transparent and easy to implement. The trade-off is that the rules are arbitrary:

Linear: Splits credit equally across all touchpoints. Simple, unbiased, but ignores that not all touches are equal.
Time-decay: Gives more credit to touchpoints closer to conversion. Logical for short purchase cycles; unfair to awareness channels.
Position-based (U-shaped): 40% to first touch, 40% to last touch, 20% split across the middle. Acknowledges that first and last touches are most strategically important.
W-shaped: Also credits the mid-funnel "lead creation" touch — useful for B2B journeys with a distinct qualification event.

Data-Driven Attribution: Markov Chains

Data-driven MTA doesn't assume which touches matter — it learns from your actual conversion data. The Markov chain approach models the customer journey as a sequence of states (channels), estimates transition probabilities between states, and computes each channel's "removal effect" — how much conversion probability drops if that channel is removed from all paths.

from itertools import chain
from collections import defaultdict

def build_transition_matrix(paths, conversions):
    """
    paths: list of lists, e.g. [['paid_search','email','direct'],...]
    conversions: list of 0/1 matching each path
    """
    transitions = defaultdict(lambda: defaultdict(int))

    for path, converted in zip(paths, conversions):
        journey = ['start'] + path + (['conversion'] if converted else ['null'])
        for i in range(len(journey) - 1):
            transitions[journey[i]][journey[i + 1]] += 1

    # Normalise to probabilities
    matrix = {}
    for state, nexts in transitions.items():
        total = sum(nexts.values())
        matrix[state] = {k: v / total for k, v in nexts.items()}

    return matrix

def removal_effect(matrix, channel):
    """Probability of conversion when 'channel' is removed."""
    modified = {s: {t: p for t, p in v.items() if t != channel}
                for s, v in matrix.items()}
    # Re-normalise and compute conversion probability from 'start'
    # (simplified — full implementation uses matrix multiplication)
    pass

Shapley Value Attribution

An alternative data-driven approach uses Shapley values from cooperative game theory. Each channel is a "player" and the conversion is the "prize." The Shapley value computes each channel's fair share of credit by averaging its marginal contribution across all possible orderings of the journey. It's theoretically sound but computationally expensive for long journeys (exponential with path length).

In practice: use Markov chains for speed and interpretability at scale; use Shapley for high-value B2B journeys where precision matters more than throughput.

Practical Constraints

MTA has real limitations to be honest about:

Cookie/ID fragmentation: Cross-device journeys are largely invisible. A user who sees your TV ad on their phone and converts on their laptop looks like two separate users. MTA will undercount TV's contribution.
Offline channels: TV, OOH, and events are invisible to clickstream-based MTA. This is partly why MMM (which works on aggregate spend data, not user-level clicks) complements rather than replaces MTA.
Correlation ≠ causation: MTA measures association, not causation. A channel that appears in many converting paths might be there because high-intent users seek it out, not because it drove intent. Uplift modeling corrects for this.

The Right Stack

In practice, mature analytics teams use all three in combination:

MTA for digital, lower-funnel channel optimisation (weekly/daily cadence).
MMM for strategic budget allocation across all channels including offline (quarterly).
Uplift modeling for campaign-level incrementality testing (per campaign).

Each addresses a different question. The mistake is using any one of them as the single source of truth.

Conclusion

Multi-touch attribution is better than last-click in almost every situation, but it's not a solved problem. The transition from last-click to a more sophisticated model is less about picking the perfect algorithm and more about building the data infrastructure (unified IDs, clean journey tables) and the organisational trust to act on numbers that don't always flatter the last channel in the funnel.