The Good Tech Companies - Measuring Non-Linear User Journeys: Rethinking Funnels Metrics in A/B Testing
Episode Date: December 1, 2025This story was originally published on HackerNoon at: https://hackernoon.com/measuring-non-linear-user-journeys-rethinking-funnels-metrics-in-ab-testing. A deep dive int...o user reorders, hidden behavioral patterns, and how aggregated funnels improve A/B test accuracy in non-linear user journeys Check more stories related to product-management at: https://hackernoon.com/c/product-management. You can also check exclusive content about #user-journey, #ab-testing, #data-analysis, #funnel-analysis, #data-driven-decision-making, #user-behavior-analytics, #good-company, #business-metrics, and more. This story was written by: @indrivetech. Learn more about this writer by checking @indrivetech's about page, and for more stories, please visit hackernoon.com. InDrive users create an order, receive bids from drivers, choose a suitable one, waits for the driver to arrive, and then starts and completes the trip. In some tests, conversions at the stages that precede the implemented changes can change in a statistically significant way. In this article, we explain how we investigated these behavioral patterns and, based on them, introduced new metrics that helped make test results more interpretable.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Measuring nonlinear user journeys, rethinking funnels metrics in A, B testing, by Indrive.
Tech. Introduction. In a mature product, it is often difficult to achieve a statistically significant
impact on key business metrics such as revenue per user or the number of orders.
Most changes are aimed at point improvements in the funnel or individual stages of the user journey,
and the impact of such changes on business metrics is usually lost.
in the noise. Therefore, product teams quite often choose a corresponding conversion as the target
metric and design experiments in a way that achieves the required statistical power. However,
from time to time, we notice that funnel metrics do not move in line with the dynamics of top-level
indicators. Moreover, in some tests, conversions at the stages that precede the implemented
changes can change in a statistically significant way. As a result, interpreting such experiments
becomes difficult and the risk of making wrong decisions increases. As an example, consider a service
where a user creates an order, receives offers from different performers, chooses a suitable one,
and waits for the task to be completed. Suppose we have developed a new feature that highlights
the best offer and IS expected to increase the share of orders where a match between the customer
and THE performer occurs. During the experiment, we may observe that the share of successful
orders decreases. The total number of orders and completed orders increases. The share of orders that
received at least one offer decreases. Such a pattern may occur if the user has the ability to return
to previous stages and, for example, repost the order. We discovered similar patterns in our own
experiments. In Indrive, passenger scan proposed their own price, after which they receive offers from
drivers and choose the most suitable one. Many users actively use the bargaining features and,
trying to get a better price, may change the order conditions and create it again. This leads to a
series of orders before a trip actually takes place. Our passenger fulfillment team is responsible for
the user journey from the moment the order is created to the completion of the trip. In this article,
we will explain how we investigated these behavioral patterns and, based on them,
introduced new metrics that helped make test results more interpretable. This article will be
useful for product analysts and product managers who work with products that have a complex
nonlinear user journey, where metric interpretation requires taking behavioral patterns and repeated
user actions into account. How do key metrics and funnel metrics behave? In our product,
the funnel roughly looks as follows. A passenger creates an order, receives bids from drivers,
selects a suitable one, waits for the driver to arrive, and then starts and completes the trip.
Imagine that we launch a small UI change. We show the user a progress bar while searching for
a driver in order to reduce uncertainty. We expect that with the progress bar, users will more often
wait for driver offers and, as a result, make more trips. It is logical to choose the conversion
from order creation to receiving a bid as the target metric for such a test. As a result of the test,
we see rides count, up pointing arrow, not statistically significant increase. Orders count.
Up pointing arrow up pointing arrow, statistically significant increase. CR from order to
to bid, down pointing arrow down pointing arrow, statistically significant decrease.
Dunn rate.
Down pointing arrow down pointing arrow, statistically significant decrease.
We see a slight increase in the number of rides, a statistically significant increase in
the number of orders, but at the same time, a drop in conversion from order creation to
receiving a bid and a decrease in the share of successful trips.
The user interacts with the feature only after creating the order, so at first glance, it seems
that we could not influence the number of created orders. If the test group happened to include
users who tend to create orders more often, the increase in the number of orders could distort
the funnel indicators and explain the positive dynamics in rides. However, a deeper analysis
showed that this was not a randomization issue. After the progress bar appeared, some users who
tended to wait a long time for driver offers began to cancel the order earlier and make another
attempt to take a trip. As a result, the number of reorders increased the most, statistic.
statistically significant growth.
How do reorders affect key and funnel metrics?
After creating an order, a user can drop off at different stages.
If they did not receive offers from drivers, if the offer price was not suitable, or
later if the driver took too long to arrive.
In such cases, some users do not stop trying, but create a new order to eventually get a ride.
We call such repeated attempts reorders.
Instead of the expected linear user flow, we observe repeating cycles, user street to go through
the same scenario several times. When analyzing the efficiency of repeat attempts, we notice that their
success rate is often significantly lower. If users start reordering more often, this affects
all stages of the funnel, including those that precede the actual change. At the same time,
in a number of scenarios, for example, when we encourage users to try again instead of leaving,
we may observe a positive effect on top-level business metrics. Collapsing reorders, our goal is to
understand whether user's intentions, not individual attempts, have started to end in trips more
often. To do this, we needed to give a stricter definition of a trip intention that would
allow us to collapse multiple reorders of one user. After discussions with the teams, we concluded that
two orders should have the following properties in order to be considered as one intention to take a
trip. The pickup and drop-off points of both orders should not differ significantly. The time of order
creation should be close, orders placed within a short interval. The previous order must not have
been completed by a trip. The remaining task was to define threshold values, what should be
considered close in time, and a small route change. Initially, these thresholds were defined
based on business needs, so the first thing we decided to do was tour check how well these
values correspond to real user behavior. We found that. In the case of reordering, users rarely
change the destination point, point B. The pickup point, point A, shifts more often, but in most
cases, insignificantly, by about 50 meters from the original position. Most reorders happen within
the first 10 to 20 minutes. We then fixed points A and B within 500 meters and tried to see what
share O free orders are made no later than X minutes. The initial cutoff suited as well, they cover
more than 90% of reorders, and further increasing the thresholds almost does not affect the coverage
share. In cases where a user creates three or more orders in a row, collapsing ice perform
sequentially. First, the first and second orders are checked and merged, then the second
and third, and so on, as long as the conditions of time and location proximity are met.
Alternatives. As an alternative approach, we considered using a mobile session identifier
to group orders within a single intention. However, this option turned out to be less reliable
for two reasons. A session can be interrupted or stick, for example, when a user places an order,
then takes a trip, and soon creates and completes a new one. In such cases, session boundaries do not
match real behavior. Mobile analytics data is less accurate than backend data. Event times and
their order can be recorded with delays are lost. As a result, we decided not to use the session
identifier as the basis for defining a trip intention. New metrics. As a result, we created
a new entity and defined a rule for forming a unique identifier. The final and adopted name
is aggregated order. Based on this entity, we built several derived metrics. Aggregated
funnel allows us to evaluate conversions without distortions related to reorders and makes
test results more interpretable. Funnels of the first, second, and subsequent attempts help
us understand which actions stimulate users to make a repeat attempt and increase the probability
of its success. Now, let's return to the test we discussed earlier and compare the obtained
values in different approaches. Metric Classic Funnel Aggregated Funnel Interpretation
rides up-pointing arrow, not statistically significant growth, same counting no change
orders up-pointing arrow up-pointing arrow, statistically significant growth, approximately
zero, not statistically significant. The number of intentions hardly change. The growth in orders
is explained by our orders done rate down pointing arrow down pointing arrow statistically significant
drop up pointing arrow not statistically significant growth the shares of successful orders and
successful intentions move in different directions order right pointing arrow bid down pointing arrow
Statistically significant drop, down pointing arrow.
Not statistically significant drop within an intention.
Users began to receive bids less often.
The effect is Clause 2 statistical significance to explain why the aggregated done rate is
growing while the order right pointing arrow bid.
Conversion is falling.
We looked at how exactly users perform reorders.
It turned out that behavior split into two patterns.
Some users began to stop searching faster without waiting for a bid.
Another group, on the contrary, began to raise the price more often when reordering, and such orders were less often cancelled after acceptance.
Additional observations.
CR to price increase after reorder.
Up pointing arrow up pointing arrow, statistically significant growth.
Aggregated bid right pointing arrow done.
Up pointing arrow up pointing arrow, statistically significant growth.
Conclusion.
Sometimes, user interaction with a product cannot be fully described by classic funnel metric.
The observed results may seem contradictory, and in such cases, it is important to use metrics that reflect custom as behavioral patterns or, as in our case, to create new entities that describe reality more accurately.
Thank you for listening to this Hackernoon story, read by artificial intelligence.
Visit hackernoon.com to read, write, learn and publish.
