The Good Tech Companies - Measuring Non-Linear User Journeys: Rethinking Funnels Metrics in A/B Testing

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Measuring nonlinear user journeys, rethinking funnels metrics in A, B testing, by Indrive. Tech. Introduction. In a mature product, it is often difficult to achieve a statistically significant impact on key business metrics such as revenue per user or the number of orders. Most changes are aimed at point improvements in the funnel or individual stages of the user journey, and the impact of such changes on business metrics is usually lost. in the noise. Therefore, product teams quite often choose a corresponding conversion as the target metric and design experiments in a way that achieves the required statistical power. However,

Starting point is 00:00:40 from time to time, we notice that funnel metrics do not move in line with the dynamics of top-level indicators. Moreover, in some tests, conversions at the stages that precede the implemented changes can change in a statistically significant way. As a result, interpreting such experiments becomes difficult and the risk of making wrong decisions increases. As an example, consider a service where a user creates an order, receives offers from different performers, chooses a suitable one, and waits for the task to be completed. Suppose we have developed a new feature that highlights the best offer and IS expected to increase the share of orders where a match between the customer and THE performer occurs. During the experiment, we may observe that the share of successful

Starting point is 00:01:24 orders decreases. The total number of orders and completed orders increases. The share of orders that received at least one offer decreases. Such a pattern may occur if the user has the ability to return to previous stages and, for example, repost the order. We discovered similar patterns in our own experiments. In Indrive, passenger scan proposed their own price, after which they receive offers from drivers and choose the most suitable one. Many users actively use the bargaining features and, trying to get a better price, may change the order conditions and create it again. This leads to a series of orders before a trip actually takes place. Our passenger fulfillment team is responsible for the user journey from the moment the order is created to the completion of the trip. In this article,

Starting point is 00:02:09 we will explain how we investigated these behavioral patterns and, based on them, introduced new metrics that helped make test results more interpretable. This article will be useful for product analysts and product managers who work with products that have a complex nonlinear user journey, where metric interpretation requires taking behavioral patterns and repeated user actions into account. How do key metrics and funnel metrics behave? In our product, the funnel roughly looks as follows. A passenger creates an order, receives bids from drivers, selects a suitable one, waits for the driver to arrive, and then starts and completes the trip. Imagine that we launch a small UI change. We show the user a progress bar while searching for

Starting point is 00:02:50 a driver in order to reduce uncertainty. We expect that with the progress bar, users will more often wait for driver offers and, as a result, make more trips. It is logical to choose the conversion from order creation to receiving a bid as the target metric for such a test. As a result of the test, we see rides count, up pointing arrow, not statistically significant increase. Orders count. Up pointing arrow up pointing arrow, statistically significant increase. CR from order to to bid, down pointing arrow down pointing arrow, statistically significant decrease. Dunn rate. Down pointing arrow down pointing arrow, statistically significant decrease.

Starting point is 00:03:31 We see a slight increase in the number of rides, a statistically significant increase in the number of orders, but at the same time, a drop in conversion from order creation to receiving a bid and a decrease in the share of successful trips. The user interacts with the feature only after creating the order, so at first glance, it seems that we could not influence the number of created orders. If the test group happened to include users who tend to create orders more often, the increase in the number of orders could distort the funnel indicators and explain the positive dynamics in rides. However, a deeper analysis showed that this was not a randomization issue. After the progress bar appeared, some users who

Starting point is 00:04:10 tended to wait a long time for driver offers began to cancel the order earlier and make another attempt to take a trip. As a result, the number of reorders increased the most, statistic. statistically significant growth. How do reorders affect key and funnel metrics? After creating an order, a user can drop off at different stages. If they did not receive offers from drivers, if the offer price was not suitable, or later if the driver took too long to arrive. In such cases, some users do not stop trying, but create a new order to eventually get a ride.

Starting point is 00:04:41 We call such repeated attempts reorders. Instead of the expected linear user flow, we observe repeating cycles, user street to go through the same scenario several times. When analyzing the efficiency of repeat attempts, we notice that their success rate is often significantly lower. If users start reordering more often, this affects all stages of the funnel, including those that precede the actual change. At the same time, in a number of scenarios, for example, when we encourage users to try again instead of leaving, we may observe a positive effect on top-level business metrics. Collapsing reorders, our goal is to understand whether user's intentions, not individual attempts, have started to end in trips more

Starting point is 00:05:23 often. To do this, we needed to give a stricter definition of a trip intention that would allow us to collapse multiple reorders of one user. After discussions with the teams, we concluded that two orders should have the following properties in order to be considered as one intention to take a trip. The pickup and drop-off points of both orders should not differ significantly. The time of order creation should be close, orders placed within a short interval. The previous order must not have been completed by a trip. The remaining task was to define threshold values, what should be considered close in time, and a small route change. Initially, these thresholds were defined based on business needs, so the first thing we decided to do was tour check how well these

Starting point is 00:06:06 values correspond to real user behavior. We found that. In the case of reordering, users rarely change the destination point, point B. The pickup point, point A, shifts more often, but in most cases, insignificantly, by about 50 meters from the original position. Most reorders happen within the first 10 to 20 minutes. We then fixed points A and B within 500 meters and tried to see what share O free orders are made no later than X minutes. The initial cutoff suited as well, they cover more than 90% of reorders, and further increasing the thresholds almost does not affect the coverage share. In cases where a user creates three or more orders in a row, collapsing ice perform sequentially. First, the first and second orders are checked and merged, then the second

Starting point is 00:06:51 and third, and so on, as long as the conditions of time and location proximity are met. Alternatives. As an alternative approach, we considered using a mobile session identifier to group orders within a single intention. However, this option turned out to be less reliable for two reasons. A session can be interrupted or stick, for example, when a user places an order, then takes a trip, and soon creates and completes a new one. In such cases, session boundaries do not match real behavior. Mobile analytics data is less accurate than backend data. Event times and their order can be recorded with delays are lost. As a result, we decided not to use the session identifier as the basis for defining a trip intention. New metrics. As a result, we created

Starting point is 00:07:37 a new entity and defined a rule for forming a unique identifier. The final and adopted name is aggregated order. Based on this entity, we built several derived metrics. Aggregated funnel allows us to evaluate conversions without distortions related to reorders and makes test results more interpretable. Funnels of the first, second, and subsequent attempts help us understand which actions stimulate users to make a repeat attempt and increase the probability of its success. Now, let's return to the test we discussed earlier and compare the obtained values in different approaches. Metric Classic Funnel Aggregated Funnel Interpretation rides up-pointing arrow, not statistically significant growth, same counting no change

Starting point is 00:08:19 orders up-pointing arrow up-pointing arrow, statistically significant growth, approximately zero, not statistically significant. The number of intentions hardly change. The growth in orders is explained by our orders done rate down pointing arrow down pointing arrow statistically significant drop up pointing arrow not statistically significant growth the shares of successful orders and successful intentions move in different directions order right pointing arrow bid down pointing arrow Statistically significant drop, down pointing arrow. Not statistically significant drop within an intention. Users began to receive bids less often.

Starting point is 00:08:57 The effect is Clause 2 statistical significance to explain why the aggregated done rate is growing while the order right pointing arrow bid. Conversion is falling. We looked at how exactly users perform reorders. It turned out that behavior split into two patterns. Some users began to stop searching faster without waiting for a bid. Another group, on the contrary, began to raise the price more often when reordering, and such orders were less often cancelled after acceptance. Additional observations.

Starting point is 00:09:26 CR to price increase after reorder. Up pointing arrow up pointing arrow, statistically significant growth. Aggregated bid right pointing arrow done. Up pointing arrow up pointing arrow, statistically significant growth. Conclusion. Sometimes, user interaction with a product cannot be fully described by classic funnel metric. The observed results may seem contradictory, and in such cases, it is important to use metrics that reflect custom as behavioral patterns or, as in our case, to create new entities that describe reality more accurately. Thank you for listening to this Hackernoon story, read by artificial intelligence.

Starting point is 00:10:03 Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Measuring Non-Linear User Journeys: Rethinking Funnels Metrics in A/B Testing

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.