Part 1.5 – Modeling User Journeys (User Flows)

The next step in this sequence would be the modeling the journey of the end user. This can also be called as the user-flow defined as the sequence of events that happen at different components that the request traces through. To be able to model the user journeys, we need to know the features of the e-commerce application that are made available to the users. Some examples could be

  • Searching for Items and saving in wish-list
  • Comparing Items
  • Lookup of Item reviews
  • Add item to the cart
  • Checkout items from the cart
  • Completion (successful or otherwise) of payment

In each of these journeys, there would be multiple services involved (considering an application architected to use microservices design principles). As stated in the earlier articles, the SLIs and SLOs should be defined based on the end-user’s experience. This means that there should clearly defined indicators of the availability and latency of the app when the user goes through any of the above mentioned flows.

The following image shows how the Kiali component in an ISTIO service mesh helps in visualizing the request flow (i.e. the user flow in a specific scenario). This example is provided for the reader to visualize the user flows. Such a service mesh graph would be available only after the services are deployed to the cloud, instrumented and the monitoring tool has enough data to derive the dependency and call graphs. However before the services are deployed, the technical team should be able to come up with one such graph with the help of the product team.

src: https://istio.io/v1.6/docs/tasks/observability/kiali/

Configuring Observability for User Flows

For Kiali or Azure Application Insights to be able to provide the data that helps to calculate the SLIs (availability and latency being the major ones in a customer facing website), all the services in the user flow should be instrumented appropriately and the logs, custom events and traces collected from these services should be centralized so as to perform a correlation of these time-series events.

Other key benefits of User flows

The 2 out of many highly useful features in the application insights service are the Usage Impact and User Flows. These can help in modeling the user journeys and provide us information on how our application is being used and which are the areas that need are possibly exceeding the SLO targets and impacting the end user experience

Explanations (from the MS documentation)

Usage Impact: Impact analyzes how load times and other properties influence conversion rates for various parts of your app. To put it more precisely, it discovers how any dimension of a page viewcustom event, or request affects the usage of a different page view or custom event.

User Flow:

The User Flows tool visualizes how users navigate between the pages and features of your site. It’s great for answering questions like:

  • How do users navigate away from a page on your site?
  • What do users click on a page on your site?
  • Where are the places that users churn most from your site?
  • Are there places where users repeat the same action over and over?

How can the APMs help in deriving the SLO failures

  • If the APM indicates a pattern of users dropping off at a specific point in the site, then that might be the area that needs immediate attention. According to the SRE book, this becomes an SLI by itself. E.g., If the user journeys indicate that the users do not complete the purchase process and drop off at the “Add to Cart” or “ Choose a Payment method” step, then the defined SLIs need to be examined to identify the root cause of the behavior
    • If the application has errored out at this point, then the infrastructure or the application logs will very well indicate the issue. This outage could be for a short span, say a few minutes or might even go on for hours. In all these cases, as the site would go partially or completely down, we will be consuming from the “Availability Error Budget“.
    • If on the other hand, if the users end the flow because of the lack of a pleasant experience, including slowness of the site, then that needs to be handled with the fix of the Latency SLI failures

References

  1. Modeling User Journeys – https://sre.google/workbook/implementing-slos/#none
  2. Usage Impact feature in Application Insights – https://docs.microsoft.com/en-us/azure/azure-monitor/app/usage-impact
  3. User Flows feature in Application Insights – https://docs.microsoft.com/en-us/azure/azure-monitor/app/usage-flows