The Good Tech Companies - Building a Crash Report Automation for iOS and Android

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Building a crash report automation for iOS and Android by Indrive. Tech. Hi, I'm Vili and Iumbayev. I recently built a system that automatically handles new crash reports for both iOS and Android. It makes tracking and fixing issues a whole lot easier. Why we did this. Manually assigning crashes to the right developers quickly became a tedious and unreliable task.

Starting point is 00:00:28 It was easy to forget something, overlook edge cases or skip over complex issues. We wanted to make the process more predictable and systematic, so that no one on the team had to waste time triaging crashes, and NO critical issue would fall through the cracks. Summary, to get started, you'll need Google tools like CrashLytics, Google Cloud Platform, and Jira. Once your projects are set up in Google Services, configure data transfer from CrashLytics to GCP using the integration page. After that, all crash reports will appear in BigQuery tables. The crash data structures for iOS and Android are nearly identical,

Starting point is 00:01:05 with just a few small differences, which means we can use a single script to process both. So now you have your crashes in BigQuery, which means you can execute some work on this data. You can request all crash data and analyze it as you want on your side. I have chosen Python language and will explain you on this example. Firstly, we need to get all crashes data to be analyzed, be analyzed but if you have large amount of data on over of million users you better to pre-process

Starting point is 00:01:30 all data on Google side, make some aggregations. Plan. One, learn some basic SQL to get crash data from BigQuery. Two, query crash data using Python. Three, get all committers from the repository and merge duplicates. Four, map each issue to a repo file in its owner. Five, create a Gira task for the file owner if a task doesn't already exist. Learn some SQL basics to a good data from BigQuery. BigQuery uses its own SQL dialect, which is similar to standard SQL but offers additional convenience for data analysis. For our integration, we needed to work with the complete crash dataset, but in an aggregated

Starting point is 00:02:09 form. Specifically, we grouped individual crash reports into unique crash signatures and the aggregated relevant data within each group, such as the number of occurrences, affected user count, version breakdown, and more. You can find the SQL script below and test it in your own environment. via the following link, https colon slash slash console, cloud, Google, com, big query as a result, you will get one row per unique issue underscore id, along with the following aggregated fields. Issue underscore titles. A list of all crash titles. This is an array to account for cases

Starting point is 00:02:46 where multiple unique titles exist for the same issue. In the scripting part, we'll select the most frequent one. Blame underscore files, a list of top stack trace files, blamed for the crash. This will be non-empty if the crash occurred in your code base, rather than in system libraries. Blame underscore libraries, a list of libraries associated with the crash. This is also an array constructed for reasons similar to issue underscore titles. Blame underscore symbols, a list of code symbols, functions, methods, where the crash occurred. Like the other fields above, it's an array. Total underscore events, the total number of crash occurrences during the selected time period. Total underscore users, the number of unique users affected. Sometimes a crash may occur only

Starting point is 00:03:33 for a specific group of users. Events underscore info, adjacent array, as a string containing total underscore events and total underscore users broken down by app version. See the example below. Request crashes data from BigQuery using Python. To get started, install the BigQuery Python client library from PIPI. After the installation, create a BigQuery executor. P.Y. File. This module will handle all communication with Google Cloud BigQuery. To start using the script, you'll need just two things. A Google service account JSON credentials file. The name, or id, of your BigQuery project. Once you have these, you can authenticate and start executing queries through the script. Google service account JSON credential. To create a service account, go to Google Cloud Console and assign a

Starting point is 00:04:22 it the BigQuery data editor role. Once the account is created, open it, navigate to the keys, tab, click ad key and choose JSON. This will generate and download a JSON credentials file for the service account. A service account JSON typically looks like this. For testing purposes, you can convert the JSON credentials into a single lanestring and embedded directly into your script. However, this approach is not recommended for production, use a Secrets Manager to securely store and manage your credentials instead. You can also extract your BQ project ID from the project underscore id field inside the credentials JSON. Models. To work with BigQuery data in a type-safe manner, it's useful to define data models that reflect the structure of the query results. This allows

Starting point is 00:05:09 you to write cleaner, safer, and more maintainable code. Below is an example of such model classes, get CrashLytics issues function. And finally, we can fetch data from BigQuery. Add the following method to your existing BigQuery executor class, it will execute the SQL query described earlier in the BigQuery SQL section and return the results parsed into model instances. Now we can execute our SQL request to BigQuery directly from Python. Here's a full example of how to run the query and work with the results. Hooray! Party Popper now that we're able to fetch crash data from BigQuery, we can move on to the next step, taking the top five most frequent crashes and automatically creating Jira tasks for them. Get all comiters of repository and merge them.

Starting point is 00:05:55 Before assigning crash issues to developers, we first need to identify potential owners for each crash. To do that, we'll start by gathering all commit authors from the repository. Since we're using GitHub, we should be aware of a few specific details. Some developers may use multiple email addresses across commits, so we'll need to merge identities where applicable. GitHub often uses no reply emails eG username at users. No reply, GitHub, com. So we'll handle those cases accordingly. The main goal at this step is to extract and normalize the list of Git authors with their names and emails, using the following command. Git log, grep, carrot, author, sort, unique sin the code below. We attempt to match

Starting point is 00:06:40 different commit identities that likely belong to the same person, for example, user at Gmail, Com and user at users, No Reply, GitHub, Com, we also extract and group their names and GitHub user names, where available, for convenience. With the script below, you can launch this process and get a cleaned, de-duplicated list of all committers in the repository. Map each issued to file of repository and file owner. At this point, we have detailed information about our crashes and the users affected by them. This allows us to associate a specific crash with a specific user and automatically create a corresponding Jira task. Before implementing the crash-to-user mapping logic, we separated the workflows for iOS and Android.

Starting point is 00:07:24 These platforms use different symbol formats, and the criteria for linking crash files to issues also differ. To handle this cleanly, we introduced an abstract class with platform-specific implementations, enabling us to encapsulate the differences and solve the problem in a structured way. The specific implementation may vary depending on your primary. project, but the main responsibility of this class is to determine whether a given crash occurred in a particular file. Once this logic is in place, we can proceed to map files to issues and assign them to the corresponding file owners.

Starting point is 00:07:57 All you need to do at this point is update the GitHub main branch property with the link to your own repository. Next, we gather issues in file owners, map files accordingly using the code below, and get a final result, a list of issues sorted by total underscore users in descending order. Create Jira task for a file owner of the crash. At this point, we have everything we need to start creating Jira tasks for crash owners. However, keep in mind that Jira configurations often vary between companies, custom fields, workflows, and permissions may differ. I recommend referring to the official Jira API documentation and using their official Python client to ensure compatibility with

Starting point is 00:08:36 your setup. Here are some practical tips based on our experience. One, don't create tasks for every issue. Focus on the top five to ten issues based on the number of affected users or a certain impact threshold. Two, Persist task metadata. Store information about created tasks in a persistent storage. I use BigQuery, saving data in a separate table and updating it on each script run. Three, recreate closed tasks if the issue reappears in newer versions of the app. This ensures that regressions aren't ignored. Four, link tasks for the same issue to simplify future investigation and avoid duplication. 5. Include as much detail as possible in the task description. Add crash aggregations, affected user counts, versions, etc. 6. Link related crashes if they originate from

Starting point is 00:09:24 the same file. This provides additional context. 7. Notify your team in Slack or another messaging system when new tasks are created or existing ones need attention. Include helpful links to the crash report, task, relevant GitHub files, etc. 8. Add error handling to your script. Use try, accept blocks and send Slack alerts when something fails. Nine, cache slow operations during development. For example, cache BigQuery crash retrievals locally to speed up iteration. 10. Some crashes may involve shared or core libraries. In those cases, you'll likely need to manually assign the task, but it's still helpful to create the Jira issue automatically with full crash context. Conclusion, this system allows us to process thousands of crash reports daily,

Starting point is 00:10:13 and route them to the right developer in just a few minutes, without any manual work. If your team is drowning in uncategorized crash issues, automate it. Hooray thank you for listening to this Hackernoon story, read by artificial intelligence. Visit hackernoon.com to read, write, learn, and publish.

The Good Tech Companies - Building a Crash Report Automation for iOS and Android

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.