The Good Tech Companies - How Evergen Uses TigerData to Scale Its Renewable Energy Monitoring Architecture

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. How Evergan uses Tiger Data to scale its renewable energy monitoring architecture by Tiger Data, creators of timescale DB. Evergan, headquartered in Australia, builds world-class software platforms and products that enable monitoring, control, optimization, and orchestration of a broad range of distributed energy resources and utility scale assets. Evergan's mission is to decarbonize the energy system by facilitating the transition to resilient and renewable energy systems. Its cloud-native approach ensures all stakeholders across the renewable energy chain have access to the information they need to make informed decisions

Starting point is 00:00:40 about their energy usage and production. As Evergan's infrastructure scaled, it sought a time series and real-time analytics platform that could scale with its needs. Here's how Evergen discovered Tiger Data, creators of timescale DB, and how Tiger Data transform it's operations, based on an interview with Evergan's lead software engineer Jose Luis Ordealis Koste. Why Evergan needed a time series database. Renewable energy optimization, says Jose, as a big part of Evergan business and time series data is essential for that. It gets used for reporting and mobile apps that show energy usage over time. Evergan had originally resorted to MongoDB Atlas for Time Series because it had been their

Starting point is 00:01:21 default database for regular non-time series data. Yet Evergan's time. series setup using MongoDB turned out to be cost prohibitive and technically restrictive. Our team did create their own schema on top of MongoDBY storing data in buckets, where each bucket is one day. So, they tried to replicate what timescale DB does, behind the scenes, in MongoDB, which sort of worked, but it also didn't work. The breaking point for MongoDB was the number of devices onboarded into the system and the minute frequency data they generated. We saw that as the data went through, MongoDB became heavier to use because we were patching more data and storing more data at the

Starting point is 00:02:00 same time, says Jose. Technical complexity and scalability constraints. The technical challenge began with raw data storage limits in MongoDB. Evergenhaus hundreds of integrations to ensure compatibility across device manufacturers and at regular intervals, pulls data from each of those integrations and sends it to a Kafka topic where it publishes all the raw samples. Kafka streams, a streaming library for Kafka, read those raw samples and did pre-aggregations in memory for five minute and 30-minute data. Then all that data was stored in MongoDB. We did pre-aggregations in memory, says Jose, because it was too expensive for us to store all the raw observations in MongoDB. MongoDB doesn't really support aggregations on the fly, as timescale DB does with continuous

Starting point is 00:02:47 aggregates. If we wanted to have both the raw data and the aggregations, we had to do all of that manually for each one. Due to the high data volume, their Kafka stream processing service was huge. Evergan ran some 30 instances of that service, which accounted for a large percentage of their usage in Kubernetes. Doing pre-aggregations in memory had even more shortcomings. They had to manually backfill data every day, which was painful. Late arriving data, more than 15 minutes for the five-minute aggregation and one hour for the 30-minute aggregation, got lost because they didn't have the pre-aggregated data in memory anymore. Not having the raw data created lack of transparency for audit trails and debugging. MongoDB performance for time series,

Starting point is 00:03:30 according to Jose, was subpar. Everyone sort of knew that it wasn't ideal, but it was one of those things that you think, at some point we'll fix it, but no one actually did. When we hit scaling issues with MongoDB, this became more important, and we started looking at alternatives. That's when when they started testing databases designed to handle time series data at scale. The database evaluation process, Evergan looked briefly at InfluxDB, but since it was dropping support for the Australia region at the time, that was a no-go from the start. Because all their infrastructure is in AWS and they use other AWS services, Evergan first evaluated Amazon TimeStream. Yet it turned out to Bevery Limited when they tried using it, according to Jose,

Starting point is 00:04:15 presenting several issues at the time. Lacking performance, even fetching data for a day or a week was in the order of two to five seconds, and lack of query or performance tuning. Inability to run a timestream database locally with Docker, while running Postgres and timescale DB locally was easy. Running local tests with timestream required connecting to AWS to create an actual database, then remove it at the end. Unusable for renewable energy forecasting, whereby timestream allowed storing data only up to 15 minutes in the future.

Starting point is 00:04:47 Evergan then tried using timestream for historical time series data and Ritas4 forecasted time series data. They had jobs and logic to move data from Rita Sinto timestream over time, yet data related to devices and sites was still in MongoDB. We had three databases with different types of data, and every time we needed to join those three sources, it was just painful. That was one of the key things we were looking at when we started this evaluation process to be able to house one single database for all our data, time series and non-time series. That was a big selling point for us when it came to timescale DB because it's just Postgres underneath. You can use it as a regular relational database. In the end

Starting point is 00:05:28 you have all these cool features for time series data, notes Jose. Evergan also tried the time series support in MongoDB. They have this newish type of collection in MongoDB that handles time series data. But they were missing a bunch of features that Tiger Data has, like continuous aggregates, retention policies, and compression. The performance also was not as good as what we get with Tiger Data. Discovering and testing Tiger Data, Jose initially found out about Tiger Data through a friend who worked there as a back-end engineer. Upon learning about the product, Jose had applied to work at Tiger Data when he was looking to switch roles from a previous company. Top repair for applying, he deep dove into timescale DB features. Then, when he joined Evergan, he became a big

Starting point is 00:06:13 advocate for using Timescale DB Ashes team began considering it. He already knew its features and that it was built-in-battled tested, boring old technology, which is great for a database. You don't want any surprises there. While evaluating Timescale DB, Evergan leveraged Tiger Data Resources. The official documentation was very comprehensive and easy to understand and follow. The Tiger Data blog also has some really nice discussions about trade-offs of different approaches. That was particularly helpful during the POC. The community Slack channel has also been great. Evergan's proof of concept involved setting up dual rights and dual reads by running MongoDB and

Starting point is 00:06:53 TimeScale DB in parallel. This enabled them to test Timescale DB without disrupting operations or impacting customers. Once they had a few months of data stored in Timescale DB, they made the switch. From a scalability, operational, and developer experience perspective, timescale DB checked all the boxes, delivering ingestion, query performance, flexibility, and ease of use. Greater than, because we have hundreds of thousands of devices, and potentially looking at greater than millions of devices in the future, we need to make sure the ingestion rate is greater than smooth. So far it's been amazing with Tiger Data.

Starting point is 00:07:30 Jose Luis Ordialis Kostcha, greater than lead software engineer at Evergan, because we have hundreds of thousands of devices, and potentially looking at millions of devices in the future, we need to make sure the ingestion rate is smooth. So far it's been amazing with Tiger Data, says Jose. Evergan also appreciated timescale DB's query performance because that data powers real-time customer-facing dashboards. You don't want your users to have to wait five to ten seconds just to see those graphs. Developer tooling availability was a deciding factor as well, because Postgres is such an established player in the market, there are thousands of libraries and tools to work with. So was familiarity with how to query the data. Everyone knows

Starting point is 00:08:12 SQL at some level, even non-technical people. We can give someone in the data science team or customer support team access to timescale DB, and they'll figure out how to query the data, which wasn't the case with MongoDB. Greater than everyone knows SQL at some level, even non-technical people. We can give greater than someone in the data science team or customer support team access to greater than timescale DB, and they'll figure out how to query the data, which wasn't the greater than case with MongoDB. Jose Luis Ordialis Kostcha led software engineer at greater than Evergan's security was also a consideration. Tiger Data's security features met Evergan's requirements. How Evergan uses Tiger data, replacing MongoDB with timescale DB for time series data and

Starting point is 00:08:57 setting up a telemetry service achieved data centralization that wasn't previously possible. We try to encapsulate and isolate access to the one service that ishandling time series data. If you are doing transformations with that data, for example, converting power to energy values, you want to centralize that in a single place so that you don't have every client doing it slightly differently. A goal of this migration was to consolidate all reads and rights to a single place, the telemetry service. It handles time series and relational data, Onda's near exclusive access to the Tiger Cloud, managed timescale DB, instance at Ivergen. One exception is the access they provide to their data science team for exploratory querying of the data. This is where having a

Starting point is 00:09:41 reed replica of the main database, just for that team, helps as it gives access without impacting performance. Evergan also heavily uses Tiger Data's integrated pop SQL-id, says Jose. It's a great tool to explore databases and share queries. We didn't have that with MongoDB. We had to log into the MongoDB Atlas page every time and manually write down our queries. How Tiger Cloud benefits end customers. With MongoDB, Evergan could only store three months of data due to cost constraints. Now with Tiger data, shares Jose, it's up to us to define those retention policies.

Starting point is 00:10:18 Right now we have that set at two years, and we have compression and tiered storage. So it's cheaper to store more data. That's definitely a selling point for our product team, enabling a home energy report for the past year versus one for only the last three months. Query performance was another big win. Before, you had to wait a few seconds when accessing your web app or mobile app to see those graphs. Now it's just under 500 milliseconds, which is really, really good. The data itself is critical for the organization because everything we do has to deal with time series data.

Starting point is 00:10:51 customers definitely need to have that information available. The time series data now handled by Tiger Data is also critical for Evergens Energy Optimization Service. That service includes behind the meter optimization, machine learning optimization that delivers advanced individualized electricity cost minimization, and front of meter optimization, which enables dispatchable assets to be monetized. What adopting Tiger Cloud meant for Evergan? For Evergan, replacing MongoDB with Tiger Cloud Cut-Cube. Brunetti's cluster resource use Sage by more than 50%. Cost savings, efficient compression, tiered storage, and constant access to historical data meant newfound technical and business agility.

Starting point is 00:11:33 Having that freedom to decide how much data to keep, what's useful, and what's not, says Jose, has been a huge win because before there was a hard limit of how much data we could store. Now we have that flexibility to say this old data, we want to keep it in S3. It will be slower to access, but that's fine. Andway want to keep this much data in high-performance storage. Definitely that has been a huge win for the organization because Tiger Cloud is essentially Postgres enhanced. Tiger Cloud also simplified Evergan's stack. Simplifying our architecture, being able to replace all these different specific databases, that's a huge thing for overall complexity of the architecture, which makes extending this in the future easier as well. Accessing all the raw data meant they can reference it for

Starting point is 00:12:18 debugging and trace it to particular devices, which is much easier to do with raw data than with pre-aggregated data. With unconstrained raw data storage, Evergan also gained the benefit of real-time analytics insights. The ability to store all the raw data, explains Jose, means we can create new aggregations that we haven't even considered on the fly. We couldn't do that before. With Tiger data, if tomorrow we decide we need this new data derived from the raw data, we can just create a new aggregate, run the whole backfill process, and that's it, we have it. That's huge for flexibility, where we don't need to predict what we'll need a month or a year from now. Built on familiar postgres, Tiger Cloud had a positive impact on team onboarding. Because everyone knows SQL

Starting point is 00:13:03 at some level, and everyone has worked with SQL at some point in their careers, it was straightforward for newly hired engineers to just jump into the storage code and figure out what wash happening. Future plans using Tiger data. Choosing Tiger data helped Evergan future proof their architecture as they scaled. It is true, you can use Postgres for everything these days. There's an extension for absolutely everything. That was another big factor for the decision, thinking what if we need to store this type of data in the future? Oh yeah, there's this extension already available. Having that possibility, I think it was a big thing for the company, says Jose. With Tiger data, built a data foundation for scalability to support their plans to reach new markets and increase

Starting point is 00:13:47 the amount of devices they have by an order of magnitude. Evergan is also planning to use a feature Tiger data is working on. That would make the data Evergan has in Tiger Cloud available two-third teams that might be using different tooling. Jose's advice to engineers considering the switch, when creating abstractions in your code base, keep everything related to one particular technology isolated from the rest of the code as that made it a lot easier for us to run our dual reds and dual rights experiment because there was a single place in the codebase that we had to go and change for that to happen. I know this is one of those things that you think you'll never need to do, like switch databases, but sometimes it happens. Thank you for listening to this Hackernoon

Starting point is 00:14:27 story, read by artificial intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - How Evergen Uses TigerData to Scale Its Renewable Energy Monitoring Architecture

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.