Disseminate: The Computer Science Research Podcast - Chuzhe Tang | Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly | #8

Starting point is 00:00:00 Hello and welcome to Disseminate, the podcast bringing you the latest computer science research. I'm your host, Jack Wardby. I'm delighted to say I'm joined today by Chuja Tang, who will be talking about his ACN SIGMOD paper, Ad-Hoc Transactions in Web Applications, the Good, the Bad and the Ugly. Chuja is a PhD student at the Shanghai Jiao Tong University. He is interested in database systems and distributed systems, and his research focuses primarily on concurrency control. Zhu Jie, thank you for joining us on the show. Thank you for having me. Can you start off by introducing your work and telling the listeners

Starting point is 00:01:01 what is concurrency control and why is it important for web applications? So we have web applications and we know that web server will typically and most of the time will serve concurrent requests and those requests, when handling those requests, the application will need to access data resource, most likely it is the data in the database systems concurrently. So in order to ensure that this application is serving the clients the right data and also maintaining the consistency among the data, they need to control how the control is concurrent access to the database system. So the mechanism used by the application is called concurrency control.

Starting point is 00:01:52 And most of the time when we talk about concurrency control in the database system community, we're actually talking about the mechanism for coordinating the concurrent access made by database transactions. So speaking of database transactions, it is actually a concept or an abstraction that encapsulates multiple requests to different data items in the database systems. When the client or the application could write a handler for web requests as a database transaction. So the access to the database system will be treated as a single unit of work to the database system. And the concurrent control mechanism by the database system can coordinate this request as if those requests are executed

Starting point is 00:02:41 sequentially. While internally they are executed concurrently in parallel, so that the reasoning of this transaction and, of course, the web application handlers will be much simpler to the developers, while the performance will be improved by the parallelism. My next question is, how do applications today use concurrency control? Do they use these classical database transactions or do they use other approaches?

Starting point is 00:03:11 Well, before I started this project and this paper, I think that my assumption is that this application would just to wrap its business logic into a database transaction and just use them to coordinate the concurrent access. But it turns out that although the applications are using database transaction, but they are also using some add-on construct to coordinate the concurrent access. That's what we call the ad hoc transactions by the title of the paper. And those ad hoc transactions are actually, you can think of ad hoc transactions that are the concurrency control mechanism written and designed and developed by the application developers. And their logic, the code,

Starting point is 00:04:00 sits among the application business logic. So that's why we call them ad hoc. Awesome. So how are these ad hoc transactions used in practice? And what was the primary focus of your research and of the paper? The first question is actually the main focus of the paper. We started the paper because we noticed that there are ad-hoc transactions. So we started to survey these applications to know how the ad-hoc transactions are used in the real applications. For example, how are they constructed? Are they correct? How are their performance compared to database transactions?

Starting point is 00:04:47 So in terms of how are they used? Most of the time they just look like a simply simplified form of database transaction mechanism, but implemented in the application code. Developers could just explicitly write some locking operations to coordinate this access, which is very similar to the classic two-phase locking protocol. Sure. So you mentioned that you studied various different open source applications to investigate

Starting point is 00:05:18 the usage of ad hoc transactions. Which open source applications did you look at? So we actually looked at eight different applications some some of them are maybe familiar with with the listeners for example the discourse forum we select select them according to the github stars so they are the most stars applications in the in different categories including for for example forum and social network and we have we have seen many e-commerce applications actually it's quite surprising we found that uh we have a ruby application java application and python

Starting point is 00:05:56 application and in each category we have a uh e-commerce application that is the most stars one. It's quite interesting, I think. Yeah, definitely. So you covered a wide range of different languages and frameworks and object relational mapping and frameworks. Fantastic. So what did you find when you were studying these different web applications and what do these ad hoc transactions look like in the wild? Can you elaborate on how they differ from the traditional database transaction? Okay, so the first question is the main findings.

Starting point is 00:06:37 So the first one, maybe the most uninteresting one, is that they are commonly used. So we have eight applications and in total we have found 91 adult transactions among these applications. In order to show that these applications use them for some critical functionality, we actually categorize those APIs into critical and non-critical. It's a subjective criterion. But anyway, we find that many of them, like 70-something of these ATO transactions are using APIs that we consider critical. For example, posting API, the upload API, and notification API of some content-centric

Starting point is 00:07:22 applications like forum and e-commerce. And of course, the checkout and cart API for those e-commerce applications. The other findings is that the construction of those adult transactions are, comparing with database transactions, they are quite more flexible. In most of the cases, I think it's simplified. And then we find that those adult transactions are pretty error-prone. Like almost half of them, more than half of them, are actually incorrect. And then we have to find some correcting issues that you can trigger by just deliberately, for example, to add some sleep invocation in the application code, and you just have to browser tab to access this application and you trigger some error.

Starting point is 00:08:10 Finally, we also try to understand the performance of those ad-hoc transactions. So we selected some API, actually four APIs from different applications and modify them to use database transactions instead of the adult transaction and evaluate them in two different settings. One is that we have contention and the other is there are no contention. The point of this evaluation is that to test whether adult transaction might be beneficial in performance to the application. So of course, so that for APIs show that we can have better performance in contentious workload. That doesn't mean that we can always improve the performance, but it does prove that it potentially can be can improve the performance. So if I'm understanding this correctly, when you did this performance evaluation and you

Starting point is 00:09:02 replaced the ad hoc transactions with the traditional database transactions and evaluated the performance difference between the two, are you comparing a correct transaction against a potentially incorrect ad hoc transaction? And is the performance gain not just the benefit of seeing how it might be potentially incorrect at some point? Of course, we have.. Unless we fix all the bugs that we are aware of when comparing the performance. I think definitely if we look deeper, there might be

Starting point is 00:09:33 some more bugs in the adult transaction version, but we have fixed all the ones that we know that will trigger bugs. Actually, for example, when we are trying to build a database transaction version, we remove those, for example, the locking call. And instead, we put a transaction begin through the ORM to wrap the whole business procedure. And in that case, when we are looking

Starting point is 00:09:58 at why the database transaction would abort in this case, but not in the adult transaction, we find that, oh, there is a new bug. Oh, wow. So actually, this will be a very interesting debug technique that potentially could be used in other places. But I can imagine the open source developers were extremely happy when you, or maybe unhappy when you found all of these bugs. And I can imagine they were very grateful for that. So we spoke about how some of these ad hoc transactions are incorrect.

Starting point is 00:10:28 Can we talk about what the root causes of these issues were? Like what were the common mistakes people were making when using ad hoc transactions? Yeah, sure. So if I were to summarize the root cause, I would say that it is caused by developer either not knowing the principles or not following the principles of classical database concurrency control

Starting point is 00:10:52 mechanisms. For example, many of the listeners probably know about two-phase knocking. It actually requires the mechanism to have a growing phase of acquiring locks and a shrinking phase of releasing lock and a shrinking phase you could no longer acquire more locks and of course all the access to the data should happen within the critical section of the lock so that is basically the principles of two-phase knocking and when when developers are trying to, I think, they are trying to mimic the behavior of two-faced knocking, they are not necessarily following these principles. We might, for example, first read the data, and because they need to typically in web applications, you access data by giving the ORM an ID, and the ORM will fetch the whole row into the application.

Starting point is 00:11:47 And then the application developer, they might decide that, oh, well, I can lock the row with some fields by a lock that is identifiable by the content of the field. And in this case, read access actually happened outside of the critical section and it would cause the bug and similarly the this issue could happen with those optimistic ad hoc transactions nice so we we said earlier on that when you did the performance evaluation ad hoc transactions under a high contention workload outperformed the classical database transactions. What was the performance gain? Can we have some numbers to illustrate the gains? We said that earlier that there are four APIs that we tested. So I think the geometric mean of the improvement is like 1.3 times.

Starting point is 00:12:47 And so in terms of why there are performance gains, I think it is because adult transaction, when we're in the corrected version, could actually sometimes, for example, they could perform some final green concurrency other than the database transaction. We know that the database systems typically have, for example, MySQL, Postgres, they will lock the data at the row level.

Starting point is 00:13:19 They can use some row ID or just put a log alongside the data. And inside of transactions, we find that, for example, they actually log the data at the column level. So in one case, in a simplified version, let's say, a row have two columns, C1 and C2. And when the developer just decide that, okay, we have two, we can identify, we have a log that is identified by a string called C1 concatenated by the row ID, and another log is identified by a string C2 and concatenated by also the row ID. So when developer writing Apple transaction access accessing different columns of

Starting point is 00:14:06 the same row, they could actually use those different set of logs. So those access actually don't block each other. And this could potentially improve the parallelism of the whole application. That is one of the reasons, I think. I think the other cases, they are actually, ATO transactions, they are actually, I think, when they are performed in a causal granularity,

Starting point is 00:14:36 they can, for example, database could just lock every row that is being accessed, but the ATO transaction could just use fewer logs to provide a mutual exclusion between different ATO transactions because they know the access pattern. Some rows are definitely accessed before the other rows, so we can just log on the first row using this ID. So the same mutual exclusion is achieved, but fewer logs are required.

Starting point is 00:15:06 And in that case, that database system just aborts because the data logs are acquired in some random order, or there might be some read-multiply-write operation, and some later write, when database transactions try to upgrade the log into a write log, and those concurrent transactions

Starting point is 00:15:40 could just block each other and deadlock, and that would degrade the performance. Has there been any other studies on the use of transactions out in the wild or I mean I believe this is the first survey of ad hoc transactions but are there other studies out there on transactions and how do they find if they exist compared to yours? So, yeah, actually that I think two papers that are quite relevant to this study. The first one is called Federal Concurrency Control. It is published in also in SigMod, but I think it's 2015. Of course, you can tell by the name that Federal Concurrency Control adult transaction,

Starting point is 00:16:24 they're actually pretty much looking at the same, on a higher level, they're looking at the same thing that how application in their own ad hoc or federal way to coordinate the concurrent access. And the federal concurrency control paper actually is looking more specifically into the Ruby on Rails framework and its mechanism in their validation mechanism in ensuring the application integrity. So that paper, that is actually a very interesting paper. I recommend all the listeners, if you are interested, please go read that paper.

Starting point is 00:16:59 The validation, it is actually different, I think, kind of different philosophy in coordinating the concurrent access. In our paper, the IADOT transactions that we studied, they are very similar to database transactions. Like I said, they are the simplified version of the database concurrency control mechanism implemented in the concurrent application level. So that means the application developers are actually explicitly coordinating by, for example, using knocking or using version-based validation to coordinate those access. And in the concurrency control paper, the authors, they are actually, the mechanism is called the validation mechanism. It's not the OCC validation, but actually they are validating on the invariants over the data. That mechanism is like, for example, you specify that the data, there should be no duplicate value on this column.

Starting point is 00:17:53 So the ORM does nothing when into your invariant. Instead, each time you update something, the mechanism would try to ensure that if this value, for example, pre-existing, so if that is the case, I won't update to the new value. So that will cause the duplicate. So that's kind of a different method in coordinating concurrent access in the application level. But I think the core finding are quite at a high level finding i think is that of course they're common and that usually prone to error they are not always correct and we should do something about it so yeah it's very complimentary to your work then awesome so what does all this mean in practice then so why don't people just use database transactions what puts people off from using them?

Starting point is 00:18:47 For the majority case, the developer did not use adult transactions because they want better performance. Because actually, when we are trying to evaluate this application, we actually found many performance issues. We couldn't scale this application to a higher number of concurrent threads. So there are a lot of performance improvements that could come up from other places of the application. So why do they use this? We don't have a very concrete answer, but I have some guesses. Sometimes developers or the application just couldn't use database transaction because this data is spread across different systems.

Starting point is 00:19:22 Sometimes the data is even, it's not just the data, but for example, the more like the abstract resource could be allocated in some external service. You just couldn't use a database transaction to coordinate all this access. We have found some cases that use Redis to store some other data.

Starting point is 00:19:41 So in that case, you have to write a log at the application level. So that's it, it will be the first reason. And the second one is that I would think using database transaction, especially in web applications, might really just degrade the performance and making the application very hard to debug the performance issue.

Starting point is 00:20:01 This is kind of against the claim that I just said that it didn't win for the performance. But actually, the case is that when a developer writes the application, you know, they use ORM, so they don't write directly the SQL. So all the SQLs are actually generated by the ORM frameworks. So we find that the ORM framework just generates too many redundant and inefficient queries. And if you put all those queries into a database transaction, that would just make the database just acquire tons of logs and deadlocks and then degrade the performance. And actually, since you are using anURM, you couldn't just possibly tell whether or how those queries are generated

Starting point is 00:20:48 and where the deadlock causing one are located. So that would be another reason, I think. And for a very minority cases, I think developers are just not aware of that, aware of the existence of database transaction that could just help solve their problem at hand. So where do you see this research, these findings, having the biggest impact? Who do you think this result or these findings are most relevant for?

Starting point is 00:21:17 Who should be listening the closest here? Several parties that may be interested in this work. First of all, I think those people are researching in database concurrency control. I read a slide a few months ago. I think it's from some people from the CMU. It said that database concurrency control research is dead because no one cares about those research. I think he might have some solid point for them. He had pointed out that many private research are researching or optimizing the stop procedures, but the applications are

Starting point is 00:21:56 not using stop procedures. And our finding, of course, we find that that statement is true. But we think of, at least I think that since the application are using adult transactions to coordinate the access to the database transaction, I think there is a lot that we can do. For example, like how can we get those applications to use database transactions? Should we, for example, do some to improve our protocol so that the problem that we have mentioned could be solved, for example, the performance problem and maybe some software engineering research that we could help developers to apply database transactions and diagnose those, for example, data errors. I'm not sure about

Starting point is 00:22:45 software engineering research because I think we have found some many bugs and we have shown some I think it's clear patterns. I have read some paper that just

Starting point is 00:23:02 introduced some bug detectors according to some pattern. Since we have some paper that just introduced some bug detectors according to some pattern. So since we have some pattern here, maybe some researchers could follow on these patterns to develop some new, I don't know, bug detector for web applications. One of those patterns is that we just mentioned, you acquire a log after you read the database rules. So that would definitely be, I think, easy to detect. I really think developers should really consider more about using database transactions so you don't have performance problems. And even though we have measured the performance difference of adult transaction and database transaction,

Starting point is 00:23:49 I have to admit that the scenario, the contentious scenario where adult transaction performs well is actually very kind of extreme, I think, when compared to real-world cases, for example. For example, we have some testing the e-commerce application. To simulate contention, we just have some tens of hundreds of clients buying the same item.

Starting point is 00:24:15 I don't know if, at the same time, I don't really think that is the normal case for the application. So in the normal case, I think we don't have that much contention. And using database transaction does not perform very differently application so in the normal case we i think we don't have that much contention and using database transaction just do not perform very differently from using adult transactions so at least developers shouldn't worry about performance that much i think okay so yeah the real world's not that contentious is what you're saying that's only the extremes where you see these benefits okay nice so leading on from what you were saying there,

Starting point is 00:24:45 what do you have planned for future research? Where do you go next with this? We have one discussion section in the paper that I'm not actually quite satisfied with. I just think that I'm not sure. In general, I think, of course, we can build systems to, for example, either provide new abstraction for using a transaction or using some more adult transaction. But I don't know, better transaction, but not database transaction in the application.

Starting point is 00:25:18 We might be able to build some new abstractions, but that would be a very difficult research. Also, I think I'm not planning to do the bug detecting stuff because that's not what my lab is doing. And I'm actually very interested about the abstraction that could coordinate access to different systems, data access to data in different systems. I think it is actually quite the norm today because you have relational data, and many applications use some semi-relational data, for example, MongoDB, and they have some cache in Redis. And they also have some object storage, for example, MongoDB, and they have some cache in Redis. And they also have some object storage, for example, for video,

Starting point is 00:26:07 for images in object store. And all these storage systems provide different mechanisms for coding the concurrent access, and they don't necessarily provide serializable data transaction, and sometimes they don't have transaction at all. And since the application are storing data in different systems, there must be some way to, I think, better coordinate those concurrent access so the developer don't have to write all the log and unlock operation and of course also eliminated those bugs so my next question is

Starting point is 00:26:48 what was the most i think i kind of i think i know the answer to this question already but i'm going to ask you and see if i'm correct so what was the most interesting but maybe unexpected lesson that you learned while working on ad hoc transactions? Unexpected ones are those, for example, that you can have the rewriter application code to use database transaction, and then you discover new bugs. That definitely increase our bug reports. But I don't really think that is,

Starting point is 00:27:24 I don't consider that the most interesting one because today, after the project and after the paper is finished, I can't really it's hard for me to tell what is the

Starting point is 00:27:40 most interesting part. Looking back at the applications that were super poorly constructed, even those most popular ones, everyone used them, for example. I'm not saying that this course is poorly constructed, but it definitely could be improved. And so that is, although I didn't have any expectation of how the code quality of those applications are. But I can tell that some of those applications that is, for example, some e-commerce application, their code is really not that clean, I think.

Starting point is 00:28:16 Right, yeah, yeah. Okay. But that doesn't count for interesting, though. No, true. But that's kind of what I was expecting. I was expecting you to say something along those lines. Yeah, I was surprised by how prevalent these ad hoc transactions are and how often in these big major applications, so many of them are actually

Starting point is 00:28:37 out of correctness violations, which is always a surprise when you hear that sort of thing. Maybe in the initial month of the project, I'm really surprised by using the add-on transaction. But in the end, it's kind of reasonable because after we find that they're using different storage systems, and for example, the transaction should spend multiple requests, and sometimes the inefficiency of the ORM, it's kind of like it became non-surprising to me now

Starting point is 00:29:09 that they're using AdHawk transactions. Yeah. So I've got two more questions for you. This one, the penultimate one is, what attracted you to research, database, concurrency control in the first place? What attracted you to this area? In fact, I didn't choose to actually choose to decide to study database systems

Starting point is 00:29:34 because that is what my advisor is good at researching. But in terms of concurrency control, I started my first project actually building a concurrent index structure. concurrency control is actually I started my first project with actually building a concurrent index structure and in that process I learned a lot about concurrent programming and proving your concurrent

Starting point is 00:29:56 algorithm is mineralizable. And in that process I really find the I think the reasoning with concurrency actually fascinating to me. And later I read a paper, it's not a paper, it's an article published by I think Leslie Lamport on the CACM. I think it's called Brief History of Concurrency Control or something like that. And in that case, he really explained how he modeled and think of concurrency in terms of some distributed process

Starting point is 00:30:33 and communication and the happen-before relation. And after that, I found, okay, I might continue doing concurrency control research for my PhD. What is the one key thing you want listeners to take away from your research? If the listener goes away and takes one key point away, what would that be? For researchers, we really need to look deep into the actual application that our research is targeting on, instead of those pseudo-benchmark TPCC, YCSB, and trying to improve their performance. We need to really look at the application and how we can improve the application.

Starting point is 00:31:21 And for application developers, conduct consider using database transactions that would be that would save you a lot of trouble and and that and i think we'll end it there thank you so much for for listening and thank you so much juja for coming on the show and it's been a fascinating talk and if you're interested in knowing more about Jusha's work, the links to his paper and all of the relevant materials we've discussed in the show will be placed in the show notes thank you very much for listening and we'll

Starting point is 00:31:54 see you next time Thank you.

Disseminate: The Computer Science Research Podcast - Chuzhe Tang | Ad Hoc Transactions in Web Applications: The Good, the Bad, and the Ugly | #8

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.