CppCast - AI Infrastructure

Starting point is 00:00:00 Episode 359 of CppCast with guest Ashot Vardanyan recorded 24th of April 2023. This episode is sponsored by JetBrains, smart IDEs to help with C++, and Sonar, the home of clean code. In this episode, we talk about some new blog posts, the annual developer survey, and Compiler Explorer support in Sonar. Then, we are joined by Ashot Vardanyan. Ashot talks to us about AI and improving the infrastructure that it runs on. Welcome to episode 359 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Timo Dummler, joined by my co-host, Phil Nash. Phil, how are you doing today?

Starting point is 00:01:13 I'm all right, Timo. Just back from the ACCU conference last week, which I know you were at too, because I saw you there. So how about you? Back at home in Finland now? Yeah, exactly. I arrived last night. I'm still quite tired from the whole thing, but kind of recovering. Yeah, it was an awesome conference. Yeah, it seems to have been back to sort of almost normal levels. Last year was definitely way down on attendance.

Starting point is 00:01:36 So it's interesting to see conferences getting more back to normal. So quite hopeful for C++ on C this year. All right. At the top of every episode, I'd like to read a piece of feedback. This time, we got an email from Peter, who was commenting about episode 356 with Andreas Weiss about safety-critical C++. Andreas had said that V model implies waterfall. Peter writes, actually, it's not a very strong implies. Waterfall is a way to organize activities that implies a strict

Starting point is 00:02:05 ordering and stage gates, etc. The V model talks about appropriate levels of system decomposition and having tests that correspond to the elaborated requirements at each level. The two concepts are orthogonal to each other, although many organizations using the V model do follow waterfall. As a counter example, there is a standard called AAMI-TIR45, called Guidance on the Use of Agile Practices in the Development of Medical Device Software, that describes how to practice agile development in a way that the FDA will accept. This features the V model, but no waterfall. Well, thank you very much, Peter, for the clarification. It's much appreciated, and we will put the link to AAMI-TIR45 in the show notes.

Starting point is 00:02:46 I'm sure everybody's familiar with that document already, but yeah, we'll put it in the show notes. Talking of feedback, we actually got quite a bit of feedback, still more feedback about the ongoing RSS feed saga. It does seem that it wasn't quite as fixed

Starting point is 00:03:01 as I thought it was last time. And the weird thing is that different people were seeing different behaviors. So some weren't seeing the last episode with Matthew Benson. Some weren't seeing the episode before that with Herb Satter. Some were seeing that one twice, and some weren't seeing either of them. So after a bit of digging, I discovered that both of those episodes were missing a GUID, which is actually not strictly

Starting point is 00:03:25 required by the rss spec but many clients do rely on it and you can see why that may have you know different uh different behaviors and different clients so hopefully that explains it all now i've added those guides back in as well as an extra check just to to make sure that doesn't happen again hopefully and i've checked with all of the clients that i know about and they all now seem to be complete and up to date everywhere so again sorry about that but um if you do still see anything not quite right with the feed uh do please continue to let us know we want to make sure that we get it right that also means that some people may now have one or two older episodes added back in their feed so um if you don't recognize the episode with herb sutter or matthew benson or you think you may have missed an episode somewhere do have a look

Starting point is 00:04:11 back in your podcast player to see if it's back in your history unplayed you may have a bonus episode so maybe there's a there's a plus side to this as well but hopefully that's all sorted now and thank you phil for fixing all of this I don't really know how any of this works. Me neither, apparently. Really appreciate you sorting that out. We'd like to hear your thoughts about the show. You can always reach out to us on Twitter or Mastodon or email us at feedback at cppcast.com.

Starting point is 00:04:39 Joining us today is Ashot Vardanyan, also called Ash. Ash is the founder of Unum and the organizer of Armenia's C++ user group. His work lies in the intersection of theoretical computer science, high-performance computing, and systems design, including everything from GPU algorithms and SIMD assembly for x86 ARM to drivers and Linux kernel bypass for storage and networking I.O. Ash, welcome to the show. Hello, guys.

Starting point is 00:05:05 Happy to be here. Good to have you here. So your bio is really quite interesting, but I did actually found another bio from you with most of the same information on your GitHub page. It also says there that you're an artificial intelligence and computer science researcher. And it also says, and that really caught my attention, you have a background in astrophysics. Now I have a background in astrophysics too. So I'm very curious what your astrophysics background is all about. Well, if only I remembered much. So for some time, I was really curious about theoretical physics, but then I didn't feel like I'm smart enough to

Starting point is 00:05:40 do it for a lifetime. I didn't feel like I can contribute this much. For some time, almost in the free time, I was building up some simulation software. And back then, people were just using packages like Root and many others that were written at CERN for physics simulations and stuff. And I always kind of went the opposite direction. So I approached all of my researchers, advisors, whatever. I convinced them that instead of using an existing package for all kinds of simulations, I'll just rewrite everything for GPUs. And then when I came back to my advisors and showed them that I can run their simulation on a laptop faster than they do it on the cluster, they were all kind

Starting point is 00:06:24 of shocked. And then I kind of started combining all of this with some of my theoretical computer science research and decided it's time to leave the university and just focus on AI for the rest of my life. That is such a cool story. So I also did do some astrophysical simulations in my time. I remember there was lots of horrible Fortran code written by Soviet professors in the 70s. But I never had any brilliant ideas like that. But yeah, so that's quite fascinating what you're saying about the GPU stuff. So you will get more into your bio and your work in just a few minutes.

Starting point is 00:06:58 But we do have a couple of news articles to talk about. So feel free to comment on any of these, okay? Sure, sure. All right. So the first thing, upcoming conferences. to talk about. So feel free to comment on any of these. Okay. Sure. Sure. All right. So, so the first thing, upcoming conferences. So you received an email from Inbar Levy, who is one of the organizers of Core C++. And she writes, our conference Core C++ is approaching fast and we're really excited about it. I was wondering if you could mention it in your podcast. Well,

Starting point is 00:07:23 dear Inbar, yes, we can. Core C++ is taking place in Tel Aviv, Israel. It's actually in a new venue, although the old venue was also pretty awesome. So I'm curious what the new venue is like. From June 6th until 7th with workshops on June 5th and 8th. They have amazing speakers. The keynotes will be given by Bjarne Strohstrup and Daisy Holman. And tickets are available at corecpp.org. And I received another email from Meeting C++ that was not specifically for CPPcast.

Starting point is 00:07:53 It was kind of just a regular mailing that they sent out. But it was also very exciting, actually, because they have announced the dates for Meeting C++ 2023. So that's a big conference in Berlin. It's going to take place from the 12th to the 14th of a big conference in Berlin. It's going to take place from the 12th to the 14th of November this year in Berlin. And be aware, this is Sunday to Tuesday.

Starting point is 00:08:11 It's not Thursday to Saturday as it has always been in the past. It will be a hybrid conference with three tracks on site, one pre-recorded online track. And they have announced two keynotes already, one by Kevin Henney

Starting point is 00:08:22 and another by Lydia Pinscher. And the third keynote, the closing keynote, will be announced later. So that's good to hear that that conference is also coming back. Actually, both of them are coming back this year. Been to both of them. They're pretty awesome, both of them. I think, Phil, you've also been to...

Starting point is 00:08:39 Have you been to Core C++? I went to the first one, yes. Not been to one since, unfortunately. Yeah, I was at the one last year as well. It was also pretty awesome. So just for completeness sake, there are a few more upcoming C++ conferences in the next couple of months. I just want to briefly mention

Starting point is 00:08:53 them as well to kind of remind everyone that this is happening. So there's C++ Now in Aspen, Colorado, which is just a couple of weeks away at this point. It's from the 7th through the 12th of May. And it's capped at 140 participants, And they sent out a mail blast as well this week saying that they still have 20 slots left. So if you want to grab one of the last slots to go to C++ Now in beautiful Aspen, Colorado, buy your ticket now. And there's obviously CBP on C,

Starting point is 00:09:22 which is your conference, Phil. Do you want to talk about that one? Of course, yeah. So the full schedule, by the time this airs, that should be available. We'll be going live shortly after we've recorded this. And we'll also announce our third keynote speaker. So like, again, meetings, we hold one of them back until a bit later. So a little bit of news there as well. But the rest of the schedule will all be online by the time you hear this.

Starting point is 00:09:47 Right. And then there is the one in Madrid coming up actually this week. So by the time you hear it, the conference is already going to be happening. So it's probably too late to direct people to that one. But there's also the Italian TPLOS conference in Rome on the 10th of June. That's also not very far away. And finally, there's also CPP North in Toronto, Canada, coming up 17th to 19th of July. So that's another one I'm really looking forward to.

Starting point is 00:10:17 Okay, enough conferences. There's another thing. There is the 2023 Annual C++ Developer Survey, which is now out. It is an annual survey by the ISO C++ Standards Committee and the annual C++ Developer Survey, which is now out. It is an annual survey by the ISO C++ Standards Committee and the Standard C++ Foundation. And they would really appreciate your feedback to share your experiences as a C++ developer.

Starting point is 00:10:35 It only takes 10 minutes to complete. So please participate if you have 10 minutes to spare. You can do so on surveymonkey.com slash r slash isocpp-2023. This link will be in the show notes on tppcast.com. And a summary of the survey results will be posted publicly on isocpp.org.

Starting point is 00:10:55 This is one of the three big C++ community surveys that go out every year. This one, JetBrains do their own survey and Meeting C++ has that ongoing survey, sometimes add new questions as well. But between the three of

Starting point is 00:11:12 those, I know from having worked with our two tool vendors now that we do watch those closely to see what the trends are and who's using what. So please do fill those out because it helps the whole community to see what's going on. We have one news item from the tooling

Starting point is 00:11:28 world that you already mentioned, so it's a nice segue. Thank you, Phil. Actually, back to you because this is about the company where you work. So Sonar has announced that you can now run Sonar static analysis inside Compiler Explorer. Yeah, really excited about this because it really makes a

Starting point is 00:11:43 huge difference being able to just enable sonar analysis on some code you already have running in compiler explorer and you'll get a much more detailed set of warnings or rules that could really break down what might be wrong with your code even if it compiles you might still get some more insight to help you to clean it up and i've actually been working on some videos to go along with that. So if they're ready by the time this airs, I'll put some links to those in the show notes as well,

Starting point is 00:12:10 just to show you some use cases. Right. And so finally, there were two blog posts the last couple of weeks that caught my attention. The first one was by Viktor Zverevich, the guy who wrote the format and the FMT library. It's called C++20 Modules in Clang. So Clang 16 that we already discussed a little bit, I think,

Starting point is 00:12:33 a few episodes back, actually has pretty good support for C++20 modules, kind of out of the box. So Viktor actually went ahead and compiled his FMT library with modules. It requires a bit of manual work, but you can make it work. So this has been done before by Daniela Engert. She has a talk about that called Short Tool of C++ Modules that she did a while back. But yeah, so Victor now has repeated this exercise, I think, with Client 16. It's quite a lot simpler now to do. But surprisingly, Victor found that there wasn't actually an immeasurable speedup in compile times, which is kind of one of the things that Modules was promising to give us. So he was digging a bit deeper in that blog post and he traced the issue down to the fact that Clang is actually ignoring external templates.

Starting point is 00:13:15 So it's kind of recompiling the template's instantiations all over again. So this is kind of exactly the thing that Modules was supposed to to get rid of so yeah it's kind of interesting to see if if and when clang is gonna fix that or improve that or what the underlying issue is i'm not an expert there but definitely interesting to see that there is still some work there to do but it's kind of kind of works but kind of also doesn't quite give you all the benefits yet yeah that seemed to be a quality of implementation issue hopefully which means that we can move past that and we might still get the benefits of it down the line uh yeah i did like the fact that uh victor actually starts off the the post saying

Starting point is 00:13:58 the the three headlines c plus plus 20 features modules co coroutines, and the third one. So it's up to you to fill in the blank there. Well, I mean, if he means core language features, I guess he means concepts. You decide. I mean, it could also be ranges. I mean, I don't know.

Starting point is 00:14:20 Interesting. Very interesting. Okay, and one last interesting blog post that I want to mention on the show that caught my attention was called Horrible Code Clean Performance by Ivica Bogosavljevic. I hope I pronounced this name not too completely wrong. an homage to another blog post and video that came out a couple of months ago clean code horrible performance which is also super interesting and kind of controversial and we haven't really covered it in the show but like yeah look up that one as well that's super interesting but yeah this one is called horrible code clean performance and basically what he's doing there is that ivica is implementing a simple substring search algorithm so you kind of of have a bigger string, and then you have a smaller string,

Starting point is 00:15:06 and you search for the first occurrence of the smaller and the bigger string. And he's kind of implementing it naively first. So he has like a kind of a char pointer and a size basically for both of them. And then he's implementing like the naive loop that you would do. And then he does like an optimization where he does it in a much more ugly way, but then it kind of stores the first bit of the string that you're searching for in a really clever way.

Starting point is 00:15:30 So he ends up with this like much more convoluted code, but he finds that that actually runs 20% faster. So I thought that was really interesting. There was also quite an interesting Reddit discussion on that, about that too. So I don't actually see an interesting backwards and forwards in the comment with someone called uh timo dumla who said he wasn't able to reproduce evica's results on his own machine so looks like that's actually still ongoing so that may have

Starting point is 00:15:57 even changed by the time this airs but uh did you get anywhere with that timo yeah so the first thing i i thought you know when I was reading that blog post, I was like, that can't be. Compilers are more clever than this. So I kind of did the benchmark myself and it came out as there is no benefit. And then I told Ivica about this and he was like, oh yeah,

Starting point is 00:16:16 but like you need to make sure that like the function isn't inline. Then you need to make sure that the compiler doesn't actually see the string you're looking for because it doesn't see its size. Otherwise it can do like clever compile time optimizations. And I was like, oh yeah, obviously this is what's going on in my benchmark. So that's all wrong. So I need to redo it. But we did agree on the fact that if you do the same benchmark just with std string find, then it's actually faster than either of these

Starting point is 00:16:43 versions on either of those machines it's interesting that you're mentioning this uh std string find is a bit weird so i always feel like some of the libraries can have a few more specialized versions for different substring search so i was doing a few uh like measurements and benchmarks in the previous years. And substring search is one of my favorite problems, like tiny ones to tackle. So you can always take a smart algorithm and then try to optimize it in the CSC way and expect some performance improvements.

Starting point is 00:17:17 But the best thing that worked for me in substring search is very trivial heuristics. And especially combining them with some single instruction, multiple data intrinsics, you kind of get the biggest benefits. So if you take, let's say, substring search, in the case where you have the needle that is at least four characters long, and you essentially cast it to 32-bit unsigned integer, and then go through the haystack comparing at every byte step the following

Starting point is 00:17:46 four bytes casted to un32 to your un32 you kind of get like an improvement over std string find and over like a few other things which is a bit surprising because this is such an like obvious thing of course it doesn't work in the rare cases when the four characters are matching quite often. But let's say the fifth one is a different one. And then the coolest thing that worked out was actually taking the AVX registers, the AVX2 registers, and they can fit 256 bits. So that's how many bytes? That's 32 bytes. And within 32 bytes,

Starting point is 00:18:25 you will be able to do how many of such comparisons? Eight such comparisons. So what you can do, you can prefetch four AVX2 registers at one byte offsets. And this way, with a 35

Starting point is 00:18:42 byte step, you can actually compare 35 characters at a time, checking if any one of those offsets matches to your four byte thing. So it sounds a bit convoluted and this is exactly the point, I guess, of the article, like convoluted code, good performance versus the opposite. But the difference is staggering.

Starting point is 00:19:01 So when you take std string and you call find, most of the time you're getting 1.5 gigabytes per second worth of throughput can give or take well i guess me and you uh tim are like with astrophysics background anything within the 10x order of magnitude difference is like accurate enough but yeah exactly but what i was able to show on some of the conferences was that with this basic uh intrinsic thing you can actually get to 12 gigabytes per second per core so like it's exceptionally efficient it's much faster than the libc implementations and the std string and you can literally fit it in so many places so i was was doing benchmarks for both AVX instructions, AVX 5.12, and ARM Neon.

Starting point is 00:19:49 And back then, SVE, scalable vector extensions, were not available on ARM. So I couldn't do those. But even on ARM, the performance benefits were huge, and the energy efficiency was absurd compared to any code that even a super smart compiler like Intel's compiler can optimize. Huh. Yeah, that is such a cool trick compiler like Intel's compiler can optimize. Huh. Yeah, that is such a cool trick. That is very, very interesting.

Starting point is 00:20:10 Do you, let me just, because this is like my kind of topic, I love this kind of stuff. So I just need to follow up on this very briefly. Do you actually have to write like the SIMD instructions by hand or use a SIMD library? Or do you just write the algorithm in a way that lends itself to auto vectorization i don't know uh so actually part of this talk was specifically about measuring auto vectorization capabilities versus naive code versus handwritten intrinsics and there were people from intel's teams validating my numbers and checking if their compilers can actually like reproduce some of the vectorization that is quite easy to do by hand. And maybe since that point,

Starting point is 00:20:47 like five-ish years ago, six-ish years ago, I almost completely abandoned the idea of writing a library for SIMD instructions. Whenever I need top-tier performance, I just manually implement a few different versions. Generally, I don't go down to the level of assembly, but I almost always use the intrinsics. And I would generally have, let's say, a function object that is templated that will have, let's say,

Starting point is 00:21:13 a few different instantiations. One of them will be, let's say, with linear code, with serial code. Another one will be with ARM Neon, another one with ARM SVE, another one with AVX, another one will be with arm neon another one with arm sve another one with avx another one with sse and potentially the last one would be avx 512 whenever necessary that's interesting so actually in audio there's a very very similar problem very like you you often have to do convolution and you know you can do convolution in fourier space and this is kind of the proper way to do it but sometimes you want to do convolution in time domain, and then it's essentially the same thing, except you don't compare, but you multiply. Also, you have a big array and a small array,

Starting point is 00:21:54 and you multiply it, and then you move it by one byte, you multiply it, you move it by one frame, and multiply it, and so on and so on. And so this is why I found that this is one of the things that the compiler just can't figure out for you if you don't write it in a particular way that like kind of lends itself to auto-vectorization or like typically i think i've seen that you kind of have to use a simd library to to get this right and i've never tried it with just raw intrinsics because

Starting point is 00:22:19 that's kind of non-portable and you have to do it multiple times so i i clearly haven't dug into this as deeply as you have but um i should i should um look into this again because i think it's kind of a really interesting problem space just for sake of completeness what talk is that where can people see that because we should totally put that in the show notes i think uh that's actually interesting i'm pretty sure there's a github repository that implements this i think the talk specifically that one was in russian for cpp russia i think the ai tools should advance a little bit and we'll have automatic translation or maybe i can just repeat this talk somewhere but it also touches on a few topics that almost no other talk I've ever seen kind of covers. So there's like

Starting point is 00:23:06 this rumor that AVX 512 and a few other different instruction subsets kind of affect the frequency of the CPU. So that like once you truly load all the cores with AVX 512, you kind of lose all the CPU boost. The frequency really drops. And people just kind of understand it, that it's the case, but no one quantified it. Or like no one really goes into the documentation to describe what CPU licensing levels are. So same way as you have like CPU cache levels, like level zero, sorry, level one, two, three,

Starting point is 00:23:42 there's also like CPU frequency licenses. And this is like a weird completely separate topic that's almost completely uh undocumented and i was doing a lot of research kind of trying to understand like how many lines of avx 512 or like how many intrinsics can i actually fire so that the cpu doesn't turn down the frequency or Or let's say, how many more should I put for all the CPU frequencies across all the cores to be downgraded, even if the remaining cores are not doing any AVX-512 and they're just doing serial code. So it's kind of an interesting thing.

Starting point is 00:24:16 There's a repository that you can run. It's on my GitHub, and my handle everywhere is identical. It's ash vardanyan ash vardanyan so i think the repository is called substring search benchmarks or something like that i'll share their link yeah we'll put that in the show notes thank you and it sounds like we're sort of transitioning into the into the main content of this episode already so great segue there thank you but actually before we get to the low level stuff i wanted to start at least uh maybe at the higher level if that's okay because you you founded your own company and uh consists of a whole set of projects which on first glance

Starting point is 00:24:56 seem to be almost unrelated um they're mostly open source there's a github repo that we'll put in the in the show notes as well but the tagline there is rebuilding infrastructure for the age of AI. So the AI really stands out there. So what sort of AI is that and how do all these libraries relate to it? So you're right in the fact that we have too many things that to most people would seem unrelated. And to be honest, this is just the tip of the iceberg. So I started

Starting point is 00:25:25 this company seven and a half years ago. I've been working on it essentially every single day since, day and night. And when I left the university and focused exclusively on this, this is when the true science began. I was reading like a thousand papers a year, almost everything that was published on the computer science part of archive or the AI and ML parts, I at least glanced through it and a lot of things were implemented. So our open source libraries include Ustore, which is an open source multimodal database that obstructs away the layer of a key value store. So kind of, you can take any key value store and you add the database logic on top of it, such as being able to store different forms of data, query them, add wrappers and drivers for

Starting point is 00:26:10 different programming languages like Python. Then there is a library called Ucall, which is essentially a kernel bypass thing, a tiny single or like two file project that took me like a couple of days to write. And one of my junior developers maintains it now, adding TLS support, which seems to be like one of the fastest networking libraries built, or at least like within the C++ open source domain. It's an RPC, JSON RPC library that uses the most recent IOURank 5.19 features.

Starting point is 00:26:42 I mean, kernel 5.19 and higher. And it also uses SimJayson and a bunch of other SimDy accelerated libraries for processing the packets. There is also a project called Uform, which has nothing to do with C or C++. It's a pure Python thing that kind of uses PyTorch in a nice way to be able to run multimodal AI models with Mute Fusion. It's essentially the kind of setup when you have multiple transformers and the signal between them is kind of exchanged before it reaches the top output of a neural network.

Starting point is 00:27:14 There's a bunch of other libraries as well, but this is just the open source stuff. Almost every one of those has like a proprietary counterpart that is far, far more advanced. It took years to build, has tons of assembly in it. A lot of GPU accelerated code includes such remote things as Regex parsing libraries, probably the second fastest in the world after Intel Hyperscan. And the only one that also exceeds 10 gigabytes per second per core.

Starting point is 00:27:41 Then there is a graph library, one of the largest graph algorithm collections. There is a BLAST library, so basically new algebra subroutines. But unlike classical BLAST, we don't just target dense-dense matrix multiplications. We also do sparse-sparse, and we do it both on the CPU side and the GPU side. And we also do it in invariant to the ring manner. So let's say there's this notion of algebraic graph theory, where you can replace a lot of graph processing algorithms with matrix multiplications. If you know how to parametrize the matrix multiplication kernel,

Starting point is 00:28:20 replacing the dot product, essentially the plus and the multiply operations, with something else, a different ring or a different semi-ring. So a cool thing that most CS people often don't realize, even though they are familiar with the subject, is that if you look at the Floyd-Warshall algorithm on the Wikipedia page, it's just three nested for loops over i, j, and k. And then within it, it's almost exactly the same as matrix multiplication, but instead of addition and multiplication on scalars, it's doing the minimum and maximum

Starting point is 00:28:51 operation. So if you take the matrix multiplication kernel, design it as a template, and then pass a different operation into it rather than plus and multiply, your matrix multiplication kernel immediately becomes a graph processing algorithm. So there's a lot of such seemingly unrelated things, but my vision from the very beginning was that those can compose into AI of scale that we've never seen before. So all of the modern AI is almost exclusively built on dense matrix multiplications and very simple feed forward layers or very basic attention and then another part is that it almost exclusively works on the stuff that fits in memory or fits within vram so the memory attached to your gpu and those volumes are tiny so in our case i was always curious like how can i optimize and vertically integrate the whole stack

Starting point is 00:29:45 so that even like external storage, such as the modern high bandwidth SSDs, can actually become part of your AI pipeline, streaming the data, reorganizing the data, let's say stored on SSDs with a participation of AI, or let's say helping to train AI by having a much faster data lake. So the idea there is that modern CPUs can have, what, like one, two terabytes of RAM per socket, but they can have also like 400 terabytes of NVMe storage attached to that same socket. So if you're not able to address and properly use external memory, you're really limiting yourself to like very small

Starting point is 00:30:25 part of what's accessible. And the additional part that kind of adds up here is that, yes, you can build up a good data lake to help with AI and the AI industry, but you can also use AI to improve the data lake itself. It's like very reminiscent of the Silicon Valley series. The guys were building compression to build AI and then ended up building AI to build compression or vice versa. I think it had the different order. In our case, if you look at the databases like Postgres, MongoDB, and many others, they focus almost exclusively on deterministic indexing, such as inverted indexes or something like that, where you just explicitly search by a specific key or a specific string,

Starting point is 00:31:14 and you only search for exact matches or even at best fuzzy string matches. But with AI, we can actually search unstructured data. So by combining vector search, by combining a database and a multi-model pre-trained AI, what we can do, we can actually embed some media documents into a vector space and then just search through those vectors, finding all forms of potentially unrelated content or hopefully related content, but across different modalities. So being able to search videos with a textual query, being able to search images with a video. So being able to search videos with a textual query, being able to search images with a video query,

Starting point is 00:31:51 being able to search JSON documents with a video query, and so on. So I guess this kind of gives you a glimpse of how everything connects together and hopefully makes the list a little bit of sense. Yeah, that sounds very interesting. Thank you. We're going to dig into a couple of those libraries in a minute, I think. But just taking a step back, you mentioned Python a couple of times there. Seems like you're continuing the AI ML tradition of having a Python front end with C++ doing the heavy lifting. Is that fair to say?

Starting point is 00:32:17 Yeah, I guess everyone just converged to this idea that this is the way to go. For many years, I'd confess I wasn't a super big fan of Python. I was too obsessed with performance to touch a tool that kind of almost entirely abandons the concept of performance. But then I realized the value that it brings to my life and my developer experience. So I thought we should bridge the two worlds and we are not the only company doing this. So famously all AI and ML frameworks are written in C++, but at the front end, people kind of use Python exclusively, almost exclusively. And this kind of spills outside of AI in all the data science and data analytics tooling as well. So NVIDIA is famously one of those companies that builds a lot of obviously GPUs

Starting point is 00:33:07 and they have CUDA as a language. They have a compiler for CUDA, a lot of like low level stuff. But I would also say they have by far the best tooling on the Python level to actually leverage those GPUs. So you can take libraries like Pandas, NetworkX or NumPy, which are all targeting only CPUs and are written purely in Python. And you can replace those with libraries like QDF as a

Starting point is 00:33:32 replacement of Pandas, QGraph as a replacement of NetworkX, and QPy as a replacement of NumPy. And genuinely, this is some of the best software I've ever used and kind of a really good benchmark for us to compete with in a sense that they do it for parallel programming and we do it for external memory right so yeah so i can see how that sort of fits into the infrastructure story that that enables better ai implementations so we'll dig into some of those libraries in just a moment, but just go have a little break because this episode is supported by JetBrains. And JetBrains has a range of C++ IDEs to help you avoid the typical pitfalls and headaches that are often associated with coding in C++. And exclusively for CppCast, JetBrains is offering a 25% discount for purchasing or renewing the yearly individual license on the C++ tool of your choice.

Starting point is 00:34:28 That's CLion, ReSharper and ReSharper C++ or Rider. Use the coupon code JetBrains for CppCast, or one word, during checkout at JetBrains.com. So there were a couple of projects on the Unum repo that jumped out at me that i just wanted to to bring up and the first one was a ucall which i think you did mention earlier which it claims to be you mentioned yourself a adjacent rpc library that is up to and i don't know how much work up to is doing here but a hundred percent faster than fast api 100x not 100 so this is important yeah no 100 times yeah i know a little bit about fast api although i haven't actually used it myself but i'm gonna have a few web servers that i've written and maintain uh some of them just serving json uh built on

Starting point is 00:35:19 python web frameworks and i usually use flask and know that FastAPI, which is also a Python web framework, is meant to be significantly faster than Flask. And I've not heard people saying that Flask is a particularly slow framework on its own. So if you're saying you're 100% faster, sorry, said it again, 100 times faster than FastAPI, that sounds like quite a big claim. So how do you actually achieve that? Sure, I'll be happy to explain. So at first, people may think that if a project is popular, then it kind of optimizes something and it's really good at something. Even though FastAPI has fast in its name,

Starting point is 00:35:58 it's not particularly fast, if I'm honest. So one of the things that they do really well is they're very simple to use. They're very developer-friendly. So you just put a Python decorator on top of your Python function, and all of a sudden, this is a RESTful web server. So I guess maybe by fast, they meant that the developer experience is fast, but not maybe the runtime itself. So the story is more or less the following. I was playing with our neural networks and they're very lightweight. So we looked at neural networks like OpenAE Clip

Starting point is 00:36:32 and we wanted to replace those multimodal encoders with something that would work much faster and can be deployed on edge, maybe like even IoT devices. So we really squeezed those transformers, made them a lot faster. And if you take a server, such as like DJX-A100 by NVIDIA, you will end up serving 300,000 or like 200,000 inferences per second across the eight GPUs of that machine.

Starting point is 00:36:58 So this is a very high mark for AI inference. And the question is like, how do you serve it? Because the first idea is, let's take the most commonly used Python library for web servers, let's connect it to PyTorch or something else, and let's just serve the embeddings. So when I tried to do this, I wasn't actually even on the DJX, I just took a MacBook. And when I built up a server and just ran it on my machine, it was an Intel Core i9. I think my response latency was close to six milliseconds. So just the client and the server on the same machine, and I'm waiting for six milliseconds to get the response.

Starting point is 00:37:37 I was just shocked by the result. So obviously there was a lot to optimize. And then I thought like, how far can I go? I haven't done much networking development in the last couple of years, but I've done a lot of storage-related stuff. And I loved IOU-Rank for all of its, like, new advances and the performance that it brings.

Starting point is 00:38:01 Of course, sometimes we have to go beyond that. So we also work with SPDK and DPDK as pure user space drivers for kernel bypass. But IOU-Ring by itself is also pretty good. So if you take a very recent Linux kernel like 5.19, it adds up a lot of really cool features for stateful networking. So essentially, the idea is the following. Whenever you have a TCP connection on the socket, you listen for new requests and queries. And whenever they come, stateful networking. So essentially the idea is the following. Whenever you have a TCP connection on the socket, you listen for new requests and queries. And whenever they come, you kind of create a new connection for every one of the incoming clients or a new client. So one of the

Starting point is 00:38:38 system calls that you would oftentimes do in this case, you kind of get a new file descriptor for the communication over a channel to a specific client. And one of the things that IOUring in 5.19 brings is a managed pool of those file descriptors that can also be taken using the IOUring interface without any system calls. So with this out of the way, almost every system call that we could have done, that would have caused interrupt and a contact switch on the CPU side is now gone. And even with a single server thread, we managed to get to 230,000 requests per second,

Starting point is 00:39:17 even on our machine, which is generally considered like efficient cores rather than high performance cores. While FastAPI was only serving 3,000 responses per second or requests per second. So 3,000 to 230,000 is a huge gap. But at this point, we're kind of comparing an implementation in Python and implementation in C. So we wrote a pure CPython layer as a wrapper on top of our C library. The result was that we kind of dropped from 230,000 to 210,000,

Starting point is 00:39:52 still a major improvement over FastAPI. And aside from FastAPI, it's also seemingly faster than most of the other networking libraries, including gRPC, which many people use as the go-to high-performance RPC implementation. But gRPC doesn't implement such level of kernel bypass, let alone the fact that parsing product buffers is actually oftentimes slower than parsing JSON with SimJSON. So we kind of win on both fronts, the packet processing speed and also the way we interact with the socket.

Starting point is 00:40:25 So here you go, 100x faster. Everything you said there all sounded reasonable, but the numbers still sound too big. So I'm definitely going to be playing with U-Call on one of my little projects and see whether that makes a difference. Because I'm looking to step up the performance. Sure, let us know. We would really love feedback. Yeah, I will report back. So thank you for that. You mentioned the term motu-model a few times there. What is that exactly? Because I think that's like a term of art in AI, isn't it? Yeah, so AI people use it a lot, especially these days with what they call foundation models or like the next step of LLMs,

Starting point is 00:41:02 large language models. So just doing language is not enough these days. People want multimodality, which means essentially being able to work with multiple forms of data at once, like images, video content, audio content, anything actually. So an example of like multimodal AI would be something like a text to image generation pipeline.

Starting point is 00:41:24 Like you put in text and you get an image. Another example would be like an encoder that kind of understands both forms of data and it produces embeddings of vectors that can be compared with each other. So you can say if an image is semantically similar to a textual description that sits beneath it, for example, like on the web page. And in the context of, let's say, databases or anything else, we also started to use this term to kind of make the vocabulary a little bit more universal across different parts of our repositories. So a multimodal database for us would be a database

Starting point is 00:42:01 that across different collections of the same store can keep different forms of data without sacrificing the remaining properties and the most important property for us in a database would be transactions and support for like asset guarantees like atomicity consistency isolation and durability so if you can do a transaction where within one transaction you're updating multiple collections and what in one of them you're storing a metadata of an object and another one you're storing maybe like a poster or a photo of a specific document or something like that. And if you can do it in one transaction with all the guarantees included, this is multimodal for us. Right. Yeah, I think I'll follow that. And it's interesting you started talking

Starting point is 00:42:45 about databases there i think you've done my transition for me again because i was going to ask about another one of your projects which is a u store which at the time of writing i think on your your site is still somewhere called ukv looks like you're in the middle of naming that so just in case people go looking for it and they find ukV. It's the same thing, I believe. Sure. The readme describes it as a build-your-own-database toolkit, but also that it's four to five times faster, at least in your benchmarks, than RocksDB.

Starting point is 00:43:20 And I hadn't heard of that, so I had to go and look it up. But it sounds like RocksDB is meant to be at least 34 faster than mongo db so i'm sure that's something people have heard of to to get a uh an idea now so we're talking about like almost an order of magnitude faster than something like mongo db so that again is very impressive how do you achieve that so uh there are a couple of stories here and uh i've done a really bad job naming some of the projects and it really seems like a bit convoluted like too much is happening so let me just give you a bird's eye view of how the storage is built today so let's say if you use something like a distributed database you have like the distributed layer at the very top, which is responsible for consensus and the ordering of the transactions. Then whenever you choose the lead and the master node, we can dive deep into that specific node. And on that node, you have

Starting point is 00:44:15 essentially an isolated single instance solution. Within the single instance solution, what you have is a database layer, a key value store layer, and a file system layer. And beneath it is the operating system and the block storage. So we kind of haven't reached the distributed layer so far. We almost exclusively focused on vertical scaling in most of our projects. Even though, as we've just mentioned, you call networking is also important for us. It's just that we take certain steps in specific order. For now, distributed hasn't been part of the agenda. It will be this year. So what we've done, we've built up something that remotely resembles the strategy of Redis. So I guess everyone is

Starting point is 00:44:59 familiar with Redis. It's essentially like a hash table on steroids. What they've done, they kind of focused on building a key value store, and they allow a lot of different additional features, essentially adding multi-modality to the underlying binary key value store. So now let's kind of disassemble this into parts. A key value store is just an associative container, like a hash table or a binary tree, B-tree, log-structured merge tree, anything actually. And Redis added pieces such as Redis JSON, RediSearch, and Redis Graph

Starting point is 00:45:38 as essentially forms of converting different modalities of data and kind of serializing them down into a key value store. So every modality is just like a feature of the underlying storage engine. So what has been happening on our side, we thought, oh, cool, let's take a key value store that we love building. And in our case, it's called Udisc. Let's take other key value stores and let's create a shared abstraction. So that's why it was briefly mentioned as

Starting point is 00:46:10 build your database toolkit. So we thought if Redis knows how to abstract away the key value store and kind of add a lot of features on top of it, we can actually do something similar and just give it out to the world for everyone to use it. So essentially you can take any key value store that you like. And if by any chance you love designing associative containers and you code in C++, it's very easy for you to actually build up your own hash table, take this project, which is now called use store and use it as an intermediate representation layer, essentially, or just like a C interface. If you add this C interface on top of your hash table or associative container that would be ordered, you're getting a lot of support for different

Starting point is 00:46:55 forms of data on top of it. And you also get bindings and SDKs for languages like C, C++, Python, as well as Golang and Java that have partial support. And we also had some contributions from the community, people trying to implement Rust bindings around it. So you've briefly mentioned some of the benchmarks and the performance numbers, and I can elaborate on them as well. So in our case,

Starting point is 00:47:22 one thing that really surprised me a few years ago was that I was so focused on AI and compute and high-performance computing, I didn't really think much about storage. And when I just tried to bring up the systems together and kind of compose them into one product or one solution, I still needed some storage. So I took RocksDB, which is an open source key value store by Facebook, which seemingly is the most commonly used database engine today. And whenever there's a new database company, there is a very known zero chance that they're using RocksDB as their underlying engine. So essentially what's happening is the database

Starting point is 00:48:03 is adding its own logic for specific workloads, such as processing graphs in the case of Neo4j. And beneath it, what's happening is that this is all converted into like binary data and is stored at RocksDB or some other key value store. So in reality, at least from my perspective, the absolute majority of work that has to be done is in the key value store layer. This specific example, Neo4j, is like one of my biggest pains because I've been always fond of graphs and graph theory. And Neo4j is kind of synonymous with graph and graph databases by like today. This company has raised $755 million. So their product must be as polished as possible. And they've been around for over a decade. But every single time that I try to run

Starting point is 00:48:54 this database, it crashes with classical Java errors. And as a C++ community, it's almost our obligation to kind of make jokes about the Java runtime and all the garbage collection issues that people face in that land. So I was facing them all the time. There wasn't a case where I would try to put a graph even remotely interesting to me in terms of size and Neo4j wouldn't crash. Either I, after 20 years of programming, am so bad that I cannot even start up a database or something is really wrong on their infrastructure level. And something was actually wrong. So until a couple of years ago, they were not using RocksDB.

Starting point is 00:49:34 They had like internal T-value store. And in 2019, they decided that they're kind of switching to RocksDB as a new faster engine. But even before they switched to RocksDB, similar to companies like CockroachDB, YugaByte, and countless other companies, half of which have this premise of let's take Postgres and put Postgres' engine for query execution on top of RocksDB. Even before they started doing this, we realized RocksDB is way too slow for us. So our ambitions were much higher than even the best expectations that other databases had for their future a few years down the road.

Starting point is 00:50:12 So we kind of went into the lab. I moved to Armenia with all places. We ordered a bunch of super, super high-end equipment. So we run on the fastest SSDs on earth, 64-core liquid-cooled CPUs, Ampere GPUs. For the last couple of years, we ran on 200-gigabit InfiniBand networking. And we used all the

Starting point is 00:50:34 state-of-the-art hardware to actually push the limits of what software can do. Because when your hardware is so freaking fast, every single bottleneck that remains is on your side, the developer, software developer side.

Starting point is 00:50:47 And what we've done, we've created the key value store that is faster than RocksDB today in almost every single workload. So today, the only workload in which we're just a little bit slower is range scans, but this is relatively easy to fix in the upcoming versions. But in some crucial forms of workloads, such as batch insertions and batch read operations, when you random gather or random scatter tons of information onto persistent memory or from persistent memory, we have five to seven times faster than ROCKSDB, which is a number so absurd. Most companies, especially like smaller startups, didn't believe this is possible. The only companies that kind of realized and were familiar with my prior work in the previous years are generally like super large trillion

Starting point is 00:51:37 dollar plus American tech companies. And they kind of knew some of my proprietary work before that. And when they started testing it last year, they were just shocked that this is even possible. So our database engine can be faster than a file system. And the only company that has ever shown that these numbers are possible was Intel a couple of years ago on their Optane SSDs. They did it using SPDK,

Starting point is 00:52:02 which is a user space driver that they design and maintain. And they reached 10 million operations per second, most likely with 24 SSDs. But this is purely synthetic workload. We've managed to reach 9.5 million operations per second batch reads on our lab setup with Intel people present and validating those numbers on a setup with three times less SSDs and not synthetic read and write operations, but actual database operations. So this was like an incredible milestone last year, a culmination of seven years of my work investments and teaching experience,

Starting point is 00:52:41 I guess. They're very impressive numbers, of course, and I'm definitely going to be trying them out myself to see how they stack up. But I'm sure a lot of people listening will be fascinated by this just as we've been discussing it. But since we are a C++ podcast,

Starting point is 00:52:56 I know you've mentioned the use of C++ in many cases, but is there anything you can say about how you've used C++ to achieve these results? Yeah, sure. So C++ is essentially the only language where I can do this. There is no way around it. I tried other languages. C++ wasn't the first language I used, wasn't the last one that I adopted or tried.

Starting point is 00:53:18 So almost every one of those projects is implemented in C++. Ustore is implemented in C++. Every single one of our internal libraries is implemented in C++. Ustore is implemented in C++. Every single one of our internal libraries is implemented in C++. But as a person who's been doing C++ for well over 10 years now, I think it's not a single language. It's just like a pile of languages mixed together. And every more or less like senior person kind of picks his own subset of what he kind of allows within the

Starting point is 00:53:45 code base and uh i guess most of the people kind of stay in the profession for this long they develop a taste and a lot of strong opinions about stuff they like and dislike so i am this uh kind of code nazi within uh my team who is like super aggressive in terms of like not allowing some of the features of the language to be used while kind of pushing everyone to adopt other features that they may have not been familiar with from school. So in our case, things that I don't like and don't use oftentimes would be related to dynamic polymorphism, exceptions, related stuff. I guess you can understand,

Starting point is 00:54:27 especially like in the low latency environment. We really hate memory allocations. We don't use new or delete. It's very important for us to have like full control of the memory system. We use Numerware allocators. We design some of them. But then on the other side,

Starting point is 00:54:45 there are features that we can't live without. So essentially being able to compose very low-level abstractions with super high-level abstractions is the kind of special thing about C++. So as I've mentioned, we oftentimes build function objects that would be like essentially a

Starting point is 00:55:05 templated structure with overloaded call operator. So open brackets, close brackets operator. What we then do, we essentially instantiate this template in a few different forms and specialize it for all kinds of different assembly targets. So we would have an implementation for x86 and for ARM. And within x86 and ARM, we'll also target a few different generations of CPUs. So I guess this is one of the things that we really love. Another thing is that we always stick to the newest compiler. In our case, this is mostly GCC and LLVM.

Starting point is 00:55:44 We also use NVC++ and NVCC. We also occasionally use Intel's one API performance toolkit compilers. I guess they are also not very good at naming as bad as we are, renaming them almost every year. So I don't know like which name ICC or ICX they go now for. So those things are crucial. Using the recent C++ standard is also important because when you do a lot of templates and metaprogramming in C++11,

Starting point is 00:56:18 it's constant STD enable if. Once I start remembering all those horrors of 2011 and 2012, I kind of almost lose consciousness. And then when if constexpr appeared with C++ 17, if I'm not mistaken, or C++ 14, whatever, we

Starting point is 00:56:38 kind of immediately adopted this. We now use C++ 20 where we can. Some features unfortunately don't work for us for now. So coroutines still allocate. And when you reach 10 million IOPS on eight SSDs or aim to get to, let's say, 20 million IOPS, heap allocations are not good.

Starting point is 00:57:01 So we cannot use coroutines there. We have to rely on pure c interfaces but overall i would explain one more time that c++ is essentially the only language where we can achieve this yeah so i love that what you said sounds quite familiar for me coming from like an audio processing background like i'm not wanting to do allocations not avoiding branches and runtime polymorphism and you know writing your own allocators and all of that stuff, kind of all this low latency stuff sounds familiar. I guess the big difference is that in music software,

Starting point is 00:57:31 typically you can't use the latest and greatest compilers because A, you have to typically ship on macOS and Apple Clang is a bit behind. And B, you're also typically, if you're shipping like an audio plugin, you need to support kind of older versions of macOS. So you're kind of also constrained on what standard library versions you can use

Starting point is 00:57:49 because the stuff might just not be there on like an older Mac OS version. So I guess you don't have any of those problems. So you can really take full potential of like the latest and greatest compilers and standards, which is kind of, I think, a really great place to be as a C++ developer. More or less, until you start packaging. So as we started packaging and shipping some of the pre-compiled binaries, especially for high level of runtimes,

Starting point is 00:58:14 it became such a nightmare. So with this project, Ustore, without a doubt, we spent a lot more time just trying to fix compilation than writing the project. And this is absurd. So this is like, as a build your own database toolkit, as Phil mentioned, it kind of composes multiple builds into one. And it's entirely CMake based, we have to build from sources. So like we use the CMake fetch possibilities, the external project possibilities, to kind of combine it into one build pipeline. But then all of a sudden, you end up building RocksDB, you end up building OpenSSL, gRPC. Every one of them has their own compression library dependencies that collide in terms of versions.

Starting point is 00:58:59 Then you have all kinds of linker problems. And at some point, once you start packaging it for Python with many Linux, it brings some obsolete, insane, old version of Lipsy. And then you try to build up, let's say, a matrix of different Python bindings for different versions of Python. And this build on a 64-core, liquid-cooled, overclocked CPU takes many, many hours. So when we try to do it in GitHub Actions, I think just a certain availability zone

Starting point is 00:59:29 on Microsoft Azure just dies. I think a lot of people know this pain. It is certainly a problem. Unlike some of the problems that you've solved, apparently in much better ways than other people, this I think is a problem that still remains to be solved in a way that makes everybody happy. So I'm not sure if you ever get there, but definitely can share the pain there.

Starting point is 00:59:52 So we talked quite a lot about kind of your work and your projects and all of that. Kind of, I want to zoom out a little bit and shift gears and ask a completely different question, if you don't mind. So we talked a little bit about AI, and obviously you're kind of working on a lot of the plumbing that makes all of this work. But if you zoom out really far, something that I found really striking over the last year or so

Starting point is 01:00:17 is how AI systems like ChatGPT or DALI or MidJourney have transformed how people do things. And I wonder what your thoughts are on this, like this latest generation of AI. Are they going to like wipe us all out as a, as a humanity and within the next few years and like, are you guys going to rule the planet or do you have any thoughts on that? Well, I wouldn't be that pessimistic. So I just had to say that because i'm a massive science fiction nerd and it's kind of a thing that you know people have been worrying about

Starting point is 01:00:51 for quite a long time so one part we have to take seriously the fact that the work is changing and like jobs will be replaced obviously some people are very frightened with this and it's understandable change is always frightening but on the different side we can now get much more efficient with ai and people can unlock a lot more of their credit creativity so there's a lot of the opportunity for people who may kind of get replaced now to actually adopt a new skill which will will not be easy, obviously. And we have to be compassionate with them and help them kind of adopt AI to kind of get into a new labor market. But in general, I was always optimistic about AI. So I think people are perfectly fine finding ways to kill

Starting point is 01:01:42 each other, even without AI. So if there's something coming to kill us, I think it's more likely ourselves rather than an artificial form of intelligence. So I would definitely bet on humans any time of the day if this question is asked. But on the opposite side, just looking back on the last couple of months on the ChatGPT release,

Starting point is 01:02:10 I would say for people who are inside the industry and have been pre-training actively and there are not too many teams like that so there's like one major team one major cluster that is the US and maybe UK with DeepMind another cluster is maybe like Russia and now South Caucasus where like a lot of this talent has moved, ourselves included. And another major cluster would be China, where people actually have the resources to pre-train those models, because this is not cheap. Like you need thousands of GPUs.

Starting point is 01:02:35 This puts your budget, starting budget at $100 million and above. So like small labs cannot really compete within this modern heavyweight category. So there are not many teams, but the people who are inside of those teams have been familiar with the incremental steps and the incremental progress that was happening. So I don't think for many of them, the chat GPT release was a shocker. Many of them have been working on similar technology and have seen every preceding paper that came before that still it's lovely to see the attention to the industry and i've seen a

Starting point is 01:03:11 lot of hype cycles in the last couple of decades like crypto was the most recent one and i think people are still confused about any application or any effect that crypto can have on our everyday lives while with ai we have such insane level of adoption already. So I think there's been like a billion people who have interacted with AI over the course of the last few months who have never touched AI or AI related tools before that. So I'm very passionate. Right. Yeah.

Starting point is 01:03:40 I mean, what you say about the job market, I've actually been thinking about that too, because I'm a developer advocate. So things that I do include things like writing blog posts or recording videos about how to do something. And basically now I can let Chachapiti write the script for the video, and then I can train in the eye to read it out in my voice. So basically I don't have to do anything anymore. So that's kind of an interesting thought okay shifting gears now again uh completely because you mentioned now uh kind of south

Starting point is 01:04:12 caucasus and how there's like a lot of talent there so you your bio says that you actually um founded a c++ meetup in in armenia and that's obviously something i'm very excited about so can you tell us just a little bit about the c++ scene in Armenia and that meetup that you've started and how that's going? Yeah, sure. It's actually absolutely wonderful. So when I moved to Armenia a couple of years ago, as a person who wasn't born or raised there, I just had some ancestry. Most people thought I'm kind of a bit insane to do this because i had a few other opportunities of other countries where i could go but now they tend to realize how much of undervalued jam armenia is within like the local town like a local region so essentially in armenia we have a ton of hardware and like

Starting point is 01:04:58 chip design companies so armenia has one of the largest offices of Synopsys, which is like a chip design EDA company based in the United States. Their second largest office, as far as I know, is in Armenia. And the third largest is in India. And India has a 750 times larger population than Armenia. And now NVIDIA and Mellanox also have presence in Armenia. Their office is only like five minutes walking distance away from mine. There's AMD and Xilinx, there's Siemens CDA. And whenever you hear like chip design and hardware, you kind of immediately get that these are low level people. It's likely that they use low level

Starting point is 01:05:36 languages. So obviously C++ is a major part of their life, professional and sometimes now like part of their after work interactions. So the problem I saw when I arrived was that there were a lot of professionals who use C++ daily, but they are not always familiar with the newest standards. They don't meet together too often to discuss how people tackle different problems. And the overall exchange of ideas between junior developers and senior developers is not as rapid as maybe, let's say, within the United States or within Russia, the places where the developer ecosystem is much more developed. So I thought it makes sense to help a little bit ignite this activity. we had maybe like six live meetings. We grew from just 10 attendees to maybe 750 members who kind of went through those meetups and our groups

Starting point is 01:06:30 and kind of chat and discuss within our vicinity. But there are definitely a lot more developers who do C++ but haven't been part of a community so far. So I guess there are a few thousand more. And overall, we now have a Dekacorn in Armenia, a Unicorn, five to seven other companies that are about to become Unicorns. So this density in a city with less than a million population kind of dwarfs or like at least competes with maybe half of Scandinavia, if not all of Scandinavia combined. So I'm very excited.

Starting point is 01:07:06 I really want to invite everyone to come visit us in our country in the South Caucasus. And I'll try to do my part in this. And hopefully next year, just like Phil is organizing CPP on Sea, we were not that lucky to get access to sea, but we have beautiful mountains. Maybe we should call our conference next year CPP in Mountains. The capital is already elevated by a kilometer over the sea level, but we can go even higher than that, somewhere in a beautiful place with a beautiful view

Starting point is 01:07:35 and just chat about C++ with all the brightest from all over the world. So come visit us. So Ash, that sounds absolutely amazing. I've never actually been to Armenia, but I always wanted to go. So if something that sounds absolutely amazing. I've never actually been to Armenia, but I always wanted to go. So if something like that would happen there, I would totally show up there. I think that that sounds really exciting. And we'll be very excited to have you.

Starting point is 01:07:55 And if we have listeners in Armenia who don't know about your meetup, I'm sure there's a link that you can give us that we'll put in the show notes and they can tie up with you and obviously spread the word in Armenia, because there's obviously a lot of technical people there that may enjoy the show. Thank you very much for sharing all this with us. We've run way over time once again, but is there anything else you want to tell us before we let you go? Just that you, both of you, are amazing hosts. I'm happy to be here and would be happy to chat about any one of our projects both on any podcast or within our discord groups so like a lot of the projects are open source come try it share your experience uh don't hesitate to ping us whenever

Starting point is 01:08:42 you see bugs because there are a lot of them, I believe. And especially like with this build and compilation time, sometimes the packages are a bit outdated, but we're doing all the best we can to actually keep the best software, the fastest software always available to our users. So where can people reach you, Ash? So I have accounts that I often check and use on obviously places like LinkedIn, GitHub,

Starting point is 01:09:09 Facebook, and Twitter. So I had essentially a read-only account on Twitter for a few years. But I guess, assuming how many tech people are on Twitter, I have to kind of change my policy in that regard and start becoming more active. Again, my nick is the same everywhere, Ashfordanian. But aside from this, there's also a Discord channel that you can find opening any one of our open source repositories. There's a few glyphs on top. And one of those is a Discord link on which you can press and connect not just with me, but with every one of the engineers on my

Starting point is 01:09:42 teams in Armenia and abroad. Well, thanks, Ash. We'll put some of those links into the show notes as well. Lovely, guys. Pleasure talking to you. Thank you so much, Ash, for being a guest today. Thanks so much for listening in as we chat about C++. We'd love to hear what you think of the podcast. Please let us know if we're discussing the stuff you're interested in,

Starting point is 01:10:04 or if you have a suggestion for a guest or topic, we'd love to hear about that too. You can email all your thoughts to feedback at cppcast.com. We'd also appreciate it if you can follow CppCast on Twitter or Mastodon.

Starting point is 01:10:17 You can also follow me and Phil individually on Twitter or Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com. The theme music for this episode Mastodon. All those links, as well as the show notes, can be found on the podcast website at cppcast.com. The theme music for this episode was provided by podcastthemes.com.

Your Ad Here

CppCast - AI Infrastructure

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.