Algorithms + Data Structures = Programs - Episode 230: Hoogle Translate

Starting point is 00:00:00 I mean, it kind of hurts my soul a little bit, though, when you when you see the the rainbow end after like seven languages and then everything else is white because there's zero agreement on the naming of one of them is is prior adjacent transform, which we now have in C++ 23. It's called prior in Q zip with next in scholar Kotlin. I can't remember which one. And but like every language has a different name for that. Welcome to ADSP the podcast episode 230 recorded on April 9th, 2025.

Starting point is 00:00:40 My name is Connor and today with my co-host Ben we chat about a new website, Hoogl Translate. Although I probably I'm not sure also because you don't spend much time on the socials anymore or maybe this can be Part two now. We're now on episode 230. I have spun up my long-awaited Hoogl Translate website. Oh, right. In fact, maybe I'll I mean We try not to share screens here, but maybe I'll do it for a sec. And honestly, say what you will about vibe coding folks. Vibe coding is, I think the number one best thing

Starting point is 00:01:15 about these AI assisted stuff, it is a barrier remover. You know, the reason you actually commented, I think it back on Twitter years ago, when I had, I think you commented something about like, oh, this like a graphic that I had posted is kind of like a Hoogle Translate, referencing the Haskell website. Potentially. And then I replied to that. It was so many years ago, yeah, you probably, it sounds vaguely familiar. That may have been the first time you'd heard of Ho go is that the case I don't know it had not I think I had heard of it before definitely I'd heard of it before because I had I you said who will translate and I replied I've owned the domain for like 12 days who will

Starting point is 00:01:56 translate calm okay and so like I was like floored when you had suggested I guess it's it's a pretty common like if you know of who goal this is basically that's why I called it that right it's a pretty common, like if you know of Hoogle, this is basically, that's why I called it that, right? It's a Google Translate for Hoogle. You type in one thing and you get it cross-language. And this was a site that I built in ClosureScript, like in 2023, but the reason I didn't publish it was because I already had a website spun up,

Starting point is 00:02:23 which was my program language rankings, plrank.com. But the way that I would deploy that, I would build the ClosureScript website. It generates a main.js, and then I would always manually push that main.js, and that's what basically the website was, which is, you should be able to, with CI, recognize that there's some change, and then get like a CI to rebuild that JavaScript file and then push it for you.

Starting point is 00:02:50 I spent half a day or a day or a couple days at one point trying to figure that out, but I'm not a CI engineer and I'm also not like a front end web developer. So like I'm already out of my depths dealing with JavaScript and stuff. and stuff. And so I just gave up and I put it on my list of things to do. And I wanted to get the automatic deployment of this main.js working before I built my second ClosureScript website so that I didn't have two different, you know, bad, poorly deployed websites. It's funny when you say, and I totally echo the sentiment,

Starting point is 00:03:22 like, you know, you're out of your depth with JavaScript. You know, my wife is a front end developer. And, you know, she thinks what I do is arcane magic and the same the other way around, right? It's like, And that's the thing is it's, it's true, like, how like how you know how you can be an expert in one thing and then you move just slightly to the right. You know, it's just a different programming language. It's just a different stack. But you don't know the idioms, you don't know the libraries to reach for. And you may know how to spell it algorithmically in your head.

Starting point is 00:04:00 But like potentially those algorithms. You need that primer to translate. You need to Google Translate. Yes, exactly. But like right potentially those algorithms you need that primer to translate you need who will translate Yes, exactly and Long story long is that I I vibe coded using Claude 3.7 got like Did like all of the plumbing of that first website got the github actions You know automatically deploying and then I did that like one day and then the next day I was like, well,

Starting point is 00:04:25 now that I have this, I can just go spin up who will translate, even though it's been sitting there for two years. And initially like it was just this, you know, bare-bones website. And now I've got like a bunch of settings and light mode, et cetera. Let me now interview you about this website

Starting point is 00:04:41 for our listeners. So this is huggletranslate.com, H-O-O-G-L-E, translate.com. And what does it do, Connor? So the front page here looks very much like just the sort of Google front page where you have just a search bar and a few settings off to the side.

Starting point is 00:04:59 But we're expected to type something in. What do we type in here? And I'll start off by, I've got a lot of comments because I released a YouTube video on this saying it's a terrible name, change the name. And then a couple of people being like, you realize there's already a tool out there called Hoogle. And I'm like, yes, that's where I borrowed the name from. There is an existing website. If you Google Haskell, Hoogle, it'll come up. And Hoogle is is a type signature, aka function declaration search engine. You type in the type signature of a function and it will return you all of the functions across Haskell libraries that match that type signature. So it's a great way for finding a function that you don't know the name of.

Starting point is 00:05:38 For instance, I did it one time when trying to find the Python in function. You're trying to check is there an element that exists in a list of elements. That's called in in Python, it's called contains in many functions. In fact, if I now type in here, I'll get the Python in. And if I click on the in, it's called LM in Haskell, right? One version of it is called LM. And now you get a list across languages of the different names for algorithms that share the semantics. So behind the scenes there's a couple different searching methods.

Starting point is 00:06:11 When you just type in an algorithm name that's called the by name search. But then when you click on a specific languages implementation that switches to a by algo ID which is basically an ID that maps to a specific semantics of an algorithm. So sometimes there's algorithms across languages that share the same name, but have different semantics. A great example of that is partition. So partition, when you type it in, has, I don't know what this 15 languages, and there's four different colors. And the colors in this by algo ID or like by semantic search means that partition across languages has four different implementations. Some of the times it's the

Starting point is 00:06:50 C++ partition where you are basically given a unary predicate and you're putting the ones that pass it at the front and the ones that fail it at the end and the two other languages that share the same color are D and then CUDA, which makes sense because CUDA is just basically a parallel version of the C++1. And everything else is basically different implementation. So I think partition for Haskell and friends, I mean, I actually have to go to the docs, but you can see that when I click on partition for Haskell, I now get partition copy and split with an elixir.

Starting point is 00:07:26 And my guess is that you're writing to two separate sequences here. So you're returning a tuple of two sequences. Yes, exactly. Instead of like mutating in place. And right now I play a little bit fast and loose. I'm going to be adding something called signature search, which is the exact same as who go where you type in the signature. And like the worst offender right now

Starting point is 00:07:47 is if I type in fold and then click on one of these, I get all the reduces, folds, inserts, aggregates. And the semantics of some of these are different and the best way to see that is if you compare the type signatures. A lot of these, there's a difference between one that takes an initial value and one that doesn't take an initial value.

Starting point is 00:08:09 And that's typically like you'll see Kotlin shows up both for reduce and fold. Rust is the same thing. Many languages are like this. And the difference is whether it takes an initial value or not. This is going to get split. And there's currently an issue open to say, you know, build in signature search and then while you do that, be more clear about what do I mean by, you know, same semantics. I've been playing a little bit fast and loose because technically the partition copy in C++, it doesn't return a tuple of lists, right? It writes the two different sequences and there's always gonna be...

Starting point is 00:08:43 It is the only standard algorithm that does that in fact it is indeed yeah, and There's some other cool things that like if you this was added after the fact someone requested it if you click on a language It'll give you all the Algorithm so if we go to C++ which there are a lot of languages represented here Many as as we might expect, of different paradigms, you know, we've got the C style,

Starting point is 00:09:12 the Algo family, we might say, C sharp, C, C plus plus, go, Python, Rust, we've got the array languages, APL, we were, we've got Excel at languages APL we were We've got Excel at the bottom Excel even yeah, and we've got functional languages. There was one language. I didn't know how to pronounce in there I didn't recognize it's the Crescent Yeah, this is I think I'm gonna I'm gonna change this to moon You can see that half of these characters aren't recognized. This was a

Starting point is 00:09:42 Watcher of one of the live streams. I think their personal language. This is an array language I take it. It's a I'm not sure if it's an array language. It is a symbolic glyph language that it does have a lot of the same primitives transposed so I wouldn't be surprised if it's an array language but okay the thing about this site is is this whole site operates basically on a markdown file that is just a table. And then thanks to Claude 3.7 again, I basically built up a CI pipeline that whenever the markdown file gets changed, it kicks off a GitHub action that runs a Python script, parses the markdown file, and then generates a Clojure script dictionary. And that's what basically runs the site.

Starting point is 00:10:23 So anyone that wants to can basically just go and open a PR and add any language or any missing algorithm. And even it's it's now that if you want to add like third party like technically range V3 is here but that's hidden by default. And then also in array languages a lot of the times they don't have explicit glyphs because it's just a combination of two or three to get that. So you can click show expressions and for certain array languages it'll show you that you know like any is just an and reduction in all the array languages. Currently right now we don't actually have those added but in the future you'll see stuff like that. And one of the coolest things that I've added because one of the like the motivation for this is one it's a useful tool for finding algorithms

Starting point is 00:11:05 You know who goes very useful a who will translate is is also super useful I've gotten a bunch of comments saying like I was working in Python and I I knew this thing should exist But didn't know to check in more inner tools, which is like a inner tool library extension But the other motivation is I I love putting these graphics Especially if I'm at a non C++ conference, in talks where I'm talking about Python and I'll say, I know this as this in C++, but it's also known, if I can just show this thing, but manually creating these is a pain. It takes 30 minutes to go and handle this.

Starting point is 00:11:39 So now that it's online, people can submit PRs, but I've also wanted to be able to show the differences across multiple languages in like a nice table. And so my favorite one these days is if you type the name of a language, and then you can type a bunch of, you know, multiple of these. So swap is the the C combinator known as flipping Haskell, which I guess has should be here as well. But then if you continue to type, it'll build up. Oh, a multi-column table showing each algorithm in a different column.

Starting point is 00:12:09 Exactly, and adds a little sad face. Sad face for when the algorithm's missing in that language. Sometimes this is just because it hasn't been added, but specifically for what we're looking at, we're looking at the combinators in that match to BQN's swap over before, I can I can add after and then it's still and I guess we haven't explained this is that I have I don't actually

Starting point is 00:12:30 have a name for the algorithm it's it's kind of basically it creates a frequency map enclosure it's actually an algorithm called frequencies where it's the equivalent of the counter data structure in fact if we copy and paste this and we type in frequencies, you'll see that there's two languages that have a function called frequencies, racket and closure. And when we click on this, this is this is histogram. Exactly. This is creating a histogram. Yeah. And in Python, the idiomatic way to do this is actually not to use an algorithm. It's to use a container from collections called counter, which is why there's a little asterisk beside this because once again, yeah, it's it's it's playing a little fast and loose,

Starting point is 00:13:09 but it uses frequencies to basically figure out what's the most common. And then I've stolen the colors from Excel 2003, which has got one of the best color palettes out there, folks. And then it describes purple to the most common name. And then I guess blue is the next most common and it goes down the rainbow and so across this you can see that you know over aka the psi combinator is the most consistently named across these array languages but then some of them like every single language chooses a

Starting point is 00:13:36 different name for it and some of them don't actually have the glyphs and so you can do this for anything. You can go C++, any, all. And then technically, it should be any of and all of. But then this shows that by a landslide, the most common names for this are any and all. And if you're actually using this to build up graphics for a presentation, I've

Starting point is 00:14:02 added the feature that you can control and click on any row. So if you don't want, you know, C sharp, Elm, Julia, common list, whatever, get rid of once you don't want to put in your graphic. You can build the graphic that you want. So anyways, it's, there's still lots of stuff that I'll be adding over time, but it's a it's a neat tool. Now the obvious thing for me to ask is for you to type in but last. tool. Now the obvious thing for me to ask is for you to type in but last. Oh yes that's why we came here. So I don't think it'll be here but when I was doing my masters I was doing a lot of small talk work and I know that small talk the ferro implementation specifically does have the equivalent of but last and so I guess init would be the one that so and it's not even in here from Haskell and but last won't show up either but I shall after

Starting point is 00:14:49 the fact go and get these and if we go to ferro small tap talk but last algorithm it's very hard to search fair and actually I don't why are we doing this let's go to chat GBT. Hello, good friend. I don't even know, I got into the trap of Googling when I would never do that if I wasn't on this podcast. You were about to say something though. Well, I was gonna say we want the right answer. So if chat GPT will give us the right answer.

Starting point is 00:15:18 Oh, that's the thing is if you give it a lot of context, if you say, I know in Pharaoh small talk there is an algorithm identical to Haskell's a knit and is it common Lisp? Yeah. Common Lisp calls it but last. Do you know the name? It nails stuff like this and nails stuff like this. All but last.

Starting point is 00:15:44 All but last implemented on sequenceable collection. OK. And they even give us a little snippet using the hash paren for a list. And so, yes, I should add this. And you can note that, you know, on the Hoogl Translate, the ferro, there's only four algorithms from ferro small talk that I have here and that's because the stuff that mostly or half the stuff that exists in here was what I needed when I was interested in looking into something like two or two and a half years ago and I think that was like roughly 300 but we've I think since a week or week and a half ago it's doubled to like 600

Starting point is 00:16:22 algorithms across languages because people have been open in pr's and so I imagine at some point it'll it'll get to like a thousand or two thousand but yes init butlast all but last so what is the correct name for well in common list but last can take an optional integral parameter so you can do but last n right but last is is but last one. But if you give it an integer you can actually chop off the last end. Right. And this is this is something that will also be built in because right now for the most part things just map to one algorithm ID. But over time things with optional parameters or default parameters like closure a lot of the times their partition algorithm I think technically maps to a couple different algorithms and with the type signature

Starting point is 00:17:12 work you'll be able to basically map a single algorithm to multiple semantics so it'll show up in every place. Common Lisp also somewhat confuses you confusingly. Well, it has n but last because it uses n in the name of an algorithm to mean a destructive algorithm. N reduce is another one. I think n reduce or n, I can't remember, but n append, I think, n concat. These are algorithms that change their data structures in place rather than returning a new one so but last return to a new list with everything but the last element and but last does not as you might think have anything to do with selecting n elements but it changes the

Starting point is 00:17:58 list to chop off the last element interesting that's what the prefix n stands for in common list that is, that is the conventional, that is the convention, yes. And I think n-concat is another example. So you give it two lists. Concat just concatenates the list, returns the concatenation. n-concat sets the tail of the first list to be the second list. So it effectively destroys the first list or changes it in place. Do you know why what the history but behind choosing prefix N?

Starting point is 00:18:34 I don't know why N is used in that way. Certainly those variations exist for efficiency. Right. Right. Yeah, that totally makes sense. And that is, I mean, that is the part of the issue of a site like this is that you want there to be like a clear, you know, almost strict how semantic, how you group algorithms by semantics.

Starting point is 00:18:58 But when you get to a language like C++ where we have a smattering of in-place versus non-in-place algorithms that are inconsistently named with the underscore copy suffix is at a certain level, semantically they are the same. But, and it's also like, I've been thinking, I haven't added, I think someone actually the other day added reverse and reversed from Python for the reverse algorithm. But those are an in place versus one

Starting point is 00:19:26 that's returning a copy. And you know, how do you how do you deal with that? Do you split that because the type signature is different, right? One returns void and one returns a new copy list, the new collection. Yeah. And or do you come up with some way of I type sort and it shows you two different columns of like in place or out of place. I haven't really figured that out yet. But I mean, that's one of my, I mean, I've complained about that in the past too, is that, you know, sort is in place and transform is out of place. But like, why did we not consistently like have a naming convention where it was either underscore copy for anything. So it should really be transform copy.

Starting point is 00:20:05 And we're kind of just inconsistent with that, because you might think that like, oh, right, this one doesn't have a copy. Therefore, it must be in place. And that's not always the case, right? Yeah, interesting, though. Commonless calls it n. And array languages don't care.

Starting point is 00:20:21 I mean, I think sometimes there's some internal optimizations based on they can reason that there's only one thing happening. So they show everything in place, but that's not something the user necessarily necessarily is aware of. Yeah, and concat is the interesting case, or an interesting case in common list, because so regular concat. So both and concat and concat don't have to change the second argument, right?

Starting point is 00:20:49 But in the case of concat, you need to create as many cons cells as are in the first list. You basically create a new first list with the second list as its tail. The second list can be the tail of something and not know about it. So the second list, because a list is really just a pointer to a con cell, which is designated ahead of a list, right? And the rest just flow after it. So the non-destructive version of concat allocates N new concels for the first list and then appends the second list just with a single operation by sending it to the tail. But the n concat doesn't need to allocate.

Starting point is 00:21:37 It steps down the first list and when it hits the end it just sets the tail to the second list. So it turns the first list into the longer list while still leaving the second list alone. And just to clarify, because concat can mean different things. I mean, sure enough, I'm sure if I type concat here, you can see there's multiple colors. It sounded like concat is just for two different lists? It's appending two lists.

Starting point is 00:22:01 Gotcha. Okay. And returning a third list. Yeah. Whereas my guess is across these languages, one of them is like a flatten where it can take any list of lists and flatten them all. Interesting. Yeah. So that makes complete sense. Yeah. Can cat might be able to take variadic arguments in some lists. Well, you know, for know for simplicity sake let's just consider the two argument version. Yeah Interesting and I'm trying to think like in C++ basically Well, there is no really equivalent. I mean in ranges land they do have join now, but if you're pre ranges

Starting point is 00:22:43 It might be that right Like if you have like a vector effect, like that stuff is statically typed, right? So you can't turn that into like a one dimensional vector. So you're always going to be doing some kind of copy, out of place copy. Well, it's like a splice. Splice is in unordered map or map. I'm going to say one of those. There's an operation or is it called merge? I've heard of merge. I definitely haven't heard of splice. Oh, splice is an operation on std list.

Starting point is 00:23:18 Yeah. Okay. So splice is the list version. Transfer elements from one list to another. No elements are copied or moved. Only the internal pointers of the list nodes. Transfer elements from one list to another. No elements are copied or moved. Only the internal pointers of the list nodes are repointed. Yeah, that's the same. OK. Which makes sense, right? Because that's common LISPs.

Starting point is 00:23:36 I assume implementation is some kind of pointer-chasing thing. Interesting. All right. And merge actually also exists. And I think merge exists on the list and also on unordered set, unordered map, forward list. So it's a very similar sort of idea. And there's a freestanding, I think, merge algorithm as well, I think.

Starting point is 00:23:57 Well, yeah, but that's something different. Right, the merge algorithm is taking two sorted sequences and outputting them and zipping them together into a third. Again, zip is an overloaded term. But yeah. Right. Anyway, so yeah, this I guess this spun out of what to call the for each but last and then that made me think, oh, I have a tool for this.

Starting point is 00:24:22 Not fully not fully built out because it doesn't have any of the names. You have a tool that doesn't quite know this yet, but they will. Yeah, yeah, over time. And that's the thing is that my hope over time is that languages that I have zero familiarity with that potentially have extents of libraries, like D is one of those languages that has a lot of interesting names. And Swift, actually, the other day, I think I knew this

Starting point is 00:24:47 and I had forgotten it, that Clojure calls their scan, which is overwhelmingly the name chosen for the incremental fold that's opening all the running results, Cotton calls it running reduce and running fold. Clojure calls it reduce for the reduction and reductions for the scan. Oh, okay. Okay. Which is kind of nice. It's one of the few algorithms or few languages that shows that

Starting point is 00:25:12 they're related in the naming of it. And Swift also, I don't know if they borrowed it from Closure, but Swift in their algorithm library, which is an extension of what's provided in the standard library, they call it reductions as well, which I thought closure was the only one. But yeah, it's neat when you stumble on different names but that are arguably better names. I mean, it's- Right, right.

Starting point is 00:25:40 I'm reminded of Tails. I think in Haskell it's called Tails. It might also be called tails in Lisp, although maybe not. Cause that's like- Which is like prefixes. Yeah, they have a knit and a knit, and tail and tails.

Starting point is 00:25:53 Yeah. That's like- Yeah. I forget exactly what it is, but it's something like all of the, so the tail is the Lisp minus the head, and the, or the cuder in lisp and the cdr and the cdr.

Starting point is 00:26:09 Yeah. The lisp folks understand what you just said. Or anybody that's read SICP probably understood as well. Yeah, I think actually BQN, you know, my favorite language of the day, they have two versions of that. And I believe they're called prefixes and suffixes. Yeah, prefixes and suffixes, which arguably is a kind of nicer name in my head

Starting point is 00:26:36 because scan, a lot of the times, they're known as prefix scans, or they involve the word prefix because it is, different prefixes that you're computing the reductions of the prefixes yeah yeah I mean it is very cool I love when people add stuff to the site because then it shows that you know sometimes like filter and even arguably filter you know there's there's been conversations about are you filtering in are you filtering out is that the the best like but conversations about are you filtering in, are you filtering out?

Starting point is 00:27:10 Is that the best? But some of them, you know, filter, map are almost unanimously, unfortunately not for C++, but almost unanimously are agreed upon. And then there's other ones where it's like there's maybe three languages and two of them are like C++ and CUDA C++, which really should count as one. And then every language has a different name for it, which is this is like the vocabulary that we use to write. It's a great, the site is good for naming, you know, as well as, as well as exploring languages and by exploring languages, we get the idea of what, what is the common name if there is one or what are some of the names with some of the ideas around this you know also because things school different names have slightly different signatures different languages right so you get to explore the design space around the algorithm. Yeah I mean it's kind of hurts my soul a little bit though when you when you see the.

Starting point is 00:28:04 when you when you see the the rainbow end after like seven languages and then everything else is white because there's zero agreement on the naming of it. One of them is is prior adjacent transform which we now have in C++23. It's called prior in Q, zip with next in scholar or Kotlin I can't remember which one. But like every language has a different name for that. And to me, it's a fundamental algorithm. I don't know how many times, I think in your talks, in Ivan Kuchik's, I'm going to pronounce that wrong. It's been a while since I've met him. But anyways, multiple talks, I've seen the zip-tail trick.

Starting point is 00:28:39 I think I've seen it in Odin Holmes, maybe a given talk as well. That's this pattern is a, you know, adjacent transform. And it's a real shame in my opinion that it's not as like popular as map and filter and reduce like it should be up there. Anyways, we'll wrap up. Who will translate? Be sure to check the show notes either in your podcast app or at adspthepodcast.com for links to anything we mentioned in today's episode, as well as a link to a GitHub discussion where you can leave thoughts, comments, and questions.

Starting point is 00:29:08 Thanks for listening, we hope you enjoyed, and have a great day. I am the Antiprase. Um. Um. Um. Um.

Your Ad Here

Algorithms + Data Structures = Programs - Episode 230: Hoogle Translate

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.