Signals and Threads - Building a UI Framework with Ty Overby

Episode Date: October 6, 2021

Ty Overby is a programmer in Jane Street’s web platform group where he works on Bonsai, our OCaml library for building interactive browser-based UI. In this episode, Ty and Ron consider the function...al approach to building user interfaces. They also discuss Ty’s programming roots in Neopets, what development features they crave on the web, the unfairly maligned CSS, and why Excel is “arguably the greatest programming language ever developed.”You can find the transcript for this episode  on our website.Some links to topics that came up in the discussion:Jane Street’s Bonsai libraryThe 3D design system OpenSCADMatt Keeter’s libfive design toolsTry .NET in-browser replJane Street’s Incr_dom libraryThe Elm Architecture “pattern for architecting interactive programs”React JavaScript libraryThe Houdini proposalSvelte UI toolkit

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to Signals and Threads, in-depth conversations about every layer of the tech stack from Jane Street. I'm Ron Minsky. It's my pleasure to be talking today with Tai Overby. Tai is a software engineer who's been at Jane Street since 2018, and his work here has been focused on our tools for building web applications, and specifically on a library he designed called Bonsai. We're going to talk a lot about the ideas behind Bonsai and about UI development more generally. But first, Ty, can you tell us a bit more about your background
Starting point is 00:00:33 before Jane Street? Yeah, so I think my first experience with web development was probably when I was in middle school writing pet pages for my Neopets. After that, I've been doing hobby programming ever since I was a kid, went to the University of Washington, and worked on compilers for a while at Microsoft. I decided at some point to move to New York, so Chain Street seemed like a great place to continue my love for programming languages, and somehow managed to get on a team doing some programming languages and web development stuff, which I think is pretty exciting. So you got to combine two different loves. Yeah. You talked a little bit about your hobby programming. I know that you're someone who
Starting point is 00:01:15 really spends time and really enjoys digging into all sorts of things in your spare time. Can you tell us a little bit more about what kind of things you've done on the side? Yeah. So I guess a running theme of mine is that I wind up taking the hammer that is compiler development and bashing everything that I can get my hands on with it. I think my biggest, longest running project is probably a computer-aided design library suite of software where in order to describe the shapes for your object,
Starting point is 00:01:41 you write programs, little functions that go from some point in space to the distance to the outline of your shape. Then there's all sorts of fun combinators that you can write on top of this and all sorts of fun optimizations that you can do if you somewhat limit the capabilities of the person using it, taking compiler optimizations to the computer-aided design workflow. So you're diving right into the abstractions in the math, but to step back for a second, the thing you're talking about, as I understand it, is what's called programmatic CAD,
Starting point is 00:02:13 which is instead of having something that's the moral descendant of McPaint or something, where you go in and interactively construct some 3D object that's going to then be printed in a 3D printer or constructed in some other way, you write a program to do that. And there's actually lots of stuff out there that lets you do this already. One of them that I've played around myself and with my kids is called OpenSCAD. Yeah, so OpenSCAD is programmatic CAD. I believe it operates on a boundary representation. So this is very similar to what you would get in CAD software or Blender or any of the other 3D modeling softwares, which is that the fundamental primitive is the points or a face, whereas the style that I'm particularly interested in as a functional programmer is called
Starting point is 00:02:58 functional representation or F-rep. And these two different styles of representing shapes lead to some really interesting trade-offs some things that are really easy in b reps or boundary representation are very hard in f reps and vice versa right and just to see if i have the description right in f rep is something where instead of describing explicitly through a collection of polygons or whatever the boundary of some surface you write down a function and thengons or whatever, the boundary of some surface, you write down a function, and then the boundary is something like the zeros of that function. So implicitly, you're describing the boundary by just writing down what the function is. They are also called implicit surfaces.
Starting point is 00:03:39 Why is it better to use implicit surfaces versus a boundary representation? You said there are some things that are easier and some things that are harder. Can you give me a sense, as someone who might use this, why I might prefer the implicit representation? Oh, I think a lot of the combinators are trivial to express. So things like taking two shapes and finding their union is, well, the two shapes, they're two functions. So you apply them both, and then you take the maximum of whichever of the two points. And the function that combines these now describes the shape that is the union of those two shapes. These functions return numbers that represent the distance to the edge. So zero is on the edge. Negative numbers are inside and positive numbers are outside. So union would
Starting point is 00:04:20 be max, intersection would be min, and neg, or taking a shape and making the outside inside and the inside outside is just literally negation. And you can use negation and union to get cuts. And also something that you can't do in boundary representation stuff is just write some code. And I really enjoy just being able to think of a function and see what kind of shape it would create. Right. Although, I mean, you can write code. OpenSCAD lets you write code that generates
Starting point is 00:04:48 boundary representations and manipulates those boundary representations. So can't you do it programmatically in both styles? Probably. I think some of the shapes that I'm interested in are things like the interference pattern between two sine waves, each of them describing distances to the outside of a shape and cutting that off via a circle. And the cutting off via a circle, OpenSCAD can do just fine. But describing a shape implicitly is something that implicit surfaces do pretty well. And the type of programming that you do is very different. And I can immediately see how your functional programming background
Starting point is 00:05:22 leaks into the way you talk about CAD, right? I think people who are doing CAD don't normally talk about combinators. They might talk about the operations you use for combining shapes. And I guess those two are one in the same in the way that you're describing it. Yeah. As I understand it, there are other things that are supposed to be good about implicit surface style representations. Like I've used OpenSCAD a lot. And one thing that's really annoying about it is you have to decide early on when you're constructing the boundaries, how detailed the boundary representation is going to be. So there's essentially almost a tessellation of the surface or an approximation of the
Starting point is 00:05:55 surface by a bunch of little tiny polygons. And then when you do things like merge things together, you can get all sorts of weird artifacts that come from exactly the kind of beats in the frequencies of the different polygonalizations of the two shapes and get all sorts of weird things that happen on the boundaries. And I think the implicit surface approach gives you something where while you're doing all of your transformations, you have everything in some sense at full resolution. And then finally, when you render it, you get to decide how you want to do the final display and how detailed that needs to be. Yeah, that's exactly right. Sadly, one of the downsides is that doing that
Starting point is 00:06:30 final transformation from F rep to B rep, which is useful if you want to 3d print something or pull it into a game engine. That's really hard. As far as I can tell, no one has done it perfectly yet. There's an amazing project by Matt Keeter called Lib5, which I've built a ton of stuff on top of, and I've loved using that, and it's very good, but it still has issues with weird edge cases. In this case, literally edge cases. You can imagine interesting things happening to a shape
Starting point is 00:07:02 as points go to infinity or negative infinity. And unless you're very careful, you can easily create topologies that are hard or impossible to render into a regular 3D file. Right. And I guess if you have points that go off to infinity, it's probably hard to find a 3D printer big enough to actually render the resulting surface. It's true. Okay. So that's a fun kind of hobby. You said you did a bunch of web development
Starting point is 00:07:27 on the side as part of your personal work. Have you done web development in a professional context as well before your work here? Yeah, so I helped out on a project at Microsoft building a in-browser REPL. I don't know if any of my code is still alive in there, but I think that's it, actually. I think that was the only brush that I had with web dev professionally before Jane Street.
Starting point is 00:07:50 Got it. And then somehow, despite the lack of any professional background, nonetheless, you ended up building the library that is now Jane Street's default library for building user interfaces, which is called Bonsai. So let's talk about that a little bit. So when you arrived here, we already had some foundation for doing web development. There was a library called Inkerdom, and this was based on a compiler component, which we still use called JS of OCaml. So JS of OCaml is a backend that takes OCaml and converts it into JavaScript, doing actually a surprisingly good job of respecting OCaml semantics. And we had successfully built a number of user interfaces using Incredom and JSF OCaml, and also run into some trouble with it. Maybe you can tell
Starting point is 00:08:33 us a little bit about what Incredom looked like to you when you arrived and explain how it works for someone who's never touched it before. For people that are familiar with the Elm architecture, Incredom would seem quite natural. Let's back up and talk with the Elm architecture, Incredom would seem quite natural. Let's back up and talk about the Elm architecture for a little bit. Elm is a programming language built entirely for developing user interfaces in the browser. Certainly its standard library is very focused on this. And it also comes with a design pattern. And that design pattern is separating the logic of your application into a view function that takes inputs and the model for an application and produces a pure functional representation of its view. And then also a apply action function, which can take an existing model for the application and
Starting point is 00:09:27 an action that is triggered either by the user interacting with the page in some way or by interactions with a WebSocket connection or other I.O. And it'll take this action and the current model and produce a new model. And then the render function gets called again with the current model and produce a new model. And then the render function gets called again with the new model. And hopefully the changes that you want to see represented on the screen make their way there. Great. And so this is in some sense, just another spin on the classic model view controller pattern and model is your representation of data view is what you're going to display. And the controller is somehow how you interact with the outside world and do transformations
Starting point is 00:10:06 to your model, right? And I guess this whole thing will sound very familiar to anyone who's used React, where React is also based on a notion of a pure functional representation of the DOM. You have some just pure data structure that represents the DOM you wish you had, and it gets slammed into the actual messy, complicated tree of objects that is the actual DOM in the browser in a hopefully efficient way. Can you say maybe a word about how Elm architecture differs from what a React programmer might be used to? On the surface, they appear to be very similar. The one thing that React users typically reach for is a state management library.
Starting point is 00:10:43 And this is changing over time. I think it's a bit less common now that React has adopted some of this into the library itself. But there were some very popular libraries like Redux or MobX that did the pure functional model updating that Elm has kind of built into its framework. Prior to this, the way that you did state in React was by putting state in your components. And this is handy. It's private. You don't need the rest of your application to know about this state. But it also has some problems, like that if a component is removed from the component tree and then added back in again, the state is gone. So it's a bit more like having a mutable variable inside of your
Starting point is 00:11:26 component tree. And so people reached for state management solutions that provided the model to action to model transformation that Elm has. And just to maybe say it in a slightly different way, the Elm approach is to some degree, like a good first order approximation is your UI is a function and the key function is a function from model to view. And it really is just a pure function where the model is just simple data and the view is just simple data and in fact, immutable data at that. And then there's this extra action step where inside of the view, there's some essentially description of how actions can be used to update the model, say when you click click a button. And that leads to an event which asynchronously will later then cause the model to be updated. So you don't get this whole encapsulation of hiding state inside of components.
Starting point is 00:12:14 Instead, you have this pure and in some sense more totalistic, like, no, just the whole UI is a function. And then we compose functions in the ordinary way that we compose functions in the context of functional programming. Yeah, exactly. And I think a major selling point for libraries like this is your application as a function, where you take the state and you transform it into a view. But really, I think that that's the least interesting part.
Starting point is 00:12:37 I think the way that you get applications that people want to use is by having really well-defined and edge case-free interactions. And that means transitioning from one model to the next in a way that makes sense to your users. And I think that modeling things as a state machine like Elm does and like some of these React state managers is a great way to do that. So that's like a new piece of jargon, a state machine. When you say it's modeled as a state machine, what does that concretely mean? Yeah, so I guess the abstract notion of a state machine is you have some state,
Starting point is 00:13:11 and then you have actions that express a desire to change the state, and another function that takes an action and state and produces a new state. This separation allows programs to more accurately read when interesting things happen, as opposed to direct mutation. So you might notice a few actions coming in altogether, and you could debounce them to separate multiple actions into just a single one. You could imagine a user accidentally hitting a button twice or typing something a bit too fast, and you might want to put some guardrails on that and make sure that they aren't doing anything accidentally. This is the type of thing where if in an action handler
Starting point is 00:13:50 you were directly mutating some state, it might be a bit harder to do this. But if you're representing everything as transitions between states that are described by actions that are raised by events, then this kind of intentional transition becomes a lot more possible. Which is to say, you might think about something like this as being primarily about two different types of data, the model that represents the state of your application and the view, which is how you present it to the user. But really there's three, there's the model and
Starting point is 00:14:19 there's the view. And there's also the action, which is the thing that represents transitions, ways in which you change your model. And you have some kind of explicit way of talking about the way in which transitions cause the state to update. And it's that structure that you're referring to as a state machine. Yeah, exactly. Great. So how does this Elm model that you described, how does that relate to what was like? Inkerdom took the pattern that Elm popularized and left it relatively unchanged, with the exception of a single edition, which was combining it with a library that we have developed here internally
Starting point is 00:14:57 called Incremental. And the Incremental library is effectively a smart memoization technique, where if you write a function using the incremental library and you run it multiple times where the inputs are changing only slightly between runs, then you could expect some better performance if the output that you're producing is changing slightly in response to inputs changing slightly. Right. and this goes back to a general fact I've long noted about UI frameworks, which is that hidden inside of every UI framework is some kind of incrementalization framework as well. Because you basically need this incrementalization for performance reasons everywhere. UIs in general, what are they doing? They are computing something to display to the user and those things need to
Starting point is 00:15:46 change quickly. And one of the things that can help them change quickly is they don't change all at once. Little bits of them change. You go and click somewhere and some small part of what you're seeing changes, not everything in the entire view being transformed all at once. And so you need some way of computationally taking advantage of this. Maybe we should mostly leave off those gory details here as well. In some sense, the difference between Elm and Incredom is that Elm has a way of doing incrementalization and Incredom has incremental, which is a more powerful way of incrementalizing, meaning you can express more complicated algorithms incrementally. But I think that's
Starting point is 00:16:19 kind of mostly not the thing that's interesting about the gap between Incredom and Bonsai. So maybe we shouldn't talk too much about incremental, but I'm curious what you felt was problematic about incremental. The story you've told so far is Elm is this nice way of describing things. And then Inkerdom takes the Elm model and adds this more powerful incrementalization framework. That sounds great. What doesn't work that well about the model? One of the things that I think incremental does differently than your average React application is that it also incrementally computes the state machine transition functions, these apply action. And this is really nice. It means that you can thread values
Starting point is 00:16:57 throughout your application and make use of them in the state machine transition functions, just as you would threading values through your view function. But it does mean that combining components becomes a lot harder, because you've not only got to take the views from multiple components and stitch them together into one parent view, but also you need to compose the apply actions for child components into this super component. And the way that this is done with incremental is very boilerplate heavy. The downside of having this library that expresses a lot more of the power of incremental computing
Starting point is 00:17:40 means that these compositions were quite verbose. So this whole point about the action application function itself being computed incrementally, on the face of it seems very abstract. Can you give an example of where that gives you something useful? We have this library that implements a UI component for very large tables. And one of the things about this table that's quite unique is that it can sort and filter and display an absolutely enormous amount of data, like hundreds of thousands of rows. And really a lot of the magic there is in the view where we only actually show a very small subset to the DOM. But during that sorting and
Starting point is 00:18:18 filtering, we produce a lot of intermediate data structures that could be very useful in downstream components. You might imagine that the sorted and filtered table might expose internal structures that allow you to very efficiently implement row selection, where you want to be able to navigate through the table using a keyboard. In a naive implementation, they might have to sort and filter all of that data again. But because we've done it incrementally, and because the apply action function is also computed incrementally, we can share that data a lot easier. It sounds like the problem you see that comes out of all of this is that it makes the system,
Starting point is 00:18:57 in some sense, less composable. It's harder, or at least requires more boilerplate, to take little pieces of a UI and combine them together to make larger UIs. Do you think that problem is worse for Incredom than it is for Elm? Or do you see the same problem in both approaches? I see the same problem in both approaches, but I think that the boilerplate that you get from Elm is quite minimal. And for Incredom, it was absolutely not.
Starting point is 00:19:21 I believe that when I made the simplest possible combinator for two Incrodom components, which was just combine their models, or rather take a supermodel that splits into submodels and apply action that would dispatch to one or the other, and a view that combined both subviews. This was something like 50 to 70 lines of code. And what that meant was that people were making monoliths where they didn't want to break their code up into multiple pieces because they knew that they would have to do this recombining at some level. And that recombining was error prone and it was tedious.
Starting point is 00:19:59 Maybe in the opposite order, it was tedious and therefore error prone. Great. So this is a problem that you wanted to solve, right? I think it seemed clear to you that it's not great to not give people a way of building little reusable components. And if everyone's building monolithic things, they're not going to throw off small, useful, reusable bits. And you built this library called Bonsai. How does Bonsai help? Bonsai started out basically as a single type definition that encompassed what I saw as the right way to build an Inker DOM component. This type signature was almost a reification of a programming pattern. that can incrementally compute a view, arbitrary extra data that might be useful
Starting point is 00:20:45 to downstream components, like we talked about earlier, and an apply action. And this function is given a model to act on and a function for raising events so that it can perform the state machine transition. And so Bonsai really started out as that function type. So just a bundle, you just take together the function for computing the view from the model, the function for updating the action, the way of throwing actions out into the world,
Starting point is 00:21:15 just take all of those things and pack them together. And that's what a Bonsai component was. Exactly. Yeah. And I was able to write some pretty nice combinators on top of this. So things like given two components, give me a new one that combines their models, combines their apply actions in the trivial manner. And then you could specify how you wanted the views combined as well. But fairly early on, I recognized that the fact that UI components were computing, well, the fact that we called them UI components at all was like a bit of a misnomer. Really, they're something producing. And the fact that they happened to be views most of the time was a bit of an accident, really. So among the first things that I did was remove the hard-coded view from that type.
Starting point is 00:22:11 And so now instead of a UI component that produces a view and maybe some extra data, it was just, here is a component, not necessarily a UI component, that produced some result incrementally. And frequently that result type would be a view, but it doesn't have to be. I think more than half of the components that I write for a web application won't be manipulating views at all. They will be state machines that produce some results that I think might be interesting. And maybe they get transformed into a view later on the line, or maybe they get passed in as inputs to other components. Right. And this is a real break with the style of components that you see in systems like React, right? Every React component, by its nature, is producing some chunk of DOM. And here, Bonsai is a kind and its compositions in the form of a directed acyclic graph rather than as a tree.
Starting point is 00:23:12 And this is the thing that allows you to get components that produce values to thread them in to inputs for other components. Whereas in a React-style UI, the value that you're computing incrementally must be of you. So if you want to communicate with other components, you're almost forced to walk up the tree, pass something on to its parent, and then that parent is responsible for shuffling down whatever value you've computed. So that means that in React, when you break down your components, you essentially have to do it by slicing it along the structure of the generated DOM. That's right. The structure of these components wind up being very closely related to the structure of the view that you
Starting point is 00:23:57 want to compute in the end. There are workarounds for this. One of the ones that I'm thinking of now is that a parent component might produce a function that gets passed on to its children, and that function produces some more view. And this is a good way of kind of doing a bit of a decoupling, but, well, it doesn't really match my aesthetics for how I want to be programming. The tree style of building components in React also doesn't let you move values from one child's component of a node to another sibling. What's the case where you'd want to do that? One motivating example might be a text box and a button. And you want the text box to have some text in it before the button is clickable. Now, in a tree-motivated UI component, you might have a component for your text box, a component for your button, a callback that you pass to the text box to be notified when
Starting point is 00:24:52 there's text in it, and then a value passed into the button that says if it's disabled or not. Whereas with a graph-oriented structure, you could have the value of the text for the text box flow into the button, and the button could check to see if its invariants were upheld. And the button doesn't need to be a child of the text box. It just needs to depend on the value that it has produced. So one thing that's changed from the original Vue is that the components aren't specifically Vue or virtual DOM producing components. They're more general components. How else does a Bonsai type differ from what you might find in some other UI component framework? Well, when you're comparing it to Inker DOM or Elm, one of the major differences is that
Starting point is 00:25:41 components don't actually expose what model they have, or really even if they have a model, and they don't expose what their action type is either. So really all that a bonsai.t has, which in bonsai, the primary type is called a computation, it's only parameterized over what value it produces. And any state machine and transitions that it has is local to it. Part of what's going on here is you're basically giving a good way of compartmentalizing the model and the action. So essentially, there's a part of this which is like a function, which takes some input and produces some output, and part of this which is a state machine. And it gives you a way of composing together functions in a way that lets you combine state machines as well. Is that a fair
Starting point is 00:26:30 description? Yeah, depending on who I'm talking to, I'll describe Bonsai as a framework for building incremental state machines. So given that description, how general do you think Bonsai is? Is it useful outside of building web UIs? Is it useful outside of UI contexts entirely? I want to say yes, but I've been looking for an alternative example for a very long time and haven't been able to really come up with anything. Is that true for both the question of user interfaces that are not web user interfaces? Oh, no. I mean, I think Bonsai is broadly applicable to user interfaces, as long as you have some
Starting point is 00:27:06 abstraction that exposes the desired view for your UI as a pure value. That is one of the downsides of Bonsai, is that effectively everything inside of it has to be pure. It becomes very hard to reason about when you have mutable values. So it might be hard to build a Bonsai UI on top of platform du jour UI component library, which is primarily mutation-based. But it should be possible if someone wrote a virtual DOM style abstraction on top of it. Right. And this restriction, I guess, is essentially the same restriction that systems like React
Starting point is 00:27:44 and Elm have. And Elm, I guess, at the language level enforces purity, and React, being in JavaScript, doesn't. And similarly with Bonsai, because it's an OCaml, which is a language that supports mutation without restricting it, like at the level of types. Also, it's the case that this is just a mistake you can make. If you use Bonsai and you use side effects in the middle, it's going to be confusing, and that's kind of that. So early on, you said that a repeated pattern in your way of thinking about the work you do is ways to turn the problems you solve into compiler problems, into programming language problems. And we haven't really talked about how that affects Bonsai. What is the way in which the
Starting point is 00:28:21 Bonsai work also looks like a compiler project? I've been describing Bonsai as a incrementalized directed acyclic graph. And one of the invariants that we enforce about this directed acyclic graph at the type level is that it can't be dynamic. That is to say, once you've constructed the super component that represents your entire application, all possible combinations of components are known. And this lets us effectively do a number of compiler optimizations. If you look at this DAG as a program, instead of as a forest of nodes communicating with one another, then you can employ some fairly rudimentary compiler optimizations
Starting point is 00:29:07 like constant folding or collapsing multiple nodes into one if you detect that one of them is going to run really fast and you would rather not have the overhead of incrementalization. In some cases, we can even transform the algorithms that are being used on a node by node basis. So we talked earlier a bit about the library incremental allows you to express a lot of powerful algorithms incrementally. And one of the places where it really shines is when it deals with map-like data structures. And we are able to pick and choose from a number of these different algorithms
Starting point is 00:29:45 at the point where Bonsai application is first evaluated to pick the one that is the most optimal for that particular node. Right. And you're talking about this as being compiler optimizations, but it's not like you're hacking the OCaml compiler to do something. Where in the process of doing this does the compilation-like phase fit in? Yeah, so it happens exactly once, and that is when you start the Bonsai application. You give a Bonsai.t to the functional equivalent of a main method, and it'll take that, it'll crunch it, it'll optimize it, and then it'll start running. After it starts running, we don't touch it anymore. But at that point where the user hands off
Starting point is 00:30:28 the representation of their application, we're able to fully traverse the entire component graph and see where we want to make changes. Right, so when someone is building up a Bonsai computation, the thing they're immediately building is something that's closer to the abstract syntax tree of an expression that represents that computation. And then you're running a compilation step to convert that into the real live running
Starting point is 00:30:50 computation. Exactly. That's why we call the thing computation. It is an unevaluated expression, and that's what they give to us. And we can look through the entire thing and then start running it. Right. So that's a great approach, but it also limits you in some ways. Yeah. The way I think of an application written in Bonsai is really an application written in
Starting point is 00:31:09 three different programming languages. The first one is a metal language. This language happens to be OCaml, and it manipulates computation.t and composes the structure of the application. But once this metaprogramming is done, it's done. You don't get to add new possibilities for components at runtime. The second language is Bonsai itself, which when you look at code written in Bonsai, you can identify things that really look like programming language constructs. So we have our own if, we have our own way to do recursion, we have functions and ways to apply values to these functions. The third programming language is the one that you alluded to, which was that the actual
Starting point is 00:31:52 business logic, the code that's running inside of these abstract notions of components is again, OCaml. At this level, you cannot add to the component graph. It's restricted to performing calculations on values that are being passed through it and applying actions to transition state machines. But yeah, after stage one, you're kind of set in stone. And maybe I should go back a little bit and say that when I say that the graph itself is static after this first stage is finished, I don't mean that the application can't change
Starting point is 00:32:26 itself dynamically. So in Gmail, for example, you've got your inbox view and your compose view and your reading someone else's email view. And in Bonsai, you can absolutely switch between all of these at runtime. But at compile time, well, what I'm calling compile time, it's known statically that there are those many views possible. And it can traverse all of those separate subgraphs, even though there isn't any data actually flowing through it, even though those components themselves aren't active right then and there. And this is a kind of classic trade-off here where you're taking away some real freedom from someone who is constructing the UI. There's some kind of
Starting point is 00:33:11 highly dynamic things that are maybe impossible or at least awkward to express in the context of Bonsai. But in exchange, what you have is the ability to do more things like these kind of optimizations that you're talking about within the framework. So in some ways, you take away power from the user of a library, and you give power to the implementer of the library to provide better services to the user. Exactly. Yeah. And I would say that the services that we provide go deeper than just compiler optimizations. I've been saying that I will build a debugger for a very long time, and I haven't yet, but it would be possible to build a very useful debugger that, up ahead of time, knows the exact structure of your UI and allows you to navigate the DAG and poke values and see
Starting point is 00:34:02 what the states of various components are in a way that I think would be a lot harder if we didn't have this upfront solidified representation of the application. So one of the things that strikes me about the way that you describe all of this, it's in some sense quite abstract, right? There's all these fairly fancy sounding ideas about how we build computations and compose them together and how we're going to have our own language inside of the language and our own version of if and our own version of recursion. It kind of sounds like a lot.
Starting point is 00:34:34 I wonder, how does that affect the ability to construct a tool that you can give to people and they can easily use? What's the learning curve like with Bonsai and how hard has it been for people and teams to pick it up and use it? Yeah. So I think it's getting a lot better. It's still not great. I learned a whole lot trying to teach very early versions of Bonsai to people. And in fact, I think we ran about 50 people through a class that included Bonsai in that class when it was still very young. And their frustrated expressions still motivate me today. I think that one of the things that we've been leaning on quite heavily is the ability to extend OCaml's syntax itself to make
Starting point is 00:35:18 expressions in Bonsai look a lot like expressions in OCaml. So here I'm talking about the PPX system, which lets me write little extensions to OCaml's syntax like let for binding variables, or if, or match for evaluating different possibilities of data that's flowing through the Bonsai graph. So in a way, the structure of the Bonsai code looks an awful lot like OCaml, but still has all of the restrictions that we were talking about earlier. So this is another way of turning this into a programming languages problem, right? You've not just inside of Bonsai itself, do you do programming language style transformations? But you've also moved up into the OCaml syntax itself and added essentially new language features to OCaml
Starting point is 00:36:11 to make it easier and more convenient to express Bonsai computations in a way that feels normal to people. Yeah. And I think that it's one of the things that I really enjoy about a number of our other libraries that operate monadically, in that you don't really need to know what a monad is in order to use them effectively. But if they have radically different syntax than what you're used to, it can still be an enormous pain. And so by extending the syntax for monads, I think that a lot of people that start at Jane Street don't know what it is, don't really need to care.
Starting point is 00:36:44 They'll learn eventually as they see numerous examples of monads in the code. But it looks so much like regular OCaml that you can think of it as though it were regular OCaml. Yeah, this kind of just reminds me of the general fact that the ability to transform the syntax of your language is such a great feature. And it's always sad when you run into languages that don't have it. Just having some kind of syntactic abstractions where you can just write some piece of code that transforms it to simplify things. There are downsides to various kinds of macro capabilities in that the macros themselves can be hard to reason about and
Starting point is 00:37:22 can sometimes give you awkward error message and all of that. But when done right, it's just enormously powerful. Part of the world and large parts of the languages out there just never adopted. I think they're moving in that direction though. It's really hard to think of a programming language that has been designed in the past 10 years that hasn't at least acknowledged the power of metaprogramming and good metaprogramming in some way. I'm thinking here specifically about like Zig and Rust as examples of languages that are very static and very serious, and they still have excellent metaprogramming capabilities. I guess Java just recently got closures, so maybe it'll get metaprogramming next?
Starting point is 00:38:08 Yeah, one would hope. I think C Sharp got source code generators, which don't let you extend existing files, but let you produce new ones based off of the contents of a syntax tree. Yeah, and it's maybe worth saying, I think OCaml's metaprogramming facilities are okay. They're not great. It's great that they're there and they're extremely useful. But I think people who work in Lisp or Scheme or Racket or something in that would mostly point and laugh at OCaml's macro system. Yeah, I think that's right. It's simultaneously too powerful in that you can do basically anything and inject arbitrary code. But it's also because of that, it's really hard to just make a tiny one-off extension to OCaml
Starting point is 00:38:53 that you're using perhaps only within lexical scope of the definition of this macro. It would be great if we had some kind of macro system in addition to compiler extensions. Yeah, I agree. And there's some exciting work in the direction of strengthening OCaml's support for various kinds of macros. And also, I think a thing that would be great would be to have macros exposed in a modular way. Like right now, what macros you're using is a thing that's kind of controlled entirely
Starting point is 00:39:19 from the build system and is not integrated in a real way in the language. And I think that's another thing that would make the system more usable and easier to apply in more cases. So a big part of landing a new library like Bonsai is evangelism, right? You can write the nicest library you want, and if no one uses it, nothing good happens. How have you approached the problem of evangelizing Bonsai and getting people excited and getting people using it inside of Jane Street? I think one of the things that I'm still very happy that I did was backwards compatibility with Anchor Dom in that you can kind of spread throughout an Anchor Dom program, providing value to people that have existing applications. And once they see the difference between the code that they're maintaining and the code that they're adding, there were a lot of people that were very motivated to convert wholesale and start new applications using just Bonsai. And how has that gone in practice? How
Starting point is 00:40:17 popular now is Bonsai compared to Inkradom for people who are building new things? I'm not aware of any new serious projects that aren't using it. But I'll also say on the point of evangelism, I really enjoy it. Well, the main goal for Bonsai was to make people want to modularize their applications and break off bits and pieces that other people might find useful. So whenever someone does this, I will be their evangelist. And I maintain an enormous collection of examples using components that I've written, but also a lot of the other people have written. And I will make sure that any new application knows exactly what powerful features other people at the company have contributed. What are you dissatisfied about with respect to Bonsai?
Starting point is 00:41:02 What are the remaining things about it that are problematic that you feel need to be much better than they are now? One of the things that I'm a bit disappointed by that isn't directly related to Bonsai, but we still get flack for it, is composing components can lead to issues when you're styling things with CSS. Because if a component author wants to ship a CSS file alongside their component, there could be issues with name clashes, where two people pick the same name for a thing, and now they're overriding each other's styles. This is largely solved by doing
Starting point is 00:41:40 component styling in our virtual DOM library. So styles added directly to the virtual DOM nodes. But I think that CSS is a much maligned pattern matching language. And the things that you can do in it are incredibly useful, especially for expressing things that are possible in OCaml, like highlighting every other row in a table a different color. But also there are just things in CSS that you can do that you couldn't do
Starting point is 00:42:10 with inline styles, like changing the appearance of an item based on whether or not it's being hovered over, for example. So this is a pretty weak point when you're writing something using Bonsai Web. And I have a project about to get merged that I hope will fix this, which also uses OCaml's extension language. And the way that it works is by writing CSS directly in your OCaml file inside of an invocation to this compiler extension, it'll pull out all of the CSS, look through the names, give them all unique names, and then expose those names to your OCaml program so that you can use these unicified hashed name values that are coming out of a very crisp and clean CSS string. So the goal here is to specifically get rid of this case where you essentially pun your way into the problem, where you had two different classes for nodes and you happened to name
Starting point is 00:43:13 the same thing, and you're just going through and making sure that you have no unintended name clashes. Yes. Well, in fact, we do that by just hashing the contents of your string and also the file name that you're currently in. And appending that hash will generate something that hopefully doesn't wind up clashing with anything else. I'll also say that the CSS language itself is not what people complain about when they complain about CSS. What they're complaining about is the implementation of layout in the browser, which can be very confusing, and the complex interplay between various styling rules. So these things aren't going away with a PPX.
Starting point is 00:43:56 They might go away with some other abstractions. We hide the details of the algorithms that are used to display and layout content. But CSS itself, no, it's quite nice, I think. Got it. So you think this does solve the primary problem here, which is this kind of name clash issue? I think I can come up with other things that I don't like about Bonsai. Right now, you need to repeat yourself a lot. Moving from one language to another, in this case, the Bonsai component language into the actual OCaml code that manipulates values at runtime is quite verbose. I would really like to get rid of a lot of that repetition. Right now,
Starting point is 00:44:36 it's not repetition that I think would cause people to inadvertently write bugs, but you do wind up repeating a bunch of names in multiple places, which is a bit unfortunate. And it's something that a very recent addition to OCaml by our own Stephen Dolan is going to hopefully make better, and that is let punning. This verbosity, the repeating of these names is going to be fixed in a very new version of OCaml that lets you ignore cases where the variable that you're binding to has the same name as the value that you're binding. So that leads to another question.
Starting point is 00:45:14 I'm curious what you think the ups and downs are of OCaml as a language for writing web applications. The normal language to do it is in JavaScript and there's lots of other, you could write in Elm, you can write in TypeScript, you can write in Dart. How do you think OCaml measures up as a language for doing this kind of work? I have to say that web development at Jane Street is very different from web development anywhere else in that we have some of the fastest networks that money can buy. And because of that,
Starting point is 00:45:42 a lot of the things that would kill Bonsai on arrival at any other company, like somewhat enormous JavaScript bundle size and not being able to do server-side rendering in order to cut down on initial load times, or being able to split a application into many files so that you're only loading the ones that are needed on the initial display. Things like this that are commonplace in other languages and frameworks are simply not present. And that's because here we primarily use our web browsers as a 2D vector graphics library where like we don't care too much about web pages. Almost all of them are a single page that loads a whole bunch of JavaScript, connects back to the server with
Starting point is 00:46:30 a WebSocket connection, and sits there chugging along all day. And the constraints that most web developers have to face are just simply non-existent. Right. So for us, the web browser is just like another place to run an OCaml program. And inCaml program that happens to generate a UI. make this very easy in a way that JavaScript, for example, does not. Is there anything about the language itself that you feel like gets in the way from a kind of expressiveness and convenience point of view? Are you mostly pretty happy with OCaml as a language for this kind of work? Yeah. Yeah, absolutely. I think if I wasn't happy with it, we wouldn't be having this conversation. Well, you could be like happy,
Starting point is 00:47:29 but have some axes to grind about things that you'd like to make better. I feel like it's rare to be so happy about a programming language that you don't have complaints. Oh, I definitely have complaints, but they're mostly from a library author's perspective. I think there's not a ton that I think I desperately want from my user's perspective. I mean, I'm going to regret saying this. As soon as we're done,
Starting point is 00:47:53 I'm going to come up with like five things that are like actually deal breakers. So to ask a grander question still, in what ways are you dissatisfied about the web as a platform for writing user interfaces? I think that the one thing that every UI author has to contend with is the only options we really have are JavaScript and the DOM for programming language and UI runtime. And I think that this may change as WebAssembly picks up Steam and as they add features that allow OCaml to compile down to it, like access to the garbage collector, or I guess ability for us to implement our own garbage collector. I think we would be happy getting the OCaml GC to run on WebAssembly if that was also possible. But I think that the DOM is going to be here for a while. And it does a whole bunch of really amazing things. The fact that anyone really can create
Starting point is 00:48:54 an accessible, stylable user interface with very little thought is unbelievably powerful. And I think that the prevalence of the web goes to show for just how valuable this is. But for people building user interface components that lie outside of what the styling and layout engines of browsers consider normal, it would be nice to be able to tell them,
Starting point is 00:49:23 hands off, we're going to be managing what we consider to be the DOM tree from here on out, and be able to forward on some of the invariants that we maintain to the web browser in order for them to make it as performant as possible. So how different is this from just drawing your own widgets on a canvas inside of the browser? In fact, I think I heard a week ago that Google is planning on doing this for various things in the Google Docs suite, right? Where they have done a lot of work over time to wrestle with the DOM and get it to do all sorts of things that the DOM was never designed to do.
Starting point is 00:49:58 Display hundreds of thousands of lines of a spreadsheet. And it sounds like they are giving up and they're like, now we're just going to render to the canvas. Does that give you the flexibility that you need? Is that enough? So yes, but at great cost, the great cost being re-implementing text boxes. Clearly there's more than just text boxes, but I think text boxes are an incredible example of complexity that you really don't notice until it's wrong. Things like selection not wrapping around lines correctly, or maintaining cursor position and moving them correctly through strings that aren't necessarily ASCII, doing text layout, doing text shaping.
Starting point is 00:50:46 These are things that, yeah, Google's going to have to do, and they have the manpower to do it. But if you told me that I needed to implement a text box on top of Canvas, well, man, we wouldn't. That's maybe best measured in years. So if what you want is not to like have the, just the raw ability to kind of draw stuff to the canvas, you want something that preserves more of the power of the DOM that exists,
Starting point is 00:51:10 but it still gives people more control. Like what would you do? What's the magic feature you would ask for from, from the web? I guess direct integration into the layout engine and maybe APIs that allow us to tell the Chrome renderer what things to avoid drawing, direct access to the layout engine, maybe direct access to their underlying 2D graphics renderer, and I think a more powerful integration with incoming user events, things like keyboard and mouse,
Starting point is 00:51:47 that allow us to take a bit more power. Is this a kind of thing that people are talking about, or is this just stuff you think would be awesome, but it's not on anyone's radar? I think that there's a proposal called Houdini that includes a lot of this. It's not totally clear whether or not it will be adopted as a standard and when those features will arrive or how powerful they will be. But I think that certainly there are people, probably the same people that are working on Google Docs, that have been bitten by these problems and are eager to solve them. So you spent all this time on Bonsai, which is a UI toolkit.
Starting point is 00:52:26 Are there other tools for building UIs that you are especially jealous of or have a lot of admiration for? Like, what are other great systems that you've encountered for building user interfaces? CHRIS BROADFOOTIS Yeah, so I think that the Svelte project is a recent UI toolkit, but also programming language that has a lot of features that I think make it very desirable for outside of Jane Street UI development. It fundamentally operates on the
Starting point is 00:52:58 basis that UI components are specified ahead of time. They are composed dynamically, but the UI components are passed through a compiler first, and it can do optimizations. It can find places where bugs might be introduced by composing things incorrectly. And it just feels to me like, I also haven't done any programming in Svelte, but reading over the tutorials, it feels like the type of thing that might have been developed for the web with hindsight. Are there any approaches to building UIs that you've seen outside of the web context that you're excited about?
Starting point is 00:53:34 Like there's a whole world, like, I don't know, there's a wizards for like composing UIs together and all sorts of systems for building UIs in different contexts. Is there anything in that world that you're excited about that you think does something impressive and good? No.
Starting point is 00:53:49 I think that UI-based UI design will almost always fall over. It will not expose enough power to the people that want it. And at that point, you need either very powerful hooks so that developers can attach things into a UI that was built in a GUI, or you need to transition that GUI into code and then hand it off to the dev. And I haven't seen any of these work particularly well. Arguably, one of the most successful systems for building user interfaces is Excel.
Starting point is 00:54:21 I wonder how you think about that fitting in. Excel is arguably the greatest programming language ever developed. Certainly, its ability to convince non-programmers that they aren't doing programming is unbelievable. It builds in an entire incremental framework for reducing recomputation of values. And it makes people integrate the data input that they have to do with the viewing and analysis that they want to do in a way that I just really haven't seen replicated anywhere else. Right. I mean, Excel is another case where it does the thing where it's very limiting. You get the best behavior out of Excel when you do the maximum amount possible in the equation language.
Starting point is 00:55:07 And the equation language is very limited. Although excitingly, they recently got let expressions and closures. But still, it's a very limited language. But once you accept those limitations, boy, oh boy, is it a nice system to work in. I don't want to build everything inside of Excel. I mean, I feel like there's like a whole separate long podcast one could have about all the things that go wrong when you use Excel.. Excel is far from perfect. But as a way of constructing easy-to-use and easy-to-understand interfaces for people doing data analysis, it's pretty impressive.
Starting point is 00:55:52 Absolutely. And I think that one of the reasons for that is that there's this very tiny barrier between seeing something on your screen, seeing a visualization or some results for a formula, and editing the formula that produced that result. You don't need to dig through source code and find out where that value is being produced. You click on the thing and it'll show you exactly how and why and where those values and those computations are coming from. And it's unbelievably powerful. And I've been experiencing a different kind of visual-based programming in the Blender program, which is a 3D modeling, animation, rendering. But I'm using it almost entirely for shading, in which a typical shader is a source code in a language that looks a lot like C,
Starting point is 00:56:46 and you write a program that operates on usually a single pixel, and it produces a color at the end. Blender has a wonderful setup in the form of what they call nodes, which is this graph construction UI where you create nodes that represent different combinators over numbers or colors or textures. And you can visually see the output of an individual node in the editor. And as you combine and compose all of these different nodes, build up something that is practically, well, for me at least, literally impossible to do in code without seeing all these intermediate steps represented so easily. When you say seeing the intermediate steps, do you mean seeing essentially the instructions
Starting point is 00:57:35 in the intermediate steps? Or do you mean actually being able to interrogate and see the results in the middle of your image transformation pipeline? The latter. There's a single key binding that if you have a node selected, it'll change the output for the entire shader to just be that node. And of course, well, all of the things that it depends on. But there's also an incredible plugin that'll actually inline previews above every node that you care about. And so you can see just what does this node contribute to the rest of the program. So part of the value here is just like a really
Starting point is 00:58:11 good debugger. Yeah. Yeah. Oh, absolutely. Right. And it makes sense. It's for a kind of programming where a lot of the ordinary things you would do for understanding whether you're doing the right thing just kind of don't make sense. When you're doing this numerical transformation of data, when you make mistakes, your program doesn't do something like enormously wrong. It just looks wrong or something. And it's hard to know what's going on if you can't visually inspect the intermediate components. Right. And a lot of the tools that we bring to bear on programs written out in source code just can't apply here. How do you write a test that passes if your shader looks good? You can't. You need a human to look at it and tweak the
Starting point is 00:58:52 parameters and make it appease some art director somewhere in a movie studio. The debuggability and inspectability is the feature. It would not be what it is if it were merely a projection into source code or out of source code. And like with Excel, it has managed to convince a whole bunch of artists to do programming without knowing it. And I think that's really empowering as well. Well, I look forward to you tricking a bunch of Jane Street people into thinking that they're not programming while they develop UIs inside of our walls and also provide them with excellent debugging tools. That's right. Yeah. Get back to me on that one.
Starting point is 00:59:33 Will do. Okay. Well, Ty, thanks for joining me. This has been a lot of fun. The world of UIs is enormously complicated, both the stuff we're doing and just the whole world. And it was fun to get your view on it all. Thank you for having me. This has been a blast. You can find a full transcript of the episode along with more information about some of the topics we discussed,
Starting point is 00:59:53 including Ty's Bonsai Library at signalsandthreads.com. Thanks for joining us and see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.