Drill to Detail - Drill to Detail Ep.91 'GraphQL, Data Platform Engineers and Dagster' With Special Guest Nick Schrock

Starting point is 00:00:00 so welcome to drill to detail and i'm your host mark rittman so for today's episode i'm joined by nick sch Elemental, before that one of the co-creators of GraphQL, and now working on a new open source project called Dagster. So welcome, Nick, and thanks for taking the time to come on the show. Thanks for having me, Mark. Let's start off first of all to Facebook and what you're doing there with GraphQL. Sure. Yeah, we can do the journey from Facebook to GraphQL to Dagster. I'll do the short version. So I was an engineer at Facebook 2009, 2017.

Starting point is 00:00:51 And the beginning of my career there was really defined by this group called Product Infrastructure. And our mission was to make our product developers more efficient and productive by building frameworks. So I built a bunch of and participated in building a bunch of internal frameworks, which kind of culminated with GraphQL. So I wrote the first prototype of GraphQL and was the tech lead of it for a couple of years internally.

Starting point is 00:01:18 And another associated project that I didn't have anything to do with was React. They were kind of a sibling team, so to speak. But the project shared DNA at a minimum. So I like to say I was present at the creation of a full hipster stack. And then we ended up open sourcing that, co-wrote the spec, and that has gone on to pretty broad adoption in the industry, which has been pretty gratifying to see. I left Facebook 2017, and then I was figuring out what to do next. And so one of the interesting properties of GraphQL was I was pleasantly surprised by its adoption in more traditional enterprises like the KLMs and Walmarts of the world, not just cool kids in Soma.

Starting point is 00:02:07 And I thought that was really cool. And I thought it was an opportunity. The fact that you could get these bottoms-up adoption stories in traditional enterprises is kind of an opportunity. So that's what kind of started me on this path. So I just started talking to lots of people and asking what their biggest technical liabilities were. And data infrastructure and all infrastructure and related stuff just kept

Starting point is 00:02:28 on coming up over and over again. And so I started to look around and I found what I like to describe as developer experience dumpster fire. And I'm attracted to set dumpster fires like a moth to a flame. I just can't help myself. When I see those types of inefficiencies in the world, I'm attracted to set dumpster fires like a moth to a flame. I just can't help myself. When I see those types of inefficiencies in the world, I'm just compelled to work on it. And then, so I was talking to traditional companies and I actually started talking to the aforementioned engineers in Soma and they were saying the same exact thing. And so when I started looking into this

Starting point is 00:03:00 and what really reminded me of was development in kind of the web stack prior to react typescript graphql both culturally and technologically meaning that people would kind of dip into it modify some scripts and try to get out and the development tools were not good and people felt like they were wasting all their time. Like when I heard data people say they feel like they spend 90% of their time data cleaning and 10% of their time doing their job. It felt like to me, someone telling me 10 years ago that they spent all their time fighting the browser.

Starting point is 00:03:37 And I believe it's a fundamental software abstraction problem. And then I started to poke around a bit. And to me, these DAGs effectively at the core of all of these pipelines, whether they're ETL, ML, et cetera, to me, it all boiled down to the same problem of their graphs of computations that consume and produce data assets. And I thought it was both, one, developing in these tools was typically really brutal. And then second of all, I thought these DAGs were a huge point of leverage in the data systems

Starting point is 00:04:14 because they touch everything. And all data must come from somewhere and go somewhere. And then I started prototyping around and exploring things and one thing led to another and here we are with dexter interesting interesting so so we we actually we actually um corresponded a few years ago when uh when you were working on on graphql um and yeah because actually my interest was spurred at the time because i was working as part of the

Starting point is 00:04:42 engineering team um at cubit a a tech startup in london where he was using react and graphql everywhere and was really enthusing about it and we were you know and we were you and i were sort of chatting and and i was saying yeah i was trying to get i was trying to think about getting you on the show and trying to think of an angle really with with kind of graphql um that would be appropriate for this audience um but but then you know you you've now you've now ended up in a way in our world or certainly sort of like on, you know, as part of the ecosystem. And I suppose what interested you in, what was the link between what you're doing now?

Starting point is 00:05:14 And we'll get on to what, you know, Dagster is and maybe the origin story of that. But what was the link between what you're doing now and GraphQL and those things? You know, you talk about sort of toolkits and so on, but how did you make the jump from GraphQL to what you're doing now? Yeah, the linkage to it is more about the focus

Starting point is 00:05:31 on developer experience and the cross product of finding where to improve a developer experience while simultaneously finding a point of leverage in the application stack where you can improve everything systematically. And so those are the type of projects I like to work on. And I think you can kind of kill three birds with one stone in effect. You can, in a generalized sense, across both these stacks, which is you can dramatically improve the developer experience for

Starting point is 00:06:04 the person it directly affects. You can improve the core architecture of the system by providing these abstraction seams, which provide points of leverage. And then you can, in fact, improve the lives of other stakeholders, too, by providing tools on top of these abstractions. So I think that is, the linkage is the common approach,

Starting point is 00:06:25 the identification of an important problem to solve, and kind of these instincts around the cross product of developer experience, architecture, and stakeholder inclusion. Okay, okay. So tell us what Dagster is then. So Dagster is a data orchestrator. So if you're building an analytics pipeline, a ML pipeline, or in broadly any data infrastructure that involves data movement, you are building dependency graphs.

Starting point is 00:07:00 And they're dependency graphs that consume and produce data assets. So at its core, DAGster is a data orchestrator, which schedules and orders data computations and production, similar to other workflow engines, which we're really going to talk about. But I think what really distinguishes DAGster from those systems and the approach is that we really focus on the DAG as a focus of a full developer lifecycle so that you should be able to develop it very quickly. You should be able to test these things. You should be able to parameterize them. You should be able to execute subsets of them. So that was really the initial instinct around the project was that the DAG is the application itself. You shouldn't just think of it as a completely disconnected graph where you're just kind of invoking scripts is the initial instinct.

Starting point is 00:07:56 So it's this API for developing what we like to call data applications. And then it's also infrastructure for executing those things. And lastly, then tools to monitor, track, and observe both the computation and the perused assets. So who do you see this actually, who is the user persona that you see this aimed at, really? And what would they be wanting to do? So I suppose, what is the use case really for an user story around the product, really? Yeah, that's a great question. So I suppose, what is the use case really for an user story around the product, really? Yeah, that's a great question. So there's two primary audiences. Well, let me step back for a second. And the interesting thing about these orchestration tools and the DAGs is that they inevitably involve everyone. So everyone typically needs to interact with these systems in order to

Starting point is 00:08:43 put the data products they're developing into production. So subsetting the target for an orchestration framework is actually an interesting positioning challenge, which I don't think we've totally mastered. But the two, you know, so if we subset that by kind of the nature of the user and their values and inclinations. So there's two primary audiences that have been drawn to the system. One are what I'll call full stack data practitioners who are responsible for both data products as well as rolling their own infrastructure. But they're often people who have hopped from adjacent domains or they're kind of classically trained engineers where the separation of concerns between infrastructure and their business logic just makes a ton of sense to

Starting point is 00:09:37 them. And the second cohort of folks who are super interested in this are what I would consider an emerging persona called data platform engineers who are responsible for supporting data practitioners. And I think the cleanest definition of a data platform engineer is their job is to enable end-to-end ownership of data products by their practitioners and then the relationships between those practitioners. So a data platform engineer that's really good at their job will enable, say, their team managing their ML pipeline to completely be able to develop, test, deploy, and monitor that pipeline with zero intervention from the data platform engineer. That's kind of their primary activity. And the data platform engineers often see what we're trying to do very clearly, and it speaks to them.

Starting point is 00:10:33 Because we talk about mirroring software abstractions with organizational boundaries. And so those are the two primary audiences. But what's interesting about it is that because of the nature of orchestrators, you naturally bring in many, many stakeholders quite quickly into the system. engineer, bringing it in to data practitioners and just data scientists and data engineers programming to the API, and then expanding it to have oftentimes non-technical stakeholders self-serving the operations of these pipelines. So it's been incredibly interesting to watch it unfold at organizations. Okay. So you mentioned data platform engineers there.

Starting point is 00:11:20 And so how would they differ then from a data engineer? Because I think people understand the concept of data analyst. They're just getting the idea of an analytics engineer. But how would you differentiate a data platform engineer from a data engineer? It's a great question. And what's really confusing about this is that the terms have different meanings at different organizations. And this is why we break down the world into three kind of meta personas. One are practitioners, and they are

Starting point is 00:11:51 responsible for data products. Then there are stakeholders. These are people who are interested in data products. And then lastly, you have platform, which are people who support the practitioners and the stakeholders. And what we find is that some people self-title data platform engineers, and some data engineers act as data platform engineers. Because if you ask any particular person who is a data engineer, maybe half of them are responsible for actually curating and maintaining data assets in production, and maybe half of them are pure infrastructure players. So I think there's a lot of confusion with the titles. So we target people who self-title data

Starting point is 00:12:32 platform engineers and data engineers that are data platform engineers, if that makes sense. Okay. And just to be clear, what do you define as a data platform? Just so it's really clear to people when they're listening? Yeah. So inside of every company, there is a data platform. It's a question of whether it's acknowledged and staffed or not. And so when a data platform exists, I believe the moment that you start to integrate more than one technology. So the rule of the day in the modern data world is that these systems are incredibly complex and every business has their own needs.

Starting point is 00:13:16 They also have their own organizational history where maybe one team used Spark, another team used Now is new and they're using Snowflake and dbt or something. And these systems are just incredibly heterogeneous. So the data platform is where all these tools, you know, end up integrating and coming up with the definition of the fly here, but, you know, which is probably something we should work on. But to me, the data platform is the central substrate in an organization where all the tools have to come and play together. And sometimes it's unacknowledged and therefore it's managed with lots of manual processes or unreliable processes where people just know that some data is updated. At some point, they do like a checklist and a wiki and they're kicking off various things. work together really well in order to produce all the data assets that are needed by all the

Starting point is 00:14:26 business functions, which is everyone these days. Okay. Okay. So I've heard you, I've listened to a few of your presentations and talks, and I heard you say in the past, orchestration is the beating heart of the data platform. And you've talked about orchestration you know so far now some people would think of orchestration as being a con job you know whereas obviously in in more in more sophisticated environments it's a bit more than that i mean what do you mean by this orchestration is the beating heart of a data platform and and i suppose why is it a problem that you need needs to be solved really or by this by your product yeah so the reason why I say that is that, you know, I think maybe the DAGs are the beating heart of the platform

Starting point is 00:15:11 and orchestration manages those DAGs. And a lot of what DAGs are trying to do is move beyond cron jobs and move beyond just opaque scripts and make programming the DAGs feel much more like what I would consider proper application programming with a full lifecycle and much more agile development and so on and so forth. that the graphs are the beating heart of the data platform, is that I like to call the orchestrator the town square. Everyone has to show up there. If you're putting any data product in production, the person is bringing their own toolkit, right? Data science, people use very different tools from data engineers who use different tools from analytics engineers. And it all has to work together because all data has to come from somewhere and go somewhere. So the orchestrator ends up being this and has the opportunity to be this incredibly powerful central point of leverage and the kind of common denominator between all the different roles. And then furthermore, you know, if you're, you know, this, the, from the core DAG, if that DAG is data aware, then you can also end up doing the gold standard, which is eventually constructing a global asset lineage graph across the organization, which I think is the holy grail of this future data platform.

Starting point is 00:16:43 And I believe that tying that to the orchestrator in a pretty deep way is the only way to make that reliable. And, you know, so, but I would say it's that the DAGs are the heart of the application structure in the data platform. And then just by nature of what it does, it is this interconnective tissue that connects all the different teams in your org, if the dependencies are modeled. Okay. And so I guess you mentioned DAGs and you mentioned graphs.

Starting point is 00:17:16 And I guess this is probably the link back, isn't it, a little bit to your work before with GraphQL. And certainly graphs and DAags and so on it seems to be a sort of theme here maybe just explain for anybody that doesn't get what a dag is right what is a dag just to be really kind of clear on that for people yeah so a dag it stands for directed acyclic graph and all it determined it is a ordering um and you can think of it like if you're, you know, there are, you know, if you look at a cookbook, right, a recipe is a DAG, where you have to start cooking something. And then once you produce an intermediate, then you can start the next step. But if there's another process that doesn't depend on the first one, you can do that in parallel. So for any recipe, you can actually construct a DAG. And it's similar in a data organization.

Starting point is 00:18:11 And the best analogy I've heard is for all these data pipelines and the way they interconnect is a factory, actually. And I got that from Chris Berg, who pushes data ops a lot. And so you can think of a data pipeline or a DAG as like a factory. And each node in the DAG is a station in the factory, and it's ingesting stuff, doing some sort of transformation, and pushing out stuff. So imagine a factory where you hit a certain machine, and maybe it has two outputs um and then feeds raw materials to two downstream machines and then do that so it's this ordering of nodes and then a cyclic meaning it can't go back on itself you can't go from a to b to c and then back to a i think we're listening to this who knows dbt, who has maybe heard of Airflow and so on, is thinking,

Starting point is 00:19:05 I know DAGs, we've got DAGs in those products. I mean, it sounds like you've taken the concept of a DAG or certainly the programmability of it sort of a whole stage further. I mean, maybe just tell us how does this relate to DAGs people do understand about now? And how has DAGs taken this idea further? What are you really adding to the conversation around this? Totally.

Starting point is 00:19:27 So we can start with dbt, because you mentioned dbt. And so dbt encodes a dependency graph as well. And I consider dbt a highly affiliated project philosophically that proves the approach, meaning that dbt conceptualizes the world as a graph of functions. project philosophically that proves the approach. Meaning that, you know, dbt conceptualizes the world as a graph of functions, but dbt is specialized for SQL. So every node is a select statement. It recursively invokes another select statement through their templating

Starting point is 00:19:58 language. And then they decide how to materialize that in the data warehouse on a different dimension. So you can decide like, Oh, this one's a view, and this one's a create table, and this one's a common table expression or whatever. And dbt has a bunch of similar properties to Dagster in that way, meaning we structure the same thing, but in Python. And the other thing that dbt does, which is similar, is that you can preload the DAG in your local development environment without deploying anything. And without having any heavyweight infrastructure, we have a very similar approach.

Starting point is 00:20:36 So, you know, one of our users said, you know, what DBT did for our SQL, DAGster did for our Python. And so that's very similar in that respect. But we consider them highly affiliated systems. And we have a first class integration with DBT and we consider them friends of the company and I hope they feel the same. And Airflow is more of a direct competitor and they encode DAGs as well, but I believe they encode DAGs in kind of the old school way, shall I say, where they don't really focus on the full end-to-end development lifecycle, meaning that their DAGs are kind of, the model is more of a DAG of bash scripts.

Starting point is 00:21:18 And, you know, because it's a natural evolution, right? There used to be kind of a bunch of different cron jobs. And you would say like, this one starts at 12am. And it usually takes under three hours. So we're going to start the other cron job at 3am. And that was terrible. And so, you know, Airflow solved this problem of kind of having a web UI on cron and then specifically ordering the dependency between those two scripts. But Airflow, youflow, the DAGs are typically very coupled to their deployed environment. So for example, like this takes Spark, for example.

Starting point is 00:21:53 Spark in Airflow, there is a Spark submit operator that typically is directly targeting like EMR or something. So there's no way to run the DAG without targeting this EMR cluster. Whereas in DAGster, what you write is a function that takes a data frame and spits out a data frame. So it's this unit of business logic. It's testable. And then you can alter where it targets based on what we call modes or what resource we apply to it, which means that you can have a local development flow where you can test your business logic, say, against a local Spark cluster on your laptop or just in a different computational

Starting point is 00:22:37 environment than production. And so it's just a very different approach where instead of just using this DAG as a purely operational artifact, we view it as a structure of the application itself. And then you should be able to locally develop it, test it, execute subsets of it, execute it in different computational environments. So, and then there's also other differences as well. Like DAGster encodes data dependencies. So you write functions which ingest data and spit out data.

Starting point is 00:23:10 Whereas Airflow exclusively models execution dependencies. So it just specifically models the ordering of computations. And we think that's a huge missed opportunity because with data dependencies comes parameters that makes things more testable. You can easily compute asset lineage if you have the data dependencies between nodes. And it also makes these graphs and graphs slash DAGs much more self-describing, where in our tool you can load it up and it tells you what its inputs and outputs are. And you can do that up and it tells you what its inputs and outputs are. And you can do that

Starting point is 00:23:45 without loading any infrastructure. And that's an incredibly empowering thing for getting context on what's going on and understanding everything or handoffs between teams and whatnot. Okay. Okay. I've also heard you refer to trusted data assets and we had Bar Moses from Monte Carlo a little while ago on the show talking about data observability. What's the story around testing trusted data assets, data observability with the product? Yeah. So the trusted data asset component of this is acknowledging that there's a goal. The goal of these systems is to produce trusted data assets for your downstream stakeholders. That's why we show up to work. That is the end goal.

Starting point is 00:24:31 And so we wanted to really make it clear that all of the work we do is in service of that goal. Improving testability, making the developer experience better, improving that, you know, dev lifecycle is all in service of that end goal. And, you know, so we imagine, you know, integrating with tools like Monte Carlo or Great Expectations or the data quality tools. But orchestration is a really important part of that process because you have to decide when you check data quality, and it generally has to be integrated into some of your production processes. So you do a data quality check, and then you decide whether or not to halt the computation or emit a warning or whatnot.

Starting point is 00:25:20 All that has to be solved in the orchestration layer. So the orchestration layer has to play a part in data observability. And then your last question was about our out-of-the-box data observability. So we, yes, because our system is data aware, it's quite straightforward for us to produce an index of assets where the orchestrator doesn't just track the pipelines of the computations, but also the produced assets that come out of it. And we consider our asset catalog what we call an operational asset catalog. So its primary role is, or one of its primary roles is to specifically link assets to computation. And this is super useful for operational context. So imagine that you are a stakeholder in a data

Starting point is 00:26:14 platform and you see that this, you know, some table called stories looks weird and you don't understand what's going on. Well, using our tool, you can just look up the table weird and you don't understand what's going on well using our tool you can just look up the table stories and you get this kind of base level information about it you can see when it was last touched by what pipeline you can then navigate to what pipeline produced it and then find the owner of that easily the tool rather rather than spamming a Slack channel or something. And this insight really comes from, we have a saying around the office, well, the virtual office these days, but the office, which is your stakeholders do not care about your pipelines. They do not care. They only care about your assets.

Starting point is 00:27:01 That's the world that they live in. And so we think really the assets for a large set of stakeholders should be the primary interface into a bunch of different use cases. So that's what we mean by out of the box data observability, is that this kind of integrated asset catalog that can enable operational context. Okay. okay so so so nick you talked earlier on about uh about improving the developer experience that's a kind of key motivation of yours so what does it what does the develop if you maybe paint a picture of what the development experience looks like you know what kind of language people use how do they kind of how do they run things test things uh how does it how does how does the the project get deployed i mean

Starting point is 00:27:44 talk us through that kind of developer experience. Totally. So it's an open source project. So you, you know, it's a Python project. So you pip install Dagster, and then you can write a, in plain vanilla Python, you can write a node of computation, just like a function, which does something.

Starting point is 00:28:04 We call that a solid. And then that is assembled into a pipeline via kind of this elegant Python-based DSL based on what looks like function calls, but it actually constructs the graph. So you write this like plain old Python. And then the moment you do that, you have access to all this tooling. So you can write maybe six lines of code. And then with our graphical tool, we call Daggett, you can just point Daggett at this thing and boom, with no infrastructure, no nothing, you can view this thing graphically. And then you can execute it locally and you get this live reactive Gantt chart. And we have a thing called the structured event log and all these tools that

Starting point is 00:28:45 assist with debugging and whatnot. So that's kind of the spin up experience. So we really want to make it easy just to kind of take this vanilla Python and then model the DAG and have tons of tooling around it. You can also invoke the DAGs or subsets of them or individual nodes in just plain vanilla Python. And so you can set up a unit testing loop and do all this stuff. So that's kind of the core developer experience. And as you get deeper in the system, you can have typed inputs to these functions, connect those things in a DAG.

Starting point is 00:29:27 And then furthermore, we provide kind of a system that allows you to separate your business logic from your environmental concerns. And that's what we call resources. And so the idea is that this provides you a structure from going all the way to hello world and then progressively growing through with you as the demands of your application go so kind of in the idealized state you're running this maybe on your laptop or in a development mode where you're targeting ephemeral resources in the cloud people the other thing is about data is that people have a ton of varied requirements and preferences in terms of the developer environment. So you need to provide a very flexible system to handle the use cases.

Starting point is 00:30:11 And then, primarily, DAX is a software abstraction, and it strives to be infrastructure agnostic. So when you go and deploy this thing, you can run on your laptop, you can run on an ECT node. We also have a really industrial strength out of the box Kubernetes experience where you can execute these computations in Kubernetes cluster, but we also have users who run on their own custom infrastructure or ECS or on Google products, et cetera, et cetera. So, you know, deploying this is typically, you integrate it with your own infrastructure. We have a number of different integrations to do that. And then you have a long-running service, which schedules jobs and then monitors them.

Starting point is 00:30:56 And so we really think about this end-to-end experience where you can kind of go all the way from your laptop to a mature data platform in one continuous continuum. That was kind of a tautology there. In one continuum. And the common thread between all of them is that you're developing these graphs, you're developing these DAGs, and the DAG is the structure of your application itself. So just like, you know, in other styles of programming, you execute your application in various contexts to get testing going in a fast developer loop, you know, data shouldn't be any different.

Starting point is 00:31:38 Data should have the same maturity and ease and delight of developer experience as all the other domains of software engineering. And I don't think that's true today. Okay. So in practical terms, let's imagine you're a DBT developer and you've got some transformations written in DBT, you've got some kind of DAGs there, and you need to go beyond that,

Starting point is 00:32:02 maybe to try and take the same approach with DAGs and so on, but do it with around the custom needs of your organization. You've got maybe sort of a spark job, as you said, you've got some files to move around, you've got things that go beyond what you can do in DBT. How would the person, I suppose, jump from understanding DBT and taking their code and working with DAGster? And how would they get started, I suppose, really, to get a flavor of Dagster and go beyond what they can do now? Right.

Starting point is 00:32:30 So, you know, I mean, obviously the obvious answer is you would try it. But in terms of how the – in terms of how, you know, I guess I'm not sure how deep to get into DPT speak. Yeah. Is there any integration, for example, between the products? Yeah. So we have an out-of-the-box integration um and you know we don't attempt to you know from our perspective dbt provides the local developer experience and api that we think is philosophically correct but is is more narrowly targeted to s only. So in the dbt context,

Starting point is 00:33:06 we have no interest if you're just working in dbt land in replacing that developer experience. Like they got it. They got it nailed. That's why people love the product. You know what, you know, from a dbt developer standpoint,

Starting point is 00:33:21 one, we want to make it easy to integrate with our orchestration layer. But second of all, once they kind of are moving out of dbt into Python land, it should feel philosophically very similar. So imagine you're a dbt developer and now, or you're using dbt, and then you need to hop back out because you need to do some ml and you're writing stuff in pandas say and you're taking data out of the data warehouse it has all the assets you produce in dbt now you're like okay now i need to write some transforms in you know pandas and scikit learn well we feel it's like philosoph for that person's philosophically very similar all you start doing is you write functions which ingest data frames and spit them out. So they're pure functions, and then you handle

Starting point is 00:34:11 where they're persisted on a different dimension, which means that you can be writing, you can say test these on your laptop, write to your file system, and then use the same business logic deploy them. And instead of writing to a file system you could swap out that component and write to your s3 buckets um to do to shuffle data between nodes and you know for dbt developer that's very similar to using what they call different profiles so often a dbt developer points at like a dev database and then a prod database but all all of their graph of SQL statements stays the same. Like I said before, one way to frame this from a dbt developer's perspective is like, if you're going to move into Python land and do computations and orchestration there, it should feel quite similar. Yeah, exactly. We have to also

Starting point is 00:35:04 remember that the world isn't just DBT. You know, the organizations have a much, much richer and more custom, I suppose, environment and set of needs than just SQL transformations. And you mentioned there sort of Python. I mean, how do I run Python code is the most common question I get asked by clients, you know, in that environment there. So maybe just one last thing here. I'm conscious of time for you,

Starting point is 00:35:25 but where does Elemental come into this? And what does Elemental add to this really? Because I mentioned at the start that you're the kind of co-founder of that company. So how do you see that? And what's, I suppose, Elemental's business model really around this? Right.

Starting point is 00:35:38 So currently we're an open source project. So Elemental is kind of the corporate host of Daxter. We're a venture source project. So Elemental is kind of the corporate host of Dagster. We're a venture backed company. And so it's kind of mimicking the process of developing, you know, technology inside of companies. So, and we're doing this by having our design partners and iterating the technology and working with them and partnering with them to kind of get the APIs in the system right. And then we're going to, then we're, you know, we are a for-profit company.

Starting point is 00:36:10 So we will be launching a commercial product that will both host the infrastructure for folks, as well as provide value-added features on top of that, that makes sense to centralize and put in a proprietary context. So really Elemental, its purpose in the short term is to host Dagster, make it successful and then build an awesome business on top of it. Fantastic. Fantastic. So, so where, so Nick, people will be hearing this and thinking this is really interesting.

Starting point is 00:36:39 How do people find out more about Dagster? How do they get to play around with it? How do they get their, their sort of their feet wet really in understanding how the product works? Totally. So we're an open-source project, so we use very familiar tools. So you can go and check out our project on GitHub. We are very active on our Slack. So if you go to Dagster.io, you can link to the GitHub. You can go to our Slack and ask questions. And then in order to start playing with it, I would recommend looking at our docs in our tutorial. And you can be, you know, once you get your Python environment up and running, which is everyone's favorite thing, you can, you know, within just, you know, a minute, I would say, get this thing installed and kind of write your

Starting point is 00:37:21 first line of code in it. So yeah, GitHub, Daxter.io, and you're off to the race. Oh, that sounds like you had that experience as well. Yeah, yeah, exactly. Yeah, yeah, exactly. In a minute, yeah. I looked at the Kubernetes one, and I got idle play around with that first of all, and I thought that is probably slightly overkill

Starting point is 00:37:39 for what I was trying to do. But no, I just pip install whatever and got it running. And like you say say and i was also impressed with the as much as we've been i mean this this could come across as a slightly technical conversation and an engineering conversation but the the user experience the actual ui i was really impressed by you know for daggett um so you know well done for that really i think as well as being a very strong product under the covers it's a very nice product to work with as well from a developer perspective. Yeah, thanks for that.

Starting point is 00:38:08 It's something we really focus on. And just to dig into that for a second, we really think it's important to bring a design orientation to developer tools. And not just to make them beautiful and pretty, which we hope Daggett is, but also to bring the thought process to it. Because in my experience, if you cannot elegantly visualize or manipulate the software abstraction or interact with it via a tool, it often indicates an

Starting point is 00:38:38 underlying problem. So it's important to have an end-to-end design process where you can kind of theorize about what the right software structure is. But then, you know, you go to like build a tool that leverages it and something doesn't work, then you go back and say like, maybe there's something fundamental wrong. is we've really focused on making Daggett approachable, fun to use, for I would say two primary reasons. One is that people end up spending a lot of time in these orchestration tools because it's where errors happen, and it's where you have to debug stuff, and it's where you monitor stuff. So you just kind of want to make it fun to be in, and that's really great. And then second of all, we also want to expand the orchestration platform to more and more stakeholders. Like, for example, we have several

Starting point is 00:39:32 users. My favorite is probably an example. This is Good Eggs, where they've actually, they power their data platform on Degster. And they have actually trained people in their physical warehouse on the floor to use our web tools to manage their pipelines operationally because they have contractors manually inserting stuff into Google Sheets. And then that gets ingested. And that's how they calculate comp and important aspects. And if something goes wrong in that the that manager can go and talk to the contractor be like oh you need to fix this field and then they restart the job and they can do that without bringing any engineers in the loop so that's like a that sort of empowerment of end and ownership is really important and empowering and you need approachable fun tools in order to enable that yeah fantastic well look

Starting point is 00:40:28 nick it's been great having you on the show really appreciate you taking the time to speak to us and uh yeah fantastic product i thoroughly recommend using it and we'll put links in the show notes to um to downloads and also to the various white papers you've written as well and so on so nick thanks very much for coming on the show it's been great to have you yeah awesome it's a pleasure glad to try and connect Thank you.

Your Ad Here

Drill to Detail - Drill to Detail Ep.91 'GraphQL, Data Platform Engineers and Dagster' With Special Guest Nick Schrock

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.