Software Huddle - Generative AI and LLMs with Dash Desai from Snowflake

Starting point is 00:00:00 Hey everyone, this is Sean Faulkner, one of the hosts of Software Huddle. So if you've been involved at all with the Snowflake world, the world of Snowflake, you know, today's guest probably doesn't need an introduction as he is the demo king from the Snowflake Summit and well known within the Snowflake builder community. I'm talking about Dash to side developer advocate at Snowflake. The background on this episode is that I've been part of the Snowflake Data Superhero Program and also involved in the community for a few years. And I've spoken at the last two Snowflake summits.

Starting point is 00:00:32 And after the past event in June, I got this idea that it might make for a fun podcast to go back through some of the announcements from the events and discuss what they might mean for those building with Snowflake and maybe even get some of those people who aren't building with Snowflake and maybe even get some of those people who aren't building with Snowflake excited about it. So even if you're not working with Snowflake, we keep things pretty high level during this interview. And I think

Starting point is 00:00:54 there's probably something for everyone. Snowflake is clearly making this big push around supporting LLMs and generative AI workloads with things like Snowpark containers, Document AI, and native support for NVIDIA Nemo and a bunch of other things that we get into today. There's been a ton of announcements even since the summit in this space with things that Snowflake is coming out with, and I'll probably cover some of that stuff down the road. Anyway, I hope you enjoy the show. Please remember to subscribe or hit me up over email or Twitter

Starting point is 00:01:22 if you have questions or suggestions. All right, now let's get to the interview. Dash, welcome to the show. Hey, thank you, Sean. Really appreciate you having me today. Let's have you first start off by introducing yourself. Who are you? What do you do? And how'd you get to where you are today? Absolutely. So hello, everyone. My name is Dash. I'm part of the DevRel team here at Snowflake. And last 15 or 20 years, I've been spending a lot of time data engineering, software engineering, data science, machine learning. And the reason why I'm here today in this position or in this place, if you will, as a DevRel or developer advocate

Starting point is 00:02:06 is because even when I was as an engineer, I really loved helping other developers, writing technical content, creating demos and things like that. So the transition itself was very easy, but also in terms of like how passionate I was, it was really beneficial to make the move. And yeah.

Starting point is 00:02:28 Awesome. And then how big is the developer relations organization at Snowflake? I'd say less than or around 10 people right now, which includes our community managers and folks that manage user groups and things like that. So definitely around or not more than 15, I would say. Yeah, so pretty small relative to the size of Snowflake as an entire company and organization. Absolutely, yeah. And it's amazing what the team was able to accomplish.

Starting point is 00:03:06 Yeah. What is sort of the breadth and scope of work that, you know, yourself and other people within DevRel are in charge of and managing? A lot of things that we do as advocates is create technical content, not only for internal enablement, but also for the communities.

Starting point is 00:03:26 And a lot of times it's also about getting the feedback from community members, getting it into the product lifecycle and things like that. So most of our time is spent on content, technical content, and trying to see like what everyone's kind of looking for and also what issues people are running into, things like that, and how we can help make our product better from that standpoint. Okay, great. So sort of owning both the external engagement

Starting point is 00:04:00 and also the feedback loops to the internal teams. Absolutely, yeah. So we're talking about the Snowflake Summit today and all the big recent announcements that took place there. That was about a month ago at the time of this recording. So first of all, how was the summit for you? Was it everything you expected? This was your second summit?

Starting point is 00:04:19 Or how many times have you been there? This was my second summit, yes. For me, it was amazing. A lot of the wishes came true. For example, last summit, I had the opportunity to create the demo for the keynote. And this year, my goal was to not only create the demos or help create the demos, but also present beyond the stage. And that's what I got to do alongside a lot of amazing product managers that helped with the whole process and also the presentation part of it.

Starting point is 00:05:01 But yeah, it was an amazing summit for me personally. And overall, it was great because of all the announcements that we made during the summit. Yeah. So how was that experience, you know, being on stage, you know, just there's several thousand people watching you, plus all the thousands are gonna be watching the video later, standing up there having to do a live demo. Absolutely amazing. I actually loved it a lot. Everything, everything went flawless. Nothing broke, especially during the main keynote. I was wearing a ski jacket and ski goggles and I was helping presenters

Starting point is 00:05:38 do the live demos. Everything went amazing. So from that standpoint, it was awesome. I usually get nervous when doing live demos, like most people do. But something happens when you are on stage and everything just flows. It's really hard to describe, but I was really glad that everything went flawless. And then on the second day, we had other team members join us on stage. It was like a play, if you will. And we got to act with colleagues and peers. We had amazing script written by one of

Starting point is 00:06:20 our awesome PMMs, Julian. And kudos to him for coming up with this idea and he wrote the script and we all played parts. And along the ways, the story was told and we did different live demos during the entire keynote. So that was amazing. We had a glitch in the middle of the Builder keynote. Internet connection went down, obviously.

Starting point is 00:06:46 And the great thing about that was that we were able to engage with the audience. On the fly, we told jokes and people loved it. It was like that thing stole the show. Yeah, I think I was there and I definitely think the the jokes landed really well it's like i think when you get in those situations where something goes wrong one i think everybody has empathy for you in that situation anyway and then many of us have been in that in that you know uh have had that happen to them so it's not like oh my god how could these people like let this happen but you handled it in such a fun natural way and it's the only way to sort of

Starting point is 00:07:25 do it and it comes across as very authentic so i think that that really engaged people do you remember any of the jokes that were told during that session absolutely so deep uh you want me to tell a couple yeah go for one yeah so uh dba walks into a bar and says, can I join these two tables? That's a classic one. Yeah. The one that I remember was, how come you can't see elephants hiding in trees? And the answer being, because they're really good at it,

Starting point is 00:08:01 which tickled me. Yes, I love a good dad joke. Yeah. They're all dad jokes. Yeah, they're all dad jokes. Yeah, yeah, that's good. So this was also my second summit and I thought it was fantastic. One of the things that really stood out to me this time was, besides, of course, all the LLM talk,

Starting point is 00:08:16 but was the increased focus on sort of like builders and practitioners and those actually building on Snowflake. The size of the sort of the community area and the Data Hero stage and so forth was like, you know, three times the size of what it was the prior year. Yeah, I totally agree. We last summit we had. So in the Builders Theater, if you remember, we had sessions just like this summit, but then to present the slides and do the demos, we had one small, maybe 40-inch LCD that for people to sit on and follow along workshops.

Starting point is 00:09:06 All that was 100 times better this summit. The screen was huge. It was like the thing that you could see from anywhere. It was high up, huge screen, I don't even remember the size. And then we all had, well, the audience had a place to sit and plug in their laptops, follow along the workshops, tables, chairs. Yeah, it was at least 10 times better, if not more. Yeah, yeah, it was awesome. So I want to start to talk about some of the big announcements and the features and what that might mean to people in the engineering community.

Starting point is 00:09:43 So last year, one of the big announcements was about native apps and at the time of being in private beta. And the idea of native apps, as I understand it, is that instead of essentially bringing the data to the app, let's bring the app to the data. We have all this data. It's expensive to move around. Let's just run the app closer to the data. You know, it makes a ton of sense. Is that a fair description in your perspective of a native app? It is. And just so that people are a little bit more aware and have the context, Snowflake native applications, as you said, it was announced in public preview of this summit. It basically provides developers a way

Starting point is 00:10:19 to package applications so that anyone can actually consume those apps. And Snowflake Marketplace is such a central place for Snowflake users to discover and install these applications. And also allows vendors to run applications directly into consumers' accounts. So what does that entail? Basically, there's better data security and governance because the data never leaves your Snowflake account.

Starting point is 00:10:49 And no one outside the org can access it. And everything leveraged is basically the same role-based access control as the rest of your account or consumer's account. It also simplifies procurement process in a lot of ways because vendors, a lot of times on other platforms, they may need to go through a lot of review processes depending on how many apps they have. But in Sofrick Marketplace,

Starting point is 00:11:15 they would have to do it probably just once for all the apps and not every single time. I see. Yeah, because you're basically just doing some sort of procurement process for the marketplace in general rather than a specific application. Yeah. And then the other big advantage from my point of view for developers is the flexible pricing model. You can have custom billing models that you come up with on your own. You can also set free trials, if you like.

Starting point is 00:11:45 And the consumers of the apps also have the opportunity to pay using their existing contract capacity. So lots of different options and yeah. Oh, that's cool. Like essentially, if you have like budget available on Snowflake, you can repurpose a budget into purchase of apps.

Starting point is 00:12:05 Right. And then as far as like building apps, you can use any ID of your choice. You can write code in Python, package all that up. And there's several ways you can package these apps. You can use the UI within Snowflake or SnowSide. You can also use extensions that we have, for example, for Visual Studio.

Starting point is 00:12:25 You can have your account logged into and then push all the code right from your Visual Studio environment too. So if I want to build a native app, so I understand I can use my ID, I can write that code, say in Python, I can package it up and start distributing it. But like, what kind of like, how do I start? Like, what is, can you kind of walk me through what's basically the process of building like the equivalent of a Hello World application for native apps? Yeah, so you start off by writing your own code. Let's say Python code, hello world.py. What you can then do is upload this code onto a Snowflake stage. So stage is just a way to have your, you know, not only just data files and things like that, but also your code.

Starting point is 00:13:23 So everything's inside your account. Once you upload it, you create an application package. So application package allows you to have different versions of the application, and you can also have different patches for each version. Once you do that, then it's a matter of how you want to distribute the application. So there's two different ways you can distribute among just generally into Snowflake Marketplace. So anybody can see it, anybody can install it. And there's also a way for you to directly share it with only specific accounts. But does that help? So it's kind of like... And then how's it work for like testing, essentially? I'm assuming I have like a published version of my application,

Starting point is 00:14:06 but then I'm going to have, you know, the version of the application that's in active development. Right. So you can, there's a ways you can install the app and test it locally before you share. And then can I also choose to use apps just like as a private application for my own company? Absolutely. Yeah. That's where the private share comes into picture. Then you can just share it with the account on the same org or different org.

Starting point is 00:14:32 But it's basically a direct share. So it wouldn't get listed in the marketplace. It would just be listed under the account that you've shared the app directly with. And then within a native application, can I call out to third-party APIs as part of that? I believe so, yes. Okay, so if I wanted to do something like, I don't know,

Starting point is 00:14:58 language translation or something like that, then I could essentially use a third-party API to do the translation over the data that I have and still create, I don't know, go from the English to French version or something like that, then I could essentially use a third-party API to do the translation over the data that I have and still create, I don't know, go from the English to French version or something like that. Yeah, and then that's a good point. A lot of these things,

Starting point is 00:15:15 when you're releasing or trying to list the application into a marketplace for anybody to use, it actually also goes through rigorous security scans before it gets listed. So as long as you comply, it will be okay to list it.

Starting point is 00:15:32 Yeah, that makes sense. You don't want to be installing the app that basically blows away all your stuff like data. Right. Yeah. And then what about essentially distribution?

Starting point is 00:15:43 Is this something that I can connect into, like, a CICD pipeline? Like, how do I go about actually taking my local application that I've tested and taking it, you know, live and distributed on Marketplace? Is that mostly using the tooling available in Snowflake today? Correct. Okay. And then what are some of the native applications that exist right now that might be interesting for people to explore?

Starting point is 00:16:06 Ooh, at the top of my head, I'm not sure which ones there are, but there are a lot. The one that comes to mind is Capital Ones. I'd have to look it up. But if you go to the marketplace, there's quite a few. Okay, cool. And then one of the other big announcements that I was excited about is around Snowpark Containers, which is bringing essentially Kubernetes to Snowflake. Just for people who are completely unfamiliar with this idea, what problem does this solve for people that were using Snowpark before this support came into play? Absolutely. So Snowpark Container Services, I'm actually really excited about

Starting point is 00:16:50 this too. I can't wait to build all kinds of apps. Because what it allows you to do, or anybody, is either run a container, including a GPU-powered container, in Snowflake. And behind the scenes, it's running on the industry standard

Starting point is 00:17:06 Kubernetes container orchestrator. But the good news there is that you don't have to learn Kubernetes to use it. Kubernetes can be a heavy lift. It's pretty complex. There's like certifications out there. I'm not certified. But the idea is that Snowflake takes care of all of this

Starting point is 00:17:26 and it'll scale your container with your data, right? So it takes a lot of guessing and a lot of the admin stuff out of your or developers' hands. And with containers, you can basically create any application that you can think of. That's the beauty of it also. So how should I think about the difference between native apps and Snowpark containers when it comes to building?

Starting point is 00:17:52 What would I do with the Snowpark container that maybe I wouldn't do with native apps or vice versa? That's actually a great question because I don't know if you recall during Summit, there was a application that I demoed with Rekka image. We were

Starting point is 00:18:12 trying to ask questions off of an image and it was giving a response based on what you thought. If there was a damage with ski goggles, what kind of model number. So that application was actually packaged as a native app in a container. So container is like the umbrella

Starting point is 00:18:31 and you can basically containerize anything, including native apps. So I don't know if that answers the question, but I just feel like, yeah, container services is amazing and you can build native apps in a container if you choose to, but also you don't have to. Okay.

Starting point is 00:18:51 I've also used in my own work with Snowflake, I've used Snowpipe a bunch for batch loading of data into Snowflake. And it's something that's been around for a while, but this year Snowipe streaming was announced. So what problem is this addressing that essentially the batch mode for SnowPipe wasn't able to solve? So the biggest difference between SnowPipe and SnowPipe streaming is SnowPipe was designed to ingest files and SnowPipe streaming is designed to ingest records in an incremental fashion. It is.

Starting point is 00:19:32 Yeah. So that could be from something like a Kafka stream? Yes. Or even some other like sort of bespoke solution for, you know, WAD files or something like that. Yeah. So SnowPpipe streaming right now allows you to load data in Snowflake

Starting point is 00:19:48 directly from either Apache Kafka or a custom Java application. Okay, so essentially it provides like an SDK that I can use as like a way to abstract my streaming process directly into Snowflake. Yes, yeah. And then very effective way to load data. my streaming process directly into Snowflake. Yes, yeah. And then very effective way to load data.

Starting point is 00:20:10 And then the other, like I mentioned, other strategies are using Snowpipe for files or copying to also for self-managed warehouses where they both involve loading your file from a stage into Snowflake versus ingesting data almost in real time using Kafka or a Java application. Yeah, so it seems like there's been a lot of work in this space, in particular with Snowflake around, I guess, making things more real-time or dynamic, less batch load.

Starting point is 00:20:40 Related to that is this concept of dynamic tables, which I think was announced last year as private beta and is now in public beta. So can you talk a little bit about the problem that dynamic tables solve? What is it that data engineers are getting out of the value of dynamic tables? Right. So dynamic tables allow you to create data pipelines using just the SQL that you use on a daily basis, right? You can use joins, unions, aggregations, window functions, and what have you. And another cool thing about that is the latency is user-defined. So you can control that by just a single parameter. And the data refreshes as low as a minute as of today.

Starting point is 00:21:24 And then hopefully that will also change in the future. But the biggest difference between, for example, the automatic incremental refreshes, refresh only once it's changed, even for really, really complex queries automatically, including updates and deletes. And then all dynamic tables in a DAG, for example, are refreshed consistently from the aligned snapshots that there might be.

Starting point is 00:21:52 So if I understand this correctly, I can essentially create a query that is going to generate the input to the dynamic table. And then I can, whenever the like underlying tables that are used by the query get updated, they'll update the data in a dynamic table. Is that right? Right. Yeah. What is the difference between this and say, like, like a view? The materialized views get very expensive if the upstream table is being frequently updated. But with dynamic tables, Snowflake will update them based on the latency that you as a user define, which gives you more control over how often these tables are updated and then how the associated compute costs are. I see.

Starting point is 00:22:45 So if I'm doing a bunch of analytics, but maybe it's not that time sensitive, I could choose to have the dynamic table like update like once a week or something like that rather than every minute. And that'll save me some money. Yeah. And then the other big difference is dynamic tables are also like a first class UI citizen of Snowflake. So you can easily visualize the entire DAG, their dependencies, and you can also see all the historical runs, including the status, how many rows were changed, updated, and things

Starting point is 00:23:16 like that. Okay. All right. So let's talk AI and LLMs, but maybe before we get into some of the announcements that were made at the summit, what are your thoughts on how generative AI might impact the Snowflake builder community? Well, I think just generally speaking, it's going to be, for developers, the sky's going to be the limit because, especially with the

Starting point is 00:23:39 Snowpark container services, you're going to be able to bring in the models into your own environment so you don't have to send the data out for training these LLMs and bringing whatever back, right? So the data is going to be within your existing data governance and security boundaries.

Starting point is 00:24:00 And I think that's going to be the biggest game changer. And then I guess, I think too, from like an application standpoint, you know, some of the things I've seen where people are leveraging LLMs to be where you can essentially, you know, just speak in or write in English, essentially, or whatever language of your choice, what you need from the data, and then behind the scenes is able to translate that into queries. It kind of democratizes access to data to some degree. Like essentially things that, you know, a business person within a company

Starting point is 00:24:35 might have had to go to the analyst in their company beforehand and have that analyst run or like write a custom query to pull some data. Now the business person might actually be able to do that directly through some sort of lm type interface directly into uh the data yes absolutely so in some ways it's like uh i would imagine the it like lowers the barrier of entry for analysts but then the actual analysts can spend more time on sort of more complex use cases and not have to deal with one-off requests from people within their company

Starting point is 00:25:12 asking them to write a bespoke SQL query to pull data for them. Right, and then also focus on the insights and what else can they gather out of the data based on insights that they were given just based on some comment that they wrote, right? Not the actual SQL. Yeah, so it's more

Starting point is 00:25:30 about doing the analytics or the analysis part of the job rather than pulling the data. So the big announcement the first night at the summit was that NVIDIA NEMO would be running natively on Snowflake. What does this mean for people and companies investing in LLM applications

Starting point is 00:25:48 when it comes to Snowflake? Right. So NVIDIA, at summit, we announced a partnership which will enable Snowflake to integrate NVIDIA's LLM framework on NEMO into Snowflake. So that's going to allow ML engineers and data scientists to build LLMs directly in Snowflake. So that's going to allow ML engineers and data scientists to build LLMs directly in Snowflake using their own data.

Starting point is 00:26:09 I think that's huge. We already demoed, like you said, during keynote, how you can not only bring LLMs, but also natively train models on NVIDIA GPUs.

Starting point is 00:26:27 Right? And then what about LOMs that are outside of the NEMO framework? Is there an opportunity to be able to run those within Snowflake or is that something that maybe is not there today? So as part of

Starting point is 00:26:42 Snowpark Container Services, you will be able to bring in any model into Snowflake, not just Nemo or others. There's actually already a blog that we published maybe a couple of weeks ago where we've taken LLAMA2, the latest Metas LLM, and used that within Snowpark Container Services. Okay. What are some of the other LLAM-related announcements from Snowflake that people should know about? Yeah, so we announced Document AI, which Snowflake acquired a company called Applica,

Starting point is 00:27:24 which specializes in auto-analyzing unstructured data. And with that, what you could do is leverage the technology to, it's actually also Snowflake's first LLM that enables you to ask questions about PDF documents. Like, for example, during the summit, when we demoed, we had 15 plus PDF documents, then we were asking questions. Not only that, there's also a way for users to give feedback, like make corrections to improve the model accuracy, and then retrain the model, which you can then republish or deploy. And another cool thing is these models, you can use them in SQL queries. So it's

Starting point is 00:28:05 not only spanning across different technologies within Snowflake, but also across teams where they can collaborate based on their requirements. And then how do I go about using that? Is that something that I use via an API

Starting point is 00:28:21 so I can integrate it into an existing application? So that actually is going to be part of Snowflake UI with SnowSight. That's going to be a new option like we demoed during keynote. The name could change, so I can't speak to that. But basically, it's going to be embedded within the UI. And all you have to do is basically follow instructions to upload documents. And then there's a UI for you to ask questions. You get a score along with the answer.

Starting point is 00:28:49 If you're not satisfied, you can change the answer and also hit train to retrain the model. And then once the model is trained, you can use that in a SQL worksheet, for example. The SQL queries. Okay. Okay. Yeah. And it also seems like, so it sounds like a lot of the major sort of like products across Snowflake from Snowpipe to Snowpark to native apps are kind of incorporating features that allow you to either build with generative AI or

Starting point is 00:29:26 essentially leverage generative AI functionality directly within those tools to make Snowflake easier to use. And one other one that I personally played around with is with Streamlit. Streamlit now supports an easy way to build a chat GPT UI directly within it. So there's this heavy investment in this direction.

Starting point is 00:29:54 I guess, what do you sort of see as the future in this space? Where do you think some of this stuff is leading for people who use Snowflake and are looking to build on it in the future? I think

Starting point is 00:30:09 the way I look at it is one platform for everything. You can bring everything in closer to your data. So I think the key to me as if I was a developer, not having to not only incur costs for moving the data around, but also around how many technologies, how many different things I would have to have in my tech stack to move data around, to process the data, versus having everything within the same governance and security boundaries.

Starting point is 00:30:43 And everything else that I need, I can actually bring in without any issues, right? That to me is huge in terms of like how I can build, how I can package, how I can distribute. Yeah, you're basically simplifying your tool chain if you're comfortable kind of living within the Stove Lake ecosystem, both from your traditional data pipeline, data analytics standpoint, and also now in the world of generative AI. Because if you look at some of the tool chains

Starting point is 00:31:11 that are out there now in the AI space, there's a lot of tooling that you have to learn to stitch together, essentially, just to do something like customize a pre-trained model. Another good example is Streamlit, like you mentioned earlier. just to do something like customize a pre-trained model. Right.

Starting point is 00:31:25 And another good example is Streamlit, like you mentioned earlier. You can create these amazing applications without having to know HTML, CSS, and things like that. Not only that, the Streamlit in Snowflake, it's a first-class citizen, first-class object in Snowflake, just like a table is.

Starting point is 00:31:44 So that means you can apply role-based access on these applications. You can share them just like you would table and databases and things like that. Yeah, that's pretty cool. I'm someone who's been using HTML,

Starting point is 00:32:00 CSS, JavaScript for a very long time, so I'm very comfortable with it. But even despite that, I found working with Streamlit quite lovely, I guess, is the best way to describe it. It was very, very easy to go from essentially nothing to I had a little LLM chat GPT type UI interface going in like an hour. Yeah, and then how many lines of code, right? Maybe less than 100, probably? Yeah, definitely. It was like less than 100 lines of code. Yeah, I used to write HTML, CSS, and JavaScript too. And just thinking back how many JavaScript libraries you had to pull in,

Starting point is 00:32:34 make sure they load. They're not taking too much time to load, so you have to put them at the bottom instead of the top. Midify them, all kinds of stuff. Yeah, it becomes an entire ordeal just to do something simple. them at the bottom instead of the top, um, midify them, all kinds of stuff. Yeah. It becomes a, uh, entire ordeal just to do something simple. Yeah. And then you use a web server to, to, to run those.

Starting point is 00:32:54 Yeah, absolutely. So as we start to wrap up, uh, Dash, is there anything else you'd like to share? Uh, you know, thoughts on the summit, thoughts on anything related to AI and some of the moves that Snowflake's making? Uh, no. So the, uh, couple of things that the uh couple of things that i've like to mention around you know the same along the same lines we demoed it's not an official name by any means uh comment to text or comments to sequel to be more specific where you'll be able to just write out a comment and then have a really really complex SQL generated for you and then the comment is basically

Starting point is 00:33:30 paying English right so that was the other cool thing that I kind of wanted to mention and everything running in Snowflake using your data that the model learns from yeah so the comments the code that's kind of like you're being able to translate like pseudocode or even

Starting point is 00:33:45 less formal than pseudocode into like a complex SQL query. Again, kind of lowering the barrier to entry. Yeah, including creating dynamic tables and multiple joins and aggregations and what have you. Alright, well, Dash, thank you so much

Starting point is 00:34:01 for being here. It was great witnessing all your live demos at Snowflake Summit, and it was really fun to be there. For anybody who's listening that is interested in the data space, Snowflake Space, I highly recommend the conference. I think it's really, really well done, and there's a lot of people there. You'll feel like you're in good company, and it's most likely I'll be there, so you can come say hi.

Starting point is 00:34:24 Yeah, thank you so much, Sean. Really appreciate you having me. Really enjoyed our conversation. Yeah. Thank you so much. Cheers. Yeah. Thanks.

Software Huddle - Generative AI and LLMs with Dash Desai from Snowflake

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.