Software Huddle - Generative AI and LLMs with Dash Desai from Snowflake
Episode Date: September 26, 2023If you've been involved with the Snowflake world, today's guest probably can skip an introduction as he is the demo king from the Snowflake Summit and well-known within the Snowflake builder community.... We're talking about Dash Desai, Developer Advocate at Snowflake. The background on this episode is that Sean's been part of the Snowflake Data Superhero Program and also involved in the community for a few years, and Sean has spoken at the last two Snowflake Summits. And after the past event in June, he got this idea that it might make for a fun podcast to go back through some of the announcements from the events and discuss what they might mean for those building with Snowflake, and maybe even get some of those building with Snowflake excited about it. So even if you're not working with Snowflake, we keep things pretty high level during this interview. And we think there's probably something for everyone. Snowflake is clearly making this big push around supporting LLMs and generative AI workloads with things like Snowpark, containers, document AI and native support for NVIDIA NeMo and a bunch of other things that we get into today. There's been a ton of announcements even since the Summit in this space with things that Snowflake is coming out with, and we'll cover some of that stuff down the road. Anyway, we hope you enjoy the show.Â
Transcript
Discussion (0)
Hey everyone, this is Sean Faulkner, one of the hosts of Software Huddle.
So if you've been involved at all with the Snowflake world, the world of Snowflake,
you know, today's guest probably doesn't need an introduction as he is the demo king from the Snowflake Summit
and well known within the Snowflake builder community.
I'm talking about Dash to side developer advocate at Snowflake.
The background on this episode is that I've been part of the Snowflake Data Superhero Program
and also involved in the community for a few years.
And I've spoken at the last two Snowflake summits.
And after the past event in June,
I got this idea that it might make for a fun podcast
to go back through some of the announcements
from the events and discuss what they might mean
for those building with Snowflake
and maybe even get some of those people who aren't building with Snowflake and maybe even get
some of those people who aren't building with Snowflake excited about it. So even if you're
not working with Snowflake, we keep things pretty high level during this interview. And I think
there's probably something for everyone. Snowflake is clearly making this big push around supporting
LLMs and generative AI workloads with things like Snowpark containers, Document AI, and native support for NVIDIA Nemo
and a bunch of other things that we get into today.
There's been a ton of announcements even since the summit in this space
with things that Snowflake is coming out with,
and I'll probably cover some of that stuff down the road.
Anyway, I hope you enjoy the show.
Please remember to subscribe or hit me up over email or Twitter
if you have questions or suggestions.
All right, now let's get to the interview. Dash, welcome to the show. Hey, thank you, Sean. Really appreciate
you having me today. Let's have you first start off by introducing yourself. Who are you? What
do you do? And how'd you get to where you are today? Absolutely. So hello, everyone. My name
is Dash. I'm part of the DevRel team here at Snowflake.
And last 15 or 20 years, I've been spending a lot of time data engineering, software engineering, data science, machine learning.
And the reason why I'm here today in this position
or in this place, if you will, as a DevRel or developer advocate
is because even when I was as an engineer,
I really loved helping other developers,
writing technical content,
creating demos and things like that.
So the transition itself was very easy,
but also in terms of like how passionate I was,
it was really beneficial to make the move.
And yeah.
Awesome.
And then how big is the developer relations organization at Snowflake?
I'd say less than or around 10 people right now, which includes our community managers and folks that manage user groups and things like that.
So definitely around or not more than 15, I would say.
Yeah, so pretty small relative to the size of Snowflake
as an entire company and organization.
Absolutely, yeah.
And it's amazing what the team was able to accomplish.
Yeah.
What is sort of the breadth and scope of work that,
you know, yourself and other people within DevRel
are in charge of and managing?
A lot of things that we do as advocates
is create technical content,
not only for internal enablement,
but also for the communities.
And a lot of times it's also about getting the feedback from community members, getting it into
the product lifecycle and things like that. So most of our time is spent on content, technical
content, and trying to see like what everyone's kind of looking for and also what issues people are running into,
things like that,
and how we can help make our product better
from that standpoint.
Okay, great.
So sort of owning both the external engagement
and also the feedback loops to the internal teams.
Absolutely, yeah.
So we're talking about the Snowflake Summit today
and all the big recent announcements that took place there.
That was about a month ago at the time of this recording.
So first of all, how was the summit for you?
Was it everything you expected?
This was your second summit?
Or how many times have you been there?
This was my second summit, yes.
For me, it was amazing.
A lot of the wishes came true.
For example, last summit, I had the opportunity to create the demo for the keynote.
And this year, my goal was to not only create the demos or help create the demos,
but also present beyond the stage. And that's what I got to do alongside a lot of amazing product managers that helped with
the whole process and also the presentation part of it.
But yeah, it was an amazing summit for me personally.
And overall, it was great because of all the announcements that we made during the summit. Yeah. So how was that experience, you know, being on stage, you know,
just there's several thousand people watching you, plus all the thousands are gonna be watching the
video later, standing up there having to do a live demo. Absolutely amazing. I actually loved
it a lot. Everything, everything went flawless. Nothing broke,
especially during the main keynote. I was wearing
a ski jacket and ski goggles
and I was helping presenters
do the live demos. Everything went amazing. So from that standpoint, it was
awesome.
I usually get nervous when doing live demos, like most people do.
But something happens when you are on stage and everything just flows. It's really hard to describe, but I was really glad that everything went flawless.
And then on the second day, we had other team members join us
on stage. It was like a play, if you will.
And we got to act with
colleagues and peers. We had amazing script written by one of
our awesome PMMs, Julian.
And kudos to him for coming up with this idea and he wrote
the script and we all played parts.
And along the ways, the story was told and we did different live demos during the entire
keynote.
So that was amazing.
We had a glitch in the middle of the Builder keynote.
Internet connection went down, obviously.
And the great thing about that was that we were able to engage with the audience.
On the fly, we told jokes and people loved it.
It was like that thing stole the show.
Yeah, I think I was there and I definitely think the the jokes landed really well it's like
i think when you get in those situations where something goes wrong one i think everybody has
empathy for you in that situation anyway and then many of us have been in that in that you know uh
have had that happen to them so it's not like oh my god how could these people like let this happen
but you handled it in such a fun natural way and it's the only way to sort of
do it and it comes across as very authentic so i think that that really engaged people do you
remember any of the jokes that were told during that session absolutely so deep uh you want me
to tell a couple yeah go for one yeah so uh dba walks into a bar and says, can I join these two tables?
That's a classic one.
Yeah.
The one that I remember was,
how come you can't see elephants hiding in trees?
And the answer being, because they're really good at it,
which tickled me.
Yes, I love a good dad joke.
Yeah.
They're all dad jokes. Yeah, they're all dad jokes.
Yeah, yeah, that's good.
So this was also my second summit and I thought it was fantastic.
One of the things that really stood out to me this time was,
besides, of course, all the LLM talk,
but was the increased focus
on sort of like builders and practitioners
and those actually building on Snowflake.
The size of the sort of the community area and the Data Hero stage and so forth was like,
you know, three times the size of what it was the prior year.
Yeah, I totally agree.
We last summit we had.
So in the Builders Theater, if you remember, we had sessions just like this summit, but then to present the slides and do the demos, we had one small, maybe 40-inch LCD that for people to sit on and follow along workshops.
All that was 100 times better this summit.
The screen was huge.
It was like the thing that you could see from anywhere.
It was high up, huge screen, I don't even remember the size.
And then we all had, well, the audience had a place to sit and plug in their laptops, follow along the workshops, tables, chairs.
Yeah, it was at least 10 times better, if not more.
Yeah, yeah, it was awesome.
So I want to start to talk about some of the big announcements and the features and what that might mean to people in the engineering community.
So last year, one of the big announcements was about native apps and at the time of being in private
beta. And the idea of native apps, as I understand it, is that instead of essentially bringing the
data to the app, let's bring the app to the data. We have all this data. It's expensive to move
around. Let's just run the app closer to the data. You know, it makes a ton of sense. Is that a fair
description in your perspective of a native app? It is. And just so that
people are a little bit more aware and have the context,
Snowflake native applications, as you said, it was announced
in public preview of this summit. It basically provides developers a way
to package applications so that anyone can
actually consume those apps.
And Snowflake Marketplace is such a central place for Snowflake users
to discover and install these applications.
And also allows vendors to run applications directly into consumers' accounts.
So what does that entail?
Basically, there's better data security and governance
because the data never leaves your Snowflake account.
And no one outside the org can access it.
And everything leveraged is basically the same role-based access control
as the rest of your account or consumer's account.
It also simplifies procurement process in a lot of ways
because vendors, a lot of times on other platforms,
they may need to go through a lot of review processes
depending on how many apps they have.
But in Sofrick Marketplace,
they would have to do it probably just once
for all the apps and not every single time.
I see.
Yeah, because you're basically just doing
some sort of procurement process for the marketplace in general rather than a specific application.
Yeah. And then the other big advantage from my point of view for developers is the flexible pricing model.
You can have custom billing models that you come up with on your own.
You can also set free trials, if you like.
And the consumers of the apps
also have the opportunity to pay
using their existing contract capacity.
So lots of different options and yeah.
Oh, that's cool.
Like essentially, if you have
like budget available on Snowflake,
you can repurpose a budget into purchase of apps.
Right.
And then as far as like building apps,
you can use any ID of your choice.
You can write code in Python, package all that up.
And there's several ways you can package these apps.
You can use the UI within Snowflake or SnowSide.
You can also use extensions that we have,
for example, for Visual Studio.
You can have your account logged into and then push all the code right from your Visual Studio environment too.
So if I want to build a native app, so I understand I can use my ID, I can write that code, say in Python,
I can package it up and start distributing it. But like, what kind of like, how do I start? Like,
what is, can you kind of walk me through what's basically the process of building like the
equivalent of a Hello World application for native apps? Yeah, so you start off by writing your own code.
Let's say Python code, hello world.py.
What you can then do is upload this code onto a Snowflake stage.
So stage is just a way to have your, you know, not only just data files and things like that, but also your code.
So everything's inside your account.
Once you upload it, you create an application package.
So application package allows you to have different versions of the application,
and you can also have different patches for each version.
Once you do that, then it's a matter of how you want to distribute the application.
So there's two different ways you can distribute among just generally into Snowflake Marketplace. So anybody can see it, anybody can install it. And there's also a way
for you to directly share it with only specific accounts. But does that help? So it's kind of like...
And then how's it work for like testing, essentially? I'm assuming I have like a published version of my application,
but then I'm going to have, you know,
the version of the application that's in active development.
Right. So you can, there's a ways you can install the app
and test it locally before you share.
And then can I also choose to use apps
just like as a private application for my own company?
Absolutely. Yeah. That's where the private share comes into picture.
Then you can just share it with the account on the same org or different org.
But it's basically a direct share. So it wouldn't get listed
in the marketplace. It would just be listed under the
account that you've shared the app directly with.
And then within a native application,
can I call out to third-party APIs as part of that?
I believe so, yes.
Okay, so if I wanted to do something like,
I don't know,
language translation or something like that,
then I could essentially use a third-party API
to do the translation over the data that I have and still create, I don't know, go from the English to French version or something like that, then I could essentially use a third-party API to do the translation over the data that I have
and still create, I don't know,
go from the English to French version
or something like that.
Yeah, and then that's a good point.
A lot of these things,
when you're releasing
or trying to list the application
into a marketplace for anybody to use,
it actually also goes through
rigorous security scans
before it gets listed.
So as long as you comply,
it will be okay to list it.
Yeah, that makes sense.
You don't want to be installing the app
that basically blows away
all your stuff like data.
Right.
Yeah.
And then what about
essentially distribution?
Is this something that I can connect into, like, a CICD pipeline?
Like, how do I go about actually taking my local application that I've tested
and taking it, you know, live and distributed on Marketplace?
Is that mostly using the tooling available in Snowflake today?
Correct.
Okay.
And then what are some of the native applications that exist right now
that might be interesting for people to explore?
Ooh, at the top of my head, I'm not sure which ones there are, but there are a lot.
The one that comes to mind is Capital Ones.
I'd have to look it up.
But if you go to the marketplace, there's quite a few.
Okay, cool. And then one of the other big announcements that I was excited about is around Snowpark Containers, which is bringing essentially Kubernetes to Snowflake.
Just for people who are completely unfamiliar with this idea, what problem does this solve for people that were using Snowpark before this support came into play?
Absolutely. So Snowpark Container Services,
I'm actually really excited about
this too. I can't wait
to build all kinds of apps.
Because what it allows
you to do, or anybody, is
either run a container, including
a GPU-powered container, in
Snowflake. And behind the scenes,
it's running on the industry standard
Kubernetes container orchestrator.
But the good news there is that
you don't have to learn Kubernetes to use it.
Kubernetes can be a heavy lift.
It's pretty complex.
There's like certifications out there.
I'm not certified.
But the idea is that Snowflake takes care of all of this
and it'll scale your container with your data, right?
So it takes a lot of guessing
and a lot of the admin stuff out of your or developers' hands.
And with containers,
you can basically create any application that you can think of.
That's the beauty of it also.
So how should I think about the difference between native apps
and Snowpark containers when it comes to building?
What would I do with the Snowpark container
that maybe I wouldn't do with native apps or vice versa?
That's actually a great question
because I don't know if you recall during Summit,
there was a application
that I demoed with
Rekka
image. We were
trying to ask questions off of an image
and it was giving a response based on
what you thought.
If there was a damage with
ski goggles, what kind of model number.
So that application was actually packaged
as a native app in a container.
So container is like the umbrella
and you can basically containerize anything,
including native apps.
So I don't know if that answers the question,
but I just feel like, yeah,
container services is amazing
and you can build native apps in a container if you choose to,
but also you don't have to.
Okay.
I've also used in my own work with Snowflake,
I've used Snowpipe a bunch for batch loading of data into Snowflake.
And it's something that's been around for a while,
but this year Snowipe streaming was announced.
So what problem is this addressing that essentially the batch mode for SnowPipe wasn't able to solve?
So the biggest difference between SnowPipe and SnowPipe streaming is SnowPipe was designed to ingest files and SnowPipe streaming is designed to
ingest records in an incremental fashion.
It is.
Yeah.
So that could be from something like a Kafka stream?
Yes.
Or even some other like sort of bespoke solution for, you know, WAD files or something like
that.
Yeah.
So SnowPpipe streaming right now
allows you to load data in Snowflake
directly from either Apache Kafka
or a custom Java application.
Okay, so essentially it provides like an SDK
that I can use as like a way to abstract
my streaming process directly into Snowflake.
Yes, yeah.
And then very effective way to load data. my streaming process directly into Snowflake. Yes, yeah.
And then very effective way to load data.
And then the other, like I mentioned,
other strategies are using Snowpipe for files or copying to also for self-managed warehouses
where they both involve loading your file
from a stage into Snowflake
versus ingesting data almost in real time
using Kafka or a Java application.
Yeah, so it seems like there's been a lot of work in this space, in particular with
Snowflake around, I guess, making things more real-time or dynamic, less batch load.
Related to that is this concept of dynamic tables, which I think was announced last year as private beta and is now in public beta.
So can you talk a little bit about the problem that dynamic tables solve?
What is it that data engineers are getting out of the value of dynamic tables?
Right. So dynamic tables allow you to create data pipelines using just the SQL that you use on a daily basis, right?
You can use joins, unions, aggregations, window functions, and what have you.
And another cool thing about that is the latency is user-defined.
So you can control that by just a single parameter.
And the data refreshes as low as a minute as of today.
And then hopefully that will also change in the future.
But the biggest difference between, for example,
the automatic incremental refreshes,
refresh only once it's changed,
even for really, really complex queries automatically,
including updates and deletes.
And then all dynamic tables in a DAG, for example,
are refreshed consistently from the aligned snapshots that there might be.
So if I understand this correctly, I can essentially create a query that is going to
generate the input to the dynamic table. And then I can, whenever the like underlying tables that are used by the query get updated, they'll update the data in a dynamic table. Is that right?
Right. Yeah.
What is the difference between this and say, like, like a view? The materialized views get very expensive if the upstream table is being frequently updated.
But with dynamic tables, Snowflake will update them based on the latency that you as a user define,
which gives you more control over how often these tables are updated
and then how the associated compute costs are.
I see.
So if I'm doing a bunch of analytics, but maybe it's not that time sensitive, I could
choose to have the dynamic table like update like once a week or something like that rather
than every minute.
And that'll save me some money.
Yeah.
And then the other big difference is dynamic tables are also like a first class UI citizen
of Snowflake. So you can easily visualize the entire DAG, their dependencies, and you can also see all
the historical runs, including the status, how many rows were changed, updated, and things
like that.
Okay.
All right.
So let's talk AI and LLMs, but maybe before we get into some of the announcements that
were made at the summit, what are your thoughts on how generative AI might
impact the Snowflake builder community? Well, I think just generally
speaking, it's going to be, for developers,
the sky's going to be the limit because, especially with the
Snowpark container services, you're going to be able to bring
in the models into your own environment
so you don't have to send the data out
for training these LLMs
and bringing whatever back, right?
So the data is going to be
within your existing data governance
and security boundaries.
And I think that's going to be
the biggest game changer.
And then I guess, I think too, from like an application standpoint, you know, some of the things I've seen where people
are leveraging LLMs to be where you can essentially, you know, just speak in or write in English,
essentially, or whatever language of your choice, what you need from the data, and then behind the scenes is able to translate that into queries.
It kind of democratizes access to data to some degree.
Like essentially things that, you know,
a business person within a company
might have had to go to the analyst in their company beforehand
and have that analyst run or like write a custom query to pull some data.
Now the business person might
actually be able to do that directly through some sort of lm type interface directly into uh the
data yes absolutely so in some ways it's like uh i would imagine the it like lowers the barrier of
entry for analysts but then the actual analysts can spend more time on sort of more complex use cases
and not have to deal with one-off requests
from people within their company
asking them to write a bespoke SQL query
to pull data for them.
Right, and then also focus on the insights
and what else can they gather out of the data
based on insights that they were given
just based on some comment
that they wrote, right? Not the actual
SQL. Yeah, so it's more
about doing the analytics
or the analysis part of the job
rather than pulling the data.
So the big announcement
the first night at the summit
was that NVIDIA NEMO would
be running natively on Snowflake.
What does this mean for people and companies investing in LLM applications
when it comes to Snowflake?
Right. So NVIDIA, at summit, we announced a partnership
which will enable Snowflake to integrate NVIDIA's LLM framework on NEMO
into Snowflake.
So that's going to allow ML engineers and data scientists
to build LLMs directly in Snowflake. So that's going to allow ML engineers and data scientists to build
LLMs directly in Snowflake
using their own data.
I think that's huge.
We already demoed, like you said,
during keynote, how you
can not only
bring LLMs, but also
natively
train models
on NVIDIA GPUs.
Right?
And then what about
LOMs that are outside of the
NEMO framework? Is there
an opportunity to be able to run those
within Snowflake or is that something that
maybe is not there today?
So as part of
Snowpark Container Services, you will be
able to bring in any model into Snowflake, not just Nemo or others.
There's actually already a blog that we published maybe a couple of weeks ago where we've taken LLAMA2, the latest Metas LLM, and used that within Snowpark Container Services.
Okay.
What are some of the other LLAM-related announcements
from Snowflake that people should know about?
Yeah, so we announced Document AI,
which Snowflake acquired a company called Applica,
which specializes in auto-analyzing
unstructured data.
And with that, what you could do is leverage the technology to, it's actually also Snowflake's
first LLM that enables you to ask questions about PDF documents.
Like, for example, during the summit, when we demoed, we had 15 plus PDF documents,
then we were asking questions. Not only that, there's also a way for users to give feedback,
like make corrections to improve the model accuracy, and then retrain the model, which you
can then republish or deploy. And another cool thing is these models, you can use them in SQL queries. So it's
not only spanning across different
technologies within Snowflake,
but also across teams
where they can collaborate based on
their requirements.
And then how do I go about using that?
Is that something that I use
via an API
so I can integrate it into an existing application?
So that actually is going to be part of Snowflake UI with SnowSight.
That's going to be a new option like we demoed during keynote.
The name could change, so I can't speak to that.
But basically, it's going to be embedded within the UI.
And all you have to do is basically follow instructions to upload documents.
And then there's a UI for you to ask questions.
You get a score along with the answer.
If you're not satisfied, you can change the answer and also hit train to retrain the model.
And then once the model is trained, you can use that in a SQL worksheet, for example.
The SQL queries.
Okay. Okay.
Yeah.
And it also seems like, so it sounds like a lot of the major sort of like products across Snowflake from Snowpipe to Snowpark
to native apps are kind of incorporating features
that allow you to either build with generative AI or
essentially leverage generative AI functionality directly within those tools to make Snowflake
easier to use.
And one other one that I personally played around with is with Streamlit.
Streamlit now supports an easy way to build a chat
GPT UI directly
within it. So there's
this heavy investment
in this direction.
I guess, what do you sort of see
as the
future in this space? Where do you think
some of this stuff is leading for
people who
use Snowflake and are looking
to build on it in the future?
I think
the way I look at it is one platform
for everything.
You can bring everything
in closer to your data.
So I think the key
to me as
if I was a developer,
not having to not only incur costs for moving the data around, but also around how many technologies, how many different things I would have to have in my tech stack to move data around, to process the data, versus having everything within the same governance and security boundaries.
And everything else that I need, I can actually
bring in without any issues, right?
That to me is huge in terms of like how I can build, how I can package, how I can distribute.
Yeah, you're basically simplifying your tool chain if you're comfortable kind of living
within the Stove Lake ecosystem, both from your traditional data pipeline, data analytics standpoint, and
also now in the world of
generative AI. Because if you look at some
of the tool chains
that are out there now in the
AI space, there's a lot of
tooling that you have to learn
to stitch together, essentially,
just to do something like
customize a
pre-trained model.
Another good example is Streamlit, like you mentioned earlier. just to do something like customize a pre-trained model. Right.
And another good example is Streamlit,
like you mentioned earlier.
You can create these amazing applications
without having to know HTML, CSS, and things like that.
Not only that, the Streamlit in Snowflake,
it's a first-class citizen,
first-class object in Snowflake,
just like a table is.
So that means you can apply
role-based access on these
applications. You can share them
just like you would table and
databases and things like that.
Yeah, that's pretty cool.
I'm someone who's
been using HTML,
CSS, JavaScript for a very long time, so
I'm very comfortable with it. But even despite that,
I found working with Streamlit quite lovely, I guess, is the best way to describe it.
It was very, very easy to go from essentially nothing to I had a little LLM chat GPT type UI interface going in like an hour.
Yeah, and then how many lines of code, right? Maybe less than 100, probably?
Yeah, definitely. It was like less than 100 lines of code.
Yeah, I used to write HTML, CSS, and JavaScript too.
And just thinking back how many JavaScript libraries you had to pull in,
make sure they load.
They're not taking too much time to load,
so you have to put them at the bottom instead of the top.
Midify them, all kinds of stuff.
Yeah, it becomes an entire ordeal just to do something simple. them at the bottom instead of the top, um, midify them, all kinds of stuff. Yeah.
It becomes a, uh, entire ordeal just to do something simple.
Yeah.
And then you use a web server to, to, to run those.
Yeah, absolutely.
So as we start to wrap up, uh, Dash, is there anything else you'd like to share?
Uh, you know, thoughts on the summit, thoughts on anything related to AI and
some of the moves that Snowflake's making?
Uh, no. So the, uh, couple of things that the uh couple of things that i've like to mention around you know the same along the same lines we demoed it's not an official name by any means uh comment to text or comments to sequel
to be more specific where you'll be able to just write out a comment and then have a really really complex SQL
generated for you
and then the comment is basically
paying English right so
that was the other cool thing that I
kind of wanted to mention and everything running
in Snowflake using your data
that the model learns from
yeah so the comments the code that's kind of like
you're being able to translate
like pseudocode or even
less formal than pseudocode
into like
a complex SQL query. Again, kind of
lowering the barrier to entry.
Yeah, including creating dynamic tables
and multiple joins and aggregations
and what have you.
Alright, well, Dash, thank you so much
for being here. It was
great witnessing all your live demos at Snowflake Summit,
and it was really fun to be there.
For anybody who's listening that is interested in the data space,
Snowflake Space, I highly recommend the conference.
I think it's really, really well done, and there's a lot of people there.
You'll feel like you're in good company,
and it's most likely I'll be there, so you can come say hi.
Yeah, thank you so much, Sean. Really appreciate you having me.
Really enjoyed our conversation. Yeah.
Thank you so much. Cheers.
Yeah. Thanks.