The Changelog: Software Development, Open Source - Elasticsearch and doubling down on "open" (Interview)
Episode Date: April 11, 2018Philipp Krenn joined the show to talk with us about Elasticsearch, the problem it solves, where it came from, and where it's at today. We discussed the query language, what it can be compared to, whet...her or not it's a database replacement or a database complement, Elasticsearch vs Elastic the company. We also talked about the details behind Elastic's plan of "doubling down on open" to open up X-Pack, which is open code paid add-on features to Elasticsearch. We discussed the implications of this on their business model, and what changes will take place at the code and license level on GitHub.
Transcript
Discussion (0)
Bandwidth for Changelog is provided by Fastly. Learn more at fastly.com.
We move fast and fix things here at Changelog because of Rollbar.
Check them out at rollbar.com. And we're hosted on Linode servers.
Head to linode.com slash changelog.
This episode is brought to you by Rollbar.
Move fast and fix things. Resolve errors in minutes and deploy with confidence.
Head to rollbar.com
slash changelog.
Request a demo.
Get started today.
It's loved by developers, trusted by enterprises, and most of all, we use it here at Changelog.
Move fast and fix things with Rollbar.
Once again, rollbar.com slash changelog. Thank you. Welcome to Philip Crenn. We're talking about Elasticsearch and the problem it solves, where it came from, and where it's at today.
We discussed the query language, what it can be compared to, whether or not it's a database replacement or a database complement.
Elasticsearch versus Elastic, the company, which is a very big company, by the way.
We also talked about the details behind Elastic's plan for doubling down on Open to open up XPAC, which is their open code paid add-on features to Elasticsearch,
the implications of this on their business model, and what changes will take place at the code and the license level on GitHub.
So we're here to talk about Elasticsearch.
And I don't know about you, Adam, but I got to claim a little bit of ignorance on Elasticsearch.
And I'm guessing you as well, because I've never touched the thing.
I've heard some hand waving on the Internet.
I'm very conservative in my data stores and my search engines.
So I haven't actually played with it, but I'm excited to learn about it.
We have Philip Grin here to talk us all about it.
So Philip, let's start with Elasticsearch, what it is, where it came from, what problems it's solving, and then we'll get into where it's at today and where it's going. Right, so it's based
on the base library we're using is Apache Lucene, but that's not really the story we normally try
to tell. Like there is a kind of cute or interesting story around it our currently CEO shy he
started Elasticsearch back in the days and it actually the first iteration
wasn't even called Elasticsearch it was called a compass and compass was kind of
like the tool for his wife to search her recipes because she wanted to be a chef and she had a ton of recipes she needed to search.
And he started building a system to make that possible.
She is, by the way, still waiting for that recipe search solution because he kind of over-engineered that.
And so he built Compass One and then he found out, well, that's kind of like a dead end.
And then he redid the entire thing, and it was Compass 2.
And then a third iteration, which is kind of the lucky number, obviously.
It wasn't called Compass 3, but that he called Elasticsearch.
And that was back in 2010 when he first released that.
That's kind of how it all got started.
And what his idea was about that search should be kind of an ubiquitous solution,
that it needs to scale, that it should be simple to use.
And that's kind of like where Elasticsearch started from.
It was scalable right from the beginning,
and it had an easy-to-use REST API, and it should just work.
And that was kind of like the promise or the start where it all began.
Gotcha.
So Apache Lucene, I think it's a Java project.
Started all the way back in like the 90s, right?
Like late 90s, early 2000s.
How does that fit into the Elasticsearch story?
Lucene is kind of an incredible piece of work.
So a lot of work has gone into that already
and it's very mature.
So it's kind of if i say the
de facto search solution that everybody is using or the standard is maybe a bit of an overstatement
but it is kind of the most commonly used base library that people are using for full text search
the problem is it's really just a library so yes it's written in java and you could include that
in your own java application but it's really a library, and you just have to call it very explicitly. And the API is not
the most user-friendly or nice to get started with. So that's not really what you want to do.
It's a bit barebone, but it has all the necessary pieces. And what Elasticsearch then did basically
around that, it does the distribution and the replication of your data, and it provides a query DSL and a nice REST API to the outside.
Yeah, so as somebody who's not a Java developer, with Elasticsearch, it's also Java,
but you don't have to care about that because it's a REST-based API
that any client library can speak to without having to include Java embedded into embedded into your application totally um so
yes since lucene is based on java elastic search is java as well um but kind of shy already saw
that initially he had it the entire system uh compass very tightly coupled to the java ecosystem
but he saw that that is not really what people want and if you just bind yourself to one ecosystem
it's kind of very limited in the long run so with a nice rest api and then we have drivers or clients for all the major programming
languages it's much easier to get started and have kind of like that base system that everybody can
use and then everybody can just build whatever they want and we really don't care what is your
programming language like whatever makes sense for your product product or project um that
is fine by us uh we're just trying to provide the right client and then you build awesome stuff with
it so he set out to build a recipe search and he ended up building a quite a large company called
elastic which is where you work tell us about elastic versus or and elastic search give us the
the lines between the open source project the the company, and how all that shakes
out.
Right.
So initially it was Shai and I always imagined him sitting in his bedroom coding day and
night.
And at my job before Elastic, we were already using Elastic Search and we were always like
curious like how that one guy
could produce so much code and he was like answering all the issues and writing the documentation and
still coding so much every day. And at some point later on in 2012, he joined forces with three
other guys to start a company. And back then, since the product was Elasticsearch, the company
was also called Elasticsearch. Since we have then added a few more products along the way, we had to rename the company at some point since, well, it was not only about Elasticsearch anymore.
Even though Elasticsearch is still kind of the core of everything and everything else is built around that and around search kind of.
But the company is now, yeah, I think we were about 820 or something like that,
though it's changing pretty much every day by now.
And we've kind of built the various other tools around it.
So people might be familiar with the ELK stack,
Elasticsearch, LOCKS-Kibana.
So LOCKS-Kibana is the thing to get data and transform it
and then put it either into Elasticsearch or some other system.
And then Kibana for the visualization part.
We always say we want to democratize data.
Basically, you have a nice browser-based tool where you can just explore your data, build dashboards, and just see what you have there later on we even added the beats which are like lightweight agents forwarders
shippers whatever you want to call them written in go to collect log files or system metrics or
ping systems and that's when we renamed the entire thing again back from the elk stack and we're now
now trying to call it the elastic stack since well our products are always about kind of being
scalable and elk stack or whatever we first we tried to call it belk or elk b because well you
know elastic search locks is cubana plus beats so out of those four letters the only thing we could
make up uh was belk or elk b and we we even had a logo for that so there was like the b with the
elk horns um uh which was a cute idea.
But since we're always about scaling, we figured out this is not really that scalable
because if we add any other open source products,
we would need to redo the entire branding again and making up new animals,
which whatever letter we would get afterwards, it would not get any easier.
Typical naming.
So yeah, now we're trying to do or call it the Elastic Stack.
And internally, every time we see when somebody is doing a meetup
or some other event and calls it Elk,
we raise the internal Elk alert.
And somebody will reach out and say, hey, this is super cool,
but we try to call this thing now Elastic Stack.
But Elk alert is pretty interesting
because we always get called change log or change capital
l log all sorts of formations of it and we need a a change log uh thing jerry we need to do this
like an actual log yeah something that logs the fact that people are saying it incorrectly i love
that but naming geez so for for log stash we had actually the original logo was a wooden log
and people found it super cute though now everything is kind of like it's just letters and
yeah at some point as you grow as a company um kind of the cuteness uh has to take a step back
i guess and you you need to grow up a little and try to be more professional. Elastic, the company, supports Elasticsearch and these other services as well.
Is the model basically you're hosting around the infrastructure,
or is there also like an open core thing?
How does it break out in terms of the open source projects?
I realize they're plural now, versus the proprietary stuff.
Building a sizable company is kind of a challenge if you're an open source company.
We're actually trying to do kind of a bit of everything.
So we provide Elasticsearch and Kibana as a service, which we call Elastic Cloud.
But we also have this open core model where you get the core features as
open source and you can just do whatever you want. It's Apache 2 licensed, go crazy, do whatever you
want. But we do have some commercial plugins around that. We don't have a special commercial
version like some of our competitors or other vendors in the database space have, where they
have like a community version and an enterprise version. We don't really believe in that model. We have like,
it's really plugins that you plug into that core system. So even the paying customers,
they're using the same open source base, but you just add some functionality on top of that.
One thing that I've said, i've read some hand waving most
people when i see elastic search come up uh it'll be somewhere along the lines of hey try elastic
search and then the person will say well i don't really need advanced search or i don't need that
much for my search which maybe adam's heard me say that to him sometimes and then they'll say
elastic search is not just for search and then they go into, that's why I say the hand starts waving.
And I'm sure they provide ample evidence for that, but I usually close tab.
Does that ring true for you?
Is Elasticsearch supposed to?
They begin evangelizing and I duck out.
Is Elasticsearch more than just for search?
Is it a full-on database?
What's the core use case that it really slays at?
Yeah, I'm very careful about the term database because people have a very specific expectation of what a database does.
And I'm not sure we're 100% that since we're first and foremost, we're a search platform.
But we kind of want to be the data platform for lots of different use cases.
So we started off with the full text search use case, but then we found these other use cases.
And we always think about it that everything else that we add around it is also a search problem.
So for example, logs, which is kind of one of the most common use cases for us, storing the logs
itself is not that helpful. What you actually want to do is you want to search them in the end again
and find what is going on.
And we are extending that further, and we are doing metrics by now,
and we are doing more and more in the security space,
and we are also adding, or we always say we add to the family.
We are adding more companies and features and products to the family.
So we have a machine
learning component now and we're trying to do the application performance monitoring the APM
space as well and adding that to the platform so we're trying to to broaden out we're also doing
search as a service now so we have been adding more and more companies around that and trying
to get from the kind of like these core functionalities also more into the solution space.
Because some people are a bit overwhelmed when you just say them or give them the options and say like here you have this building block and then you can build pretty much anything you want with that.
But some are kind of like more okay I need a solution for this exact problem.
And we're also adding that
or going more and more into direction to add more of these solutions so you you just need search for
your website for example we want to provide you a solution to do that you can totally build it
yourself with the open source tools but we also try to give you more of a solution just to get
to the result quicker or you want to build a logging platform and you can totally build that yourself but we're trying to get you started in a kind of quicker way
so we we always have these building blocks and elastic search is kind of i would still say the
centerpiece and what everything else is built around that but we're trying to give you more
solutions uh now that well we try to help you with the heavy lifting it actually
reminds me of something and i'm not sure if you remember this conversation but back on go time 48
uh alexander newman or neumann i can't neumann recall how his name is pronounced neumann was
talking about restic which is his backup solution and he said something really poignant during that
episode he said nobody nobody wants backups.
Everybody wants restore.
And he got some pushback on that, but I thought it was so insightful because backups are actually a pain in the butt, and they don't do you.
They're not the end game, right?
They're just like an artifact that you have to deal with.
And if they can't restore, they're worthless.
So what you really are after is the restore.
And you said something there
philip which which made me think of that with regard to logging and like collecting the logs
and having them and storing them and it's like nobody really wants logs right nobody wants
this stuff what we want is uh answers right even with search like search is it and as a means to
an end we're looking for insights we're looking to find that thing that, you know,
we're looking for that piece of data that we remember.
And so it seems like what you're trying to do
is build around that, like you said, these solutions, right?
Like give us the solutions, not necessarily the tools.
We're happy to cater for both
because we have people in the open source space
who say like, oh, it's awesome.
I want just this building block
and then I can take it wherever I want. And then there are others who say like, oh, it's awesome. I want just this building block and then I can take it wherever I want.
And then there are others who are like, oh, I have this business need and I just want to get to the solution quickly.
And we're happy to help both of them.
Because, well, we are an open source company and we will try to always or we are doing our open source work and you can just build anything you want around that.
But then again, we try to broaden that out into the solution space.
It makes sense to going back to what you said with the fact that you're growing,
which we haven't really talked much about the company size.
Not that we have to go too deep on it, but from what I understand,
you've got a pretty large company and your model is build open source tools,
or at least it seems you can tell me if this
is true or not build open source tools that you can give freely out there but at the same time
you're about solutions so you take these open source tools that jared or i or anybody else can
freely grab contribute to and use and build our own solutions but you've gone ahead and
as a company as a mission as a model business model, built solutions around your open source
as paid-for services to sustain yourselves and grow?
Well, not only paid services.
Some of these solutions are also in the open source space.
Really?
So you can run them yourself.
So for example, the APM company that we acquired,
the base components for that are all in the open source space.
Also because we kind of saw an opportunity there
that they're like in the APM space,
there is not that much open source.
There are not that many open source solutions
that you can use today.
But we think for us as a data platform,
it makes a lot of sense to not only have logs and metrics,
but also cover more things like the tracing
or APM functionality there.
So we're trying to extend that.
But of course, if you don't want to host it yourself,
we're happy to host it for you and provide it as a service.
Or we have some more features around the entire thing
that you might be interested in as an enterprise
and you want to get our open core features or you also want support.
But we're always packaging support
and the plugins that we have together.
This episode is brought to you by DigitalOcean.
DigitalOcean is a cloud computing platform built with simplicity at the forefront. So managing infrastructure is easy.
Whether you're a business running one single virtual machine or 10,000,
DigitalOcean gets out of your way so teams can build, deploy, and scale cloud apps faster and more efficiently.
Join the ranks of Docker, GitLab, Slack, HashiCorp, WeWork, Fastly, and more.
Enjoy simple, predictable pricing.
Sign up, deploy your app in seconds.
Head to do.co.changelog.
And our listeners get a free $100 credit to spend in your first 60 days.
Try it free.
Once again, head to do.co.changelog. So, Philip, when I said as a database, you were very careful around that word,
and you said that it's very much a search platform.
Perhaps you could say it's a better complement to a data store
or an additional data store that you have in your application.
I'd like to kind of take a small look at Elasticsearch kind of from a micro perspective of an application,
maybe perhaps similar to changelog.com,
which is a relational database on Postgres
that has some search functionality
that's just using Postgres' full-text search
and how an elastic
search would fit into that equation and really be a good complement and how it would do better
at the search side of postgres but then do worse at kind of maybe the acid side or
the relational side of postgres so um with postgres and the full text search features
in postgres it's kind of an interesting approach because Postgres is first and foremost the relational database.
And then they have kind of added more and more full-text features around that
just because you saw that, well, people need to search at some point.
And that's fine.
It's just like at the core of Postgres,
there is still kind of the relational database,
whereas Elasticsearch for the search use case
is really built on having as many features
and being as scalable around search as possible.
And it's not just an afterthought as with other products
where they have like some full-text search capabilities,
which is often like,
I'm not saying this is Postgres in specific,
but like on some products,
we have the feeling that it's kind of like this checkbox where you say, oh, we do full text search as well.
And then when you press further, it's like, ah, yeah, we're doing this one or two things.
But if you really want to take advantage of it, then it's not going to help you that much.
But what Elasticsearch does is basically is whenever you store some text, we have this analysis pipeline.
So, for example, we know something is an English text.
And for an English text to search, you have some rules what makes sense and what doesn't make sense.
For example, you do something like stemming.
Stemming basically means you cut off English.
It's a very simple language in that regard.
You cut off the ending of a lot of words because you don't really care if something is a singular or plural. It's just you're just interested
in the concept or you're not kind of concerned with the specific form of a verb. You're just
really interested in the concept that you're looking for. Then you're normally kicking out
stuff like stop words, which are like very common words that appear in nearly every sentence or text,
but they add very little meaning because and or an article would be in nearly every sentence,
and you don't add any values. So that is what full text search does. And Elasticsearch is kind
of elaborate in that area. So we support a lot of languages. We support a lot of features to
refine your search. And that is where kind of the benefit of full-text search would come in normally.
Yeah, I think that's where I'm driving at.
Can you enumerate those additional features that you're going to get
by complementing your relational database with an Elasticsearch platform?
Like what additional things is it going to give you in terms of
search relevance? What search is generally giving you, I'm always comparing it that databases are
very much black or white. You're searching for something and then you get a hit or you don't
get a hit. Whereas search is much more shades of gray. It's more like how relevant is that to what
I have entered? And it is normally a number that is being calculated in the background.
I'm not sure how deep you want to dive into that,
but there are multiple factors that play into calculating that relevancy.
For example, so the one sentence I'm always using is from Star Wars,
these are not the droids you're looking for.
Let me see your identification. You don't need to see his identification.
We don't need to see his identification. These aren't the droids you're looking for.
These aren't the droids we're looking for. Move along. Move along.
So if you saw that in Elasticsearch the sentence these are not the droids you're looking for after removing the
stop words and stemming what remains is droid you look because these are kind of the three main
concepts that might stick out or that people might be searching for so these are not they're all
irrelevant even the not like full-text search doesn't generally understand what you're saying,
like if this is positive or negative or what this is.
It's kind of just matching on these terms.
And draw it, you look, are the three terms that would remain when you do the search.
Depending on the sentence, you will have more or fewer stop words.
And we will kind of extract these base concepts.
And then since we're just storing this
stemmed version of the the concepts that you have the lookup afterwards is very fast because
whatever you're searching for if you search for droid or droids it doesn't really matter
the term you're searching for runs through the same pipeline so the stop words are removed
we're doing the stemming and then we can just go on the direct matches.
And then you can see, oh, we are searching for Droid,
and this sentence contains Droid.
Then we're doing the calculation of how relevant the specific text is.
For example, if a text contains Droid multiple times,
that is probably more relevant for your Droid search
than if the Droid term was only appearing once in the sentence.
And then we're assuming, okay, DROID is kind of like a relevant concept.
We give a specific weight to that.
And then we will also take into consideration
how long a specific element is.
So, for example, if your search term is appearing in a title,
titles are normally very short,
that is much more relevant than if it's just appearing
in text body because that is much longer.
And the base concept that is being applied there in the background, which I've tried
to describe here, is called TF-IDF, the term frequency inverse document frequency, which
is kind of calculating this relevancy.
The algorithm has been slightly refined by now. It's called
best match 25, BM25. So it's the 25th iteration of a best match algorithm. And this one is slightly
better now. And this is what is doing the heavy lifting behind the scenes for your search.
And if you compare that to the classical like search, a lot of people are probably still doing in the relational database.
A, you will have a hard time because this doesn't support anything like stemming.
This also doesn't support anything like fuzzy search.
This doesn't support synonyms and lots of other concepts.
And if you have the wildcard in the beginning, so if you're doing the like percentage,
whatever term you have percentage,
you cannot even use an index,
so your search will always be very slow
because you're basically going through all the entries.
Since you have the wildcard at the beginning,
you cannot use the index
because you don't even know where to start.
You need to basically go through all the entries.
Whereas full-text search just extracts the right terms,
and then you basically check where are these terms,
in which documents do I have appearances of these terms
that I'm trying to find.
And these different facets that I'm just thinking of,
like an equation, like this factor plus that factor
equals relevance, rank, or some sort of scoring.
Is all that stuff you know tweakable customizable
either at like elastic search configuration time or maybe even at query time with regards to how
how you get your results back there are a lot of tweaks that you can apply one you can tweak some
parameters in the search but a lot of the functionality is also like the way you store
the data for example if you resolve synonyms at index time that is
some index time feature or you could also do that at query time where you say these five terms are
equal and if the user is using any one of them i want to find all the other four as well or all
the all the other four places where i've where these synonyms are appearing. And you can build quite complex queries.
We have a proper query DSL that is giving you lots of power,
where you can say this must appear, this must not appear,
this term should appear,
or at least two of these three should terms should appear.
Or you can say, I'm looking for either one of these terms,
or if you have them as a phrase or
in combination like first one of them and followed by the other then it should be ranked higher so
you have a lot of ways to actually tweak that i suppose the the underlying bm25 algorithm i would
suppose that itself is not tweakable because you you know, after 25 tries, they probably are doing better than I could go in there.
You can still slightly tweak it.
If that is improving your search a lot is very much up to you
or up to your use case.
We always like to say it depends.
Whatever you're doing there, it depends on what exactly you want to achieve.
I would just start kind of with the basics and try to expand from there and not overthink it from the start. Otherwise, it can
get kind of a bit complicated. How well is full text search in Postgres, Jared? Like, since we're
asking him on the Elastic side how it compares, what are some of the things that you know about
Postgres and its full text search that we like or dislike in terms of indexing or
being able to you know query you know at index time or different things or being able to create
indexes and all that stuff yeah so you can do full-text search specific indexes in Postgres
that allow it to not do full scans on you know specific, and does fuzzy search and stuff like that. But you can't,
I don't, I don't know. Maybe you can do more than just that, but, um, you can't do all of
these different relevance facets and stuff that he's talking about as far as I know.
It's a specialized thing.
Yeah. And that's, you know, Postgres' full text search is better than other RDBMSs, you know, Postgres' full-text search is better than other RDBMSs, you know, reputationally as being, like, slightly better than a like query.
And so it gets you a little further.
And in many cases, you know, for small data sets and small uses, like if you're not searching very often, it's fine.
But in many cases, like you said, you kind of know when you outgrow it, I think.
And probably we're at a point now, Adam, where we're just getting to the edge.
I know we have a user story in our Trello board about search and some different ways that it should be matching, which it's not.
And maybe I could stretch our current implementation to work that way.
But at a certain point, it's going to become, especially as our data set grows, it's just going to become less relevant over time.
And we'll probably end up reaching for something like Elasticsearch when that makes sense.
Yeah, because it seems like things like plurals, which Philip, it sounded like that's something that's just baked right into Elasticsearch,
where pluralization of nouns or different things, different terms, that comes for free.
You don't have to be an exact match.
I find that a lot of times I don't find something because I haven't searched precisely enough
where it should be a little bit more forgiving to the user.
Yeah, and once you start growing, probably you need to scale past what Postgres can give
you.
So for example, if you're searching on Wikipedia,ipedia stack overflow or github behind that search box
there is always elastic search doing the hard work for you well hidden behind the scenes i was just
trying to quickly google some of the actual like the feature list on postgres and we're just picking
on it because it's what we use postgres is actually pretty feature rich yeah being pretty good for rdms
but it does do stemming. It does do ranking.
Supports multiple languages.
Has fuzzy search.
So, I mean, it can take you a ways.
And like I said, I've never used Elasticsearch.
I've never used a search engine, like a thing that's built for search for any of my client work or for Chainsaw.com because my data sets are small and my search needs are usually very trivial and so
that's why i said i'm kind of claiming ignorance on this because this is an area that i've never
had to move into i'm examining it yeah i very much feel like you know it when you need it
and once you hit the wall yeah you will feel yeah you're kind of like okay we need something that
that these you know these results are getting less and less relevant all the time and the other thing is that once you have elastic
search for one use case there are all these other use cases where it's coming in handy
so we are trying to give you a broader tool to cover kind of a lot of base for that can you get
some examples of like once you're using it it can also do x y or z well um so once you're using it for
search um then probably some analytics use cases come along like you have whatever kind of data
your company is having or what you're trying to do um especially in combination with kibana um you
can then just store all of the data and build fancy dashboards by just clicking a few buttons
basically um or you have logs for, who is visiting your website?
You have, I don't know what your architecture is in the background,
but if you have like an Apache or NGINX or something,
you might want to collect those log files
and just see like who is visiting our site,
which IP addresses, which we can then translate to a region
and do the GUIP lookup.
Or what errors do we have how many 404s how many
500s if we change anything on the website like who has changed what and why are we suddenly getting
more 404s what is up with our system and you could add metrics for example either business metrics
like how many people are coming to our website how much time are they spending but it could also
be metrics like okay c CPU and memory usage,
or if you're using Docker or Kubernetes or whatever system basically you have.
We're very good at collecting a lot of metrics for that.
And then you can bring all of that together in some dashboards,
and then you get the overall view, both of your business data,
but also like on the IT system side, what is my infrastructure doing?
I was just thinking about the logging aspect. And so, you know, you said you don't know what
our infrastructure is like. Well, we just push everything off to Papertrail,
which is a service that we use. And they probably have Elasticsearch on the back end or some sort
of search tool allowing us to then, you know, run our searches through them.
And so that got me thinking about Algolia and some of these other searches as a service.
And I'm just curious how either Elasticsearch self-hosted infrastructure or even Elasticsearch offerings, how they differ and measure up to other search options that are out there
for developers to pick and choose from.
So we're getting into two different areas here. I'll go here for the search use case.
We have recently acquired a company called SwiftType, which is basically in exactly that
area. And while their product was already based on Elasticsearch, they were just doing the crawling
for you and just automating that search process, basically. And that is one of the solutions.
I've talked about solutions before.
This is one of the solutions we want to add.
It's still built on the open source search platform that we have,
but it's more of a solution that you probably don't want to build that yourself
because you totally could.
And if you want to jump into that for a weekend project,
you can totally do that.
But maybe you just say,
oh, I just want to have a site that is easily searchable. And I just want a solution. And I
want my page to be crawled automatically. And maybe I want to fine tune some searches.
For example, if I enter this term, this should be the order that I want to have,
or I want to have some features where you need some fine tuning, you can totally do that.
But generally, it's just a solution that you can get started with.
Swift type.
I think I actually run that on my blog because it's a static site.
Nice.
And to add search.
Haven't they provided free for personal use for a long time?
So I think maybe I got got elastic search uh power in my
my blog searching and you didn't even know it even know it well hidden behind the scenes this is
awesome i love it love it well you said we're getting into different territories when you
talked about logs versus like search for a database or content can you go into that more
does it end with swift type for the log use case uh you can totally use uh one of
these kind of like smaller solution providers uh but then again it's one more island because well
your search results basically sit on on their solution or their site and if you want to access
anything while you're going there and then for any other data like business analytics you might
have another island um but it's just like lots of different islands which you then need to go each individually to get the bigger picture.
Our vision is more to have like one dashboard where you can show different things.
Where you can have both like, okay, my website did that much revenue today.
But also how did the latency of my website or how did the number of errors affect that?
And it's just like one tool where you have the overall and bigger picture for that.
Maybe you can go deeper into it because I see the user types caring about those interfaces
as one team but different cares.
Meaning I care about search and I maybe as a marketer I care about
terms or I care about relevancy or I care about people actually finding certain things or caring
about content that's getting searched but if I'm a developer I care about logs or if I care about
performance maybe I'm a different sector and it seems like those customer types or the the user types of those three different things in one dashboard
i'm why one dashboard well obviously you you don't have to like probably everybody will have the one
big tv screen in their office which the custom metrics that they are mostly interested in
but maybe you want to have like the bigger picture how did one influence the others
which right now well if you have
different solutions for that might not be all that easy and maybe also this kind of like siloed
approach is a bit partly because you had the different tools and everybody was kind of like
using their own view and there was no easy way to to bridge those different views. And I think that is kind of part of our vision to get the
bigger picture and have a better integration between all of these different departments.
I hated the term DevOps, but I think this is kind of partly that idea that you break down those
silos and that everybody's doing the thing that they have been doing for the past or in the past.
But you want to kind of like get beyond that and get to the kind of like the inherent value.
Where is the value in your company?
It's not like doing one of these things, but it's kind of like getting the bigger picture
and see how you can strive and what you by our friends at GoCD.
GoCD is an open source continuous delivery server built by ThoughtWorks.
Check them out at GoCD.org or on github at github.com slash gocd gocd provides continuous delivery out of the
box with its built-in pipelines advanced traceability and value stream visualization
with gocd you can easily model orchestrate and visualize complex workflows from end to end with
no problem they support kubernetes and modern infrastructure with elastic on-demand agents
and cloud deployments to learn more about gocd visit gocd.org slash changelog.
It's free to use, and they have professional support
and enterprise add-ons available from ThoughtWorks.
Once again, gocd.org slash changelog. so philip elastic recently published an article called doubling down on open in fact shay
wrote this february 27th of 2018 and i misread it i thought it said doubling down on open in fact shay wrote this february 27th of 2018 and i misread it i thought
it said doubling down on open source and so we were going to talk about that but it stopped short
it says doubling down on open and it kicks off with him saying he's excited to announce that he
will y'all will be opening the code for your x-pack features security monitoring alerting graph reporting so on and so
forth but this is not open source this is opening the code can you give us the distinction and tell
us what's going on here this is very much a definition problem but i think the osi has
a definition of open source which says something like you can see the code you can modify it and it's freely available
and the freely available is kind of what we're not doing there and since well we're a large
company our salaries need to be paid somehow so what we're doing with these features is and you
can get the source code on github we will have a special directory or there will be a directory
or a folder with these non open source parts so what is Apache 2 licensed right
now that will stay Apache 2 license but we will add the code for the commercial
features to get up so you will be able to see everything that is going on there
but to use it in a production environment you will be able to see everything that is going on there. But to use it in a
production environment, you will still need a commercial license. So it's not open source,
but I always say it's open code, because you can see the code, you can totally open issues for
that, you can even contribute patches back, we don't really expect anybody to contribute major features to our
features that we will sell afterwards. But you can totally see what is going on.
And that has multiple reasons. Firstly, especially around security, people always want to see what
they're getting. And with bigger customers, sometimes they wanted to have an audit of the source code behind that.
Well, it's much easier to tell them, well, the code is open.
You just have a look there and you can really see what you're getting.
Secondly, for us internally, it was kind of a problem because we always had the open source GitHub projects.
And then we had the XPEC ones, where the commercial
code was living.
And then you always had the problem that how do you work efficiently with that?
You cannot do atomic commits, because part of the functionality might be in the open
source part, and part of the fix that you're contributing is on the commercial side.
How do you communicate the issues to the outside world?
Because, well,
the issue for the commercial part is in the private repository. So nobody can really see
what is going on. And that will also make either communication, but also the process for us
internally much easier. And we just think it's the right thing to do. And everybody can see what
they're getting, you will still need to pay for some or most of the features.
You can see that in the feature matrix, what is commercial
and what is actually free to use but not available under an open source license.
So there might be some minor restrictions,
like you cannot provide it as a service for customers,
but you can totally run it for your own projects on premise so this is what we're trying to achieve there to kind of find a way to have or be an open company
and build on open source but still survive as a company and not end up like i don't know
for example rethink db i think that was one of the products that was really widely loved, but it was just not enough commercial in there that the company made the cut in the end.
And I don't think that is benefiting anybody.
So it is a fine line to walk, but we are doing our best to kind of be open and make users happy, but also have a sustainable business model and be around for a long time and build good products for a long time.
Are you guys following in somebody else's footsteps on this?
Or is this paving a new path with regards to this particular layout
that you come up with, with the XPAC features in a separate folder
and the license being in the way of it being completely
open source? It's definitely not very common. I think one or two other companies have looked
into similar things. I think CockroachDB is one of them, though they're much smaller and much
younger as a company. I'm not aware of any other more established or larger company doing that.
And also from the legal perspective, it is very interesting. And on the one hand side,
we really want to kind of like keep the legal text there to a minimum and not scare anybody away.
On the other side, it needs to be waterproof so that nobody can kind of legally or find a loophole to legally use our intellectual property or commercial intellectual property to make money themselves or just use it for free and work around that.
Some people have had the concern that, well, you can just take the code modify it and kind of like uh comment out all the
licensing restrictions um though we don't assume that this is kind of an issue for any established
company like anybody who is capable uh of paying or at least in the western world and i'm not sure
how it's like in the rest of the world, especially with the legal system there. But we don't see that as a major risk, that somebody could just easily modify the source code now and run everything because it's open.
We have thought about that. We're not afraid of that.
We're still in the process of drafting that legal document or that license that we will add.
And we're also kind of right now cleaning up the code for the opening
because, well, you need to make sure
what was closed source code.
There were absolutely no credentials.
There cannot be any references to customers.
You don't want to have anything else
that might be embarrassing.
So there is kind of a cleanup process right now
that the colleagues are going through.
The legal document may be in process,
but I can say for sure is that between this blog post doubling down and open, and then also we're
opening XPAC as well documented, so you're definitely doing a good job of like communicating
your intentions, which I think is probably the hardest hurdle to get over when making this kind
of shift, especially something that can be this controversial or be mistaken or feel misled if not described carefully.
You're saying why you're doing it, what's changing, when it's going to change, how things will be affected.
These two documents, which will be in the show notes, greatly communicate your intentions here.
We are really trying because even internally, people were confused at first. And after the announcement, somebody accidentally, like from within the company, even on the private account, wrote like, oh, we're open sourcing XPAC.
And it's like, no, that's not what we're doing.
And we have, it's an ongoing fight.
And obviously, once it's being posted on Hacker News, everybody goes crazy and posts whatever they think it means or doesn't mean.
And everybody has great fears.
And we understand that people kind of like are first a bit surprised because it's not a common model.
But we are really trying to do the right thing here.
And we think this is a model that might have a lot of benefits for companies as well.
So we kind of hope that this will be more common in the future,
or at least we're risking it and seeing where we can take this.
Curious what you mean by doubling down, it could be the risk portion of it, or just the fact that
something indicated that you should have such a belief in this direction that you're doing it.
I think it's kind of both. We really see open source as kind of the driving force and
how to get software out there and also what is
making us successful we we always see it like that like every paying customer has been an open
source user in the beginning that is really where everything is starting and even the sales people
understand that even though of course the sales people never want anything in the open source
space they would love to have have everything closed source and commercial.
But they're kind of understanding that model.
Like, how do you get where you are right now,
and how can you take it further?
Got to get paid, you know?
And they have like 50% of their salary being based on what they're selling. Yeah, you want, you want, you know, as a salesperson, you want no ceiling on your revenue
opportunity, you know, how much money you can make, because when you're in sales, usually
you risk, you know, what is often a salary, you usually get some sort of stipend or a
base or a draw is what the common term is used for.
And it's usually very small,
nothing you can actually rely upon.
So in that position, you're like,
I don't want any restrictions. If I can sell a lot, don't restrict me.
I'll sell a lot.
If I can sell very little, well, then you fire me
or I will starve, one of the two.
Totally.
Believe me, we commonly have these discussions
and engineering would, of course,
want to make everything open source
because, well, who doesn't? And sales obviously doesn't discussions and engineering would of course want to make everything open source because well who
doesn't and sales obviously doesn't or wouldn't want to make anything open source and we we try
to or we need to strike the right balance and it's of course an ongoing discussion but i think we're
doing the right thing here um we see um how that develops over time of course when it comes to
security i think that's uh you mentioned it earlier when you first started to share the details here, but I think it's so crucial.
You hear so often tooling or something being in the security space and you can't get access to the source code. which is totally opposite of this, but third-party CSS not being safe,
where Jake Archibald said the real problem is thinking that third-party content is safe.
In this case, it's third-party code or dependencies.
And so many issues stem from a dependency that becomes – what's the term for it?
Not safe anymore.
Unsafe.
Unsafe. That's not what I was looking for, but that works in this case here. But you it? Not safe anymore. Unsafe. Unsafe.
That's not what I was looking for,
but that works in this case here.
But you can't trust it anymore.
It becomes compromised.
That's the word.
And you've got that in your code base.
You don't even know it.
But the point is that you can see these because you have opened them up.
And it sounds like you also have issues open,
but you're not looking for people to contribute,
but you want people to for people to contribute,
but you want people to be able to see the code, scrutinize the code,
maybe even file bug issues and or patches that may be security related.
Is that correct?
Oh, totally.
And especially if you're a more advanced user and you run into an issue,
the first thing you might want to do is just check out the source code and see like,
okay, this is what it's doing and this is what it's supposed to do. And then can say oh either i'm using this wrong or no there is a bug and i can report that bug and then i can see the progress and i can be part of that discussion
and it's all on github where it's like much more uh kind of inclusive in the regular process you
have around everything uh you do in the open source space and we want to give people the
opportunity to participate in that as well and just be able to show like hey this is what we everything you do in the open source space. And we want to give people the opportunity
to participate in that as well
and just be able to show like,
hey, this is what we are doing
and this is when this release is coming out.
Otherwise, that communication was kind of very complicated
because then you would have like had somebody
to always communicate that like,
oh, we have fixed the bug
and it will be in that patch level release.
And then you shouldn't forget anybody.
Otherwise, people are surprised like, oh, is my issue now fixed in that patch level release. And then you shouldn't forget anybody. Otherwise, people are surprised like,
oh, is my issue now fixed in that release or not?
And it's just like creating an unnecessary barrier
that we're trying to get rid of.
Well, for the developers out there that are like thinking,
okay, so how big is Elastic?
You know, great, you got to make money, but how much?
Why don't we share with them how many people you got in your company
so they can kind of, you know, quantify that so to speak it's changing every day we're i think like
820 or maybe we're already 830 today um right now we are growing by 50 a month um which is an insane
number um if anybody is looking for a job by the way, just shoot me a message. I'm happy
to connect you. We have for pretty much any technology that you can imagine. We're not just
Java. We have lots of other stuff as well and lots of open positions. What's driving that growth?
Obviously, we have more and more products and we're getting more into that solution space.
So that is the engineering side.
But of course, since we have all these solutions, you always also need to sell them.
So we have also a lot of sales and marketing people there.
How has your community responded to this new direction?
You have your customers, you have lots of users of the open source project, even just on the
Elasticsearch repo on GitHub, there's 983 contributors over time. Now, maybe with 820,
you know, you've got a lot of those be your employees. But surely there's other companies
using this other individuals. And now this change for this direction of open, but not open source,
proprietary open code things that are going to be in the repos and this vision that's been laid out.
I know there's been some confusion, but has there been a backlash? Have people received it pretty
well? What's the response been? I partially uh confusion and partially people are waiting
since the the final license is not out there and they don't really know what it means um
that yeah i guess we will get the final vote once that is being done on the other hand if you're an
existing user nothing is changing like what has been out in the open source space is staying out in the open source space.
We're just adding more or viewable source code.
So if you want to take a look behind the scenes
for those features,
that is totally possible in the future.
So we're not taking anything away.
We're just adding more features.
And I think a lot of people care more about the free part
than the open source part, to be honest.
For those, not too much will change in that area.
That's an interesting question, Jared, to consider the response, obviously.
I didn't think to ask that.
That seems like the obvious thing to ask, which is like, okay, you've got this many employees.
You must have a large customer base.
What's the response?
And it looks like this announcement was made
at elastic on is that right is that how you say that elastic on or elastic on we normally say
elastic on right it's our annual conference okay and maybe it was just timing but but uh
have you asked or does any are you aware of why you would announce this change prior to the
uh the end user license agreement being available?
You know, because, I mean, you said confusion.
It seems to me that maybe some of the confusion can be guarded, I guess,
or just, you know, not there at all if the whole deal is clear and that's the missing piece.
A, you want to announce something at your annual conference and we really wanted to to put that out there and and show our commitment to openness
on the other hand since there is not that much uh prior art there um kind of just finding the
right legal text is a lot of work and we just were not there on the legal side for having the text.
We were aware that, well, it might have been better
or probably would have been better if we had the final text there.
But on the other hand, speed in that regard could really kill
if you just put out something that is not foolproof
or does have some loopholes that would totally impact the company.
So we really want to draft something that is substantial there
and is doing the right thing.
And it's kind of like the engineering discussion is very interesting.
It's like, oh, so since you have the part of the source code
that is Apache 2 license,
maybe you could just modify the Apache 2 license code
to circumvent
the license check for the commercial part. Maybe you could do stuff like that. And this needs a lot
of kind of discussion, both between engineering and the legal side. On the other hand, we don't
want to make this too restrictive to scare anybody off. So we are really trying to walk a fine line
of doing the right thing. And
unfortunately, that takes some time. And it's really a back and forth. I think it's important
maybe to put in perspective the reasons why. You know, there's a lot of confusion on the details,
but the why usually helps everyone understand the direction and maybe even gain some trust,
right? The why is because you need to be a profitable company
and survive and continue to have the necessary employees
to innovate and to deliver services, right?
I mean, that's the why, right?
I mean, that's not the why for opening it, yes,
but that's the why we need to have commercial features
that you can continue to get cool features
and we can
innovate other products.
But we're also,
like we said,
committed to this openness and it's just like finding the right balance.
And we would love to see that we are not the last ones to do something like
that,
where you have a commercial offering because,
well,
once you have a company,
you need that.
But also having this open part and not be like, I don't know, Oracle,
where you just cannot see anything in the source code,
and then something doesn't work out, you write to support,
and then you wait for some answer from support.
And maybe it's not giving you the right answer of how something is supposed to work.
Whereas once you have the open code approach, if you're knowledgeable enough,
you just look up how is this working behind the scenes.
I can just figure it out like in 10 minutes myself and see what is going on.
And I think there is kind of a tremendous value in that as well.
If you don't go this route, it sounds like you referenced RethinkDB earlier.
So it sounds like you're familiar with that story that if not sustainable, that Elastic could see a downturn in employment.
That means lost jobs.
That means – heck, that could potentially mean we see you on Patreon at some point rather than finding ways to sustain yourselves in ways that meet your own business model.
Not that that's going to happen, but that's an extreme case. What I'm trying to say is that we see open source projects and or products attempting to, and in a lot of cases, succeeding and sustaining through open collective, Patreon, direct support.
Obviously you're a company, so that may be slightly different. If you don't find a way to deliver these things you want to in a commercially viable way,
then it means lack of success.
It means, you know, company failure, potentially.
Yeah, and it's in nobody's interest to shut down a project like it happened to RethinkDB.
I mean, yeah, the code is available on GitHub, but I checked just a month ago or so,
and I think, like, pretty much nothing is happening there.
So this is, I guess, pretty much nothing is happening there so this is i guess
pretty much the end of it and nobody is benefiting from that because it was it was a great product
and it was also widely loved from what i understood so that's not what you want to do
we'll put in the show notes we did the uh was it the future of rethink db jared was that the last
episode we did it's a great show i mean it kind of end capped the chronicling of this podcast covering RethinkDB, which was two episodes of Slava.
And I can't recall the person we spoke with right now.
That's right.
Mike Glukovsky, episode 266, The Future of RethinkDB.
I got the title, but the person I forgot.
Mike was great to have on. He greatly shared the backstory, the founding portion of this,
and then ultimately how the IP was,
you know,
bought by the Linux Foundation and what that meant.
And yeah,
so we'll put that in the show notes.
Philip,
anything else we can cover here?
What,
maybe what's,
what's next?
So this is probably the hottest topic in your company and in your projects.
Where can we go from here? What's best to cover to close out the show so continuing kind of like the open theme uh is we're we're
doing google summer of code this year for the first time it's sponsored by google and organized
by google it's basically open source organizations can apply uh to run student projects and a student will then
implement some feature for the project in three months and Google is paying the
student for that and that has been going on for I don't know I don't even know
which year we're in about 10 plus years from what I remember because I think I
was a Google Summer of Code student like nine or ten years ago and participated in the project.
And now we are trying to be there as,
or we are there as an organization as well.
And we are currently selecting a student,
so unfortunately it's over for this year.
But if you're a student and you want to work in open source
during the summer and don't, I don't know,
serve drinks or anything like that then it's
a great opportunity and keep your eyes open in february for for the call for that and then you
can see more than 100 open source product projects where you can apply for either ideas they are
putting out or you can come with your own project ideas and if you're being selected you can work
on that code for three months during
summer and being paid by google so that's kind of a very nice thing for students to do i can
highly recommend that and we also see that as being part of that open source ecosystem and the
openness that we're participating in initiatives like that and try to bring on students into the
projects and like just the the new generation
into open source and help them getting started you were a student in that uh in this google
summer code i was a student in um in in google summer code uh i worked on a php based cms system
uh called silver stripe which nobody knows because it's from new zealand that was kind of like my
start into the open source world uh where i worked on a project and then i kind of kept ties with the project and
then two years later three years later that that organization uh was a mentor organization and then
i was a mentor with them as well and that's kind of like a common topic that you bring on on people
or students as uh on the student side,
and then they continue as the mentors
or as we now do on the organizational level,
driving that to kind of help the next generation
of striving the open source ecosystem.
I'm looking at their homepage.
13,000 plus students, 108 countries,
13 years, 608 open source organizations, and 33 million plus lines of code over Google Summer of Codes history.
Pretty impressive statistics and what an impact it's had over time.
Well, Philip, thank you so much for schooling us on the use cases of Elasticsearch, how a relational database like Postgres can leverage it,
potentially even how you can bridge the gaps
across various different vectors.
But yeah, thank you so much for sharing us that backstory
because that certainly educated me quite a bit.
And the fact that this is open source,
it began as open source,
and the direction of your company is so great.
So thank you for sharing that.
And thank you for being a fan of the show.
Thank you for coming on.
We appreciate it. Thanks for having me. And thank you for being a fan of the show. Thank you for coming on. We appreciate it.
Thanks for having me.
And I hope you can fix all your search problems.
Let me know if you need a hand.
We need a hand.
All right.
Thank you for tuning into this week's episode of The Change Log.
If you enjoyed this show, do us a favor.
Share it with a friend.
Hit that favorite button.
Add it to a list.
Tell somebody about this show.
And of course, thank you to our sponsors,bar digital ocean and go cd also thanks to fastly our bandwidth partner
head to fastly.com to learn more and we move fast and fix things around here at changelog because
of rollbar check them out at rollbar.com and we're hosted on linode cloud servers head to
linode.com slash changelog. Check them out and support this show.
The changelog is hosted by myself, Adam Stachowiak, and Jared Santo.
Editing is done by Tim Smith.
Our music is produced by Breakmaster Cylinder.
And you can find more shows just like this at changelog.com
or on Overcast or Apple Podcasts or wherever you subscribe to podcasts.
Search for us. You'll find us.
That's it. It's done.
We'll see you next week.