The Data Stack Show - 08: When data alone is not enough - Reinventing book shopping at Bookshop.org with Mason Stewart
Episode Date: September 30, 2020In this week’s episode of The Data Stack Show, Kostas Pardalis and Eric Dodds chat with Mason Stewart, the lead engineer at Bookshop.org. Bookshop is an online bookstore with a mission to financiall...y support local, independent bookstores. Their hope is to help strengthen the fragile ecosystem and margins around bookselling and keep local bookstores an integral part of our culture and communities.Among other topics, today’s conversation talked about making what some might call boring decisions with the data stack that are better described as mature decisions and the intertwining of human interaction with data for problem solving and recommendations.Background on Mason and Bookshop.org (3:28)Technical challenges of keeping up with a rapidly expanding business (10:00)Interacting with data from fulfillment partners (14:36)Data schema for books and dealing with Elasticsearch (24:46)Human intervention in recognizing problems and exceptions (31:38)In-depth look at Bookshop’s data stack (37:06)Using curated lists from bookstores instead of algorithmic recommendations (43:50)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome back to the Data Stack Show with Eric Dodds and Costas Pardalis.
Today we have a really interesting guest, someone who is near and dear to me because we've started several businesses together.
Mason Stewart is the lead engineer at Bookshop.org, which is a really cool company he'll tell us about.
Because I know Mason so well, I know that they do some pretty interesting things with data, specifically book-related data, because they sell books.
Costas, as an engineer, what are you most interested in?
I mean, on the surface, it seems like a pretty straightforward e-commerce concept,
but you're always good at sort of looking below the surface
and thinking through what data issues they might be facing.
Yeah.
Actually, I'm very excited to hear about all the details of implementing
such a simple, in a way, system
as a design e-commerce site.
Because I think that both us and our listeners will figure out that working with even the
simplest things and stuff that we take for granted, like a book, it can be quite complex
from a data perspective.
And especially when you do that on scale,
and especially when we are talking about scaling in terms of transactions
where people, they pay their money to get a specific product.
So you have to do, I mean, you're dealing with the money
and the desires of people, and you need to keep them satisfied.
So I think that in general, we pretty much around how to catalog books and archives and all that
stuff, which is more of the academic, let's say, way of dealing with this data. I know that this
can become quite complicated and tricky and very dirty also. Like this data usually are not,
let's say, the cleanest data to work with. But every piece of data that is not clean
can affect the purchase of a person.
So it's quite important to have
the best possible quality on your data.
So yeah, I think that people will be surprised
both on the complexity and the technical solutions
around dealing with an e-commerce site that works with books.
So, yeah, let's see what we can learn today.
Great. Let's dive in and talk with Mason.
All right. Mason, welcome to the Data Stack Show.
I've worked with you for many years in the past, so excited to have you on the show.
Welcome.
Thanks.
Thanks for having me.
Why don't we start out by, I would love to just hear a little bit about what you do at
Bookshop and then what Bookshop does and just kind of what makes it's a pretty unique model
and would love to learn a little bit more about just the company.
Yeah, totally. So I am the lead engineer at bookshop.org.
I have a lot of very typical engineering duties.
I do a lot of production work.
I deal with a lot of DevOps and developing features.
Also just kind of trying to build some sort of a long-term roadmap for how this app is going to function,
especially as it continues to grow and gain more and more traction is, you know,
how do we make sure that we've got a really robust app that can withstand the amount of books being purchased and the amount of interest from the community,
which has been really cool.
But it's also, you know, it also is a lot of books being sold on Bookshop.
And so, you know, our responsibility to keep the systems functioning
and be, you know, really a great experience for booksellers,
for readers, customers,
and communities that we're trying to serve.
It's a big responsibility, but so far we're having quite a bit of fun.
As far as what bookshop.org is, bookshop is, on the surface, a way to buy books online.
But there are quite a few twists in the way that we do this
that are pretty different from really anybody else, especially when you think of like buying
books from Amazon. So with Bookshop, we are specifically designed to support
local indie bookstores. That's a huge part of what we do. And also to support
our affiliates who may not necessarily be indie bookstores, but they are still promoting books,
promoting authors. The way that we do that is we take a really big chunk of the cover price of a
book, essentially all the profit that we would make from a book.
And if you have set a local bookstore as your bookstore,
you went and selected, you know,
looked up your bookstore by zip code or, you know, address, geolocation,
we'll share the profit, all the profit from that book
with that bookstore that you've selected.
The interesting thing there is that that bookstore doesn't have to have the book in stock.
They don't have to ship it.
They don't really even have to know that the book exists.
We send that request to our fulfillment partner and drop ship the book to the bookstore to the user.
The bookstore doesn't have to have
touched any of this and we'll still give them that commission on the sale. And then using Stripe
Connect, they can cash out. And we really are basically designed as a company to like send
the vast majority of our profits back into the indie bookstore ecosystem.
The same thing happens with affiliates. You know, we'll take the cut is a little bit less there,
but we'll take still pretty generous commissions on any books and those two, two affiliates who
have shared the link on social media or, you know, with a book club. And we also take a part of that sale and put it in the
global, I guess you would call it for lack of a better term, global pot of
funds raised for bookstores that we take every six months and distribute to indie bookstores
and really just write them a check every six months.
I mean, it's technically an ACH transfer,
but we send that money out every six months.
And if you don't select a bookstore
or don't use an affiliate link to buy a book,
we still take a big chunk of that
and put that into the global pot for all bookstores.
So there's really only some transactions that we actually make money on, but that's kind of by design. So it's designed to
take as much money as we can and infuse that back into local bookstores. We believe that
you can look at the about page on the site and see it written better than I'll say it here,
but local bookstores are a really big part of communities. They're anchors in our downtown areas. And I think that's a really good way of
putting it. That's where kids are going to learn to love books. It's where people are going to get
engaged with the community. They're going to meet authors and go to signings and readings. It's a
really important thing. And personally, I have really, really fond memories of my local bookstore
growing up being exactly that for me. I was able to take my son there for the first time
when I went back to visit last year.
And it's like the bookstore I grew up going to
and I was able to take him.
So that's something that I think
everybody on staff cares about.
And so far, we've really exceeded anyone's expectations
as far as the popularity.
So we are selling enormous
amounts of books every day. And that's, you know, it's been really exciting to see that,
see that happening. I joined a little bit after the launch, full time. So I wasn't there for the
very beginning and some of the early development stages, but even then coming into it where, you
know, we've got a, on average, a book selling every, on average,
maybe 15 seconds and orders being placed. So, you know,
just kind of stepping directly into that scale and the company,
I think only having really started opening its doors to customers,
I think maybe in March. So it's,
it's already grown to a pretty significant scale and that's,
that's been really exciting to see the community so excited about that.
That's crazy. Every 15 seconds.
Yeah, that's some rough average math off the top of my head, but we get about 8 million requests, 9 million requests a day.
If you average that out, it's about 100 requests a day so if you average that out you know it's about a hundred requests a second only about half of that's cashed so you
know our our application on its own is handling about 50 requests a second all
in and so we just got a lot of volume we got a lot of people on the site doing a lot of things and
buying a lot of books so yeah it's quite a bit of volume well i know costas will have a bunch
of technical questions but i'm interested to know the you know when you go from launching to, you know, selling a product every 15 seconds from a technical
standpoint, I mean, from a business standpoint, that's incredible. But from a technical standpoint,
and that's a lot of infrastructure to manage to make sure that the site doesn't go down that,
you know, there aren't latency issues that people try to check out what,
could you just explain a couple of the sort of technical challenges that you faced making sure
that the app can actually still function with such the pace of expansion is so high.
Yeah, definitely. So, you know, I definitely want to say thanks to
Happy Fun Corp in New York. They are the agency that kind of built the initial version
of Bookshop. And I joined, you know, after they had launched the app and they, you know, they
were the initial muscle behind the app. And, you know, I think it's a testament to their ability
that they were able to get the app off the ground in a relatively short amount of time and keep the app running.
But, you know, as far as the long-term maintenance and growth and success of this,
luckily the app itself is fairly simple. So it's a Rails application, which, you know,
there may be some people who are listening who find that immediately boring. But I actually
think that's a good thing, right?
We're not doing anything wildly exciting here.
We're doing fairly straightforward e-commerce
and we're using a really battle-tested and very normal framework.
And really, Ruby, you know, while Ruby is a slower language
as far as CPU optimization,
I don't think Ruby is our bottleneck at this point.
But, you know, we, the way that book data is stored is a little bit complex because it really
isn't just like, well, here's a book and here's the author and here's the name of the book and
here's an ISBN. There's quite a bit going on, right you have you have the different you know you would think
an international like an ISBN would be usable internationally it actually isn't so you have
a lot of issues where you have different variants of books that are not necessarily
there's no way to know except as a human looking at them that they are the same book just in different editions dealing with you know i think there's about 14 million books in our in our books database you
know including all variants of all books but at that scale you know we have to be we're obviously
just going to have some data that doesn't make sense we're going to accidentally have authors
that are misattributed and that may be because of the data from the warehouse we got had a mistake in it.
And maybe because of during our import, we made a mistake. So managing, you know, managing the
amount of data that we need to import pretty regularly is challenging because we can run up
against some very long running
background jobs. And we have to be very careful that those are as interruptible as possible.
So making sure that the app is reasonably fast, making sure that we, you know, that we're able
to keep up with not only the demand of orders that we need to send to the fulfillment partner,
but that we're able to import, you know, the pretty vast deltas on inventory every couple of hours
and that not take days to complete.
So I think the thing for me, you know, to wrap that all up is that the issue is that
just about anything that you could normally get away with in a Ruby application or any web application
becomes really hard when every table has hundreds of thousands of things in it, if not millions.
I tried to load a dropdown with a bunch of list of publishers, not thinking about the fact that,
you know, there's some 15,000 publishers in the database and, you know, a normal select box with 15,000 items in it is not really a great user experience for anybody. So dealing with sort of the, just the massive amount of even the smallest task can be a little bit challenging.
It's definitely, it takes a lot of extra thought to even do something as simple as a migration on the database.
That's very interesting, Mason.
Actually, I wanted to ask you, I mean, apart from the front end parts of the website,
which is like the interface that your customers have with interacting and trying to find the books that they need,
there's a lot of work that needs to happen behind the scenes.
And I think you touched this a little bit about like gathering all the
different data that is needed around the products.
And when we are talking about books, I mean,
they are not like the simplest kind of product to represent, right?
But it's a lot of metadata that you need to have there in order to,
for the customers like to find what they need, like descriptions,
titles, ISBNs, like there's a wealth of well-structured data there that is needed.
And I assume that also in your case,
because you are more of a decentralized marketplace, let's say,
like you have many different bookshops that are represented there.
You probably have like some kind of like different challenges compared to, I don't know,
like a company centralized, super centralized company like Amazon that controls everything.
So how do you deal with that? How do you collect this data? What's the importance of this data?
Where do you find this data? Like I'm going to Amazon and to be honest, like I don't know
how they find and how they collect and how they validate all this information around the books and the different titles. So can you give us a
little bit more information around that? I think it's going to be very interesting.
Yeah, for sure. So one thing that's interesting, and it takes a little while for everyone to kind
of wrap their mind around the fact that, you know, we may be supporting a, you know, let's say a bookstore
downtown in your city. But while they may need to have an affiliate account so that we can pay them
via Stripe Connect, they, we don't need to know their inventory. It's really, you know, it's,
I mean, I'm sure they have wonderful inventory, but we don't actually need to know it for the transaction because they're not responsible for shipping. They're not responsible for ever even touching the book. Um, you know, it sounds like we're just sort of giving free money to them, but that's, that's actually the intention is they don't have to have done anything other than exist as an indie bookstore. And, you know, we were trying to sort of make right some of the
wrongs of how, you know, how we shifted so much of our commerce away from local bookstores,
because the convenience of being able to order it online. And so in that sense, we don't really
need data from the bookshops, which I think is good and has allowed us to do what we do in a relatively
short amount of time. You know,
they're not having to manage their inventory on some kind of tool that we
built. We're not having to import their, their inventory. So that is good.
However, you know, dealing with our fulfillment partner,
we, we, you know, we have a gigantic set of data. A full
import can take an enormous amount of time. And so, you know, luckily we're able to get deltas
on the inventory, but even then, you know, we have to do some, some ETL type things with it,
where we take the, a giant inventory file, we split it into discrete files. We process them in parallel.
We have to, you know,
deal with the fact that we may be getting an import of book cover images,
not necessarily after we have gotten the books author data.
So we're,
we're absorbing all of this different data at sort of in parallel and
not necessarily in any sort of ideal order. So making sure that we store that in sort of a
temporary place that can be safely written to and then ETL it into our actual, you know,
sort of available inventory and making sure that's indexed correctly with Elasticsearch, it's a pretty big headache because even a very, very small negligible amount
of additional time loading something or writing something into the database,
when it's multiplied times millions of items,
we can end up adding days to an import if we're not careful.
So optimizing those queries are really, really challenging.
And dealing with the sheer amount of metadata that we have to use and get right.
And the reality is that even the best fulfillment partner in the world is going to still just
have human typos on an author name or a author bio.
And so having an ability in our own system to go in and override that.
So if an author says, hey, I went to go look at my book on bookshop.org,
but you have the wrong book listed under my work
so that I didn't write this one particular book.
So being able to override that, but make sure that on the next import,
we don't destroy that override is also,
and there's also some interesting challenges there.
So trying to treat the fulfillment partner
as the one source of truth is great,
but also realizing that there's gonna be times
where we're just simply going to have
to provide overrides for that data.
Yeah, it can definitely be, it could be a big challenge. It's, you know, it's also a challenge to make sure that we are, that we are, when we're dealing with warehouses, so we're dealing with two different fulfillment partners, one in the United States, and we're working on one with the United Kingdom as we're getting ready to roll out in the UK. And one of the, you know, the thing that
probably anyone who deals with warehouse integrations knows is that every warehouse
wants to receive their order in a slightly different way, in a different format. So that
may be something like generating an actual file, maybe a fixed width file and actually sending it to an FTP server.
And that's how the order is placed, which sounds incredible in 2020, but that's perhaps how some of these warehouses have worked forever.
And there's no reason, you know, there's a responsibility on the
warehouse to, you know, to try to receive those orders in a reasonable amount of time and reliably,
but also on our part to make sure that we built in really robust retrying mechanisms and fail-safes
so that a bad FTP connection doesn't wreck the business, you know? So yeah,
it's, it's as I know that that's probably, you know,
just barely scratching the surface of all the things that you wanted to hear
about, but it is,
it is a challenging thing to deal with at just about every level because of
the scale,
but also because of the amount of metadata that needs to be ETL'd and the sometimes
non-typical non-REST API sort of ways of interacting with some of these vendors where
it may be that we're developing some sort of parser for a type of file that
exists nowhere else except with this one warehouse,
you know? Yeah, yeah, yeah. I totally understand what you are describing. I think that's most of the industries out there and especially, okay, if you start seeing what markets are doing outside,
you know, like the Silicon Valley or like the high-tech sector, you will see that like things
are very, very different from what we read on Hacker News and all these places.
And there is a reason behind that.
They are building their systems for all these decades and there are things that are working
and change might cost a lot in mistakes.
So whenever you try to expand in a market like this, you always have to deal with that.
And it's something that we keep forgetting
of how important it is.
So I totally understand your point.
If I understood correctly,
the way that you have structured your product,
you have Elasticsearch that I assume
that's used for searching, for retext searching.
And then you probably also use like some kind of like database system
where you keep all the records there.
Is this correct?
Yeah, so we use MySQL as our primary data store.
And I think most everyone on the team is very, very used to PostgreSQL.
And during the very, very initial phases of the application,
there were some features in Google Cloud Platform's MySQL that were not yet available
for general use that the team needed to deal with some of the large file format, large file imports.
So, or sorry, they were not available in Postgres,
but they were available in MySQL.
So we actually ended up going with MySQL, which is fine.
And for the, you know, for the most part, I don't think, you know,
there's an enormous difference.
There's definitely some things to get used to and the difference there.
But yeah, generally speaking, our data storage is fairly straightforward. We're using MySQL and Elasticsearch is indexed for search on the site.
Yeah.
Can you give us like a bit of an idea of like the complexity of the data model or the schema
that you have to deal with for the books?
I'm not talking about the business logic that you have as a bookstore, but for the books itself and how this is like differently represented on an inverted index like Elasticsearch and how you can, you might not like to keep in sync your index with your database, which is a very classical problem with these systems, but it would be very interesting to see what kind of challenges in your
case you have exactly because of like these deltas that you try to introduce of the amount of volume
of the data that you have to work with and all that stuff. Sure. Yeah. So the schema for
the books themselves is not particularly, so if you take the, the sort of e-commerce, you know, the idea of a order and light items,
you take those things out of it. The schema for,
for a book is not particularly complicated,
but it's not maybe as simple as most people would guess either. Right.
So we have, you know,
we have a contributors table because you'll have a particular book is going
to have who wrote the forward, who
wrote the afterward, who edited, and also who wrote the book and were the multiple
authors. And you'll even have one book maybe in different editions, different
variants, having different contributors. So there's a need to make sure that all of the contributors are correctly, not only correctly
joined to the variants, but that we're listing them in an order that makes sense, right?
You know, if you're the primary contributor to a book, that's important.
You probably need to be listed differently than someone who, you know, just did,
wrote the forward. There's contributor metadata. There's the metadata for the book itself. There
are all sorts of different categories that are called BISAC categories, and they are nested categories that can go pretty
deep on a book as far as how a book is categorized. So it may be, you know, there's BISAC for, say,
science fiction, but then there's hundreds of subcategories of science fiction, right? So a
book can belong to multiple BISAC categories. And you have a different language variance of the books. I mean, it's not particularly
crazy from a data structure, but it's a little bit more complicated than would probably be guessed.
As far as indexing with Elasticsearch, I'm of the opinion that you could probably have
somebody's full-time job be just working on Elasticsearch for this application, and that would be a really good use of time.
We definitely don't have that.
So we definitely have to deal with issues where, let's say, a warehouse has just simply attached the wrong ISBN to a book.
And we have imported that book.
We have set up all the data structure has been indexed
in Elasticsearch. And then in a Delta, they've corrected their mistake, but the Deltas are large
enough that we're not necessarily going to be paying attention to every, you know, every,
every small change like this, but we can end up in situations where a book is still indexed in
Elastic. It's not actually available anywhere with the warehouse, our system may still have a record of it.
It may not.
It may be that an order has been placed and is waiting to be transmitted,
and then the warehouse removes that ISBN from their database.
So there are a bunch of tricky issues,
not only with having such a large elastic index that re-indexing can take a significant amount of time, but also trying to cache this stuff effectively.
So as we're trying to store things in the same way. So trying to mitigate that can consume an enormous amount of time. you know, for such a simple data structure, it really can still end up in a lot of complication.
I think to the e-commerce Ruby gem that we use, Solidus is, you know, it has a lot of features.
It's been around for a very long time, but it has a lot, it tries to deal with a lot of concerns
that don't actually matter in our particular case because we're selling books
and because they vary in very specific ways.
So we definitely have some aspects of Solidus,
the Ruby gem that we don't actually use,
but that aren't necessarily simple to remove, right?
So we may end up having to touch quite a few tables
just to update a book
to make sure that all of our e-commerce data has
been updated and the pricing and the availability and the stock and everything has been updated,
but also then dealing with all of the metadata of the book. So yeah, dealing with Elastic is,
it can be very easy to really ruin the search results on the website.
Yeah, absolutely.
We have done that a couple of times, even since I've started.
And the problem is that it's really not simple to see that mistake until someone starts to point out that,
for whatever searches you're doing to make sure that the re-indexing makes sense,
there's gonna be somebody else
who's doing a completely normal search on the site
as a customer.
And they're like,
this is a New York Times bestseller.
Why is it not in the search results?
And we have to go realize
that the particular combination of title
and everything just happened to be words that
are deprioritized, you know, or ignored in our search. So it's a challenge. I mean, it's,
I barely scratched the surface on everything that we're doing as far as our elastic goes. And even
then it's, it's a lot to take in. Yeah, yeah, absolutely. So the way that you describe it,
from what I understand, like consistency,
not only internally, I mean,
between your Elasticsearch and your database,
but especially with the warehouses
that you have to work together
because the whole interface that you have with them
at the end is around data, right?
You need to have like the right ISBN,
you have to send like the right information to them back and forth.
And there might be like mistakes that might become from them.
So what kind of, like,
have you figured out any kind of like mechanism to identify this kind of
issues? I mean, you mentioned that many times,
like the customers themselves,
like they find the issues and they let you know because, okay, it's a very hard problem when it comes to search.
But in terms of like the consistency between you and the rest of the vendors that you work with and you interface through data, what have you figured out so far that it works and how hard as a problem it is at the end for a business?
Yeah. So that's a good question. So, you know, there's kind of a couple of layers to it. So one is when we're placing an order, which is, you know, in some ways you could consider the most,
the most important as, as what we're effectively saying is you told us you have this thing,
you told us this is the ID of the thing. This customer
wants this many of this thing. And as simple as that conversation is, there are so many things
that can go wrong, particularly because of the way that books are ordered and how back orders
and pre-orders are dealt with. So you may have a situation where
you say, well, look, you know, we'll, we'll, we'll accept something being on back order,
but only up until a certain amount of time. So 30 days, 60 days, whatever you set that number to.
And if they can't fill the order by that point in time, you'd want to cancel the order, right?
Because there's just a certain,
only certain amount of time that a customer will want to have given you their money and wait for
a backordered book. But that's a bit different from pre-orders where you may have an extraordinary
amount of time before the book is actually released. You may even have delays. I had a
book pre-ordered since August of last year and it got delayed multiple times.
So it was technically pre-ordered for a whole year.
So you have those situations.
And I think the thing that we've found so far still
is that when something goes haywire with an order,
it's important enough to just send it to a human
to be looked at, right?
So like we can, we can try
to recognize patterns. We can try to recognize trends and those are things that are important
and that we should do. But at the end of the day, a human's still going to be really great at just
looking at the order and being like, this should not have been canceled. Or I know this book is
in stock. Something's wrong. Oh, we've mangled an ISBN or the order was in some way malformed.
Or, you know, we just, it could be that on either side of us, a warehouse or us, we just,
we have an outage, right? Where someone has an outage where they just simply can't process
orders during some hopefully small timeframe and, you know, trying to discover, well,
what did, did they actually receive them? And we didn't simply get the acknowledgement or, you know, all those different back and forth. So there's a surprising amount of human intervention that has to happen, even when things are working, aka, you know, we're collecting money and people are getting their books. There's still even then a lot of just really making sure that, you know, those orders go through.
We're sort of seeing the same thing with the, you know, as we're integrating into the UK is, you know, our fulfillment partner there's even in their documentation says you probably won't be able to code around every, you know, every possible exception that could be raised. We really want to encourage you
to look at these as a human and try to understand what's happening. So it's always going to be a
challenge between do we raise an exception because something strange has happened or and do we stop execution do we stop
processing or do we want to assume that this may actually be okay and what level of trust do we
want to put in you know our partners and ourselves to make sure the order goes through even if
something seemingly mysterious happened i accidentally sent an order to our UK fulfillment partner that had no shipping or
billing address or name. So it was just like, I just sent them an ISB, no information about where,
what they should do with it. And it actually like, because of some interesting things, it,
the order wasn't necessarily rejected. They actually just emailed and said, Hey,
we don't know what to do with this order. Right. And so in a sense,
that's actually pretty nice because that was an honest mistake. The trick,
you know, the trick is what do we do? You know,
how do we protect ourselves from sending huge amounts of orders that may not
necessarily fail validation, but will require human intervention.
So threading that needle is honestly, it's a challenge, right?
There's nothing like having a person look at some bad data and figure out what's going wrong,
but there's also a limit to how much of that we can do in a 24-hour span with a small staff.
Hmm. Mason, I mean, number one, it's just amazing to hear about the plumbing complexity behind just ordering a book.
So thank you for everyone who's purchased on Bookshop for all the work you're doing on the back end to actually make that experience seem really simple to us.
Stepping back a little bit, one thing we like to do on the show is just ask about the stack.
So we know that it's a Ruby app, but it sounds like you have a pretty heavy-duty warehouse set up to manage different jobs and would love to know more about the infrastructure and you said you work
on some devops stuff too and any tooling around you know sort of the data pipeline flow
in the stack would be interesting yeah totally so um so it's kind of interesting because you've
got the first version of the app out and that's working for the United States. And, you know,
it's a lot of just sort of figuring stuff out on the fly. And again, hats off to the folks over at
HFC for even getting the app to run and keep up with the traffic. But we've been working on the
UK version of things and we're trying to make some small improvements in the UK on kind of a separate branch
that we hopefully can merge back into the US master branch.
Because right now we're technically deploying
two different versions of the app,
one for the UK and one for the US.
The UK version, we are actually trying to run on Kubernetes,
which is Google's distributed or not distributed, the Docker orchestration library.
I really like Kubernetes. I've used it before and it definitely has a little bit of a learning with, even though we're really not doing microservices, we still have some needs for, you know, having some kind of unique one-off jobs and job queues and, you know, small services.
I mean, it might count as microservices, but that's really not generally the architecture we're working with.
It's generally a monolith. So Kubernetes is great. We're on Google Cloud Platform. I think that generally
speaking, because we sort of positioned ourselves as an alternative to Amazon, it seemed like it was maybe in bad taste to host the app on AWS. So put it on Google Cloud Platform.
We're using Redis and Sidekick as our job queuing system.
And Redis is sort of a masterpiece at this point.
I mean, it's a really, really fine piece of software
as far as dealing with, in a Rails application,
dealing with job queues.
And I think we're going to pretty quickly end up,
because we've made some of these jobs very, very granular,
where we will have hundreds of thousands
or millions of jobs running in the queue
and not necessarily running in parallel,
but we'll have a lot of jobs in the queue and not necessarily running in parallel, but we'll have a lot of jobs in the queue.
I've tried to structure the new version of how fulfillment works in a bit of
an event stream. So it's not necessarily in a proper event stream.
You know, we're not trying to reinvent Kafka,
but there are tools like Kafka but
it's really really important when we are dealing with orders containing lots of items and separate
shipping addresses and so many little things can both succeed and fail that we treat those things
as atomic actions and that we have a very very clear paper trail of
exactly what events happened uh when and where and why and how our where our strategy is for
when the event fails how do we want to retry it should we retry it um so that's been really
exciting to kind of see that uh coming along because the first version of the app did have some ideas, some ideas like that
baked into the app, but I'm trying to formalize them so that the event stream for fulfillment
processing really doesn't know anything about the underlying data structures in the e-commerce
Ruby gem solidus. So there's really a very, very small layer that has knowledge of the events and has knowledge of the
e-commerce data structure, but otherwise it's, it's you know,
it's trying to be an event stream.
And I think we're making some really good progress there.
But aside from GCP, MySQL, Rails,
and Redis.
I don't know that we're really doing anything extraordinary.
The hardest parts of this app are by far dealing with things
for which there are no Ruby gems and there aren't necessarily tools.
So we wrote, for example,
we've been working on a wrapper around Ruby's standard library FTP to add a
whole bunch of modern niceties and robustness and retryability and things.
And, you know,
you would think that that library would already exist somewhere that a Ruby
gem for that would exist.
But I just don't know that most Rubyists
are having to deal with placing, you know,
20,000 FTP orders in a very short amount of time
with a probably pretty slow, you know,
FTP server that has probably been around
for a very long time.
So making sure that we're dealing with that responsibly
and we're not crushing the underlying FTP server, but that we also can really try to increase the guarantee that those things
will get over the wire. I mean, the stack is actually pretty boring. I think that's a good
thing for us. I would be a little worried if we were doing a lot of super crazy edge stuff here,
because by keeping it pretty boring, we're more or less kind of matching what we're seeing with
various vendors where we're not going to necessarily be seeing any of our partners doing leading edge stuff. And in a sense, that's nice
because that means things are probably not going to change and we can implement things and trust
that that will work for a long time. But it does kind of warrant a different approach than everyone's
got a fancy GraphQL and API. And it's a little bit of a different set of concerns, but I kind
of like it. I actually really enjoy it.
Yeah, I think,
I'm amazed on the way that you say it.
And I totally agree with this approach
that you're using the technology
to solve the business problem
and not focusing on solving technology problems
that many companies have to do
when they adopt the latest
super interesting and state-of-the-art technology.
And I think that especially when we are talking about e-commerce and about dealing with the
desires of people, their purchases and their money, we need to be boring.
That's the way to do it.
I think too, you know, I should mention now that you mentioned it, one of the
things that's exceptionally boring, but I think is absolutely a wonderful part of bookshop is we
really don't make algorithmic recommendations about reading or about books. So we really don't,
like, we don't look at the things you purchased and then try to make, you know, some sort of
algorithmic assumption about what you'd like.
When you're on the homepage of bookshop, it's,
it's recommended lists from actual affiliates and users and indie bookstores
and our curated lists.
And I actually really enjoy that because I think the results,
while sometimes, you know, the,
you may have already read everything on there and you may you know,
it may not be the most exciting, you know, like suggested for you type results, but honestly,
the results I think are excellent because humans curate them. And we actually have strayed from trying to do this automatically programmatically. Um, and I think it works out really well because
not only do these lists that they're attached to affiliates so affiliates and indie bookstores will get their commissions from these but um it's it's also um
just great recommendations and sometimes that can be really hard you know for anyone who's ever been
just unbelievably bored with the recommendations and spotify because you're like i don't want to
listen to any of this stuff why is this what you're telling me to listen to? You know, I typically have a distaste for over sort
of algorithmic recommendations. But yeah, I think that's right. I think that the boring, you know,
when we're talking about people's money and we're talking about bookstores livelihoods and we're
talking about making recommendations that are that are really actually great recommendations um we've gone a pretty boring route and i actually think that's
been in our favor yeah i think i'm sorry go ahead go ahead derek i was just gonna say i think um
i mean not to get too philosophical but i think you know when you when you think about the product itself, a book, um, and recommendations,
the, like one of the great things about books that I love is that they can expose you to new ideas,
um, and ideas that challenge you, but a recommendation engine that continually zeroes in on things that you've already read
doesn't necessarily serve you well
in terms of exposing you to new things.
So it's fascinating to hear that the human curation
works really well for you
as opposed to algorithmic recommendation.
Yeah.
Yeah, that's an amazing point.
Actually, all the time that both of you were talking about
this, I was thinking about some latest discussions that I've seen where people were talking about
places like Goodreads or subreddits for reading and recommendation around books and it looks like okay like technology
helps a lot to scale up things right like have like a recommendation algorithm there but it
really helps on that but at the end you reach a plateau where like the technology is not enough
anymore like to provide the right content for people because okay like at the end, reading, listening to music and all these activities, they're also like very social events, right?
Like I read the book because I heard about something from a group that I care about, from my community and all these things that you also mentioned, you touched like at the beginning about like the importance of the bookstore as part of the community.
And I think we are going, again, it's a bit of more philosophical point but i think it's quite
important and that's where we see like the the limits of technology at this point with all this
um um algorithms so uh having said all that and it's great that you're actually trying to do that. I mean, it's a very refreshing idea in driving
ecommerce around books. Any and this is like the last question. After that, we can close
this amazing discussion we had. Any thoughts about also building Goodreads? So many people
are still using it and complaining about Amazon pretty much has abandoned it and they are not updating it.
So that's a question we get asked quite a bit. And I don't know that I have a clear answer on
specific plans to do one thing or another that would be something you could say. It's
an alternative to Goodreads per se, but I think the thing,
I think the reason people ask that is that when they interact with bookshop and they see like the engagement we have with the community and the,
and the good we're hopefully able to do in the community. We,
I think everyone sort of sees the connection that like tools like Goodreads help.
They help communities, they help book clubs, they help bookstores, they help people connect.
And especially during a pandemic, like to connect in ways that they maybe couldn't in person and also maybe in ways that like might not totally make sense over Zoom. And so I think that there's a real desire in a lot of communities to have a way to
facilitate book clubs, to facilitate discovering new things and interacting with authors. And so
we're starting to do some of that with Bookshop where we're hosting author events, digital author
events now, and those are actually going great. And, you know, we're excited to have more community events. And I think we're excited to figure out ways that we can
further help communities read and enjoy books and help people discover new things,
whether or not that ends up looking like something that's built into bookshop itself,
or whether it feels like a more external tool. I don't know, but I do think, I know,
I know that that's something that we really are excited about.
And especially just because people keep asking, like, how do I share my,
you know, for on bookshop, you can't really leave reviews at all.
You know, there's just no mechanism for that.
And I think people want to share their thoughts and share, you know,
what they got out of books and what they're excited about reading.
So finding ways to do that with our users who are already really engaged,
I think would be really exciting, but yeah, I don't,
I don't know that we have any specific plans right now. Right now,
I'm just trying to we're just trying to get that things up and running in the
UK for Christmas. So we can hopefully help, you know,
help a lot of bookstores over there with the holidays coming up. So,
yeah, we've got a lot on our plate right now, but it's definitely something we're
talking about internally. Yeah, I was going to say migrating to Kubernetes is,
that sounds like enough work in itself. Yeah, that's surprisingly been the easy part,
but yeah, it is definitely a lot of challenges. So, we've got a good team, and the easy part. But yeah, it is. It is. It is definitely a lot of challenges. So we've got a good team and I'm excited. I don't have any doubt that we will be able to continue to
To build a great product and and help bookstores, but it the bigger this gets and the more people that buy books, the bigger the throughput, the more, well, the less room there is to get it wrong,
right? So we've got to really, really get it right going forward. But yeah, I'm excited.
It's been, it's been truly enjoyable to hear about the, I mean, sort of, in many ways,
chaos behind the simplicity of, you know, of selling books online.
So thank you for the time.
As always, you approach things in such a thoughtful way.
And I think our listeners will really love
just hearing of all the ways
that you're trying to solve those problems.
So thanks for taking the time to join us on the show today.
Yeah, absolutely.
Thanks to both of y'all.
I really appreciate it.
Oh, that was great.
I think the conversation with Mason was more than interesting
that I would ever expect to be honest, Eric.
And one of the most amazing outcomes is how important boring technology
can be at the end when you are dealing with business problems at scale
like the one of building an e-commerce site to sell books.
What do you think about this?
Yeah, I think that, you know, I've known Mason for a long time.
And I think one of the qualities that I appreciate most about him really came through in our discussion.
And that is that he is very thoughtful about the
way to solve problems and deliberately choosing you know what he described as a boring stack
in order to provide a consistent stable experience is to me just a really mature decision
and I think the other thing that just blew me away was,
you know, I've purchased books from Bookshop, and it's such a seamless experience. And I just had
no idea how much they're dealing with on the back end to actually make it seamless. So it was very
eye-opening for me and just appreciate all
of the thought that's going into it. So yeah, I would love to, we should absolutely catch
up with Mason after the Kubernetes migration, just to hear how that goes.
Absolutely. I think you put it very well. I mean, what Mason was calling a boring stack i think at the end we should call the mature stack
and when you are dealing with important and difficult business problems that's i think that
also as the engineers should approach these problems i think the approach that mason has
is like a very mature as you said approach on dealing with hard business-related problems and solving them using the
developed technology that we have today.
And yeah, I'm really looking forward to chat with him again and learn more about his
experience with Kubernetes and also their experience with expanding the company to more
countries and see what kind of complexity this is going to introduce to both the technology
and the business.
Thank you everyone for being with us today
and sharing the stories of Mason and Bookshop.
Talk to you next time.