Software at Scale - Software at Scale 46 - Authorization with Or Weis
Episode Date: May 10, 2022Or Weis is the CEO and founder of Permit.io, a Permission as a Service platform. Previously, he founded Rookout, a cloud-debugging tool.Apple Podcasts | Spotify | Google PodcastsMany of us have strugg...led (or are struggling) with permission management in the various applications we’ve built. The complexity of these systems always tends to increase through business requirements - for example, some content should only be accessed by paid users or users in a certain geography. Certain architectures like filesystems have hierarchical permissions that efficient evaluation, and there’s technical complexity that’s often unique to the specific application.We talk about all the complexity around permission management, and techniques to solve it in this episode. We also explore how Permit tries to solve this as a product and abstract this problem out for everyone.Highlights[0:00] - Why work on access control?[02:00] - Sources of complexity in permission management[08:00] - Which cloud system manages permissions well?[11:00] - Product-izing a solution to this problem[17:00] - What kind of companies approach you for solutions to this problem?[22:00] - Why are there research papers written about permission management?[38:00] - Permission management across the technology stack (inter-service communication)[42:00] - What are you excited about building next? This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev
Transcript
Discussion (0)
Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications.
I'm your host, Utsav Shah, and thank you for listening.
Hey, welcome to another episode of the Software at Scale podcast.
Joining me today is Or Weiss, the founder and CEO of Permit.io, which is a permissions as a service platform.
Thank you for joining me.
It's a pleasure to be here, Ustav. I'm really excited for our conversation.
It's pretty early in the morning there, right, in Israel.
Let me start with just asking about your background.
My background starts in the intelligence core in the IDF.
I had a long career in the IDF and then as a VP of R&D and I worked in several startups
and I founded another company before this one, another DevTools company called Workout.
And throughout our careers, both myself and my co-founder, we've built access control
for products that we've been building probably thousands of times.
But the most annoying part is that it've building probably thousands of times. But the most annoying
part is that it was more than one spare product. So for example, in my previous company, Rookout,
I ended up rebuilding access control five times for a product that wasn't even three years old.
It literally drove me insane. At each point, I thought, okay, I've built this, it's perfect,
I'm done. And every time it surprised me and you with more challenges coming either from the customers,
from security, compliance, from the infrastructure, or also from weird angles.
So for example, we were working with Cisco as a biz dev partner.
They were selling Rookout directly to market.
And at some point, they came in and said, we want our own back office,
we want to manage users on our own, we want to assign permissions on our own, we want our
salespeople to be able to work with this. And I looked at what we've built in and said,
there's no freaking way that I can make this solution support two back offices, I have to
once again, throw it out in the window and start from scratch. And I just thought, this is so silly. I don't want to do this. I want to focus on actually
building my product. And I remembered feeling that sensation, that mindset across my career.
And I just thought there must be a better way. And that's what brought me to create a
permission service so developers can focus on building
their products and not rebuilding this over and over.
Maybe can you walk us through where does the complexity in permission management come in?
As I think about this as a layperson who hasn't thought too much, you have different user
types, maybe they have different attributes or different roles.
When does this get complicated?
So I think actually the question that you're posing
is the crux of the problem at each point that we're building it it's hard to see what we'll
actually need down the road i myself fail i myself also have fallen for the same fallacy
um at each point that i was building this i I thought, oh, I have the entire picture. I know what I need.
I'm going to build this and I'm done.
But things are constantly changing.
And so if we take a zoom back, look at broader strokes, we can see that almost every company
starts with or every product starts with having admin and not admin.
And then you move to admin, not admin and super admin.
Then you move to access control lists.
So the people on that list can do A,
people on that list can do B.
Then usually as you start working with customers
and you need to have more structure in your permissions,
you move to role-based access control.
Then it also often comes with compliance
because compliance like SOC 2
specifically talks about these kinds of controls.
And then you realize, oh, actually, roles are not enough because I need more granular things. Like
I need roles plus ownership. It's not enough that an editor can edit files. It needs to be able to
edit only his own or her files. Or you need other attributes. I want to enable this only if a
customer is paying or only if they're in a specific geolocation.
So that's our back plus ownership plus some attributes.
And as you start to add more attributes, you start to slide toward attribute based access
control.
So everything that is a arbitrary element or property of either the identity, or the
resources in the application itself, or the actions you perform
on those resources are generally referred to as attributes. And as you start to add those,
you either find yourself at attribute-based access control or policy-based access control.
And as those gain complexity, you either try to simplify it with relationship based access control that can
also work with graph based access control. And nowadays, most of these would translate
into some kind of policy as code, the challenge is not necessarily understanding each of those
models. By the way, for each of those models that I described, we can probably have a discussion
for five hours straight on the different structures
and layouts and objects that you can have for that. And also how you create DB schemas for it.
And now we create an update mechanism for it and how you create an edit mechanism for it and a
versioning mechanism for it and an auditing mechanism for it. And each of them, it will
be slightly different, but also none of them is like the correct answer. application is a snowflake each application is unique otherwise it wouldn't need to exist
because there's another application like it so you need to be able to adapt these mindsets
or concepts to the concrete requirements of your product and the most challenging part
there is that your product is evolving, just like your company is
evolving and you as a development team are moving forward and gaining more features, more capabilities,
more infrastructure. And with that evolution, your permission model will change. It will also,
as we said before, be affected by what the customers want and what the product managers want and what
security, compliance, infrastructure, all of that will change your product constantly. On average,
every company refactors or rebuilds their permissioning system every three to five. And
the change, depending on how they're shifting or what they're shifting to can be between a month to eight months
of intensive labor of on average of three people team so the cost is also very high every time you
readdress this and the organizational fixture friction is also very high because these often
float in so it probably sees something like a product manager talking to a customer and the customer
saying, oh, we need another role because we have this guy who's working on that department
and we need them to have a slightly different set of permissions.
And the product manager would go, oh, yeah, sure.
Let's just go back and open up a ticket in Jira.
And some poor schlep of a developer receives that
ticket. And the people that opened that ticket don't actually realize that there's a world of
pain behind that simple requirement of adding another role or making roles dynamic, or making
roles more auditable or whatever it is. So it ends up just out of the blue becoming a huge project. And as it gets delayed and has more friction, more people start to clamor around it.
And that puts more pressure on R&D.
And then it asks.
And there's this gap between the requirements and the people floating them in product security
compliance and the understanding that's actually needed to build this that only resides with developers.
So a lot of the tension is there.
So it's both about the organization understanding and the developers understanding where this
is going and also aligning the organization around the journey because otherwise people
are constantly being surprised about simple things that are actually interesting.
And like you mentioned so many different kinds of like
access control like role-based attribute-based graph-based i've certainly dealt with like
policy-based when it comes to like cloud system like aws so i'm just curious like off the top
of your head like are there any particular cloud system that you think do permissions well or
because permissioning in like and it i amAM and AWS has always been like the biggest source of
confusion for me what's your opinion as someone who thinks about permissions like day in and day
out yeah so I think the for example the AWS IAM is an amazing system it's super powerful
but I see a lot of developers that are impressed by the power of that solution and consider that
breadth of scope ideal. It's actually not, I think not the right way to think about it
because what's right for a big cloud infrastructure solution like AWS is not the same as a SaaS
application or even a PaaS application. So there are definitely
concepts that you can take from it. But you probably shouldn't take it one by one and
definitely not fall in love with it. Sometimes I actually see that too, and expect them to work
perfectly for your solution. In the end of the day, all of these models are back a back the IAM
system policy based systems. These are concepts and tools. They're not final
product for a specific application. So you should look at them more as options and suggestions
and pick and choose what's right for you throughout in a journey where you'll constantly
be updating this. The reason that IAM is so powerful is it's because it needs to be. It's catering to a highly technical audience
that requires super flexibility. It can allow itself to be less interfaceable or addressable
by the average Joe, and it needs to cover a lot of different services. That's not your run of the
mill application. So definitely, it's a very good example of a well designed solution, but it's not your run of the mill application. So definitely it's a very good example
of a well-designed solution,
but it's not the perfect example for every application.
Yeah, no, I think it's definitely really flexible,
but I do think the complexity is just inscrutable at times,
like service accounts, like linked roles and stuff.
It's just like the concepts keep piling on,
but you definitely see how it allows you
to do a lot of interesting things there's like
solutions you can build on top of that like access manager getting all those access logs ensuring
you have auditability then combining roles and stuff together there's like a security audit role
that combines a bunch of policies so that's interesting then my question for you then is
how do you productize this problem, right?
Like you, as you mentioned,
there's a lot of complexity here
and every solution,
every company probably needs
a different set of answers to this problem
based on how technical their audience is,
what kind of flexibility they need.
How do you build a product
that encapsulates all of that?
Great question.
I'll start by saying that with the IAM and with AWS, there's a choice there to keep it
more complex and not addressable for the common Joe.
And that's on purpose.
That's not really what's relevant for more applications.
But you did
mention some other things there, like the auditing and logging of it and connecting this to higher
level concepts like roles. That's something that you'd probably find in almost every application.
So that's something that we can definitely look on in a positive light there. So every application
should probably have audit logs at some point and should have versioning on its policies and should have the ability to combine roles and to combine attributes.
Those are, it's a matter of time until they chime into the conversation.
And the way to think about it, I think, as in general with software is to think about it in a kind of modular stack. So you don't have to have everything at day one, but you want to
have the right stack and components built in so you can add more capabilities as you go. So you
want to start simple and you want to start with something that answers your needs now, but can
grow and can add more interfaces to the other people, the other stakeholders that are involved.
The way I like to think about it is in kind of three ways. And those are also the things that
we offer people when they work with us. I like to think about it in best practices,
infrastructure, which can ideally be open source, and then experiences and interfaces on top.
So with best practices, you have things like decoupling the policy and code.
Once you understand that things are going to change and both your application is going to
be different, and both your authorization layer and policy are going to change,
you understand that if you couple them together, every time you want to change one of them, you'll have to change everything. And that's going to be very painful
and add a lot of friction or basically slow you down and often reforce you to redo everything for
every little thing. So by decoupling policy and code, which essentially in a modern application
means creating a separate microservice for authorization.
You can keep your application more simple and your authorization more simple, and both
can evolve separately, but side by side.
So that's one of the key best practices.
Another one would be keeping things event driven.
Permissions and access control is a critical experience.
You want it to be quick, you want it to be performant, you want to be consistent.
So if you have something that updates in delays, you're going to have a bad time. For example,
if you want a policy, only users that have paid for a feature can use it. The information there
on who paid doesn't exist in your database today. That's a third party service like Stripe or
Chargebee or PayPal. So you need a way to synchronize with
that services is changing. And the best way to do that is to listen into events. So you have events
propagating in from different services. And you allow your authorization layer to be updated by
those events in a real time manner. There are more best practices, but we can circle back to them in
a minute. The second part is building the right infrastructure.
If you have a plug, a pluggable infrastructure that is extensible, once you want to add more
interfaces on top.
So for everyone starts, as we said, with just having basic permissions, basic enforcement
and a really simple model.
But on top of the model changing, you want more capabilities on top.
So you probably want to add user management with the ability to assign roles.
And you want to add API key management because you also provide some automation.
You want secrets management, you want audit logs, you want to be able to see who did what
in your system.
You want to multi tenancy, you'd want impersonation for the ability to see who did what within
the system logging in as that
user. You'd want approval flows, asking permissions from another user. And this list is, first of all,
things you've seen a billion times, and also never ending. There's always another item to add to that
list. So if we design the authorization layer where those kind of interfaces, experiences can
plug in on, we can grow gradually with the evolution of the application.
So we don't need to have impersonation, for example, at day one.
But we want to be able to easily add it
without refactoring everything when we get to that point.
And if we use the right best practices
and the right infrastructure, we'll be able to.
And it's just a matter of either adopting the right tools or learning yourself how
to work with those tools and best practices. And lastly, these are the experiences themselves. I
think it's the recognition that it's not you're not just delivering a feature here, you're delivering
a organizational pattern here. So it's not just the developers being involved with this. It's
all the other stakeholders, product managers, security, compliance. They'll need a modicum of
self-determines, an ability to manage this on their own and at least chime in on the conversation.
So we want to be able to provide them with interfaces early on, not necessarily at day one,
but we want to plug in those interfaces.
Once we recognize that,
we are ready for most patterns of that evolution.
And once we have interfaces for ourselves,
we can also offer interfaces for our customers,
which is also something that arrives pretty early.
The customers themselves want this democratized.
They want to be able to control
who they're adding to their organization within your application.
A subset of permissions that they can mutate on their own, maybe create a few roles on
their own or attributes on their own, et cetera, et cetera.
So generally at what stage of company does like someone approach you?
Have they generally like built an auth system or two and they realized
they should be outsourcing this is it like different for like b2b companies versus b2c
companies because i can totally understand that kind of complexity with oh if you want to do like
geolocation based permission checks like i don't want to build that on my own yeah so we're seeing
companies of all sizes all of them arrive
to us at the point where they they are actively working on this they are actively thinking about
this because some requirement has come that has came in and changed the way that they need to
build this we are seeing companies starting at square one just saying i have so much else to
build i don't want to deal with this at all. I think in
general, that's the common thread. Developers often don't care about this. They want this to
work well, but it's not a unique part of their product. And just they don't want to build billing
or authentication. No one really wants to build this and definitely not build this and make errors
while building it. The other two types are either companies have already built something in place
and realized that they need to change it because of those incoming requirements, or companies even
going through a more significant change. So we see big companies, for example, as they're going
through an IPO process or an M&A process, there are a lot of demands coming in, pushing also
critical timeline on the changes that they need to apply, or when they're
doing significant infrastructure change. So for example, we had several companies moving from
monoliths to microservices. So when you're working with a monolith, you can often rely on the built
in access control mechanism. So for like in Django, in Python, or Spring Framework, there are some
basic RBAC admin panels baked in. The moment you move to
microservices, that just stops working at all, especially if you're polyglot, if you have multiple
languages. So that often brings players to the table. And the painful part is if you arrive at
this later than earlier, the amount of refactoring you have to do is where most of the pain is.
And I think the most painful parts are people that have already learned,
they've glanced there's a different way to work about this.
They've decided, we don't want to put the effort of changing this,
we'll just tweak what we have.
And then they come back a year later and saying,
okay, we realized that didn't solve it.
And now we have to completely revamp it. And we actually added more friction on the way. So bottom line, we're seeing companies
of all sizes, but they come in with different requirements and different needs. And the idea,
like I said before, is to enable them to find a quick solution for what they need now,
and gradually evolve with it. Okay, that's interesting interesting to know and doesn't match my intuition like i
guess i i assume that as companies get bigger they would run into this but i guess it makes
sense that sometimes people just they know that this is going to be a problem from like their
previous job and they're like i'm just going to outsource this from day one so i just don't have
to think about this at all i think what the difference there is that people are learning
that this is an option,
just like with authentication. If you go five, seven years back, most companies would say,
why do I need to use an authentication vendor? I can just store passwords. What's the big deal?
And now most, I think most developers would react to that and say, okay, that's insane.
Storing passwords is really hard. It's the security and cryptographic aspects of it,
like hashing and salting and just tracking everything and doing SSO around that. That's
a huge pain point. And there's no unique value in implementing this again. And as people learn that
authentication solutions are an option, and that they are readily available, the mindset shifted.
I think the same thing is happening
now with authorization, a lot of developers are learning that they don't have to build this.
And most of them don't want to build this anyway. So if there's an alternative, they'll they often
stick to it. Some people are still struggling to saying, Oh, I've been building this for it's
actually with the bigger companies. So we've been, we've built this huge complex thing that we're really proud of. So what if it doesn't meet our
requirements anymore? So what if it doesn't meet the modern standards anymore? I think I can make
this work. And they're right. But every time they make that statement, they're just postponing
another point where they'll have to reconsider and actually adopt
the modern patterns. Because there's it's again, it's not about having the right solution. Now,
it's about having something that can evolve quickly. Okay, then one question that I have
for you is like you mentioned the Google Zanzibar paper in one of your documents,
maybe you can walk us through through, even behind the scenes,
permission management is not easy to run
in a nice and fast and consistent and scalable way.
Why have people written research papers about this?
Isn't there just an access control list
and you need to check whether a person's in the list
or not in the list?
Where does the performance challenge come in?
First of all, you need to realize that the average microservice sends three authorization
queries for every request it gets.
So if your authorization layer is inefficient, you're going to have a bad time because you
if it adds, let's say, 50 milliseconds, you're quickly getting to several hundreds of milliseconds before your application has done anything.
On top of that, there are other hidden complexities in how you store the data that you need for authorization and how you fetch it.
The data that you have for the application, first of all, is not all the data that you need for authorization.
We already covered the third-party services and distributed data plane and data sinks
you're working with.
But even just the data for the application itself, the way you structure the schema of
your database for the application is not the ideal way to structure it for the authorization
layer because they're actually querying different things and they need to do different joins and different aggregates. And you see that often that pain point starts
when people are moving from RBAC to attribute based. So they're piling in attributes, just
adding more queries to the database, essentially. And initially, it's fine. But then at some point,
the database chokes, because there are too many queries, they're too slow,
and while the authorization layer might be still quick, the underlying data layer can't
really support it, and everything screeches to a halt.
And so there's complexities in how you store your data, how you propagate it, and how you
manage its schemas.
And lastly, and that's something that is actually unique specifically to Zanzibar, is how you manage its schemas. And lastly, and that's something that is actually unique
specifically to Zanzibar, is how you apply consistency. So one of the key challenges
when you have a large complex system is things can change while the system, for example,
you're sending a request to the service, it starts at microservice one. And as that microservice is querying another microservice, during that transaction, the
world picture, the data for authorization has changed.
That's often referred to as the new enemy problem or a subset of the new enemy problem.
So now you have, as you're running queries for your systems, you're handling requests,
they're inconsistent.
So now you can have a case where at one moment, you're giving someone permissions and the other one they
don't have, or they have a different set of permissions. And you end up either failing the
request or providing the wrong result, or worse leaking data or access that you weren't supposed
to. And that's something that's really hard to track, especially if you have a high-scale system.
So in general, taking a step back,
there are two camps today.
What's interesting about the authorization landscape
is it's still nascent.
It's still evolving.
As, I don't know, humanity, society,
I don't know what you want to call it,
we haven't decided on what are the best practices
and standards.
We have some of them, but it the best practices and standards. We have
some of them, but it's not finalized yet. We're still writing that book. So unlike with authentication
and with JSON web tokens and with SAML and OpenID Connect on the IAM side, things are still evolving
in the authorization space. And currently there are two camps. There's the code-based camp and
the graph-based camp for implementing access
control. In the code based camp, you'd find things like open policy agent, which essentially says,
you should write policies loaded into an engine, a load data in the form of JSON documents in that
engine, you can have that engine run as a sidecar or as a cluster next to your services, and then
they can query it. It's really the equivalent of
the policy decision point in the ex ACML methodology for those who are familiar with it.
And the graph based camp says something different. There's a lot of data here, a lot of complexity,
a lot of users, we need to manage it in a consistent picture and consistent graph,
and be able to query it all the time in an efficient manner. And these camps have pros and
cons that I'll try to run through some of them quickly. So with code, first of all, code is
Turing complete. So you can describe any policy that you want. With a graph, you can't have a
Turing complete really, because then navigation on the graph won't be efficient. The moment you
make it cyclical, And the more it's
not a DAG, not a direct acyclical graph, it's going especially if the graph is large, you're
going to have a really bad time navigating through it. And it will most likely fail. So you can only
have more with Zanzibar and most graph based solutions, you can only have more simple policies,
mostly around relationship based access control. But it's really
great to describe hierarchies like nested files or folders or organizational structures, but it fails
when you start to do multiple attributes, for example, when you try to do more a back, I never
thing is the ability to do reverse indices. So you often ask the question in authorization, can who staff access this thing. But a lot
of times you want the reverse of that you want to ask who can access that thing. So
with code, if you have code, this answering the question can X you it's basically impossible
to get the reverse code only runs one way. You can try and maybe brute force it and enumerate all the options, but that's
really a bad way to do that. With a graph, you have the advantage of navigating the other way
around. So you can get, basically we get reverse indices out of the box. That's what some people
call the spice of Google Zanzibar. The graph, because you're managing a big graph in the cloud,
you get consistency. You control all of the pictures. So you can make sure that picture is consistent.
But when you work with a distributed layout, it's harder to do. But if you work with a with a graph,
and it's, it's a big graph that is remote from the services themselves, you're paying for latency,
when you're querying it as opposed to a small, efficient agent at the edge that
you can query.
So you can see that there are more pros and cons, but there are a lot of them that we've
already touched on.
And another thing that I think is interesting to see is that they're complementary.
So what the policy, what code is good for is the complementary or opposite image, mirror
image of what the graph is good for. So what I'm
actually advocating for is using both, is using both the graph-based solution to manage a bigger
picture in the cloud, and to use the code base to have efficient answers at the edge. And if you
have a component in between that syncs the two, you can actually enjoy both options. And I think that's
probably the ideal way to think about it. But it's still evolving. We'll still have to see
where things go. So like the ideal graph based solution would be like a Google Drive or something
where you might mark this person has access to this folder therefore they have access to every filed and
recursive subdirectory and that gets complicated really quickly because you could have tons of
subdirectories and they all need to do it so you need to traverse and that there's a code-based
solution is tricky and you're advocating for keeping these both of them because they have
these different use cases and then you have to figure
out how to keep them consistent which is like tricky yeah and yeah the more i think about it
there's it's not just a google drive that needs it like anybody who maintains things like here's
a collection of documents that maybe are not like don't have a lot of subdirectories, but you can add permissions
to the collection, you can add permission to the document itself.
So a lot of people are like building something in use case like Figma, or even like the company
that I work at might have to think about this kind of stuff.
And yeah, I guess I just didn't appreciate how complicated all of this could be.
And to be sure, just to clarify, we're just
scratching the surface here. Just on Google Zanzibar, we can talk easily for 10 hours and
not get to all the concepts there. We didn't even touch on the main reason that Google Zanzibar was
created, which is great scale. So if you just have a few users and a few objects that you're
interacting with, it doesn't really matter how you manage
this.
You can just shove it into a database, make most of the available data in cache, and it
would just work.
But as you start to move from hundreds of thousands to millions and above that, both
managing all of that data and the continuous scaling up of that data, that's what's going
to get you.
And so Google Zanzibar was built for those scales.
It was built to maintain that constant huge picture for things like Google Drive and YouTube,
which are running within Google and Google Zanzibar.
I should probably mention also that there are open source implementations of Google
Zanzibar. So Google hasn't released Zanzib that there are open source implementations of Google Zensibar.
So Google hasn't released Zensibar as an open source.
They just threw a white paper at us. But some cool folks at companies like AuthZ and Auth0 have taken up the mantle of implementing
it.
They actually haven't implemented it fully, but it's getting there.
But I think it's important to understand that for most companies, at least at the beginning, you don't need Zanzibar, you're not going to run things at Google
scale, you might need to be able to grow into that scale down the road. And that's an important
difference. So you want to create a modular solution with the interfaces that will later
on enable you to change your data layer into something like Zanzibar, for example, you can
definitely start with Zanzibar at they want, but you need to understand that
there are trade-offs.
So you will, for example, you'll have more latency and perform and general performance
to aggregate, but you'll get a better picture, more consistent, and you'll have an easier
time scaling.
But I think if anyone takes anything out of this is you should stick to the best practices.
Decouple your policy
and code, create a separate authorization layer, have an event-driven fashion to update it and
have it modular enough so you can layer interfaces on top. And then it doesn't matter. You can start
with the stupidest thing. You can have a microservice that always returns true for any
authorization query. That would be a good place to start because you can build on top of that,
as opposed to having something baked in into some if in your code that later on, if you want to
refactor, you have to do a full code review and change everything in the application itself.
So start simple, start modular, grow gradually, you don't have to cover all of this in day one.
It's also so hard to code review or like check for correctness with authentication
checks or like authorization checks. Like very few people write sufficient integration tests when
they add things like permissions logic or like they evolve it from admin, non-admin to something
more involved. So refactoring that code is often like another whole project.
That's why also the system themselves, the way you manage the code, you rarely see in the modern
solutions, just functional code. You don't see Python or Java as the recommended language to
write policies in, because it's hard to make sure that you cover all your bases when you're running because unless
if you have a rule but you don't invoke that rule you're you basically you're screwed but with for
example with opa or also they are using logical programming languages they're both derivatives
of prologue so also is a derivative of prolog, OPA is a rego, the language for OPA
is a derivative of Datalog, which is derivative of Prolog. And the idea there is that you have
a recursive engine that runs through all of the rules that are defined in a performant way. And
that way it ensures that you cover all your bases. Same thing is true of the graph, you have an
engine that does the graph navigation for you. So as long as you structured the graph correctly, it's going to do what you're
planning for. So it translates the problem from making sure that you cover all the bases within
the logical layer of the policy to structuring the policy correctly and auditing the policy itself,
takes it on another level higher and enables
you to focus on what you actually want as opposed to how it should work with prologue it really
takes me back to college like thinking about data flow languages i haven't thought about that in a
while but we've been talking about opa like open policy agent right so there's two separate
permission conversation
that we're having.
One is for like the end user
when you want to build like a system
that lets a certain user access
certain party for application
or a certain document or whatever.
There's also the microservice,
can this service call this other service type of logic,
which I think OPA helps with
because like OPA,
you can put that into like your Kubernetes, you can put that in as like a sidecar, as you mentioned. But the more I think
about it, you're basically trying to solve the same problem within your product and as like an
infrastructure component. Like does that sound right to you? Like, what do you think?
Yeah, yeah. So both OPA and also our general purpose decision engines, you can use them to make whichever decisions are relevant to you. They're focused on policy, but they're general purpose decision engines. OPA got its real kick, its real control across the stack. You need physical access control.
You need like locks on door.
And then you need network level access control,
like firewalls and zero trust networks.
Then you have infrastructure level access control
with admission control and service to service access control.
And then you have application level access control.
And then it evolves more and more in complexity
within the application layer into more logical.
And OPA really got its go in the infrastructure authorization layer.
And it's actually quite difficult on its own to take it to the application layer.
The big problem there is how do you keep it in sync with the changing application?
Like a new user is paid for the service.
How do I make sure that OPA knows about that user? Or we change the policy,
we added a new role and we did it from the UI. How do we make OPA know that there's a new role now?
And that's actually solved by another open source project. I'm actually wearing the t-shirt for it
now. So we created OPAL, Open Policy Administration Layer, that essentially takes that event-driven best practice and applies it to policy.
You are able to subscribe to topics for both policy and data.
And as events come in, they propagate into each of the instances at the edge, keeping them constantly up to date with both the policy and data that they
need, and only those that they need. And so you have a distributed administration layer for OPA,
and you can have your different third party services that are changing with your applications,
webhook and notify Opal on what has changed. And you can have your Git repository webhook on policy changes to Opal,
and it will pick those elements and trickle them down like rain to the various
Opal agents through what we call the Opal client. Opal does two things through that. One, it
solves that challenge of bringing Opal to the application there. And two, it really helps you
tackle the inconsistency problem
because it really focuses on propagating events quickly.
So the agents at the edge, even if they don't have the data,
they know that they're missing data, that the picture has changed.
And you can already start seeing this working with something like Zanzibar.
So if you have a big graph in the cloud managing the bigger aspects, you can take subsets of it through Opal as the graph changes and
propagate them in real time into each of the edge nodes. So each edge node has what it needs
being supported by the bigger picture managed for everyone in the cloud. So that kind of also
touches on the hybrid solution that we're seeing here,
and also how we are literally moving towards the hybrid solution and implementing it.
So your company is not just working on like end user like application security,
but it's also working on tooling for basically permissions across the stack.
Yeah, so we just we just try to solve this. So our notion is, developers don't want to build this,
it's really hard to build, there's a lot of complexities, it's really hard to be aware of
all those complexities, we want to abstract those away, we want to always enable developers have
access to the code to manage this with GitOps to manage this with infrastructure that they control.
But unless they want to do something, they shouldn't
be forced to. They should have the option, but not the responsibility all the time. I don't think
most people care about the difference between RBAC and ABAC. And I don't think they should.
I think a solution should abstract that and enable you to dive into that only when it's relevant.
You should be able to start simple,
build this, have it work and grow with you as you go. And the way to achieve this is by creating
standards. It's by creating solutions that are inherently built to address the problem and are
flexible enough to be extensible by the different snowflake solutions that need to use them. And that's really the mindset that we had with Opal.
And also why I think it's, though it's a really young project, it's only a year old.
I think that's why it's seeing so much success.
It's already in use in companies like Tesla in production, in Zapier, Accenture, and
dozens of others.
And as a significant community in Slack of people asking questions on a daily basis,
I think we were able to do that because we built something that is both powerful enough and flexible enough for developers to adjust it for what they're building. Yeah, I'm noticing this
consolidation across the industry around standards and going up the stack. It's very similar to what
AWS is doing, but more in like the open source ways.
Like now you have like open telemetry.
I was talking to the LightStep people a few years ago
and it really seemed like it's matured.
And now I'm guessing there's like more and more standards
coming out on authorization,
like how you should be doing this.
People are converging on to OPA
and saying this is the way it should be done.
It's interesting to see, see yeah as the industry matures
you think less about the infrastructure that's running your systems and more about your end
use cases you have to is it's basically the story of humankind right at the beginning we were working
like you had uh you just pick a stone and use that to hunt or to cut your meat or whatever and then one day someone
came in and said oh you should take stone from that guy he make good stone and then everyone
said you should take spear wood shaft from that guy he makes good shaft and then one guy then one
day someone came in and offered you a shaft with a stone already tied to it and say, oh, this is much better than getting it and assembling it on my own.
And we constantly spread out, create new solutions, then we consolidate and then we build more
layers on top.
And every time we add a layer on top, we have to specialize.
We have to create people that are or solutions that are specialized in building that.
So other people don't have to understand all of those complexities.
And the same thing is happening here.
The only difference is that
we don't have the right answer yet.
It's still evolving.
So what we're trying to do as a vendor
is to give you that promise of
no matter what spear or sling
will come into existence,
we'll wrap it for you
and make it available for you.
So you don't have to care about it.
As you go, you can focus on building your product.
And I also think it's our responsibility
to chime in on the conversation
and make sure that together
through the open source we're offering
and through integrations that are being built,
we create the right standards.
That's why we took this open source.
So we can have a public conversation
on how we can
all together build the correct thing for again us as society humanity whatever you want to
so then let me wrap up with what are you most excited about what you're building what's the
next big thing that you're excited about what's like the next feature or like the next project that's a good question i'd say i'm
most excited about the human interfaces which is funny to say for a developer tools product but i
think that's really key because when we explore the space when so we started with our own pain
but we wanted to see how it looks across the space. So we looked in into the bigger
organizations like the Facebook and Google as a glimpse into the future. And what we realized
there is that a they've invested a lot of time to build this. So for example, in Facebook,
they invested a team of 30 people for half a decade to just build the infrastructure components
for their X. And what they did is two things.
One, they, at some point, they had to move from just static rules, just policy you create
to a intelligent component, to a machine learning component that can react to the gray points
between the policies.
And B, that AI ends up translating the interactions back into organizational behaviors
and flows. So for example, when an employee tries to access the Facebook database, or the metadata
database, I should say, and they're querying more data than they probably should, or they do on
average, the AI can detect that as an anomaly. But because they
want business to continue, they don't just shut it down. Because you have thousands of employees
doing thousands of things. If you just shut down everything that passes the anomaly, things will
just screech to a halt. So what they do instead is they translate that into human interactions.
So for example, they ask the team lead for that person,
is what they're doing okay? Is there an assignment around this? Should we throttle this? Should we limit this? Maybe you should talk to them. And by going back to conversations and having
the people align back with the machine, they're able to both keep it secure and keep it fast
enough for the business to run. And I think that's something
that's coming up for all of us, both the as we're like, when we're building applications down,
it's mostly we're thinking about human users using our applications, but more and more, it's
applications on behalf of applications on behalf of applications on behalf of applications,
using our application. And we're it's like with algo trading, if like in the past, it was just like humans yelling at each other,
buy, sell, buy. Nowadays, it's all automated in a speed that humans can't really work with. So we
need a very quick layer that can react those things, interpret it and provide back interfaces
for us as humans to manage it and have it work the way
we want. So what we're building today, we already covered a significant part of the basic
infrastructure. And we're starting to look at the more automation around it. But mostly and more
importantly, building interfaces, low code interfaces, no code interfaces, human conversation
interfaces, that all the stakeholders can come in and
build this together in a way that can move quickly.
Yeah, like, it seems very similar to IAM right sizing, right?
Like this kind of stuff seems like super chaotic, but it makes sense that if you notice a certain
role is not using all the permissions that are assigned, AWS can tell you, you should
reduce the set of permissions, increase the set AWS can tell you, you should reduce the
set of permissions, increase the set of permissions, because if you see like an access denied,
but what if you do that in a more naturalistic way? I can also imagine you can actually use
your permission system to understand whether somebody is worth upselling this. Oh, this person
keeps going on a feature that they don't have access to maybe show them an ad saying buy
the product it's a it's interesting to think about one thing i remember from dropbox is like the
highest or the biggest like the the most popular way for them to make money was when a user was
over quota and they got like an error message because once they got that error message there
was a click do you want to buy more space that That's what made them most of cash. So it would be cool if we had an inbuilt like feature
flagging permission based system. I know you all have, I remember looking at OP to toggle, which
does something like that, right? Yeah, so that's one of our other open source projects. So you want
to be able to have a one core place where you manage your policy and have all of
your application feed from that. So with Opal, we already talked about how that propagates with
Opal and OPA. We talked about how that propagates to the backend. But what about the front end?
You want the front end experience to also adjust. So for example, if someone's going to get an error,
like a four or three error when they query the API, you don't want that to just be thrown in the
UI. You want to give them a different experience. If they can't click that button,
don't show them that button. And the way to do that today in general is with feature flag solutions.
That's the way front-end applications adjust our experience. So with Optogles, you can sync your
feature flag solution to your open policy. So you change your open policy and through Opal,
Optogles listens in and then updates your launch directly, split IO, etc. So you can have everything
chime in the right way. But more importantly, and kind of like touching on what I said before,
everyone gets the right
interface.
So the backend engineers can work with the policy engine and the GitHub solutions.
And the frontend engineers can work with what they're accustomed to, which is a feature
flag solution.
So everyone chimes in on the same conversation, but with the right interface for them.
Yeah, ideally, to me, you should just have the same thing like feature flagging permissioning
etc etc should just be like this one big product that manages all of that for you and helps you
like maybe upsell and block unless necessary but anyways thank you so much for joining this was a
lot of fun and i hope i hope you had a great time I had a great conversation thank you so much was great talking to you and I look forward to next time
yeah thank you I will take you up on it