Postgres FM - Contributing to Postgres
Episode Date: October 28, 2022Here are links to a few things we mentioned: PostgreSQL 14 coin (upside down!)Intro to Planner Hacking (talk by Melanie Plageman)Demystifying contributing to PostgreSQL (talk by Lætitia Avr...ot)How to become a PostgreSQL contributor (blog post by Aleksander Alekseev)Compile and install from source code (PostgreSQL Wiki)PostgreSQL mailing listsGitHub PostgreSQL mirrorGitLab PostgreSQL mirrorCommitfestsSo, you want to be a developer? (PostgreSQL Wiki)Resolving the search engine issue (mailing list thread)Planet PostgreSQLpg_stat_monitor (extension by Percona)ZomboDBpgx (framework for developing extensions)Awesome PostgresDepesz blog ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of Fiji Mustard, and this is my co-host Nikolai, founder of Postgres AI.
Hey Nikolai, what are we talking about today?
Hi Michael, it was a great pleasure to meet you in person a couple of days ago.
Finally.
Yep, finally.
In Berlin, Germany.
But we didn't take a photograph, so we can't...
Yeah, I also already realized it was a mistake, definitely.
So we will talk, let's have some almost non-technical discussion about how to contribute to Postgres,
what obstacles you can meet, how to pass those obstacles,
and what kinds of contributions we can do, right?
Yeah, absolutely.
So we've had requests about talking i think
probably reading them i assume they were talking about code wise maybe specifically or at least
everything that's included so maybe documentation but it would be great to also discuss contributing
in other ways community ways other ways that help the project in general? Right. There are several good talks from people who,
first of all, we don't contribute quite actively.
I'm kind of very, very, very minor contributor.
I had this coin with Postgres 14
and I already had notification
that I am included Postgres contributor list this year.
I'm in Postgres 15,
so maybe we'll have another coin.
I also was like
contributing more a little bit more 15 years ago or so in postgres 8.3 working on xml data
typing functions but i'm not definitely not a c programmer at all so so like that's it for me
well you say that's it and i think hopefully a lot of our listeners would completely disagree with you on in terms of contributing to Postgres evangelism and talking about Postgres, educating about Postgres. You do loads. The Postgres TV channel has been a great resource for me, especially over the years. I've learned a lot of things through there. I've seen some great talks I wouldn't have got a chance to see otherwise. Equally, you open source a lot of other code in and around the Postgres ecosystem.
There's a lot of contributions there.
So I think it depends a little bit on definitions.
But if we're talking only to Postgres code base, I guess I've contributed zero to Postgres.
But if we talk about the wider ecosystem, I feel like I've contributed a fair amount in terms of, you know, I've translated a migration guide, I've written a little browser extension, I've done
some documentation for like external documentation for explained, and these, of course, so it feels
to me like we're contributing, maybe giving talks at conferences, that kind of thing.
But if you only count code base wise, yeah, there's definitely and there's always people that contribute more.
Yeah. So our classification of contribution could include at least one dimension where we see like code or a non code contributions, right?
We distinguish can distinguish them. Sometimes they come together.
And also another dimension is, you know, this very, very old concept that open source is more a bazaar compared to enterprise software, which is more like a cathedral.
I think in our Postgres bazaar, in the center of it, we have a huge cathedral, which is controlled in a centralized way.
And this is Postgres codebase. that code base you need to follow some rules and approaches and so on and be ready to like apply
quite a good amount of energy to to be included and considered as a contributor in that cathedral
so i started from describing this cathedral but around it we have a lot of other buildings and
sometimes not buildings like many things it's it's really bizarre first of all a lot of additional software pieces of software for example
extensions some tools many of them open source most of them open source some are not and you
can also consider some are purely open source many of them have a postgres license which is very very
very open very short and very open bsd license. You can do whatever you want, just don't erase author names.
That's it, basically, right?
So you can even sell it and don't share.
You can not share your changes.
But some software, of course, related to Postgres has different license.
And what you still can contribute, for example,
Timescale has dual license and you
can contribute to purely open source
part and so on. Or for
example, GitLab. Yeah, so GitLab
or also DatabaseLab, I wanted to
mention our product as well.
We have a JPL v3
license, so it's not BSD at all
but still you can contribute there as well.
Full transparency,
PGMustard a completely closed source.
Yeah, and we have a discussion with what is better, right?
But yeah, so because of the requests and because of what people seem to want to know most,
I think we should start with, it would be great to introduce how to contribute from
a code perspective, how to get started there, and then we can talk a bit more about contributions
more generally or ways of getting started maybe elsewhere.
Right. So let's again, let's distinguish big cathedral in the center of our bazaar and all other places of our bazaar.
All other places, it's easy. You go to this project.
If you like the project, usually they have in readme some like contribution section or contributing.md.
For example, it's quite of quite kind of standard right now for open
source projects our database lab engine also has contributing.md explaining where to start where to
find the good first tasks to check code style and so on but if we talk about the main cathedral
i wanted to mention several good material we can recommend them in show notes unfortunately
material i quickly remembered,
it was from Anastasia Lubennikova.
It was in Russian, sorry.
So I cannot recommend those who don't speak Russian.
But there was a blog post in Timescale blog last year
from, I forgot the name, sorry.
Also Russian speaking from Timescale.
But it was quite good intro post.
You can check that.
And you can provide a couple of recommendations and check our show notes right yeah there was a talk by leticia i believe
recently i can't remember if it's recorded i'll get the link to that if it was she was she's very
good at publishing her slides so maybe at least those and there was one that wasn't it was with
examples i think one from Melanie Plagerman.
I'm sorry, I probably pronounced both of those badly.
But those talks were fantastic.
And they definitely, it definitely seems like there's quite a high barrier to entry.
But those talks give you a nice process for how to go about doing so.
Right.
But so far, we talk about only code contributions.
Very roughly, the process is this.
First, you need to learn how to compile source code on your system.
It's super easy, especially on Linux.
There is a wiki webpage which describes what you need to install on your system.
I usually do it on Linux remotely in Tmux.
I have Postgres checked out.
Postgres, usually, you will have master branch.
You can check it out from GitHub or GitLab.
GitHub.com slash Postgres or GitLab.com slash Postgres.
It's a mirror, right?
Postgres doesn't use branches for development.
No regular Git flow, like, development feature branches.
Branches are only to maintain major versions.
Right, so like stable release number 15,
uppercase, you will see it.
So you can check the code out
and switch to the branch
you want to explore.
For example, we check out
and either we stay
in the current development line
as Postgres 16,
or we can switch to previous one, 15,
and if you want to develop our addition
into 15 branch right so we switch and then we need to just compile it it's just three lines
you need to install dependencies as I've said like also several packages but then it's first
configure with prefix I usually do like prefix in my home dir because just avoid dealing with
permissions and sudo at all.
Because this Postgres will be
temporary for development purposes
so I don't install it outside of my
home directory. Everything is inside.
So you mention prefix and that's it.
. slash configure dash dash prefix
equals some
path. You wait a little bit. If you
install all dependencies correctly,
it won't complain.
And then you just make.
Make its compilation.
It will require some time, CPU time.
But usually it finishes successfully.
And then make install,
and you have Postgres installed
in the directory you used in prefix.
That's it.
Only three lines.
Simple, easy, super easy.
Yeah, and for anybody, if that sounds intimidating, then kind of baby steps before then can be
anytime you're curious about a feature, how something works in Postgres, you can start
by looking at the code, searching on the GitHub mirror, for example, looking at the code comments,
things like that, starting to get familiar with things.
And also, I guess it's obvious,
but there's some parts of the code base
that don't touch as much of the rest of it.
So there's no list of open issues or anything.
So maybe you need to start, like,
starting to monitor some of the mailing lists
to start to see what questions are coming up,
that kind of thing.
Is that sensible or
no development is done in pgskillhackers at postgskill.org mailing list so you need to
subscribe you can see archives on website and also there is commit fast kind of software supporting
to group discussions and the patches on a single
page for, it's like
replacement for pull request, I would say.
Kind of.
If we have some feature,
the development can last
years, actually.
Commit fast entry is
where people manually register
okay, we have this feature
and track status, lifecycle.
Is it an under review, or return with
comments to the author. So some changes are requested and so on.
But everything is done from mailing list and you need to
attach a patch to your email.
So it's like a project management system is like it's
the commit fest, almost like a way of project managing a little
bit. But I would I would add that that's fest is almost like a way of project managing a little bit. But I would add that that often comes up as a really good recommendation
for a place to start in reviewing other people's patches.
So sometimes it's a lower barrier to entry.
So once you know how to compile and make Postgres working,
I mean, if you run an ADB initialized database
and you connect with to post like
on special port for example you adjust that port if you have other postgres running usually like
if we can run multiple postgres we just need to use different ports so once you are able to compile
from sources and make postgres running you can check some patch it's a great idea you go to commit fest list of currently
patches there are always some many things that need review so you you can check something then
you need to learn how to apply patch to your source code interesting thing that once you
compile everything if you change something you can just say make and it will compile only what was changed
so it will be much faster that's how you can iterate and the same thing as with minor upgrades
you just need to replace a binary and restart postgres and you have already new version of it
so iteration is quite simple and fast you changed code make make install restart postgres and you're already working with new binary
so you applied patch make make make install restart postgres and you're already working
with patched postgres the only thing usually like you need like something like dash p1 or something
to ignore some leading parts of the path so you need to to know how to use patch command in command line in Linux.
So people need to be familiar with C, I guess,
or get familiar with C to contribute.
I disagree with you.
I disagree.
Oh, okay.
Yes, I disagree.
C is...
Familiar with C,
I don't consider myself familiar with C.
Like I learned it at university.
I used it many times, but I forgot everything.
Everything.
Well, there is Java, JavaScript, other languages which are similar, kind of, right?
You can lack a lot of knowledge, but you still can try to compile and use Postgres, right?
This is a super big contribution to review patch.
Not from the perspective of code,
but from perspective of feature, from behavior, right?
So this is one thing.
Second thing, to read source code,
you don't need to know C.
Open GitHub or GitLab mirror.
Just read all comments.
Comments explain a lot.
And also sometimes readme.md in the same directory
or txt,
I don't remember. So readme file, which
explains what's happening inside some
directory. And this is
enough to start.
It's not a requirement
to make source contributions to NoC.
But of course,
if you need to write
a substantial part of code, you need to be familiar with C.
But not for reviewing patches.
Very good point.
I wish actually reviewing patches would be even easier.
I know there is CICD already used from, I forgot the company.
It's not GitHub, not GitLab.
Some other company provides infrastructure to run tests and so on.
If we had, maybe we have it.
I'm not, I don't, maybe I'm missing some details here.
But imagine if we already have binaries for various platforms for each feature branch.
So reviewing user-wise, from user perspective, it would be much easier.
You just install that branch, you know, like a revision.
This is this commit you're reviewing or this patch you're reviewing.
But unfortunately, like this is a barrier.
But I must assure you, knowing C is not a requirement for reviewing patches.
Yeah, fair.
And one other like very, very close to Postgres project.
Sorry for interruption.
Just to complete picture.
Knowing C is not a requirement.
Requirement is knowing how Postgres works and to be a Postgres user.
So if you're a DBA or programmer or developer who uses Postgres SQL,
you know how it works in terms of SQL, some details.
And you know this patch adjusts some behavior.
This is what's needed.
You can say, before it worked this way, now it works this way.
I checked these corner or edge cases, and you can provide very useful worked this way now it worked this way i checked these corner or edge
cases and you can provide very useful helpful feedback to mailing list exactly replying on the
mailing list is as easy as replying to an email it's that's that's the barrier to entry there and
i've been able to do that even with a couple of different patches that one of them was planning
this is still not committed but it's planning to add
extra information to joins or loop like nested loop operations when to explain output do you
yes exactly and as a tool provider i'm able to add context as to when those for example they're
planning to add a minimum and a max rows per loop which is really useful when for example
the average rows per loop is zero below 0.5 it's really helpful to know that any rows are returned
because if you only get returned to average you as a tool you need to you need to kind of display
zero or make an estimate which is really not optimal and misleading at times so having that
max is really helpful at times and
a total especially so i think that was that's giving extra context that you have as a user or
as a someone else in the ecosystem can be really helpful if all people are thinking about is the
code and there was another area that i was interested in talking about which is kind of
projects around the core which seem quite central like for example the software that runs
the mailing lists and displays the mailing lists i believe that's a django project for example
so there are other projects written in different languages very very close to the core experience
of of postgres that have that use different technologies as well right and before we switch
to like to move move further of course if you
know c and you can analyze the patch from a code perspective and or write some patch of course then
you need to know some details how like to know about postgres macro macro says a lot of macros
macro inside the code some like try begin a lot of stuff and also how to use gdb debugger how to deal with some
core dumps when it crashes and so on but it's already like moving further right i think it's
not already it will be obvious wiki has a lot of information of course so yeah so it's easy to find
some interesting like details but if you want to discuss mailing list yes i for me it's huge barrier like sometimes i
have some proposal and i post a patch for code or documentation by the way documentation lives in
the same repository in doc or docs directory and it's in sgml, which is a superset of HTML, XML, XHTML, and XML.
So it's then converted to HTML.
Kind of interesting markup language I would like to avoid.
But anyway, it's there.
It's not difficult to adjust it.
And then you can have a different send patch in mailing list.
But it's painful for example documentation on our website i
can edit it right on my phone or ipad easily like i don't need to compile anything and so on like i
don't need to extract code i can do it on gitlab or github and just just that's it and i even using
ci cd i i will automatically have have result and check how it works.
In fact, I have a recommendation for you for open talks.
There was an absolutely fantastic talk at the Postgres EU conference just now by somebody not from the Postgres ecosystem.
And she's a professional technical writer and actually documentation architect
and really went into detail about how much...
Yeah, Postgres gets a lot of compliments
for how good its documentation is,
but there are also a lot of ways it can be a lot better,
and one of them was how much we could lower the barrier
to people being able to contribute to it.
The process of changing documentation,
it's the same as the source code.
So you don't need to compile maybe
but unless you you need to know how to convert from html but again like it's you need source
check out source code and deal with these files only on computer roughly yeah you can you can
connect to server and tmax right on phone but it will be really painful. So, so yes, this is the process. And
each time I have some proposal, it hurts like easily, like, but I consider this workflow as
very old fashioned and outdated. And, and it's because of these barriers, I'm 100% sure many
people don't contribute. So if you want to contribute you need
to be prepared to to apply energy more energy than you can expect initially i think unlike a lot of
open source projects in the default is not it mailing lists don't necessarily you don't have
to have a resolution for example in in g GitHub workflow, if you open an issue,
it has to eventually be closed. Otherwise, it continues.
Or pull request, merge request, right?
Exactly. But that would still close the issue, right? Whereas a mailing list, a thread.
A statuses.
Yeah.
Yes. But my point really is that in my experience, for example, with the documentation, one of my minor contributions was trying to push forward the change for defaulting to the current.
So trying to get search engines to pick up that we were telling search engines.
Problem solved. This is good.
Yes, it's a big achievement.
I like it, actually.
So I notice it almost daily.
And it's so good that it's finally solved.
So Nikolai and I know what we're talking about here, but for everybody else,
it's the problem with when you Google search for something that you hope the Postgres documentation will cover.
Alter table Postgres.
And you see somebody old.
Exactly.
In the past, maybe if you searched that six months ago,
it was a real lottery as to which version of Postgres documentation
you would get linked to.
In most cases, it was not the latest version.
Yeah, exactly.
For almost always, it was not the latest.
And now through a probably about a history of maybe 10, 12, 15 different threads that
were started on the mailing list, it eventually gathered enough consensus.
I erased this a couple of times.
A couple of those threads are mine, yes.
I think the final thread that did work actually might have started with you.
So, well, actually, no, I think it was Andres from Microsoft, Andres.
Yeah, he did a very good analysis of previous threads.
Like, this is the whole picture, right?
Yeah, and that, for me, was a really good example of sometimes that the patience that's needed but also the process the process by default ends in not
nothing happening you need to push constantly push yeah exactly so that don't give up and also
build consensus so starting with a patch isn't always the best advice. Maybe it is the best way of getting feedback on the specifics, but you can also start with a discussion.
Start with a suggestion and start to try and build consensus.
See if people even agree with your approach before you start coding.
That seems to be how some people do it and get success.
So talk about design and so on. Yeah, exactly. Trade-offs, alternatives that you
considered, examples of the problem that you've seen, that kind of thing. Right. And speaking of
mailing list, another problem, quite a huge one. We all already got used to use diff discussion per
line, like in pull request, merge request, we can say, in this line, we see this, we can start discussing it.
And have single context for everyone.
And multiple discussions can happen at the same time.
When you talk about the code and you discuss some patch in mailing list, you refer to some line.
And then if another person's already sent a new version of patch. You're already starting to visualize this.
So you need quiet imagination of code in your head to speak about multiple versions.
It even hurt me when I worked on translation of release notes.
Because some bug was in the original text in English, and it was fixed fixed and the line changed. And then my patch already, like patch may be adjusted,
but discussions, previous discussions referring to some lines.
So it's difficult, believe me.
Yeah.
Translations is, in fact, maybe that's a good time.
I know we don't have loads of time left,
but it'd be good to cover at least a few other ways
that people can contribute to the wider project.
And translations is a really good starting point for that, think if you speak another language other than english and you're
keen to help there's there's a lot of efforts to to translate different things and it's often if
you look at the list of people that have done each language it's a very small list of people
doing a lot of work there so i'm grateful to everybody who has been involved in those projects
those people who translate release notes,
they have an understanding
of what's new.
Actually, I was translated to Russian
for many years until this year.
And I actually enjoyed it.
I know many details. I spent
some time translating. I know some details.
But maybe you can
just blog post about new features
and about some things and so on and so
on it's also like of course we are we are coming outside of main cathedral in the center so you can
write content you can write create videos i think it's obvious postgres needs more if when you
search good materials of course sometimes you every problem you dive in, you will find maybe something but usually it's not enough and you can discuss your
experience and help help content based grow as well. Yeah,
absolutely. This area, I want to say like also, Planet Postgres
QL org is the good place to start seeing what's what's what
Postgres ecosystem has. But keep in mind that it focuses mostly on texts.
For example, if you push something,
try to push video only or some infographics,
something non-textual,
they probably will not approve this posting
and you won't be able to be present on planet.
But I think we do podcast, YouTube channel, some infographics. Not everyone can read long texts. I personally have issues
with reading long texts. I can write one, but reading one is always even more difficult
for me than writing. I'm a very visual person.
And I see in very, very different areas, technical areas,
how sometimes good visualization can be.
And I'm sure Postgres can have more materials.
Postgres ecosystem can have more materials.
Even sometimes dealing with code.
I remember we created several things.
Like you take code and you start visualizing this line
is responsible for this behavior and it helps you understand how particular thing is working
so infographics good also idea right yeah i've seen some good ones they get distributed on twitter
for example so it can be as easy as that putting something together and sharing it and maybe tag
tag some people that you know would share it and let us know.
On that note, things like Postgres Weekly is quite good at distributing things like that as well.
Yeah.
Picking up on things.
They share all types of materials.
I agree.
Yeah.
Right.
And why are we focusing on this?
Because the documentation is amazing.
I like documentation, actually.
It's great.
3,000 pages if you print it on A4 format.
But the problem with documentation is, of course, it lacks many things.
For example, pictures, only one or two.
How many?
Like, GIN indexes?
Five already.
Wow, I'm lagging.
But only five, right?
So there is opportunity here.
And it lacks a lot of how-to style pieces.
Sometimes you have something in the end of page, some usage notes, but it lacks completely how-tos.
For example, I just want to set up replication and steps better with some screenshots or visualization of console or of anything like screencast.
It lacks it.
There are attempts to build additional websites that try to deliver similar content,
but still it's far from being perfect to this area.
How to style documents.
Yeah, exactly.
But all those additional things you're talking about,
they're all examples of people contributing,
I think, to the Postgres project.
There's lots of examples of those.'t I think it's also probably quite important to say don't
ask you don't have to ask for permission to put something up as a third party so for example that
browser I made a browser extension before that redirect before the the problem was solved on
Google and other search engines to get to the latest version i made a browser extension put it on firefox put
it on chrome web store stores made the source available i didn't have to ask for permission
from anybody it's not widely used but there's no you don't have to wait like you don't have to go
to the cathedral and ask permission to be let in you can set up your own little stall you don't
even have to to pay to have a stall at the bazaar. Because wide view on it, it's a huge bazaar.
Sometimes people prefer not going to Cathedral.
For example, Percona, they created PgStartMonitor,
and it's actually a replacement for PgStartStatements,
solving some of its problems.
And they decided not to work on it as a part,
like not improving PgStartStatements, like not improving pre-sales statements,
just to have their own release cycle,
not to be dependent on the core development
and so on and so on.
It's an option, definitely.
Extensions are a huge contribution
to the Postgres ecosystem.
And even the team behind the extension,
ZomboDB, does Elasticsearch to Postgres.
A bridge between Elastic.
Amazing project, but I'd say
an even bigger contribution they've done
to the Postgres ecosystem is
the extension framework behind it
to help make extensions even easier
to write. So that's a fantastic
contribution to the ecosystem.
There are thousands of extensions.
But also don't forget
additional utilities, not extensions, but utilities. contribution to the ecosystem. There are thousands of extensions. But also don't, don't forget augmented, like not to go to additional
utilities, not extensions, but utilities.
Our, just our database, our pension, it's not an extension.
It's additional additional software.
And there are many, many, many other pieces of software.
And there is a curated list of Postgres projects, extensions and tools and so on.
And also postgres it's called awesome postgres.
We will provide yes. On GitHub. Of course it doesn't have everything, but it has many things.
It has a lot of things that are awesome on it. It's said that there is no such list of
curated materials for content about Postgres. I think on Awesome Postgres there are some,
maybe we should add some more to that. to that but it should be structured very differently
than software very differently because content is not software like it's another story maybe you
need a pull request you can do your preferred method on awesome maybe we will discuss this
separately but there are several starting points where you can check and find opportunity to contribute.
Yeah, absolutely.
Was there anything else you wanted to talk about?
Yeah, that's it, I think.
Oh, I have one more, actually.
I wanted to give a shout out to everybody in the Telegrams, Discord, Slacks.
IRC was particularly helpful to me.
People that help beginners.
That's a big contribution to the to the ecosystem as well
and people that especially people that are friendly to beginners stack overflow as well
right yeah and you don't have to be that technical you could if you're if you're one step ahead of
somebody you can help them along the problem with this always for me when i start doing this
by the way it's a good anti-depressant or something like you if you have some issues
some stress just go and
help someone stack overflow some chats telegram i don't know anywhere and the problem with me
usually when someone asks something and you you think you think you know answer but then you start
researching and couple of hours later you understand you you already learned a lot. It's good, but you also didn't want to spend so much time for it.
But it's good, really.
I know people who do it constantly.
It's like some hobby, helping others.
It's good.
It feels good, definitely.
But it can eat all your time.
Yeah.
And maybe if you learn and you spot patterns of what
people are asking i know de pez is very good at this he spots patterns of what people are asking
and then writes a blog post about it so he can just send them the link in future it's a good
way of getting into understanding what do people care most about what should i do a video about
what should the documentation cover better so if you want to contribute or what it may be even what
should should Postgres behave
differently? Should it have a different default?
That kind of thing. So maybe getting
involved in those communities might even be a really good
starting point. Postgres definitely should have
different defaults.
Yeah, different topic.
Thanks so much for joining us, everybody.
Sorry this one's a bit later than usual.
And looking forward to getting
another one for you next week. Did you get it as well? I've got several good personal pieces of personal
feedback. People Yeah, this is incredibly like, great.
At the conference. Yeah, it was bizarre. bizarre being approached being like, Are you the people
from? bizarre approach being like, are you the people from Pitchfix? Yeah, I enjoyed it.
I answered, this is exactly what drives us.
So we thank you for feedback.
So please continue providing feedback.
Like, share.
Oh, by the way, special announcement today.
If you have Apple device, iPhone or iPad,
please go to Apple Podcasts and write review.
Please. Yeah, thank you thank you would be much appreciated and especially thanks to everybody that came up to us and said thank and
gave us feedback in person as well i know it's a bit not possible for everybody but we appreciated
it great thank you we take care we did it we didn't miss another week great no that's fine
take care bye