librarypunk - 135 - The Once and Future Linked Open Data feat. Dorothea and Jonny
Episode Date: August 28, 2024The gang talks about linked open data. What did we learn in library school? What’s the future? Where does it fall off a cliff? Media mentioned https://www.hillelwayne.com/post/graph-types/ The Eth...ics of Sustaining Linked Data Infrastructure https://scholarsphere.psu.edu/resources/430f30bf-e029-483d-b1c8-d7e9bb430a8e Aaron Swartz unfinished book https://web.archive.org/web/20220512132144/https://www.morganclaypool.com/doi/abs/10.2200/S00481ED1V01Y201302WBE005 Tim Berners Lee https://www.w3.org/DesignIssues/Overview.html https://www.w3.org/DesignIssues/RDFnot.html https://neuromatch.social/@jonny/111938123937338008 https://neuromatch.social/@jonny/111939286501257910 https://neuromatch.social/@jonny/111731459659839112 https://jon-e.net/surveillance-graphs/#knowledge-graphs-a-backbone-in-the-surveillance-economy http://microblogging.infodocs.eu/wp-content/uploads/2014/10/publishing_bnb_as_lod.pdf https://www.oclc.org/en/news/releases/2024/20240507-introducing-oclc-meridian.html Join the Discord: https://discord.gg/zzEpV9QEAG Transcript (plaintext): https://pastecode.io/s/qundkp0n
Transcript
Discussion (0)
I'm so glad that you're here with that, Starthe,
just because, like, I'm just, like,
always interested in your perspective on this,
having, like, lived at the library world of leaked data for so long as being, like,
because on the other end of, like, living in programmer world,
sometimes I still get the sort of, like,
both the persnickety, you know, puricide and the people that are trying to make it work
happening, but, like, very few, like,
actually, this doesn't even come close to meeting my needs
or like resemble my work style at all.
I remember it was so funny.
Like Scott Carlson helped edit or write that like link data in libraries book.
And then like two days later was like link data's dead.
And then like became a like a programmer.
I love Scott.
I think he was stuck in a deeply shitty workplace.
I agree.
It happens to us, and then we get out of them.
Hooray!
Woo!
Proud of you.
Yay.
Okay.
I'm Justin. I'm a skull call him a librarian.
My pronouns are he and they.
I'm Sadie. I work IT at a public library, and my pronouns are they then?
I'm Jay and I'm a...
No longer a music librarian.
Surprise.
Finally fucking a cataloging librarian again for the first time, seven years after finishing grass school.
And I won't say where.
And my pronouns are he.
Just post the address this time around.
If you're in the Discord, you know.
Okay.
And we have guests which like to introduce yourselves.
Sure.
I'll start.
I'm Dorothea Salo, pronouns she, her.
and I teach at the University of Wisconsin-Madison Information School.
I'm Johnny Saunders.
They and them, I'm just sort of like, I guess I do various forms of like information-based work at UCLA.
Thank you, yeah, for the belated applause.
I was waiting for that.
Thank you.
Very kind.
Welcome, welcome.
I still have my reorganized onboard, so I still have like 10 sounds.
It's just.
Oh.
It's going to stop me.
I started making Justin watch
It's Always Sunny
and it was a bad decision
because now the soundboard
has always sunny theme on it.
And it's got to be the full-length version
too. No soundboard is complete
without the one that keeps going for an hour.
Because that's just a piece of like
public domain music.
It's not even like written for the show.
I'm pretty sure.
Sweet.
I think I had just the full.
full Soviet Union and
Yeah
That's no cause button
Yeah I was like this is anime one you piece of shit
Yeah that one keeps going
So this was an episode we came up with
Because city wanted us to explain linked open data
And I think I probably know the second least
So I figured it would be
Fonest for me to start
and try and explain what linked open data is,
which is all from what I remember in grad school,
which is the last time I ever had to interact with it,
that I'm aware of,
besides the parts of linked data that are used by Google,
is it's primarily you can think about it as triples,
and everything is one item linked to another item.
So Hamlet is a character in Hamlet.
Hamlet, the book, those are two separate URIs.
And then...
It's a play.
Well, it's in book form.
Okay.
And then Shakespeare is the author of Hamlet,
and so there's an is the author of statement that each has a URI, and these three things
can chain together forever.
And that way you would have something that's both machine readable and human
readable and somehow that makes data boxes in Google work.
Or certain extremely non-human readable forms of human readable.
Right.
So once you're trying to organize it in other ways, like say, make a list of things,
suddenly it doesn't work anymore.
Yeah.
Because now you have these series of statements.
Yep.
I'm like just chin hands here waiting for all of these super smart people.
This episode is for Sadie.
Yeah.
Literally.
This is we explain linked data to SATA.
There's like the triples explanation and then immediately you fall off the cliff of ideology
and 25 years of some of the most prickly and opinionated people in the world making like claims on reality that you truly can't believe until you see them.
So it's like, you know, we got talking about technology and beliefs.
And then also, like, for a lot of people, like, a huge amount of, like, weighted time, drama or success, depending on if you work for Amazon or Google or not.
Yeah.
Like, my experience with Link Data is that I took ontology development in grad school with Dave Dubin, shouts out Dave Dubin.
And we developed, we learned RDF.
And we mainly wrote in Turtle, I think.
But we learned, like, all the other, like, triples and N3 and all that.
but I think he liked Turtle the best, if I remember.
The only one that worked.
Yeah, as a class, we collectively created an ontology.
Together, each of us had our own specific section of it that we had to create.
And mine's still, it's still on my GitHub and everything.
Like, it's still, like, theoretically is a working RDF, like, ontology.
Is this the origin of the homosaurus?
Yes, no.
But I'm also...
I'm also on the homosaurus, which is actually linked data, but I don't, none of us on the board actually interact with that part so much.
Like, we have like a software dude who does that.
But like, we all know about it to some degree.
And then I've also done some like wiki data.
Like, like I was like, I did like a wikidata training.
Like I went through one of those trainings one summer.
And that was cool.
And I wanted to write up.
submitted a proposal for a paper on like thinking of wiki data and link data like as like a cyborg
kind of thing. And but like interrogating that. And I submitted this to the code for live journal
that ended up being the one that everyone yelled at. So I'm glad it got rejected. Like literally that
issue was the one that I submitted to with like the data like bad data practices. That one. That was the
what I had submitted to you, so I'm glad I got rejected now.
Narrow miss, narrow miss.
Dorothy, weren't you the one to blow the whistle on that, or is that different?
It was like a different time.
It was like you were Becky, right?
Oh, because, well, I mean, if we blew the whistle over anything, it wasn't over late data.
It was over that privacy.
It's a thing.
You might want to let people keep it.
Yeah.
Yeah.
Yeah.
I just happened to be writing about link data for the thing I was writing about.
Right.
Yeah, so I'm very glad that my goofy little theory, like, high theory article got rejected.
So I actually never ended up using Turtle.
I think I learned it in three notation.
It was very not hands-on the way I learned about it.
And so it was never clear how it worked, except for the aspects that kind of pulled from wiki data and that explained a little bit.
But I never got an in-depth explainer for how wikidata works.
So it was very theoretical.
And my metadata teacher was like very on the theoretical side of things.
So I never got to see a lot of practical applications of a lot of the stuff we talked about in class.
So that is not how I teach metadata.
Yeah.
If you're not new at once a point.
Yeah.
Exactly.
And that's like one of the major cultural fissures is that just like, is it supposed to be something that you touch?
Or is it something that is supposed to be like a true artifact of the world and needs to be done once and never touched again?
You know, so like that that you, the division between the teaching styles, it's like reflective.
of the entire system of belief
that goes into linked open data as well.
I'm like I'm curious like
hearing people's like origin stories with LinkedIn
because like I'm like
because Dorothea you've been doing this for like
while there's like in libraries and stuff like that.
I'm curious like if what your origins are.
I mean, you know,
I got into it the same way
a lot of people did as it started
to be talked about as
potentially where library
move from Mark.
And, you know,
it's,
that's a really awkward question
when you think about it.
We,
sticking with the
homegrown,
if you will,
like Mark encoding,
which we made up from scratch
in the 1960s,
Lord bless Henriette Avram.
She was awesome.
But it doesn't map
cleanly onto any
of the dominant
data structures,
data models.
that we have today,
it's pulling teeth
to try to stuff mark into a relational database
such that you can actually do anything with it.
You can kind of do it in XML,
but XML is really squishy that way.
And I don't mean that in a bad way.
XML squishyness is actually quite useful.
If you look at, for example, EAD and coded archival description,
some of EAD is when you and I, Johnny,
would probably think up as data.
but a lot of EAD is narrative, right?
It's storytelling.
And you know what?
Databases are shit at storytelling.
You can't represent Hamlet in a database.
Link data is shit in storytelling.
One of the things that really pissed me off
about the very early days of linked data
was some of its gestures going around
and just bragging on it
as something where you could literally represent anything.
Right. If you could put it in a computer, you could put it in linked data.
And my retort to that is, as it has always been, and it's pure coincidence, but I kind of love it.
All right, express Hamlet in RDF and get back to me, okay?
You can't do it.
And I was reading through some of the stuff in the show notes for today.
and I happened on one of the Tim Bernersley pages.
Let me see if I can find that.
Ah, yes.
And Tim Bernersley on this particular page talks about a semantic web,
or sorry, a magical artificial intelligence.
He's talking about AI.
And he says this,
the concept of machine understandable documents
does not imply some magical artificial intelligence
which allows machines to comprehend human,
mumbling.
That's literally what he says.
Human mumblings.
Excuse you, Tim Berners-Lee.
Excuse you.
Language is one of the most magnificent things we have as human beings.
And you are calling it mumblings.
Excuse you very much.
Sorry.
That was my rant.
No, well, I mean, yeah, his relationship to this sort of like, you know,
the fuzziness of language.
is like one of the most fascinating parts of like the early outlooks on what link data could be because on the one hand there's sort of the romanticism of language and like the fluidity of language as being something to embrace but then almost immediately that becomes like squished out just sort of like the thing that's almost immediately excluded is the ability for people to actually express ambiguity uncertainty and so on yeah right I think last time you were on Johnny or
Or maybe this was in like just, oh, no, this is when we were watching it together.
But we talked about how like the Ted Nelson versus the Tim Berners-Lee view of like the interconnected internet and data.
Right.
I see I can find that.
Interesting dude, Ted Nelson.
I actually did get to meet him once.
I was like wiped out at the time, unfortunately.
But yeah, I will always treasure that.
he was an interesting. He is. I think still is. Interesting dude. Yeah. The Chad, Ted Nelson.
So I'm like, the story that I don't have a good, like, hold on is just like, so like what happened? And this probably relates to just like, you know, some of the stuff that we talk about all the time and like cybersecurity screaming channel and just like, say what you might have to deal with as well of just like the state of technologies that go into libraries and how just like, they're not.
actually under any of our control and we sort of do the best we can to exist on whatever
like scraps that IT wants to feed us and stuff. And so just like I imagine like that's like
the intertwined stories of like where did like why did link data not happen all the way at
libraries like sort of related to the institutional inertia as well. Yeah, that's part of it.
And you know, getting back to my point about the the question of getting off mark,
relational databases were going to work
XML wasn't going to work
and was in kind of a little bit of a decline
as we were asking ourselves
this question
so what was left
I remember a blog post by Jonathan
Raj Kang who hates LinkedIn
boy does he hate RDF
and you know he backs it up
he's not just a random hater
but he was like we
we can't we cannot move to this and I'm like
okay what's the alternative right
And there are things about RDF that are attractive ideologically, but also practically to libraries.
The idea of the open in linked open data.
We can really truly share an OCLC can't stop us.
Oops, did I say that out loud?
I mean, you know, really, the elephant in the room is OCLC, and it,
it's enclosure of
Mark and Mark
cataloging for
its own corporate, and
I am going to call them corporate. I don't care
that they're legally
and non-profit for their own corporate benefit.
So linked data
to some of us
look like a possible
way out of that. And
you know, I can't fault anybody for that.
It's definitely a goal worth pursuing.
So why
didn't it get as far
as we might have wanted it to.
Part of it is that RDF was not built,
and Johnny can speak to this more,
because he's read more of the STS and sociology literature around it than I have,
but it was not really built for practicality or computability.
I, as a complete sparkle duffer,
and sparkle, if you haven't run into it,
is the query language for linked data.
it is to link to RDF
what SQL is for relational database.
I can make a typo and a Sparkle query
and knock a server over dead.
It's not even hard.
So,
like, the brittleness
of just being able to ask a question
without killing a server,
this is not a consideration
for the early designers of the semantic
web.
and like how do you build a library infrastructure
on a foundation that is that technologically
will. In the answer is you can't. You really, really can't.
Another, I'm not going to say this is a problem actually.
I actually think it was good, but it's a situation that does not
commend itself to libraries, to librarians.
We tend to be very orderly people.
and catalogers as much as anybody and more than some.
So in the aughts, right?
In the, well, no, not the odds, in the teens, I guess,
particularly in Europe, there was just this flowering of experimentation
with how are we going to represent the things in the library universe,
like books and maps and musical scores and all, and movies and all that good stuff.
How are we going to represent this in RDS?
lots of experimentation. A lot of it was fantastic.
Europe is great.
Yeah. Yeah.
There's a lot of really good thinking, very practical thinking going into this,
but there were models, data models, RDF models, ontologies, if you will,
springing up a lot over the place.
And so if you're an average cataloger, you're looking at this and going,
well, which one do I learn and which one are we going to use?
And when is there a tool that's going to work with any of the way?
this? Yeah. And the answer is there wasn't. Now, what seems to have fallen out of that
is that Bimframe for all of its faults and it has many. It is not my favorite bibliographic
ontology. It seems to be kind of taking over the world and muscling out a lot of that
European experimentation. And that frankly makes me sad because Europe, there's several countries in Europe
that just plain kicked bib frames at
as far as modeling quality.
And I hate that they're getting plowed under, basically,
by this crappy American juggernaut.
But why, why is this happening?
Because they're finally tooling.
They're finally cataloging tools
that as much as any RDF-based tool can fail to suck.
Yeah, like I know in Alma, you can do Bibframe stuff in Alma.
Yeah, but you can look at Sinopia and you can look at Marva and you can imagine an actual person using these, right, and making them work and getting good records out of them, which we didn't have for at least a literal actual decade after Bibframe happened.
So when Tim Berners-Lee calls human language mumbling, I think it's a symptom of the contempt that so many linked data people have for human beings.
And I yell at the Semantic Weapon Libraries conference in like 2014, a decade ago about exactly that.
Stop dissing human beings. You can't do that if you actually want linked data.
But nobody listened and here we are.
Right.
Yeah, like another idea, and this was also something I think I talked with Johnny about, like, another idea for a goofy, like, high-minded, like, theory paper I had was thinking of, like, link data as this attempt to, like, do a reverse confusion of tongues, like, a pre-tower of Babel, like, divine language that ignores the actual, like, the reason that link data is cool is that it has,
the potential for everyone to have their own way of doing it,
and it all talk together and intermingle.
Instead, it's just turned into this, like, nope,
everything looks this way now and this almost like mechanized version of language,
like taking over.
Like, it doesn't care about being human readable actually.
Right.
And like, so it's like this,
this tension that was there from the origin of it,
and it's actually just like the dawn of the term linked data
as opposed to the semantic web is just like a part of this,
the same thing of like part of this.
I feel like we need to like at least nod to,
because it's like,
we talked about this at length last time I was on here,
but just like also nod to the Lindsay Poirier piece,
like I'd turn to the scruffy,
which like we both called out as being like one of this,
this is like seminal work on like understanding the culture of the semantic web.
And just like that just like points to and also just like it's there too
in Tim B.L's website of that just like the separation of link
data and linked open data from the Spandigweb was about, like, reclaiming just like stuff that
worked as opposed to stuff that like was perfect.
That just like, we're about like trying to make a bunch of separate ontology.
So it's like the initial idea of being there's one graph, like one global graph where everything
is always linked together.
And there should be one URI that represents each unique concept and only one.
And to the point we're just like, there's these sort of like absurd blog post.
And one of the things that's amazing always about just like web history is that a lot of it is just like still there.
And still up there, at least on archive.org.
But just like these blog posts that I think this is 2009.
I put this in the links as well.
But I was just like they apparently took down the comment section on it.
But just like someone that was like from like semantic web like in this era of just posting a blog post about when the first time that the New York Times had like link data in their web version of the product.
And so what they'd done is they'd made some, you know, article that was about Barack Obama and the quote unquote, you know, the racist controversy, like, you know, Barack Obama is a Muslim, whatever.
So it was an article about that controversy. And so there was an RDF claim that was like Barack Obama related to Muslim or something like that. That just like, this is just like trying to describe the contents of this piece of writing. But then people immediately were just like, this is messed up because that's now a claim.
on reality. It is like, it's not just like, someone says this. It's just this is a fact. And that
is just like something that like the RDF group had specifically designed to be doing. And so like the,
the model of the world that like people keep trying to escape from, but now need to return to,
but keep trying to escape to have to return to is that like when you make a statement in RDF,
like there's a difference between the way that like the language and the syntax and the systems
designer thought about it as being literally like, like there's some like really remarkable.
quotes in the W3C archives.
I was trying to pull up earlier.
But it's like that,
this one quote from Brian McBride, 2001,
so this would have been just like only a couple years
after the project formally launched at W3C.
That's like, RDF is not just a data model.
The RDF specs should define a semantic
so that an RDF statement on the web is interpreted
as an assertion of that statement
so that its author would be responsible in law
as if it had been published in a newspaper.
So these are like, they're supposed to be like legally binding documents in this way,
where there doesn't, there is no such thing as an author.
Someone says this, you know, that just like it when in reality,
everything it has an author.
Everything has a point of view and a perspective and just like was said by or written by somebody.
But like, you know, it took a while for even that notion to be encoded in the,
language at all as like an expressible thing period, adding the fourth item in the triplets,
like being able to say that this doesn't belong to the global graph of everything,
but in fact is my like local system of meaning. And then, but then like that just like this,
you know, you have to keep escaping that because it doesn't actually work. Because it's like
the thing that I always come back to is that just like imagine if language worked this way
where I want to use a word,
and I have to use Johnny's version of this word.
And so I have to say, like, I have to go into like,
johnny.net slash this word,
and now I'm referring to that one.
And there's no way that I can make my own copy of this word.
It's like in the way that language works of just like,
you know,
we have these sort of like parallel representations of ideas
and concepts and words and phrases that are like,
you know, they're not the same at all.
even close to the same in between person to person when or even utterance to utterance.
And yet, like, we're trying to express like a system of meaning where there is one version of each of these things.
Like, no one would do it.
Like, no one would, if I had to go to the dictionary every time and look up each person's unique word and, like, use that or else it was meaningless, then it just doesn't work.
So, like, and it's like intimately, I don't know, go, I don't want to just like trail off forever on here.
it's like intimately related to the tooling problem where like theoretically and so like one of
the authors of scos like the simple knowledge organization system like the ontology and modeling system
for like modeling relatedness and similarity it's like it's how you do controlled vocabularies
and it's actually quite functional quite useful and if i'm not wrong i think homoosares is actually
based on it that's your underlying how you're modeling this stuff yeah it's scos yeah yeah yeah
It works pretty well.
And so you'd imagine that like a tool like that where you're able to say that something is a similar match or this is exactly the same as this other thing would enable this kind of like expressive system.
And it doesn't because doing all of those queries and lookups is preposterously expensive because of just like the way that it's encoded as URIs, i.e., URLs, i.e., I need to hit a web server every time to actually retrieve this item as opposed to, yeah, there's a, and you.
any number of different web architectural models that that could take, but that's the form it took.
And so as a result, like, yeah, it's like intimately related to the tooling as well as the
implementation of the technology, like, in the same way that it is a reflection of the ideas behind it.
Yep, right on.
Yeah.
And so how are we doing, Sadie, clear as mud?
Yeah, just about, like, I think the thing that gets me about linked data and, like, I haven't gone to
library school, I have just like the most barest knowledge of cataloging and that kind of thing
is, is like, I'm a very practical hands-on person. So like, I have to dig into a system to be able
to show, like, to really see how it works. And every time I, every time I have tried to do that
to even think about open linked data, I'm like, I don't, I don't see how this is
usable. So that, yeah, like, you talking about like, there is, there needs to be
tools to be able to use it. It sounds like the heart of the problem at a lot of library technology
where, like, I keep, I keep saying this is just like there's a very small selection of vendors
that have a very large control and they just keep conglomerating together. So there's like three now.
And somehow libraries, who are the ones who are using the tools are the most powerless people
in the whole ecosystem of it, right? So a big topic at my work lately and maybe a tantam
engine here is why the fuck are we still using SIP2?
Like,
like,
yeah, can't blame me on that one.
Like, I don't know
if you're like, like familiar with SIP2, Johnny.
It's basically a protocol.
So interlibrary or integrated library systems,
ILS is the biggest software that libraries use,
like to keep track.
of all of their stuff, it's basically the protocol that passes information between like these
systems, right? So like a lot of vendors use SIP. So like like overdrive, you know, you like overdrive has
to know what you already have checked out to be able to enforce your limits. Like you can only have
five books checked out. So it uses SIP to query that information from your library system, right? It is
entirely unencrypted.
Clear text
unencrypted.
And has been its entire life.
And SIP2, which is different
from the IT SIP, which is a VoIPP protocol,
which causes no end of confusion every time
people are, like every time we have to talk to a vendor IT
to figure out how to set something up.
I just totally gave myself, if a single one of my co-workers
is listening to this, I just absolutely gave myself
the way, because I've had this conversation so many times. But yeah, it's like, and it's been in
use for so long and all of these interlibrates, it's the only one that is actually usable,
like actually, what's the word I'm looking, agnostic, system agnostic. So it's starting to be
replaced by a lot of APIs, but each API for each system is its own thing. So you have to wait for
other like, you know, oh, we could do this API. We could do, I don't know,
know if this is true. We can't do almost API, but we can't do Sierra Millennium's API.
So it's just like, in IT, it's just like, why the fuck are we still using this? And then we talk to
people like vendors and they're just like, well, what's the problem? And we're like, it's completely
clear text and requires extra tunneling to be able to actually keep our patron data over,
like, not readable over the internet. And I said all over the entire internet for anybody
And like looking at the strings, it's literally like library card number, name, address, you know, number of checkouts.
Like it's just like it's so ridiculous and people are still just like, well, I don't understand what the problem is.
Until you talk to an IPT person and you say it's in clear text, it's completely unencrypted and they go, oh, that's bad.
But no libraries have like the power to go to these freaking vendors and just be like, you have to figure something else out.
Something has to be worked out.
But it's going to end up being, you know, OCLC who does that kind of stuff or something like that.
And then, yeah.
It would be nice out, right?
And they're vendor panties.
That's all we are.
In a lot of ways, yeah.
Yeah.
Yeah, what was it, Bree said in the Scalcom Discord, ACAB includes,
Niso.
Yeah.
Yeah.
Absolutely.
So like,
I still don't think I understand entirely what linked data is,
but I do think that I like,
I can start to get to it.
If you know what I mean.
Because yeah, like it's, it's just, it's a system.
It's a system to connect data to other data in meaningful ways.
And it once had the promise to actually help,
libraries figure shit out and it has completely kind of shut the bet on that. Is that an accurate
starvation? That's completely accurate. I still have tiny little sparks of hope. I do.
Did we describe me? Did we describe why it's called the semantic web? Oh, I don't think we did.
Johnny, I'll leave you that one. It's a really simple storage. It's like being like, web happened, right?
And so web is documents with links between them.
But those links are meaningless.
They're just the relationship from one page to another.
And it's hard to imagine this in retrospect of a web without search engines or without any sort of overlay to them.
Because basically the way that everyone interacts with the web now is either through search or through some mediating discovery mechanism.
Like you don't just go on the web and then go to a URL and then just be.
like, well, I'm here now and just like, I've found the internet.
And it's like, yeah, so like, that's like the way that the web was sort of designed and like
the way that was supposed to work is it just like it would be self-organizing where the like literally,
like when you go back to like the founding obviously, it's like, we will just have people that have
lists of links on their personal websites and they will link everything together and then just
like people will find their way from these like local nodes of meaning.
And the imagination there was always that just like the web would be super easy for the average person to make a website on.
And that just like everyone would basically have one.
And that didn't work at all.
Not even close.
Not even from the very beginning.
We're just like, you know, it was the case for just like the ultra nerds that were on the internet at the very first part of it still gravitated towards sort of like mediating platforms like bulletin board systems and et cetera.
So the semantic web was supposed to be a way of encoding computer readable information into the protocols of the web and specifically into HTML documents that are, you know, that are XML, a dialect of XM.
I don't even know how to describe the relationship between HTMLics.
But like that like so that it would be possible to both annotate a given page and then also just like be able to link them together so that you'd have this sort of like, you know, coexistent between documents that people are on.
that have like, you know, human readable text and then embedded within that and embedded
between that are just sort of like, in this paragraph, I'm talking about this person. And like,
then I can sort of like say, go to that page and theoretically go and find backlinks to all
the time that that person was mentioned or something like that. And so that's like why it's called
like the semantic web is we're adding semantics to the web, which formerly was just sort of like
naked links and documents. Yep. Like the computer could understand what
that Johnny is a person because it knows what those URIs are and what they point to,
and it can then tell what the relationship between those are not in a way where it knows what a person is,
but it knows what this URI is.
And if you use this URI, then it sees other things that have that URI and knows that they're people too.
And there's a certain amount of magical thinking that like,
because language sort of works this way, that it's like entirely relational,
and metaphor-based and like, you know, the meaning of a word is only sensible in context of
surrounding meanings and contrast with similar, you know, that just like meaning would emerge.
And like, again, like, that's sort of true.
Like, language does work like that.
Just like sort of local negotiations over meaning and indigent.
But like, you need to have the people there negotiating in order for it to work.
And that never really existed.
So just like, so like there's, and it sort of like points to one of the same.
salient features that is both like, it's like, you know, eerily prescient, but also just like
another one of these like critical pieces. We're talking about just like the missing tools
is like from the very beginning, like there's this 1999 piece in Scientific American that
Tim Bernersley that was like sort of like the public announcement of like, you know, the existence
of the semantic web as a product. I remember reading that. I was at work. I remember reading it.
And so is this wonderful document and just like that like is like this very pie in the sky kind of system of, you know,
beliefs about just like what it could be. And like there's a bunch of just like really basic and obvious things that like, wow, we should really have the computers work like that.
We're just like, you know, like the the idea that I have a calendar appointment or whatever.
Why can't my computer know that like I also have a photo that was taken on that day?
So I can just like say, computer, find me the photos that were taken during this appointment on my calendar or something like that.
So like a sort of universal acid for this data where just like I can just relate, you know, totally heterogeneous systems between one another.
But the part that's like really like, you know, come to be, we all like thinking about just like AI is like, you know, this year and this last year being like that it was always going to be dependent on compute.
that it's just like there's metadata there,
but even from the very beginning,
you need what Timversley was talking about as agents,
like as about just like little bots,
little scripts or whatever,
that are running around getting all of this metadata around.
And this is like around the time when Google
and like the first algorithmic search engines
were starting to exist.
So like this idea of crawlers
ingesting this information and making sense of it
was like a relatively new one,
especially like at a mass scale like this.
and like that's but that's always been the tension where just like like say it's like talking about like what is it where do I touch it like how am I supposed to you that just like that was sort of always the intention with that just like you would have like a little computer butler thing that would just like be going out and like you could you have your own set of commands just sort of like go get this for me go fetch this for me but like yeah like again it's never really materialized just because like you know how with what infrastructure does the average
average person have constantly running bot that goes out and scrapes the web for them all the time.
And so even from, yeah, from like there are a couple of moments in this tree of the magnetic web of like,
you know, times when Google basically bought it. And that happens actually several times.
We're just like a good, this sort of domestication of this process where like now like when you
think about it's like, where does it exist, how does it exist? Pretty much the only way that
people usually interact with it is like the medical.
data, the open graph metadata, and like, well, that open graph is slightly different, but like the JSON
LD document that you'll have at the top of your website header that is just like using schema.org
terms to say that this is a website about an organization or an event or whatever.
And like as Justin was saying in the beginning, just like sometimes it makes the Google info boxes
work.
And like that's pretty much the most concrete realization that the average person has for linked data
on the average, and that's because who owns?
owns the crawler, Google owns the crawler. And so it becomes something where you make metadata
available to be crawled by Google in this very constrained, commercially focused context.
But it's not a system of expression. And just one more thing is like there's like these other
technology that like RDFA, like this dialect of RDF, which is supposed to be like the thing that goes
embedded in documents where like as I'm writing, I will tag a particular paragraph as, you know,
with some semantic web tag or something like that.
That's arguably one of the most attempts
at making a human link data interface for that.
We're just like, you could imagine
I have like a document editing software or something like that
and I can highlight a sentence
and add a tag to it or whatever.
It's like actually embedding this in documents
that people actually use.
That is actually no longer supported
by the main RDF parsing,
library, RDF, live in Python, because it's complicated to parse, but also it's just sort of like,
that's not really the important one. It's like, you know, for all these like mushy positional
document tags and stuff like that, and people don't really want to know the information in
context, they want it all split out into like, you know, something where I can do an HTTP
request and just get the headers and that's it. And so like, it's like, it's just, one of these
this mutating landscape of technology always ratchets more and more towards.
It's intended for doing the big web of open data that you're not a part of,
but you get to experience through platforms.
And a lot of platforms are, in fact, powered by linked data, at least, if not RDF,
knowledge graph, TM derivatives of that idea, where, like, it is an extremely powerful
set of ideas, but not for you. So if you, but if you are a company that exists as a giant
conglomeration of data sets that you've bought by acquiring smaller companies over time,
it is an incredibly powerful system for integrating all of that information being able to
do complex queries across them. So in that piece of surveillance. For Tim Berners-Lee,
not for thee. Exactly. And increasingly for the surveillance state and just like the
people who have this nightmarish, multi-sided market of selling your data to insurance providers
at the same time as selling it to police at the same time as selling you back a little slice of
it as well. So, like, yeah, the way that it exists now is largely in the shadows and that's
by no means passive effort. There's an active corraling and an active, you know, domestication
of this set of ideas. Yeah. And to bring it back to tooling for just a second,
some of the more pro-social, I guess I will use that word,
experiments in this space, like Wikidata, for example,
are already running up against the absolute limits of what you can do with link data
if you're not like Google.
They've already, and the technical details here completely escape me,
but Wikidotia has gotten too big for its britches.
The infrastructure literally cannot cope with it anymore,
So they're sharding it, is my understanding.
They're kind of flooding it down the middle
and figuring out how to get the two shards
to talk to one another,
which I'm sure is really exciting technically,
but wow, that's not great
for those of us who are not Google,
but are interested in this technology stack.
Did you see the cause of this issue?
It's the underlying database software,
Blazegraph that it's running on.
Amazon hired away all of the engineers,
So they're, oh, great.
Yeah.
So.
All right.
Typical.
So again, this is like, the big company is literally buying the underlying
technologies.
We're just like, you know, software needs maintenance, you know, that like, that it needs
maintenance and needs constant improvements.
And just like to be able to handle an ever growing stack of triples like wiki data, you need
to have active maintenance workers.
And like, who pays for open source work?
Like, if I'm the, if I'm a software developer and Amazon.
says, here's, you know, 250K a year to make the, do the thing you were already doing for free,
then it's like, sure, I have a family. You know, I, you know, I'd like to have like, you know,
go on vacation sometimes. And so like, yeah, it's just like, yeah, that actively, that, that
there was another moment of like, yeah, actively poaching away the talent so that like the
underlying technology can. And I will say for all that we are cultural heritage organizations
founded on the idea that culture should persist,
we're very bad in libraries and archives at admitting
that software needs maintenance, that standards need maintenance, right?
That's the SIP-2 problem in a nutshell,
though that was proprietary, actually.
So Ruth Kitchen Tillman and I wrote an article,
got published about a year ago,
about the ethics of linked data sustainability.
You can find an open access online.
And we took a pot shot, actually.
Okay, we, I took a pot shot.
This one was my.
At information scientists, okay?
Because there are too many information scientists
who are serial project and standard abandoners, right?
They can grant money to do this fancy-dancy thing,
and they get as far as it being implemented in libraries.
and then they just wander
away to do the, to write the next
grant application and do the next
fancy fancy thing, and then it won.
Totally. Right, whatever they built
it wants, because inevitably
they didn't build it right in the first place.
I'm totally thinking about OEIPMH here
since we have some, some
Sculop folks in the room. But SIP2
is another beautiful example.
Gosh, we are so
bad. And versioning stuff.
It's a really basic idea.
You got a version stuff.
You can never get it right the first time.
So, yeah, I in that article took a pot shot at serial project abandoners and said,
funders, stop funding them.
Ask what happened to their last three projects.
And if they're dead in the water, that's a black mark.
For real.
Yeah, this is a general issue in any sort of, like, you know, publicly funded tooling space.
Is that just like, like, I was allegedly on some.
review panel for some funding agency that is theoretically talking about software sustainability.
And that was a completely novel concept that just like what we want to do is we want to fund
sustainable software ecosystems.
That just like we're not trying to start a new project.
We're not trying to like, you know, fund the new feature.
But just like these are the already existing things that are happening in open source.
And let's just keep that going.
like paying for like stuff like documentation and like making the tests work and like you know years and
years of technical debt and like security audits yeah totally yeah and please yeah and so this is like
one of one of my entry points in the thinking about semantic web and thinking about just like linked open
data was just like initially thinking about because i was like living with someone who is like working in
metadata in a library at the time and there was this like increasing cry of just like the we all know
the journal system is broken. And like, there's this recurring strain of papers that are just sort of
like, let's just like make the libraries do it. You know, just like that just like we can sort of like
get libraries to host a bunch of journal like things, journal like overlays or whatever, completely
ignoring the reality of work and the reality of bureaucracy in libraries that just like,
and, and so like, you know, you wonder who I'm talking about that.
Oh, I don't have to wonder.
I didn't talk about it out.
Yeah.
And so, like, that just like, this is where, like, on the one hand, it seems like an obvious thing where just, like, of course, like, it seems like libraries in general in the abstract should be invested in just, like, you know, maintaining some their catalogs at least.
But just like, also the all the other things that just like, you know, that are being archived and cataloged and just like, you know, exist in libraries and just like making that as available as a public catalog.
Like, sure, surely they're already doing stuff like that.
So it shouldn't be that much of additional effort to have an institutional repository that acts like a journal and can link together these things.
But, as you all know.
I keep going back to tooling.
Yeah.
The tooling was shit.
The tooling for open access is and always has been shit.
Right.
Yeah.
And so it's just a matter of like, definitely.
like there is this universe of like,
where like, okay, we could get sort of some of these things aligned,
like funding priorities for maintaining sustainable software.
Okay, if we can then like get some sort of like IT consortium
to help out with like maybe, you know, quote unquote, public cloud.
So it's not the case that just like every library needs to have like an on-prem IT team.
And just like there are some of these things that could like lock into place that just could theoretically make some of this work.
but just like, that's just not the way academic work is done generally.
That's just not the way it's structured to make these sort of like long-lasting
infrastructural efforts.
Like, as you say, that these are just like grant cycle to grant cycle.
Let's just like ride to the next thing.
And even within, so like part of my role in the last six months of work is like I'm
working with actually a lovely group of people who I like and they have welcomed me in,
so I'm not trying to speak ill of them at all.
But this is a linked open data project.
And basically what I've been trying to do for the last like six months is like pay down technical debt.
We're just like there's this like really good idea of this like this way of having authorable link data schemas.
It doesn't require you to be part of the priesthood to be able to describe what exists in your reality.
But it's just like I didn't really work.
It's just sort of like that it's just like the people that are concerned with the,
modeling part about the like what you know what is this kind of thing do we put it in this category
like this like are not usually the same people who are just like going to be able to write a really
good implementation of that and so like trying to figure out how to make those collaborations
happen as well because this is another point where like I don't see this as a thing that really
could exist or come from any sort of startup like rest in peace to the solid project which I have
been trying to find for several years and I keep seeing little promising scraps of it. But this is like,
so solid was like the thing that Tim Berners-Lee was like, this will be the semantic web, like,
the thing that we're trying to like do to, so it's like, it had like crisis of conscious. Like actually
the web sort of sucks. Like, like, I think around like 2015 and 2016 and 2016 and like, you know,
starting to be just like, okay, let's try and make solid as like a way for people to do the like the more like
vernacularist dream of the semantic web, where I have my, like, this, now they're talking about
like activity pods. Like, I have my little unit of my semantic web, like, information graph.
But that quickly got bogged down in the academic cycle. No one could manage a project. Then they
spun that off into a startup. And wouldn't you know it once that happened, then it became owning your
own data was a bug, not a feature. And so now you have to, you're supposed to be pushed on to, like,
renting a cloud server for it and so on. So I think that just like this doesn't come from
startups or from any sort of like company. It also doesn't come from the scattered weights of
open source world. You can't just like ask people to do it for free. And it also doesn't come from
just like local efforts of like trying to make tools for like an individual institution. And so
just like what's left is like we need to use some sort of public funding and try and rally
rally public funding in a way that it's not designed to be allocated in order to like make these
kind of technologies and also the belief that there should be these technologies in the first place
in order to make that real and so like that's this just like this an unending knot of like who do we
who is the next little thread that we need to pull in order to make this large tapestry but then like
you you're dealing with 25 years of baggage at the same time so
it's like a lot of the people that are still in that space either have distanced themselves from it
and look back on it with this chain of mixed emotional memories, but I don't want to touch that anymore.
Or they're like in some way still true believers that just like, what do you mean?
Nothing is actually broken.
It's totally fine.
And like you just need to learn how to do it good.
So.
Awesome.
Yeah.
So, like, and so this is like one of the reasons why I'm just like, like, we were talking about this earlier today as being like that in some ways, like, talking about like serial project abanders, protocol abandoners.
And just like there needs to be like a break in a way that's like backwards compatible.
We bring the past with us or like or have some way to like carry it through with us.
But we're not beholden by all of this baggage that.
And and so I don't know.
Like I'm talking about just like what happens in the future.
I guess, I don't know if we've even gotten past the expository part of what even are we talking about yet.
But like, maybe I'm jumping the gun there.
But like, yeah, last thoughts on that idea is like that's another part.
Like the twin entry points for me into this whole line of thinking are just like thinking about just like what could be an alternative to scholarly communication and publishing.
It just like it should be possible for me to throw stuff up on the web and then have it be part of.
of this sort of blob of information without a lot of gatekeepers in the way.
The other part of it is that even long before I got interested in,
I keep coming across these various like graveyards of things that are just like,
this is a really cool idea, like a browser extension that, like, everywhere I go,
I can make sort of personal annotations and not just like bookmarks,
but just like I highlight this section and then I can relate it and share it to my friends.
like, oh, actually, that extension was for like Netscape 6.0 and like was abandoned 20 years ago.
And like no one has thought about this ever since. And just like this long string of just like dead projects that are that are exactly like this.
Because again, like thinking about like the kinds of open source projects that work and like are sustainable are usually ones that have some material tangible benefit for the people that use them day to day.
like this is a tool I have active use for or their baseline behind the scenes of the structural work that like a lot of companies that will just like sort of rely on them like the types of like this niche of technology just like what you have to have in order to use it are a a website so that rules out 99% of all people and then be like a website where you are deeply in control of the HTML that goes on that page and that
rules out 80% of the remaining 1%.
And so, like, there just, yeah,
there never was a time when it had like an actual practical use.
And this is something that just like gets called out as early as,
the earliest I've seen of people saying,
what is the point of all this was like in 2005 and 2006,
where just like there's a series of these blog posts of just like abandoning the
semantic web.
It's like, no one actually figure out why we're doing this at all.
Like, there's one interesting example of, like, music annotation,
or just, like, it's sort of like a peer-to-peer-ish music system.
And then that's it.
Like, the rest of it is totally pointless.
Like, why would I ever do this?
And the first, like, invest all this time into learning these incredibly complicated parts of it.
Because, like, one of the things that we're missing in the exposition stack is,
exposition section is, like, the sort of stack of things that the data is.
like you have the triples part which we talked about,
but then you also have like ontologies and schemas
and just like the way that these things all sort of relate to get.
And it took me a year to even figure out what these meant
and like what they look like and why they existed
and just like why is a schema different than an ontology?
Like that seems like the same sort of thing,
but like different roles in the ecosystem.
And also definitely different.
But like, just to say that just like,
Why does neither of a record constraint language?
Yeah.
Entology means that your professor goes on tangents about first order logic when you're learning it.
That's right.
Yeah.
Yeah.
And schemas are on schema.org.
Exactly.
That's how you know their schemas.
Also, was the music project you were talking about linked jazz?
I will look up this.
It's in this blog post abandoning the semantic web.
I'll see if I can find it.
Because link jazz rules.
Yeah, that's a great little site. I love it.
That was like the first I ever heard of Link Data.
I was like an undergrad still working in a music library.
Sure.
And my, and my like mentor, professor, or not professor, my mentor, like, boss was like, this is the coolest thing I've ever seen in my life.
Well, and, and, and, and, and music in particular in a library context is actually a really wonderfully subversive place for Lake Data to get a foothold.
Because Mark for music
So bad.
It's terrible.
Music cataloging, like music copyright,
is something that even seasoned professionals will not touch.
Yeah, music cataloging has its own rules.
I mean, heaven, blood, but wow,
Mark was just not designed for that, and it shows.
Oh, it shows.
It shows.
Yeah.
Back to the explaining,
part of things as well.
One of the main benefits always sold about link data is that since the web is sort of a page
or document focused sharing of information, this would allow subsets of information to be
pulled.
Like Johnny said, pulling like all the headers from an article with a request.
The thing is that like without like, I could pull 9,000, I don't know, 500 fields from a
mark record.
what do I need that for because I don't know anything about the context of it without the full document?
Plus, I'm guessing that's probably why it's so computationally heavy is that everything has to be done through servers, whereas documents can be retained locally.
And it's mostly just text files, right?
So it's sort of the same problem blockchain had where everything had to be done computationally.
And that's why it took 20 minutes to buy a donut because it had to get pushed out to like 20 ledgers.
And instead, this is like, if I want to query information, it has to go through different servers, which I think was kind of the idea of websites that heal.
I have it pulled up.
It's John Rhodes blog posts.
But when Johnny was talking about bots, I think that was the idea was websites, like link rot would happen between them.
And eventually bots would just kind of communicate server to server constantly and then just fix links.
And they would heal themselves.
and that was kind of the idea.
And that blog post ended with,
if anyone wants to write this, I'll help.
But until then.
But that's the thing is like,
it's very difficult to do that
because if you've ever worked with like government websites,
particularly like healthcare websites,
every presidential administration stuff moves entire divisions of the government.
And so they're on completely different domains.
And that's why government websites always break.
And like really important ones.
And that's also why the,
the government tends to do a lot of like dot coms now where it's just like health care.
Healthcare.com.
Okay, just go there and we'll point it wherever it ends up because trying to keep, because
I was an allied health librarian and trying to keep those pages about like the Affordable Care Act
up to date in lib guides.
I mean, thank God has a very good link checker.
But I constantly had to run that link checker because those things broke all the time.
They don't even keep their pearls or whatever it is.
they use because like one of one year in grad school I was the the gov docs library and graduate
assistant and half of my job was just like going through su doc stuff and then also like checking
the the pearls or whatever permalink system that government websites and online gov docs uses
and just finding all of the broken ones which was all of them um they don't like they don't even
maintain their permalinks yeah which is the point of
of permalinks is so that the back
that the URL itself can change.
Well, I don't know if the LC again?
Yeah, always.
Yeah, that was actually a very example
that Ruth and I wrote about in our piece
was OCLC and Pearl.org,
which was not originally OCLCs.
It was a grassroots little thing for,
okay, here's a place where you can mint
permalinks and we'll keep the database
of where they point to and everything will just work
and we'll have a happy permalink utopia.
And then with absolutely no warning,
some years after
OCLC
took over Pearl.org
and made a very loud
statement about how it was very important
and they were going to maintain it.
And definitely, it broke.
They broke it.
I don't know the details.
I think the person who had been maintaining
it left retired, who even knows. But Pearl Org just completely broke. OCLC, of course, didn't
give a fuck. And it remained broken for like several years. And now the Internet Archive eventually
took it over and they don't give a fuck. So you can't actually get any support for it. And a bunch
of innocent third parties who believed OCLC's lies and gleefully minted all kinds of.
of pearls because they thought that infrastructure was going to stick around.
Dot burn, right?
This idea that Justin, I believe, was talking about of self-healing websites.
Right, that is nonsense.
That is garbage.
The world does not work that way.
The world needs maintenance.
Yeah.
And so there's like the whole nest of ideas about it's like roads not taken in the internet with a lot of this.
Because it's like, I have the same feeling about just like,
permanent ideas and as I do about just like in general when I see like yet another platform for scholarly communication or like we're going to fix the ills of like academia by making yet another platform is that just like this is intrinsically a political one where and it puts and it's one where you are putting power in the hands of a specific organization that just like and the longevity of that is strictly social we're just like it's the same way it's like permanent.
permalinks exist as long as the organization exists.
And so, like, I have, in general, sort of, like, more faith than average that
archive.org will continue to exist in the next year, although they're sort of, like,
damaging that reputation lately to sort of, like, like, just like, you know, anyway, we won't
go there just being sort of like, I think that they have good longevity plans for their
archive of the web. Okay. But, and I also, in general, think that,
like the DOI system is probably not going anywhere.
That's largely because it's like,
you know,
one of the mechanisms for extracting billions of dollars
from public funding every year.
Then just like,
so there's like social reasons why these things persist.
But it's like there's the major thing that was not taken.
Like why the like as you're saying is like the web doesn't work in such a way
where it would be possible to do self-healing websites or self-healing links
is because it's designed to be a client to server.
you go to a place and get something that someone else controls entirely and like you're not
actually supposed to have any agency in this in this world and like there's good reasons for that
don't get me wrong but just like this is like one of the true things about linked open data is that
just like it needs to be peer to peer like the thing that like the way that it could conceivably
work is as a peer to peer system where that's where there it's possible to do efficient querying
and caching between a bunch of different peers.
So it's like designed to be distributing labor in this way
instead of every time someone updates a link or makes a new record,
everyone has to go and hit this one server to get this one, you know,
URI that represents this core concept or whatever.
That just like, and so like as long as that doesn't exist,
there's this duality of this beautiful idea of,
of basing semantic web and linked data on URIs.
Is that just like, okay,
and elegant simplicity of this idea
that the identifier is actually a location,
that location and identity are the same thing.
And when I go to that location,
I'm supposed to get something useful from it,
and then that allows me to go to the next thing.
That's like a wonderful, wonderful idea.
But in reality, it doesn't work at all
because identity and location are not the same thing.
that like I didn't and you know for one one reason is identities change and like that like that like
and so like there's this like you know it's classic thing that everyone always reference on the web is that it's like cool
uri's don't change that's another tim bernersley classic it's like actually all uri's change all the time
and like and if for that to be something where just like you you're have a polemic trying to force something to behave in a way that it doesn't
rather than adapting to the reality of that thing than just like yes you buy yourself in an infinite failure and so like one of the there's this
raise and you hang please i just want to jump in yeah we do the raise hand thing to like you can keep going
and then when you're done sadie will say something but also just like interrupt i actually would start
trying to make some notes to organize this thought because this is a long idea so like i but like yeah like
Oh, I just, just, I, I've been thinking a lot about the, the purpose of a system is what it does.
Completely.
Right.
Not what it thinks, it's not what it was designed to do, because we all know how design goes awry.
But yeah, the purpose of a system is what it does.
Right on.
And I just think about, I don't remember where I saw that.
I love systems theory.
Yeah, right.
If anyone has ever maintained a website or any sort of web technology, we're just like, if the intention of this thing is to be liberating and freeing, it certainly doesn't feel that way, that just like, that like, you know, what it would take to actually maintain a URL for forever.
Like, if that's the way the web is supposed to be, that's the purpose of the web is to, like, put these documents on the web.
Like, it didn't, it doesn't do that.
So, yeah, exactly.
That just like, the purpose of the system is different.
we're just like, and like, again, like, thinking about just like all the ways that the technical development has been stunted by the, you know, commercialization of the web that just like precluded a lot of these things from existing is like, it's not an accident.
And so like, so one of like one of the ways, the ways that link data is working and mass right now in a pretty invisible way is the Fedaverse.
And this is like what we were talking about the last time I was on here. So I won't believe for the point.
But it's just like that that's built on link data, at least in the absolute.
And this is there's sort of fascinating, like,
realization of that where just like, like, for example,
like Massadon, like the largest implementation of that does not actually use
link data as its internal data model.
That's all like a Postgres database that then it sort of just like synthesizes
JSONLD out of.
And like, as there's benefits and tradeoffs that we're just like, as a result,
it sort of doesn't do all of the link data parts of what activity
the pub was supposed to do.
But there's the other,
like one other major
alternative to this is pluroma and a coma,
like the fork of plumber that is based on a graph database.
And that can do a bunch of really interesting things,
but it also is like always crashing all the time
and like sort of hard to,
because it's like, you know,
thinking about just like,
because social networks are networks.
It's like easily modeled by a graph.
and so doing something as simple as just like there's this notion of like this containers and these ordered collections and stuff like that in activity pub and one ends like this I've you know obviously lots of feelings about this this particular spec but like one of them is I have a this notion of who I'm addressing my message to and I should be able to address it to whoever I want to that I have I can address it to this one controlled ontology term public and that's just like I'm sending it to the
the world. But also it should be possible for me to have collections of people and like I can
address it to this collection of people. And so it's like in that way I have a graph and then that
graph is modeled like and all the relationships are modeled within activity publishers being like,
I'm allowed to send it to these people and I want to send it to this subset of them in this particular
case. And so you can do stuff like that in the coma and pluroma. Like the UI for it is a little
less than what could be desired, but that's not something you can do in Macedon.
where each one of those addressing features has to be carefully architected from, like, as a database query.
So, like, there's a, this, this tension of just like, okay, we try and do it the semantic web way.
It has the beautiful possibilities, but it's, like, really hard to implement.
And one of the things that's hardest, that was an extremely, like, big reach and was really only, like, done and made work by just the sheer hegemony of Massadon as, like, you know, the thing that if it does something, everyone else has to adapt around it,
is like implementing editing.
Like, you know, thinking about just like, I have a post, I want to edit that post.
That means I have to propagate that new version out to everybody else.
And so, like, thinking about just like what it would take to have like these sort of self-healing websites
or just like the ability for the web to adapt to change is like you need to have that expectation
that just like everything that I know about, I should be able to receive changes and be able
to propagate those among the people.
in the same way that just like, that's how rumors and horizontal information transfer works generally is that just like, oh, I heard that this new thing happened.
And I tell my friends about it and just like, you know, maybe and doing so in a way that's like actually safe and that is resistant to counterfeiting is a remarkably hard thing to retrofit into a system.
And so like that's like.
Like how do we make the web actually rhizomatic?
Yeah.
And yeah.
And this is like, again, it goes back to the, like, the dawn of the web browser and what it is as a technology is like this idea of the read, write web.
We're just like, it should be just as easy to write as it is to read on the web and like, you know, obviously controlled by permissions in some way.
But like this, that experiment died basically when Netscape won in the early browser wars.
But then it persisted in the form of wikis and this notion of soft security.
where just like, how do we make that work?
Is we make it so that doing this kind of like, you know,
we allow stuff to happen,
but then make it so it can't damage the system in some profound way.
We're just like, if someone does something they're not supposed to do,
you know,
something goes and vandalizes a Wikipedia page or whatever,
then like, sure,
the next person that goes and loads that page might see a bunch of vandalism.
And that's bad.
But like, it's not,
it doesn't ruin the page.
It doesn't break it forever and completely.
Like, it's possible for me to revert the old version of it and so on and so forth.
So, like, and that's a radically different political vision than the most of the web stack that we're familiar with.
So just like that it's like that ultimately for this technology to work, it needs to be constructed on a different set of political primitives that include other people existing and being a.
able to do stuff in a way that just like is very uncomfortable for like most of the people who
design web technology nowadays to think of that as being I'm going to design a platform that
I administer for other people. And so instead like thinking about it as being stuff that is
designed so you get out of the way. Like the most successful technology that would enable like
semantic web stuff is that no longer requires the developer to be there and allows people to
actually have autonomy on computers.
But again, there's no percentage in that.
It's in fact, anti-profitable.
And so that's a very difficult thing to organize that kind of,
not only a technical vision, but social vision as well.
Yeah.
I always end up just like back in Wiki world.
It's just like some of the most lovely parts of the web as far as I'm concerned.
I'm so curious I can find this like link data music project.
that also is a major.
Oh, so like, I don't know.
I feel like the thing,
and I think about just like,
survivable web technology,
I would just like return to like pirate networks
just being sort of like the things that can exist
and do survive on the web.
We're just like,
what are the longest lived things on the internet?
And it's like the W3C website,
just sort of they win by the hell.
But like, other than that,
like pirate networks.
Like that is the other major answer that just like some of those like MP3s that were like
released on Cazaar or something like that are still floating around.
And that just like you compare that to the extreme adversarial conditions by which
the entire global intellectual property regime is bearing down.
And still it happens.
Like why does that work?
And like, you know, to some degree it's a technological question.
But it's also a social question of just being like, because people.
take it as their responsibility.
That it's like, I see myself
as an active participant in this system.
And so when my pirate site gets
shut down, I go to the next one
and put everything back up.
And so, yeah, that's...
Anyway, you've got to love the pirates.
Although there's a huge amount of power
and political problems in those circles as well.
Librarians need to read that, like,
how to form an affinity group zine
and, like, go from there, see what happens.
Totally.
I mean, I was, it's likely to work as anything, really.
Yeah.
I think one of the practical reasons also linked open data is always difficult is that kind of all files are local files in the same way that like all history is local history.
Because it's always local to somewhere.
Anytime I try and think of, you know, particularly like when you mentioned EADs, there used to be a lot of stuff in the EAD literature about like, why does no one share their local authority files?
Like, you know, like John Fuck Smith donated to the library and we have his name authority file in like our decks, but he doesn't have like a library of Congress name authority because he wasn't famous enough.
Right.
So everyone's got there.
Right, right.
He just had a bunch of money.
Right.
And so so we have all of these people who are local in our local name authority files and they never ever get shared and they always stay siloed.
and there is almost no solution to it because the amount of labor it would take to
disambiguate the names, people who have common names, and is this the same person?
And then who's going to do it too?
Because they barely have enough staff in special collections anyway.
So who cares if like every local donor is going to get their own name authority file?
And I think another thing is like Johnny mentioned having like the way Johnny used
a word would have to go to a URI.
It's kind of
when we were talking about taxonomy
last week, and that episode doesn't come
out yet, but
sort of like the issues with
like taxonomy for animals and everything,
you need
like smaller sets of words,
not bigger ones in order to actually
make it useful for humans. So
when I was working with the bird working group,
it was like everyone
keeps using too many different
words. We need to just all we need
to solve this problem is like a shortlist.
And then we can use that as like user-submitted metadata and tags.
And that's really all we need is just to agree between us humans.
We're going to use the word paleo-anthropology instead of archaeoanthology.
And like that's all we had to do is like kind of get people to agree to that.
There's not really like a technical solution because, you know,
the entire birdworking group of paleo-ornatologists is like if they were all
on a boat and it sank, there wouldn't be a bird working group.
Right.
So it's not too difficult to like, it's, it's not an impossible, like, political solution.
And I always keep kind of thinking about is like, we have all these documents.
Yeah.
And they're, it's, it would be nice to break things up into data and share it as linked data.
But as an organization, you don't really need to, depending on the size and scale.
And so that's why like so many libraries.
have their own. When I think of how a library is organized, it is ultimately, you know,
the reason why Mark is like that is its access points. And it's kind of what we always default
back to is what's the access point for this. And I don't really care semantically, like,
how the data works, as long as like, this is a subject area, this is the title, this is the author,
how do I get to the information, like the quickest possible steps? And then that leads to,
I feel like that's where always the disconnect has been for me with linked open data of like when is this going to help my users in my library?
It's like, well, you can get stuff out into the.
And it's easy for me as a Skalkon person because it's like, I'm the only person who's like, no, I want this out everywhere in the world.
I want everyone to look at this.
But everything else in the library is categorically organized around how do people in here find the stuff that we're looking for?
and I'm the only one who has to flip that and try and say,
how do we get what's in here out to the world with no barriers and restrictions and logins?
Yeah, like, last year, maybe a couple years ago,
I was part of the like PCC ad hoc group that put out the final decision about like,
hey, maybe don't put gender in name authority files.
Because there was the initial one and then a lot of people got mad at that one.
and I was part of the ad hoc,
hey, let's revisit this.
Thank you for your service.
And one of the final sticking points,
like,
because most of us were on board with,
like,
maybe let's just don't.
Like,
it's too complicated to think of any ways
to,
like,
put consistent language,
ways to do this ethically
that's not going to hurt,
like,
trans people was mainly who we were thinking of.
But,
like,
there's other reasons why you might put gender.
I'm like some of the reasons we're like, but with like Asian names, sometimes it's hard to disambiguate.
And I'm like, that's racist.
Like, that's just lazy and racist.
But the big one, like the final kind of sticking point where we were like, maybe there's a point here, but ultimately, no, we don't care, was, well, in a linked data environment, people could query books about XYZ written by trans authors.
Or, for example, like you can do a Sparkle with Wikidata where you can be like,
pull all of the towns that currently have female mares or whatever is usually the example
that they use when they tell you what Sparkle can do with WikiData.
Like, what if you could do that with the library catalog?
Whoa.
And we had to be like, yeah, but no discovery layers, like, Primo doesn't even do that yet.
like no discovery layer right now that's like popularly used by academic or public libraries has that capability.
They might have linked data in the records and they might have APIs exposed if you have a developer who can do neat shit.
But ultimately that's not how those searches work right now.
So maybe it is available in the future, but for right now we don't care and that's not the purpose of name authority files.
Right.
like the question of just like what is it for like what is the point of it you know why would I do it if there's no use is like also ultimately really just like beliefs about like how things are supposed to be designed we're just like is the goal of it to be able to get a exhaustive and true answer of all of the cities that have a woman as a mayor you know is that that the point of what we should be doing with semantic web is too like
make the correct information exist in a unified vocabulary.
And like, I don't, like, spoiler alert, I don't think so.
That just like, well, because there's no such thing as like the authoritative and complete true archive of all knowledge.
But it's also just like thinking about it's like, well, that's like an impressive technical feat that I could put on like some sort of like tech specs document that just like my query engine can produce 10 billion triples in like one.
one second. But like, yeah, like, what's the point of that? And just like thinking about it,
like, in the context of language, we're just like, it's also related to the notion of like
ontology curation about just like, how do we come to like know the terms that are the one term
to use is like, that's only an important question. If the goal of it is to like make everything
be totally uniform and also that that act of searching is like relatively precious and hard
to do and like I can only do one of these or something like that that just like this is not an iterative
process of exploration and ultimate and also that just like you're not able to so like I think about
just like the way that this works with language we're just like it doesn't ever work with language like
say new phenomenon exists in the world like we need to get the council of languages together to agree on
the one word for that and then everyone from then on has to agree to only use that word to refer to
that phenomenon. It's like that never how it has happened and it never will be. And just like,
instead just like this sort of local interpretation of what's happening in my immediate reality
and just like you try and use this word and is this effective with it when I say it in this way.
Oh, what I'm talking about is this and oh, I know it is this. And just like the sort of negotiation
of what things mean and in what context and to who. And like being able to have your
personal vocabulary and ontology where just like as your history of your browsing,
It's like, I've come to know that these terms are the same terms or just like when I am in this neighborhood of semantic space, I use this word instead of this word.
And like, then you can imagine like the collective power of something like that.
We're just like, okay, all of my friends know these words as being the same.
And so just like in general, I can ask around and say who I'm looking for this.
Does anyone know how I would refer to that?
and just like being able to make sense
of just as like an iterative
and a social and an interactive process,
not one that's done
once as if it were like
a database query with a very
controlled database schema that's
like known in advance.
And so like it just changes our expectations
for what technology should look like.
That just like I don't go to the vast
impersonal search engine that indexes the whole
web. But instead I have to
actively cultivate sort of like
like a set of nodes and friends and like relationships and like prior acquaintances with this kind of thing
and then expect it to take a little bit of time to find stuff, you know, that just like that and like,
that sounds sort of counter-duty.
I'm not saying it and create exclusion or create inefficiency, but like that just like the goal of the system isn't to produce maximally true,
maximally numerous and maximally cleanly organized data all the time. And like, it's just like,
it's, I can imagine, like, thinking about just like, what happens, you know, just like, like,
like, just like, why doesn't everybody share their, their, like, local. I actually am not
familiar with this term, like, authority file. I assume that's like, you know, like a local, like,
reference. Like subject headings or like if you publish a book, like your name, how that's in the
Library of Congress. It's an authority file. Gotcha. Yeah. It's also just like one of the things like
who gets to do that, you know, that like, the same problem with just like, you know, libraries and
museums being the sites of just like pillaged cultural artifacts. It's just sort of like not your
job and not your role to be the purveyor of this information like it's about this person.
And it becomes your role because like there you have no means of doing so themselves. Like there's
There's just like these systems aren't ones that can be touched by the average person.
Like I can't like deposit a book myself in Library of Congress.
I need some intermediary force.
And so like that's just like that like that's like another part just like why doesn't it happen and why doesn't it work is because like on the other end is like,
who is it for?
And should we even do that at all?
Because like same thing of just like what happens when you need to change your dead name in in the all the bibliometric records.
Like how does that happen?
I freak all my software friends out when I talk about eventually needing to write the anti-performance manifesto that just like sort of like that just like like in someone who is like a friend on the Fedover's and it's like we talk all the time just sort of like horrified just like what do you mean software should be delightful to run and like just like yeah yeah that's not exactly what I'm referring to though just being sort of like that like the we need to get page load time down.
to two milliseconds or life will be lost and meaningless as we know it as just like a set of
ideological commitments rather than making stuff be usable by people is the thing I'm talking about.
Oh my God, I'm opening this. I'm opening this. You have an authority file. You have an official
U.R. I do. I have a U.R. I'm part of the problem.
we shall have many URIs
yeah I helped I helped
write a book in like 2018
during my first job
hell yeah
like one of the interesting things that I think
that Blue Sky and AD Protocol has done
is like make it so that like domains are sort of meaningful
as identity we're just like
that's cool yeah that just like I
have a domain and like control over a domain
and that gives me a source of identity.
Even if it doesn't give me control over the computers
that host the thing that, you know, whatever.
Like, we talk about that a different time,
but just being like, it's very interesting that just like that has resurged
and it actually genuinely useful.
And I think one of the best ideas to come out of it
is like actually using those, like, you know, URIs and URLs has just literally,
this can be my name.
Yeah.
Because it's language independent,
Cuban language independent.
and things like dead naming, which we have to deal with in the authority file environment because it is predicated on names, it's just a URI.
You don't have to do that. You can attach any name you want to it.
So there's definitely...
That's the good thing about URIs is it allows the flexibility for trans names or any other kind of name that might change.
Absolutely.
That's the good part about them.
Yeah.
Love URIs.
That's one thing that I want to keep.
one thing.
At all this nonsense, URI's identifiers was genuinely a clever and useful idea.
Yeah, like it was a big deal when the homosaurus moved from having the terms be the URIs to having alpha numeric URI so that we could change terms as language use changed.
Yeah.
Love it.
Yeah.
Does they ever tell you don't put semantic information into URI?
everyone does it
it's so stoop
we're queer we don't listen
fuck you
DOI
dot org slash
DOI dot org
slash my journal
volume one
and it's like
yeah
if you never
meet Jeff Builder
who's a wonder
works a cross rep
wonderful human being
he has many many
many rants about publishers
coming to cross ref
wanting to change a DOI prefect's
because they merge with another publisher
or a journal change publishers or whatever the hell.
And he's like, no, that's not the point.
They have a suffix generator now.
It's just, it's literally just a spreadsheet that generates a suffix.
But they're like, use this, idiots.
Yes, please.
Is that like half your job, Justin, is just being like, hey.
No, I don't meant, I mean, I don't meant DOIs manually usually.
But the thing that always bugged me was, OJS used to put,
semantic information into the automated strings that it would create.
So it would create, it would say like V and then the volume number and then the article number.
And I was like, don't do that.
Just put random numbers.
Just put random numbers.
Just general, just random number generator.
That's all you need to do.
But they didn't do it until the latest update.
So now they do it properly.
Or you could do what every single baby.
database administrator knows to do and just count.
I don't know how to count.
I don't know how to count. I'm gay, as we've learned from the homosaurus.
I do have an Excel sheet of like manuscripts and database bases, and it's just 0-000-0-0-0-0.
What?
Yeah.
Me one.
What happens when you go beyond the capacity for how many zeros you picked?
What then?
Add another zero at the end.
It doesn't matter.
Okay.
and like it's like it's like all of these things like have their times and applications and usages and everything like that we're just like just do all of them and make them all point to you know the same thing different things etc that just like like like I think like you know sequential numbering identify works you know there are times when you don't want to use it like we're just like you have like potentially personally identifying information where you don't want someone to be able to enumerate over all.
possible things and find all the stuff on the server.
And spoiler alert is like university IT terrible job at this.
And frequently we'll just have like very sensitive documents hanging out that can be publicly
enumerated on their on their public web.
But like, you know,
so it's like super useful when designing some systems in the same way that just like having
totally anonymous strings is super useful in like PID space,
but then want to have semantic URIs and some other content.
They just do all of these things.
And the other one is like the content hashing where just like the identifier is like intrinsically based on the content of the thing.
So if I have the thing, I know how it would be called everywhere in the world, like has its own benefits and tradeoffs.
That's like that is one of those dangerous ideological territories where just like you get pirates and also cryptocurrency zealots in the same room.
And it's just sort of like it becomes this maelstrom of just like.
the same idea, meaning completely different things to different people.
But, like, yeah, we're not going to solve the identification problem.
But basically just like, you know, it's the rigidity in being only able to use one thing that, like, is the problem to me.
Yeah.
Now, I don't have LIBRA of Congress name authority file.
Although someone from Florida with my name born same year as me does, which is confusing.
there's so many people with my
I went to high school with someone with my name
it's very confusing it doesn't seem like it should be that common
it makes you harder to docs though
so that's like passive self-defense
it is really good I can I have
successfully scrubbed my information
off the web several times it's not hard
or one time I couldn't do it so I just redirected it to another dude
with my name and so I just changed my information
to I changed my address to his
and like I feel like
this would be something that just like, like,
like, North here probably have, like,
stronger thoughts about it's like the notion
of privacy and, like,
when it comes to, like, linked open data and stuff like that,
we're just like this,
the fact that just like, we don't want
all the world's information to be publicly.
We don't want, like, the Justin
authority record that includes your,
you know, social security number and,
you know, phone number and everything like that.
Like, like, like, limits to openness, you know,
that just, like, needs to be some amount of, like,
fungability and, yeah.
I'll actually give you a real-world example.
If you go and look at my wikidata page,
and you can just go to wikidata.org and look up Dorothea Sallo.
I'm the only one as far as I know that has ever existed,
so what you find will be me.
I might, although I identify, like, I'm se as female,
that is how I identify, that's who I am.
My wikidata page actually says, no gender, no gender recorded.
And the reason for that is that Wikipedia,
with which I have a very vexed relationship
runs through wikidata
every now and again to do things
like make lists of people
who maybe should have Wikipedia entries
but don't.
And of course they do this for
minorities and underrepresented populations.
And of course, Wikipedia is well known
for having a huge gender problem,
gender disparity coverage problem.
So I get sucked up
into those lists and nobody asked me,
I do not actually want a Wikipedia page, thank you very much.
And I would rather not be.
So I change my gender that is listed on Wikidana.
I do not actually change my gender.
That's dumb.
Like, like, anti-bought action.
Like, you just like to read a digital channel.
It seemed to be the only option for saying, no.
Don't make me a Wikipedia entry.
Trans for the privacy of it.
Pretty much.
Gender opsec.
My gender is fuck off.
Get this gender working for me.
Yeah, no, that's why I also like orchid IDs too, because it's a very nice system that you get to control and you get to change.
You get to write your name how you want it.
You can write it in multiple scripts.
And it's just an orchid.
And it just will point to whatever you tell it.
So you can change it whenever.
whenever you want.
And that's what I really like about it is,
you know,
that would be something that would be very nice to use for like local archiving and stuff like that.
But the reason why is like,
no one's going to bother to do that.
But like I couldn't even get like faculty to do it,
even when this would save them time in the long run or it would make.
Right.
Or it would solve headaches.
Like if they don't,
if they have a double barrel first name and people keep putting their second first name as
their last name,
it would solve them this problem.
but they don't go sign up for an orchid.
I was actually when I was cited in the ethics and name authority files,
book, one of the chapters,
and then they asked how I wanted to be cited.
I was like, I would like my orchid because they were citing one of my articles
or my thesis or something that had my dead name on it.
And I was like, I want you to do it this way and I want you to have my orchid in there
so that it's collocated, like properly links back to like all of my stuff.
right and I think it was Brie actually then went on to write an article and talk about
like how I asked to be cited in that book as like using orchids and URIs and link data as a way
to help trans people who maybe have published under dead names yeah and if they don't want
to go back and change like ask for it to be changed which I don't but this way I can have
people cite me and just use my first initial,
then it point back to my current stuff and everything
I've done with my current name while also still being like,
but I'm also the person that wrote that.
Yeah.
It's not that hard.
Yeah, especially if you like use initials,
because I use my initial a lot, because I do have a very common name.
So I think, but I used to write my full middle name and I don't do that anymore.
So it's nice to be able to be like, okay, I published my thesis with my full name,
but now I only like using my middle initial.
And now I'm at an institution where I'm the only one of me, so I don't even have a number after my name.
I was very excited when I got my email assigned to me because there is now someone else at my university with my name.
So there is like a zero one now.
And I'm like, ha, finally got there first.
I used to get detention because some dude had my name.
Are you serious?
I used to get his detention.
Yeah, they used to put out a roll with the names.
At the beginning of the period, teachers had to check them.
And if you were on the list, you had to go to the cafeteria.
So I kept getting called into the cafeteria because they wouldn't disambiguate my name.
Did you?
I had that happen to me too.
I had my birth last name, which is I changed my last name when I got married.
My birth last name is Johnson.
So there's like, not only are there 70 billion like S. Johnson's out there.
But like I have a cousin who has like almost the same exact name as like we were born.
or in almost the same exact, like, person practically, right?
We have the same name, the same first name, same last name.
Neither of us use our middle name, right?
Yeah, so I got told I was supposed to go to detention a couple of times in high school
because there was another person with my name.
It's common.
But like that's a, you know, like free bad kid and social currency, you know, just like,
hell yeah, I'm going to detention, baby.
Like, that's like, right.
And then you don't even have to do it.
So you get the best of both worlds.
Well, I used that. What they said to me was, well, if they don't put your middle initial, it's not you. And I use that excuse for the next four years, even though that dude was a senior when I was a freshman.
Said, no middle initial, it's not me.
Can't make me do it.
That's like just social engineering, you know, just in the real world. You know, just people just intuitively do it.
There is no difference between social engineering and con artistry.
Hell yeah.
Yeah.
I will die on that.
Hill. Yeah, a good friend of mine is having a crisis of like direction in life. And I'm like,
okay, so your strengths. You are super good at like infiltrating unfriendly organizations and
groups of people and like taking on roles and shit. And did you know that that is a job?
And like, and so like trying to like, yeah, turn this person.
Early a job. Like it's like, and a lot of the people that do it sort of accident.
find themselves, you know, like, like, you know, seeing it the first, I was like,
holy shit, you can do that. And then just like suddenly becoming really good at it.
Anyway.
I feel like the alternate of that fork is improv comedian.
No.
Fuck.
Their true, their true destiny is they just become podcasters.
Impro people are good at doing podcasts.
Like all my favorite podcasts, I've learned, like, the people did improv.
I have no idea what I'm doing here.
Yeah.
That's like something.
We did improv that one episode.
What you did like improv games or like what?
What are you talking about?
We had we'd seriously wrong on.
We did skits and those were improv.
Oh, yeah.
I dipped.
I was bad at it.
We were very bad at it, but they are very good at editing.
They're so good at editing, my God.
When I finally listened to that episode, I was like, oh, wow, they made something.
that it is.
But yeah,
the only thing that we didn't mention that I wanted
to maybe mention
is kind of what we talked about last time was
whoever controls the nodes
of a graph can control
the graph. And so I
was also thinking about that as a security problem with
linked open data is, you know,
when we were talking about like all of the
privatization happening, if someone buy
a certain node of the graph,
then the same problem
Sadie was saying with everyone having their own API
is like if you're controlling this graph,
even though it's open and you control
the right permissions, then
like, I don't know, assume that's a problem
that's going on because OCLC has
meridian now and I assume
that that only exists because it will make money.
If you control the spice, you control the universe.
Yeah.
Is that a
animal
This is a very
cranky and just like
Desirous animals
It's like my turn
Like I'm
I haven't heard about this
This meridian thing was the first time
I heard about this today
Is this just like a
It says May
2024th
Is it like I assume it's
Is it that new?
I hadn't known about it until today either
For when it's worse
Oh,
CLC just loves to do shit
Our metadata
Librarian is currently work
like on on a at my job is on like a committee for i think what is what is the organization the
program for cooperative cataloging and they're working on a task group for like uri's in mark
implementation so i guess like they're going to have separate types of like handle based permalinks
or something i don't know that are going to be in mark but they were also talking about how they
had like a demonstration of Meridian.
And I don't, I think it's just the link data they've made out of WorldCat.
So they're, they're using an entry for Octavia Butler as the demo data.
And I'm like, that's like an interesting, interesting like person and body of work to
evoke in your like corporate platform.
Like that's just like.
Oh, yeah.
The don't build this machine.
Yeah.
The torment nexus.
Thank you.
Don't create the torment nexus.
Wouldn't it be terrible if we created the torment nexus?
It creates the torment nexus anyways.
So here's a gift.
And this is totally off the cuff just because, again, I only heard about this today.
I think it is clear to OCLC that their World Cat monopoly is not long for this world.
One way or another.
whether it's a customer revolt or we finally find a way to do this with linked data without getting sued out of existence, that's not going to last.
So how can OCLC come up with a linked data store that they can fence around, limit to their customers, the same way that they've done with WorldCat?
That's what I think Meridian is.
probably
I mean as
as you're saying
like they're doing it
because it makes money somehow
and like
I think that's a pretty good bet
I mean and it's like continuous
with the way that the rest of
like Lincoln Open Data has worked
where just like that's like what WikiData is
to some degree is that it's like
basically a captive labor pool
like and so it's like
who funds WikiData is largely
Google and so like
Google bought
freebase, like the predecessor
to it. You know,
they did their attempts
at cleaning it up and everything like that, and
then basically, like, shunted that into
Wikidata, and
they profit
from it immensely by being
clean, corporate-friendly,
like, there's no, like,
swearing on Wikidata, you know,
and
a way of concentrating
a bunch of labor, so that then they can
mine it and make derivative profits from
it. And like, we're just like, the people that work on wikidata are like genuinely true believers in like the beneficence of cataloging the world's data.
They're like not corporate stooges. They're like, view themselves as being like, we're just trying to do the same mission as Wikipedia, which is just like, yeah, make a global information store, but not really evaluating the like, why would Google want us to do this?
you know, and like, and so just like that, that sort of pure production as captive labor model is one of those biggest sort of like, you know, red pilling moments for like information people.
Is that just like, what if it's actually bad to have like these sort of like crowdsourced information platforms that just like, so when we were watching, when we were watching, lo and behold, like one of the like examples of, like,
just like the beauty of the internet.
And so it's like, again, like, every time I think about this is like, this is a movie that
was released in 2016, which is not that long ago.
But yet, and yet it feels like the completely different universe where just like, this is
like one of the promising things about it where you had this like chemical reaction,
crowdsourced thing.
We're just like, the wisdom of the crowds, lots of people playing this game about like protein
folding or whatever was able to do something that, you know, the best scientists in the
world could do. And it's just like, cool, but were any of those people on the paper that got
published from that and from all of that work? And like, we're just like, if it's just a thing where
you farm out other people's labor in time, or just like, in this case, like, farm out all of
the cataloging labor that like happens in libraries into sort of curating this like collection
of information in the same way that I don't know the politics of world the cat. I assume it's the
similar kind of way. We're just like, everyone is required to use this, but we don't actually
have much control over it, kind of thing. And just like, yeah, that like that is a massive extraction
vector sort of hiding in plain sight under the guise of pro-social technologies. Yeah, and this is
probably more of the same, which is to make that data than usable and useful to AI products,
I would assume. Particularly, it's interesting that they mentioned like incorporating
Orchid and ROR, which are like Skalkom specific things, really.
Especially ROR is like a weird one to throw in there because that's like research
organizations, right, to make sure that those are disambiguated because journals are
really, really bad at disambiguating like the biology department of this university because
departments change all the time and also people abbreviate them.
And, you know, so there's no, there's no like one identity and that causes all kinds of
problems, even just like getting the university right half the time.
It's like, it's wrong.
So ROR is kind of like orchid for organizations.
And so that's a very specific thing.
And I find that very strange.
Like, do they want like regular like cataloging librarians like fix the Skollcom metadata
problems that are out there?
They do want to oyster.
Yeah, that like clarivating fix.
Scoop that up back in the day.
What's that?
Oh, it was a union search engine for institutional and sometimes disciplinary repositories is what it was.
It's basically, there were always problems with it, but the problems go back to OAIPMH being complete garbage,
such that you couldn't, for one of the things it does not allow you to say is, is there full tax associated with this item?
And so one of the reasons
Oyster became completely useless is that it was
choked with metadata only records,
which really disappointed end users
because they couldn't click on it and get to the thing.
Right.
And that's definitely why I auto embed
SciHub links in all of my writing
because it's just like,
what uses it to someone else for me to cite something
if they can't actually see it?
I wonder how they scrape the full text information
now when stuff gets pulled from OAI
IPMH because it still does.
Because OAIPMH is how we push out to core,
but it definitely does know if we've got full text.
I have to think they implemented a check,
which is fascinating because they would have had to
implement such a check for pretty much every single repository
and repository design in existence.
Like you're literally looking for a link that says PDF or something.
Yeah.
Wow.
Oh, because Herbert Vandis,
is complete crack at building protocols and things that will be useful at last.
All right, I said the name.
This is obscure beef.
Oh, I, you know, Herbert van de Stauffel, when I say serial project abandoner, he is the paradigm example.
He totally did that with OAIPMH.
He totally did it with Memento.
There are probably six other projects of this.
I could also.
Right.
Remember memento?
Yeah.
And I'm just like,
P funders, stop giving this guy money.
It never turns out well.
We got more obscure beef than a Wagyu farm.
Heck yeah.
Look at me like that.
I look at you however I want to.
All right.
I was very proud of that.
It's good.
Well done.
Thank you.
you. Okay. I think we should wrap up.
Yeah. Sadie, did we accomplish the mission?
Yes, I've got a sleepy bitch disease.
Did we clarify what the hell's going on or still cloudy?
I think I've got a pretty good gist, actually.
And you know what? Knowing the beef actually helps. It does.
Good.
That's like...
And you know, I do teach this stuff, Sadie. You know, my email address.
He can totally ask me questions.
That's true.
Yeah.
That's true.
And like one of the things I have come to love in this world,
you know, the few things that you can love in it.
It's just like every time you get close to something,
like you just like realize that it's all just people.
And just like all these things that are these immutable features of the world,
one day you might just come face to face with like, oh, that was you?
and then just be able to be just sort of like that just like yeah all of a sudden it makes sense
where it's like I get why it is that way that just like you know you knowing the beef and knowing
the people is the way to know the thing yep it all makes sense now
oh glad to hear it thanks y'all as always love being on the on the podcast yeah oh thank you so much
for coming on. Yeah, thanks.
And I'm glad we got to do this.
Yep. Yes. Good to see you yet again.
Let's find
time to watch a movie sometime soon. It's been a while.
Yes. I need to do more movies
in the Discord, which I was about to plug, because Dorothy, you've also
been answering questions in the Discord. It's very helpful.
Yes. And we appreciate it.
It's just us chip posting and you being helpful.
Yeah.
Well, I mean, you know, that's the worst way you're using.
usually is everybody else is being helpful and I've shit-pissing.
So even the score.
Good night.
