librarypunk - 135 - The Once and Future Linked Open Data feat. Dorothea and Jonny

Episode Date: August 28, 2024

The gang talks about linked open data. What did we learn in library school? What’s the future? Where does it fall off a cliff?  Media mentioned https://www.hillelwayne.com/post/graph-types/ The Eth...ics of Sustaining Linked Data Infrastructure https://scholarsphere.psu.edu/resources/430f30bf-e029-483d-b1c8-d7e9bb430a8e  Aaron Swartz unfinished book https://web.archive.org/web/20220512132144/https://www.morganclaypool.com/doi/abs/10.2200/S00481ED1V01Y201302WBE005  Tim Berners Lee https://www.w3.org/DesignIssues/Overview.html https://www.w3.org/DesignIssues/RDFnot.html https://neuromatch.social/@jonny/111938123937338008 https://neuromatch.social/@jonny/111939286501257910  https://neuromatch.social/@jonny/111731459659839112  https://jon-e.net/surveillance-graphs/#knowledge-graphs-a-backbone-in-the-surveillance-economy  http://microblogging.infodocs.eu/wp-content/uploads/2014/10/publishing_bnb_as_lod.pdf  https://www.oclc.org/en/news/releases/2024/20240507-introducing-oclc-meridian.html Join the Discord: https://discord.gg/zzEpV9QEAG Transcript (plaintext): https://pastecode.io/s/qundkp0n

Transcript
Discussion (0)
Starting point is 00:00:00 I'm so glad that you're here with that, Starthe, just because, like, I'm just, like, always interested in your perspective on this, having, like, lived at the library world of leaked data for so long as being, like, because on the other end of, like, living in programmer world, sometimes I still get the sort of, like, both the persnickety, you know, puricide and the people that are trying to make it work happening, but, like, very few, like,
Starting point is 00:00:25 actually, this doesn't even come close to meeting my needs or like resemble my work style at all. I remember it was so funny. Like Scott Carlson helped edit or write that like link data in libraries book. And then like two days later was like link data's dead. And then like became a like a programmer. I love Scott. I think he was stuck in a deeply shitty workplace.
Starting point is 00:00:54 I agree. It happens to us, and then we get out of them. Hooray! Woo! Proud of you. Yay. Okay. I'm Justin. I'm a skull call him a librarian.
Starting point is 00:01:37 My pronouns are he and they. I'm Sadie. I work IT at a public library, and my pronouns are they then? I'm Jay and I'm a... No longer a music librarian. Surprise. Finally fucking a cataloging librarian again for the first time, seven years after finishing grass school. And I won't say where. And my pronouns are he.
Starting point is 00:02:09 Just post the address this time around. If you're in the Discord, you know. Okay. And we have guests which like to introduce yourselves. Sure. I'll start. I'm Dorothea Salo, pronouns she, her. and I teach at the University of Wisconsin-Madison Information School.
Starting point is 00:02:26 I'm Johnny Saunders. They and them, I'm just sort of like, I guess I do various forms of like information-based work at UCLA. Thank you, yeah, for the belated applause. I was waiting for that. Thank you. Very kind. Welcome, welcome. I still have my reorganized onboard, so I still have like 10 sounds.
Starting point is 00:02:49 It's just. Oh. It's going to stop me. I started making Justin watch It's Always Sunny and it was a bad decision because now the soundboard has always sunny theme on it.
Starting point is 00:03:11 And it's got to be the full-length version too. No soundboard is complete without the one that keeps going for an hour. Because that's just a piece of like public domain music. It's not even like written for the show. I'm pretty sure. Sweet.
Starting point is 00:03:27 I think I had just the full. full Soviet Union and Yeah That's no cause button Yeah I was like this is anime one you piece of shit Yeah that one keeps going So this was an episode we came up with Because city wanted us to explain linked open data
Starting point is 00:03:52 And I think I probably know the second least So I figured it would be Fonest for me to start and try and explain what linked open data is, which is all from what I remember in grad school, which is the last time I ever had to interact with it, that I'm aware of, besides the parts of linked data that are used by Google,
Starting point is 00:04:17 is it's primarily you can think about it as triples, and everything is one item linked to another item. So Hamlet is a character in Hamlet. Hamlet, the book, those are two separate URIs. And then... It's a play. Well, it's in book form. Okay.
Starting point is 00:04:44 And then Shakespeare is the author of Hamlet, and so there's an is the author of statement that each has a URI, and these three things can chain together forever. And that way you would have something that's both machine readable and human readable and somehow that makes data boxes in Google work. Or certain extremely non-human readable forms of human readable. Right. So once you're trying to organize it in other ways, like say, make a list of things,
Starting point is 00:05:18 suddenly it doesn't work anymore. Yeah. Because now you have these series of statements. Yep. I'm like just chin hands here waiting for all of these super smart people. This episode is for Sadie. Yeah. Literally.
Starting point is 00:05:32 This is we explain linked data to SATA. There's like the triples explanation and then immediately you fall off the cliff of ideology and 25 years of some of the most prickly and opinionated people in the world making like claims on reality that you truly can't believe until you see them. So it's like, you know, we got talking about technology and beliefs. And then also, like, for a lot of people, like, a huge amount of, like, weighted time, drama or success, depending on if you work for Amazon or Google or not. Yeah. Like, my experience with Link Data is that I took ontology development in grad school with Dave Dubin, shouts out Dave Dubin. And we developed, we learned RDF.
Starting point is 00:06:19 And we mainly wrote in Turtle, I think. But we learned, like, all the other, like, triples and N3 and all that. but I think he liked Turtle the best, if I remember. The only one that worked. Yeah, as a class, we collectively created an ontology. Together, each of us had our own specific section of it that we had to create. And mine's still, it's still on my GitHub and everything. Like, it's still, like, theoretically is a working RDF, like, ontology.
Starting point is 00:06:51 Is this the origin of the homosaurus? Yes, no. But I'm also... I'm also on the homosaurus, which is actually linked data, but I don't, none of us on the board actually interact with that part so much. Like, we have like a software dude who does that. But like, we all know about it to some degree. And then I've also done some like wiki data. Like, like I was like, I did like a wikidata training.
Starting point is 00:07:19 Like I went through one of those trainings one summer. And that was cool. And I wanted to write up. submitted a proposal for a paper on like thinking of wiki data and link data like as like a cyborg kind of thing. And but like interrogating that. And I submitted this to the code for live journal that ended up being the one that everyone yelled at. So I'm glad it got rejected. Like literally that issue was the one that I submitted to with like the data like bad data practices. That one. That was the what I had submitted to you, so I'm glad I got rejected now.
Starting point is 00:08:02 Narrow miss, narrow miss. Dorothy, weren't you the one to blow the whistle on that, or is that different? It was like a different time. It was like you were Becky, right? Oh, because, well, I mean, if we blew the whistle over anything, it wasn't over late data. It was over that privacy. It's a thing. You might want to let people keep it.
Starting point is 00:08:26 Yeah. Yeah. Yeah. I just happened to be writing about link data for the thing I was writing about. Right. Yeah, so I'm very glad that my goofy little theory, like, high theory article got rejected. So I actually never ended up using Turtle. I think I learned it in three notation.
Starting point is 00:08:51 It was very not hands-on the way I learned about it. And so it was never clear how it worked, except for the aspects that kind of pulled from wiki data and that explained a little bit. But I never got an in-depth explainer for how wikidata works. So it was very theoretical. And my metadata teacher was like very on the theoretical side of things. So I never got to see a lot of practical applications of a lot of the stuff we talked about in class. So that is not how I teach metadata. Yeah.
Starting point is 00:09:29 If you're not new at once a point. Yeah. Exactly. And that's like one of the major cultural fissures is that just like, is it supposed to be something that you touch? Or is it something that is supposed to be like a true artifact of the world and needs to be done once and never touched again? You know, so like that that you, the division between the teaching styles, it's like reflective. of the entire system of belief that goes into linked open data as well.
Starting point is 00:10:00 I'm like I'm curious like hearing people's like origin stories with LinkedIn because like I'm like because Dorothea you've been doing this for like while there's like in libraries and stuff like that. I'm curious like if what your origins are. I mean, you know, I got into it the same way
Starting point is 00:10:18 a lot of people did as it started to be talked about as potentially where library move from Mark. And, you know, it's, that's a really awkward question when you think about it.
Starting point is 00:10:32 We, sticking with the homegrown, if you will, like Mark encoding, which we made up from scratch in the 1960s, Lord bless Henriette Avram.
Starting point is 00:10:45 She was awesome. But it doesn't map cleanly onto any of the dominant data structures, data models. that we have today, it's pulling teeth
Starting point is 00:10:58 to try to stuff mark into a relational database such that you can actually do anything with it. You can kind of do it in XML, but XML is really squishy that way. And I don't mean that in a bad way. XML squishyness is actually quite useful. If you look at, for example, EAD and coded archival description, some of EAD is when you and I, Johnny,
Starting point is 00:11:21 would probably think up as data. but a lot of EAD is narrative, right? It's storytelling. And you know what? Databases are shit at storytelling. You can't represent Hamlet in a database. Link data is shit in storytelling. One of the things that really pissed me off
Starting point is 00:11:40 about the very early days of linked data was some of its gestures going around and just bragging on it as something where you could literally represent anything. Right. If you could put it in a computer, you could put it in linked data. And my retort to that is, as it has always been, and it's pure coincidence, but I kind of love it. All right, express Hamlet in RDF and get back to me, okay? You can't do it.
Starting point is 00:12:13 And I was reading through some of the stuff in the show notes for today. and I happened on one of the Tim Bernersley pages. Let me see if I can find that. Ah, yes. And Tim Bernersley on this particular page talks about a semantic web, or sorry, a magical artificial intelligence. He's talking about AI. And he says this,
Starting point is 00:12:39 the concept of machine understandable documents does not imply some magical artificial intelligence which allows machines to comprehend human, mumbling. That's literally what he says. Human mumblings. Excuse you, Tim Berners-Lee. Excuse you.
Starting point is 00:12:59 Language is one of the most magnificent things we have as human beings. And you are calling it mumblings. Excuse you very much. Sorry. That was my rant. No, well, I mean, yeah, his relationship to this sort of like, you know, the fuzziness of language. is like one of the most fascinating parts of like the early outlooks on what link data could be because on the one hand there's sort of the romanticism of language and like the fluidity of language as being something to embrace but then almost immediately that becomes like squished out just sort of like the thing that's almost immediately excluded is the ability for people to actually express ambiguity uncertainty and so on yeah right I think last time you were on Johnny or
Starting point is 00:13:48 Or maybe this was in like just, oh, no, this is when we were watching it together. But we talked about how like the Ted Nelson versus the Tim Berners-Lee view of like the interconnected internet and data. Right. I see I can find that. Interesting dude, Ted Nelson. I actually did get to meet him once. I was like wiped out at the time, unfortunately. But yeah, I will always treasure that.
Starting point is 00:14:17 he was an interesting. He is. I think still is. Interesting dude. Yeah. The Chad, Ted Nelson. So I'm like, the story that I don't have a good, like, hold on is just like, so like what happened? And this probably relates to just like, you know, some of the stuff that we talk about all the time and like cybersecurity screaming channel and just like, say what you might have to deal with as well of just like the state of technologies that go into libraries and how just like, they're not. actually under any of our control and we sort of do the best we can to exist on whatever like scraps that IT wants to feed us and stuff. And so just like I imagine like that's like the intertwined stories of like where did like why did link data not happen all the way at libraries like sort of related to the institutional inertia as well. Yeah, that's part of it. And you know, getting back to my point about the the question of getting off mark, relational databases were going to work
Starting point is 00:15:18 XML wasn't going to work and was in kind of a little bit of a decline as we were asking ourselves this question so what was left I remember a blog post by Jonathan Raj Kang who hates LinkedIn boy does he hate RDF
Starting point is 00:15:34 and you know he backs it up he's not just a random hater but he was like we we can't we cannot move to this and I'm like okay what's the alternative right And there are things about RDF that are attractive ideologically, but also practically to libraries. The idea of the open in linked open data. We can really truly share an OCLC can't stop us.
Starting point is 00:16:05 Oops, did I say that out loud? I mean, you know, really, the elephant in the room is OCLC, and it, it's enclosure of Mark and Mark cataloging for its own corporate, and I am going to call them corporate. I don't care that they're legally
Starting point is 00:16:26 and non-profit for their own corporate benefit. So linked data to some of us look like a possible way out of that. And you know, I can't fault anybody for that. It's definitely a goal worth pursuing. So why
Starting point is 00:16:42 didn't it get as far as we might have wanted it to. Part of it is that RDF was not built, and Johnny can speak to this more, because he's read more of the STS and sociology literature around it than I have, but it was not really built for practicality or computability. I, as a complete sparkle duffer, and sparkle, if you haven't run into it,
Starting point is 00:17:10 is the query language for linked data. it is to link to RDF what SQL is for relational database. I can make a typo and a Sparkle query and knock a server over dead. It's not even hard. So, like, the brittleness
Starting point is 00:17:31 of just being able to ask a question without killing a server, this is not a consideration for the early designers of the semantic web. and like how do you build a library infrastructure on a foundation that is that technologically will. In the answer is you can't. You really, really can't.
Starting point is 00:17:54 Another, I'm not going to say this is a problem actually. I actually think it was good, but it's a situation that does not commend itself to libraries, to librarians. We tend to be very orderly people. and catalogers as much as anybody and more than some. So in the aughts, right? In the, well, no, not the odds, in the teens, I guess, particularly in Europe, there was just this flowering of experimentation
Starting point is 00:18:25 with how are we going to represent the things in the library universe, like books and maps and musical scores and all, and movies and all that good stuff. How are we going to represent this in RDS? lots of experimentation. A lot of it was fantastic. Europe is great. Yeah. Yeah. There's a lot of really good thinking, very practical thinking going into this, but there were models, data models, RDF models, ontologies, if you will,
Starting point is 00:18:58 springing up a lot over the place. And so if you're an average cataloger, you're looking at this and going, well, which one do I learn and which one are we going to use? And when is there a tool that's going to work with any of the way? this? Yeah. And the answer is there wasn't. Now, what seems to have fallen out of that is that Bimframe for all of its faults and it has many. It is not my favorite bibliographic ontology. It seems to be kind of taking over the world and muscling out a lot of that European experimentation. And that frankly makes me sad because Europe, there's several countries in Europe
Starting point is 00:19:39 that just plain kicked bib frames at as far as modeling quality. And I hate that they're getting plowed under, basically, by this crappy American juggernaut. But why, why is this happening? Because they're finally tooling. They're finally cataloging tools that as much as any RDF-based tool can fail to suck.
Starting point is 00:20:07 Yeah, like I know in Alma, you can do Bibframe stuff in Alma. Yeah, but you can look at Sinopia and you can look at Marva and you can imagine an actual person using these, right, and making them work and getting good records out of them, which we didn't have for at least a literal actual decade after Bibframe happened. So when Tim Berners-Lee calls human language mumbling, I think it's a symptom of the contempt that so many linked data people have for human beings. And I yell at the Semantic Weapon Libraries conference in like 2014, a decade ago about exactly that. Stop dissing human beings. You can't do that if you actually want linked data. But nobody listened and here we are. Right. Yeah, like another idea, and this was also something I think I talked with Johnny about, like, another idea for a goofy, like, high-minded, like, theory paper I had was thinking of, like, link data as this attempt to, like, do a reverse confusion of tongues, like, a pre-tower of Babel, like, divine language that ignores the actual, like, the reason that link data is cool is that it has,
Starting point is 00:21:31 the potential for everyone to have their own way of doing it, and it all talk together and intermingle. Instead, it's just turned into this, like, nope, everything looks this way now and this almost like mechanized version of language, like taking over. Like, it doesn't care about being human readable actually. Right. And like, so it's like this,
Starting point is 00:21:53 this tension that was there from the origin of it, and it's actually just like the dawn of the term linked data as opposed to the semantic web is just like a part of this, the same thing of like part of this. I feel like we need to like at least nod to, because it's like, we talked about this at length last time I was on here, but just like also nod to the Lindsay Poirier piece,
Starting point is 00:22:13 like I'd turn to the scruffy, which like we both called out as being like one of this, this is like seminal work on like understanding the culture of the semantic web. And just like that just like points to and also just like it's there too in Tim B.L's website of that just like the separation of link data and linked open data from the Spandigweb was about, like, reclaiming just like stuff that worked as opposed to stuff that like was perfect. That just like, we're about like trying to make a bunch of separate ontology.
Starting point is 00:22:41 So it's like the initial idea of being there's one graph, like one global graph where everything is always linked together. And there should be one URI that represents each unique concept and only one. And to the point we're just like, there's these sort of like absurd blog post. And one of the things that's amazing always about just like web history is that a lot of it is just like still there. And still up there, at least on archive.org. But just like these blog posts that I think this is 2009. I put this in the links as well.
Starting point is 00:23:13 But I was just like they apparently took down the comment section on it. But just like someone that was like from like semantic web like in this era of just posting a blog post about when the first time that the New York Times had like link data in their web version of the product. And so what they'd done is they'd made some, you know, article that was about Barack Obama and the quote unquote, you know, the racist controversy, like, you know, Barack Obama is a Muslim, whatever. So it was an article about that controversy. And so there was an RDF claim that was like Barack Obama related to Muslim or something like that. That just like, this is just like trying to describe the contents of this piece of writing. But then people immediately were just like, this is messed up because that's now a claim. on reality. It is like, it's not just like, someone says this. It's just this is a fact. And that is just like something that like the RDF group had specifically designed to be doing. And so like the, the model of the world that like people keep trying to escape from, but now need to return to, but keep trying to escape to have to return to is that like when you make a statement in RDF,
Starting point is 00:24:20 like there's a difference between the way that like the language and the syntax and the systems designer thought about it as being literally like, like there's some like really remarkable. quotes in the W3C archives. I was trying to pull up earlier. But it's like that, this one quote from Brian McBride, 2001, so this would have been just like only a couple years after the project formally launched at W3C.
Starting point is 00:24:45 That's like, RDF is not just a data model. The RDF specs should define a semantic so that an RDF statement on the web is interpreted as an assertion of that statement so that its author would be responsible in law as if it had been published in a newspaper. So these are like, they're supposed to be like legally binding documents in this way, where there doesn't, there is no such thing as an author.
Starting point is 00:25:08 Someone says this, you know, that just like it when in reality, everything it has an author. Everything has a point of view and a perspective and just like was said by or written by somebody. But like, you know, it took a while for even that notion to be encoded in the, language at all as like an expressible thing period, adding the fourth item in the triplets, like being able to say that this doesn't belong to the global graph of everything, but in fact is my like local system of meaning. And then, but then like that just like this, you know, you have to keep escaping that because it doesn't actually work. Because it's like
Starting point is 00:25:47 the thing that I always come back to is that just like imagine if language worked this way where I want to use a word, and I have to use Johnny's version of this word. And so I have to say, like, I have to go into like, johnny.net slash this word, and now I'm referring to that one. And there's no way that I can make my own copy of this word. It's like in the way that language works of just like,
Starting point is 00:26:13 you know, we have these sort of like parallel representations of ideas and concepts and words and phrases that are like, you know, they're not the same at all. even close to the same in between person to person when or even utterance to utterance. And yet, like, we're trying to express like a system of meaning where there is one version of each of these things. Like, no one would do it. Like, no one would, if I had to go to the dictionary every time and look up each person's unique word and, like, use that or else it was meaningless, then it just doesn't work.
Starting point is 00:26:46 So, like, and it's like intimately, I don't know, go, I don't want to just like trail off forever on here. it's like intimately related to the tooling problem where like theoretically and so like one of the authors of scos like the simple knowledge organization system like the ontology and modeling system for like modeling relatedness and similarity it's like it's how you do controlled vocabularies and it's actually quite functional quite useful and if i'm not wrong i think homoosares is actually based on it that's your underlying how you're modeling this stuff yeah it's scos yeah yeah yeah It works pretty well. And so you'd imagine that like a tool like that where you're able to say that something is a similar match or this is exactly the same as this other thing would enable this kind of like expressive system.
Starting point is 00:27:34 And it doesn't because doing all of those queries and lookups is preposterously expensive because of just like the way that it's encoded as URIs, i.e., URLs, i.e., I need to hit a web server every time to actually retrieve this item as opposed to, yeah, there's a, and you. any number of different web architectural models that that could take, but that's the form it took. And so as a result, like, yeah, it's like intimately related to the tooling as well as the implementation of the technology, like, in the same way that it is a reflection of the ideas behind it. Yep, right on. Yeah. And so how are we doing, Sadie, clear as mud? Yeah, just about, like, I think the thing that gets me about linked data and, like, I haven't gone to
Starting point is 00:28:21 library school, I have just like the most barest knowledge of cataloging and that kind of thing is, is like, I'm a very practical hands-on person. So like, I have to dig into a system to be able to show, like, to really see how it works. And every time I, every time I have tried to do that to even think about open linked data, I'm like, I don't, I don't see how this is usable. So that, yeah, like, you talking about like, there is, there needs to be tools to be able to use it. It sounds like the heart of the problem at a lot of library technology where, like, I keep, I keep saying this is just like there's a very small selection of vendors that have a very large control and they just keep conglomerating together. So there's like three now.
Starting point is 00:29:11 And somehow libraries, who are the ones who are using the tools are the most powerless people in the whole ecosystem of it, right? So a big topic at my work lately and maybe a tantam engine here is why the fuck are we still using SIP2? Like, like, yeah, can't blame me on that one. Like, I don't know if you're like, like familiar with SIP2, Johnny.
Starting point is 00:29:40 It's basically a protocol. So interlibrary or integrated library systems, ILS is the biggest software that libraries use, like to keep track. of all of their stuff, it's basically the protocol that passes information between like these systems, right? So like a lot of vendors use SIP. So like like overdrive, you know, you like overdrive has to know what you already have checked out to be able to enforce your limits. Like you can only have five books checked out. So it uses SIP to query that information from your library system, right? It is
Starting point is 00:30:20 entirely unencrypted. Clear text unencrypted. And has been its entire life. And SIP2, which is different from the IT SIP, which is a VoIPP protocol, which causes no end of confusion every time people are, like every time we have to talk to a vendor IT
Starting point is 00:30:41 to figure out how to set something up. I just totally gave myself, if a single one of my co-workers is listening to this, I just absolutely gave myself the way, because I've had this conversation so many times. But yeah, it's like, and it's been in use for so long and all of these interlibrates, it's the only one that is actually usable, like actually, what's the word I'm looking, agnostic, system agnostic. So it's starting to be replaced by a lot of APIs, but each API for each system is its own thing. So you have to wait for other like, you know, oh, we could do this API. We could do, I don't know,
Starting point is 00:31:19 know if this is true. We can't do almost API, but we can't do Sierra Millennium's API. So it's just like, in IT, it's just like, why the fuck are we still using this? And then we talk to people like vendors and they're just like, well, what's the problem? And we're like, it's completely clear text and requires extra tunneling to be able to actually keep our patron data over, like, not readable over the internet. And I said all over the entire internet for anybody And like looking at the strings, it's literally like library card number, name, address, you know, number of checkouts. Like it's just like it's so ridiculous and people are still just like, well, I don't understand what the problem is. Until you talk to an IPT person and you say it's in clear text, it's completely unencrypted and they go, oh, that's bad.
Starting point is 00:32:13 But no libraries have like the power to go to these freaking vendors and just be like, you have to figure something else out. Something has to be worked out. But it's going to end up being, you know, OCLC who does that kind of stuff or something like that. And then, yeah. It would be nice out, right? And they're vendor panties. That's all we are. In a lot of ways, yeah.
Starting point is 00:32:38 Yeah. Yeah, what was it, Bree said in the Scalcom Discord, ACAB includes, Niso. Yeah. Yeah. Absolutely. So like, I still don't think I understand entirely what linked data is,
Starting point is 00:32:55 but I do think that I like, I can start to get to it. If you know what I mean. Because yeah, like it's, it's just, it's a system. It's a system to connect data to other data in meaningful ways. And it once had the promise to actually help, libraries figure shit out and it has completely kind of shut the bet on that. Is that an accurate starvation? That's completely accurate. I still have tiny little sparks of hope. I do.
Starting point is 00:33:30 Did we describe me? Did we describe why it's called the semantic web? Oh, I don't think we did. Johnny, I'll leave you that one. It's a really simple storage. It's like being like, web happened, right? And so web is documents with links between them. But those links are meaningless. They're just the relationship from one page to another. And it's hard to imagine this in retrospect of a web without search engines or without any sort of overlay to them. Because basically the way that everyone interacts with the web now is either through search or through some mediating discovery mechanism. Like you don't just go on the web and then go to a URL and then just be.
Starting point is 00:34:12 like, well, I'm here now and just like, I've found the internet. And it's like, yeah, so like, that's like the way that the web was sort of designed and like the way that was supposed to work is it just like it would be self-organizing where the like literally, like when you go back to like the founding obviously, it's like, we will just have people that have lists of links on their personal websites and they will link everything together and then just like people will find their way from these like local nodes of meaning. And the imagination there was always that just like the web would be super easy for the average person to make a website on. And that just like everyone would basically have one.
Starting point is 00:34:50 And that didn't work at all. Not even close. Not even from the very beginning. We're just like, you know, it was the case for just like the ultra nerds that were on the internet at the very first part of it still gravitated towards sort of like mediating platforms like bulletin board systems and et cetera. So the semantic web was supposed to be a way of encoding computer readable information into the protocols of the web and specifically into HTML documents that are, you know, that are XML, a dialect of XM. I don't even know how to describe the relationship between HTMLics. But like that like so that it would be possible to both annotate a given page and then also just like be able to link them together so that you'd have this sort of like, you know, coexistent between documents that people are on. that have like, you know, human readable text and then embedded within that and embedded
Starting point is 00:35:42 between that are just sort of like, in this paragraph, I'm talking about this person. And like, then I can sort of like say, go to that page and theoretically go and find backlinks to all the time that that person was mentioned or something like that. And so that's like why it's called like the semantic web is we're adding semantics to the web, which formerly was just sort of like naked links and documents. Yep. Like the computer could understand what that Johnny is a person because it knows what those URIs are and what they point to, and it can then tell what the relationship between those are not in a way where it knows what a person is, but it knows what this URI is.
Starting point is 00:36:23 And if you use this URI, then it sees other things that have that URI and knows that they're people too. And there's a certain amount of magical thinking that like, because language sort of works this way, that it's like entirely relational, and metaphor-based and like, you know, the meaning of a word is only sensible in context of surrounding meanings and contrast with similar, you know, that just like meaning would emerge. And like, again, like, that's sort of true. Like, language does work like that. Just like sort of local negotiations over meaning and indigent.
Starting point is 00:36:56 But like, you need to have the people there negotiating in order for it to work. And that never really existed. So just like, so like there's, and it sort of like points to one of the same. salient features that is both like, it's like, you know, eerily prescient, but also just like another one of these like critical pieces. We're talking about just like the missing tools is like from the very beginning, like there's this 1999 piece in Scientific American that Tim Bernersley that was like sort of like the public announcement of like, you know, the existence of the semantic web as a product. I remember reading that. I was at work. I remember reading it.
Starting point is 00:37:33 And so is this wonderful document and just like that like is like this very pie in the sky kind of system of, you know, beliefs about just like what it could be. And like there's a bunch of just like really basic and obvious things that like, wow, we should really have the computers work like that. We're just like, you know, like the the idea that I have a calendar appointment or whatever. Why can't my computer know that like I also have a photo that was taken on that day? So I can just like say, computer, find me the photos that were taken during this appointment on my calendar or something like that. So like a sort of universal acid for this data where just like I can just relate, you know, totally heterogeneous systems between one another. But the part that's like really like, you know, come to be, we all like thinking about just like AI is like, you know, this year and this last year being like that it was always going to be dependent on compute. that it's just like there's metadata there,
Starting point is 00:38:34 but even from the very beginning, you need what Timversley was talking about as agents, like as about just like little bots, little scripts or whatever, that are running around getting all of this metadata around. And this is like around the time when Google and like the first algorithmic search engines were starting to exist.
Starting point is 00:38:52 So like this idea of crawlers ingesting this information and making sense of it was like a relatively new one, especially like at a mass scale like this. and like that's but that's always been the tension where just like like say it's like talking about like what is it where do I touch it like how am I supposed to you that just like that was sort of always the intention with that just like you would have like a little computer butler thing that would just like be going out and like you could you have your own set of commands just sort of like go get this for me go fetch this for me but like yeah like again it's never really materialized just because like you know how with what infrastructure does the average average person have constantly running bot that goes out and scrapes the web for them all the time. And so even from, yeah, from like there are a couple of moments in this tree of the magnetic web of like, you know, times when Google basically bought it. And that happens actually several times.
Starting point is 00:39:48 We're just like a good, this sort of domestication of this process where like now like when you think about it's like, where does it exist, how does it exist? Pretty much the only way that people usually interact with it is like the medical. data, the open graph metadata, and like, well, that open graph is slightly different, but like the JSON LD document that you'll have at the top of your website header that is just like using schema.org terms to say that this is a website about an organization or an event or whatever. And like as Justin was saying in the beginning, just like sometimes it makes the Google info boxes work.
Starting point is 00:40:23 And like that's pretty much the most concrete realization that the average person has for linked data on the average, and that's because who owns? owns the crawler, Google owns the crawler. And so it becomes something where you make metadata available to be crawled by Google in this very constrained, commercially focused context. But it's not a system of expression. And just one more thing is like there's like these other technology that like RDFA, like this dialect of RDF, which is supposed to be like the thing that goes embedded in documents where like as I'm writing, I will tag a particular paragraph as, you know, with some semantic web tag or something like that.
Starting point is 00:41:04 That's arguably one of the most attempts at making a human link data interface for that. We're just like, you could imagine I have like a document editing software or something like that and I can highlight a sentence and add a tag to it or whatever. It's like actually embedding this in documents that people actually use.
Starting point is 00:41:24 That is actually no longer supported by the main RDF parsing, library, RDF, live in Python, because it's complicated to parse, but also it's just sort of like, that's not really the important one. It's like, you know, for all these like mushy positional document tags and stuff like that, and people don't really want to know the information in context, they want it all split out into like, you know, something where I can do an HTTP request and just get the headers and that's it. And so like, it's like, it's just, one of these this mutating landscape of technology always ratchets more and more towards.
Starting point is 00:42:00 It's intended for doing the big web of open data that you're not a part of, but you get to experience through platforms. And a lot of platforms are, in fact, powered by linked data, at least, if not RDF, knowledge graph, TM derivatives of that idea, where, like, it is an extremely powerful set of ideas, but not for you. So if you, but if you are a company that exists as a giant conglomeration of data sets that you've bought by acquiring smaller companies over time, it is an incredibly powerful system for integrating all of that information being able to do complex queries across them. So in that piece of surveillance. For Tim Berners-Lee,
Starting point is 00:42:45 not for thee. Exactly. And increasingly for the surveillance state and just like the people who have this nightmarish, multi-sided market of selling your data to insurance providers at the same time as selling it to police at the same time as selling you back a little slice of it as well. So, like, yeah, the way that it exists now is largely in the shadows and that's by no means passive effort. There's an active corraling and an active, you know, domestication of this set of ideas. Yeah. And to bring it back to tooling for just a second, some of the more pro-social, I guess I will use that word, experiments in this space, like Wikidata, for example,
Starting point is 00:43:31 are already running up against the absolute limits of what you can do with link data if you're not like Google. They've already, and the technical details here completely escape me, but Wikidotia has gotten too big for its britches. The infrastructure literally cannot cope with it anymore, So they're sharding it, is my understanding. They're kind of flooding it down the middle and figuring out how to get the two shards
Starting point is 00:43:59 to talk to one another, which I'm sure is really exciting technically, but wow, that's not great for those of us who are not Google, but are interested in this technology stack. Did you see the cause of this issue? It's the underlying database software, Blazegraph that it's running on.
Starting point is 00:44:20 Amazon hired away all of the engineers, So they're, oh, great. Yeah. So. All right. Typical. So again, this is like, the big company is literally buying the underlying technologies.
Starting point is 00:44:32 We're just like, you know, software needs maintenance, you know, that like, that it needs maintenance and needs constant improvements. And just like to be able to handle an ever growing stack of triples like wiki data, you need to have active maintenance workers. And like, who pays for open source work? Like, if I'm the, if I'm a software developer and Amazon. says, here's, you know, 250K a year to make the, do the thing you were already doing for free, then it's like, sure, I have a family. You know, I, you know, I'd like to have like, you know,
Starting point is 00:45:03 go on vacation sometimes. And so like, yeah, it's just like, yeah, that actively, that, that there was another moment of like, yeah, actively poaching away the talent so that like the underlying technology can. And I will say for all that we are cultural heritage organizations founded on the idea that culture should persist, we're very bad in libraries and archives at admitting that software needs maintenance, that standards need maintenance, right? That's the SIP-2 problem in a nutshell, though that was proprietary, actually.
Starting point is 00:45:39 So Ruth Kitchen Tillman and I wrote an article, got published about a year ago, about the ethics of linked data sustainability. You can find an open access online. And we took a pot shot, actually. Okay, we, I took a pot shot. This one was my. At information scientists, okay?
Starting point is 00:46:01 Because there are too many information scientists who are serial project and standard abandoners, right? They can grant money to do this fancy-dancy thing, and they get as far as it being implemented in libraries. and then they just wander away to do the, to write the next grant application and do the next fancy fancy thing, and then it won.
Starting point is 00:46:27 Totally. Right, whatever they built it wants, because inevitably they didn't build it right in the first place. I'm totally thinking about OEIPMH here since we have some, some Sculop folks in the room. But SIP2 is another beautiful example. Gosh, we are so
Starting point is 00:46:42 bad. And versioning stuff. It's a really basic idea. You got a version stuff. You can never get it right the first time. So, yeah, I in that article took a pot shot at serial project abandoners and said, funders, stop funding them. Ask what happened to their last three projects. And if they're dead in the water, that's a black mark.
Starting point is 00:47:06 For real. Yeah, this is a general issue in any sort of, like, you know, publicly funded tooling space. Is that just like, like, I was allegedly on some. review panel for some funding agency that is theoretically talking about software sustainability. And that was a completely novel concept that just like what we want to do is we want to fund sustainable software ecosystems. That just like we're not trying to start a new project. We're not trying to like, you know, fund the new feature.
Starting point is 00:47:40 But just like these are the already existing things that are happening in open source. And let's just keep that going. like paying for like stuff like documentation and like making the tests work and like you know years and years of technical debt and like security audits yeah totally yeah and please yeah and so this is like one of one of my entry points in the thinking about semantic web and thinking about just like linked open data was just like initially thinking about because i was like living with someone who is like working in metadata in a library at the time and there was this like increasing cry of just like the we all know the journal system is broken. And like, there's this recurring strain of papers that are just sort of
Starting point is 00:48:21 like, let's just like make the libraries do it. You know, just like that just like we can sort of like get libraries to host a bunch of journal like things, journal like overlays or whatever, completely ignoring the reality of work and the reality of bureaucracy in libraries that just like, and, and so like, you know, you wonder who I'm talking about that. Oh, I don't have to wonder. I didn't talk about it out. Yeah. And so, like, that just like, this is where, like, on the one hand, it seems like an obvious thing where just, like, of course, like, it seems like libraries in general in the abstract should be invested in just, like, you know, maintaining some their catalogs at least.
Starting point is 00:49:07 But just like, also the all the other things that just like, you know, that are being archived and cataloged and just like, you know, exist in libraries and just like making that as available as a public catalog. Like, sure, surely they're already doing stuff like that. So it shouldn't be that much of additional effort to have an institutional repository that acts like a journal and can link together these things. But, as you all know. I keep going back to tooling. Yeah. The tooling was shit. The tooling for open access is and always has been shit.
Starting point is 00:49:41 Right. Yeah. And so it's just a matter of like, definitely. like there is this universe of like, where like, okay, we could get sort of some of these things aligned, like funding priorities for maintaining sustainable software. Okay, if we can then like get some sort of like IT consortium to help out with like maybe, you know, quote unquote, public cloud.
Starting point is 00:50:06 So it's not the case that just like every library needs to have like an on-prem IT team. And just like there are some of these things that could like lock into place that just could theoretically make some of this work. but just like, that's just not the way academic work is done generally. That's just not the way it's structured to make these sort of like long-lasting infrastructural efforts. Like, as you say, that these are just like grant cycle to grant cycle. Let's just like ride to the next thing. And even within, so like part of my role in the last six months of work is like I'm
Starting point is 00:50:37 working with actually a lovely group of people who I like and they have welcomed me in, so I'm not trying to speak ill of them at all. But this is a linked open data project. And basically what I've been trying to do for the last like six months is like pay down technical debt. We're just like there's this like really good idea of this like this way of having authorable link data schemas. It doesn't require you to be part of the priesthood to be able to describe what exists in your reality. But it's just like I didn't really work. It's just sort of like that it's just like the people that are concerned with the,
Starting point is 00:51:14 modeling part about the like what you know what is this kind of thing do we put it in this category like this like are not usually the same people who are just like going to be able to write a really good implementation of that and so like trying to figure out how to make those collaborations happen as well because this is another point where like I don't see this as a thing that really could exist or come from any sort of startup like rest in peace to the solid project which I have been trying to find for several years and I keep seeing little promising scraps of it. But this is like, so solid was like the thing that Tim Berners-Lee was like, this will be the semantic web, like, the thing that we're trying to like do to, so it's like, it had like crisis of conscious. Like actually
Starting point is 00:52:01 the web sort of sucks. Like, like, I think around like 2015 and 2016 and 2016 and like, you know, starting to be just like, okay, let's try and make solid as like a way for people to do the like the more like vernacularist dream of the semantic web, where I have my, like, this, now they're talking about like activity pods. Like, I have my little unit of my semantic web, like, information graph. But that quickly got bogged down in the academic cycle. No one could manage a project. Then they spun that off into a startup. And wouldn't you know it once that happened, then it became owning your own data was a bug, not a feature. And so now you have to, you're supposed to be pushed on to, like, renting a cloud server for it and so on. So I think that just like this doesn't come from
Starting point is 00:52:47 startups or from any sort of like company. It also doesn't come from the scattered weights of open source world. You can't just like ask people to do it for free. And it also doesn't come from just like local efforts of like trying to make tools for like an individual institution. And so just like what's left is like we need to use some sort of public funding and try and rally rally public funding in a way that it's not designed to be allocated in order to like make these kind of technologies and also the belief that there should be these technologies in the first place in order to make that real and so like that's this just like this an unending knot of like who do we who is the next little thread that we need to pull in order to make this large tapestry but then like
Starting point is 00:53:33 you you're dealing with 25 years of baggage at the same time so it's like a lot of the people that are still in that space either have distanced themselves from it and look back on it with this chain of mixed emotional memories, but I don't want to touch that anymore. Or they're like in some way still true believers that just like, what do you mean? Nothing is actually broken. It's totally fine. And like you just need to learn how to do it good. So.
Starting point is 00:54:05 Awesome. Yeah. So, like, and so this is like one of the reasons why I'm just like, like, we were talking about this earlier today as being like that in some ways, like, talking about like serial project abanders, protocol abandoners. And just like there needs to be like a break in a way that's like backwards compatible. We bring the past with us or like or have some way to like carry it through with us. But we're not beholden by all of this baggage that. And and so I don't know. Like I'm talking about just like what happens in the future.
Starting point is 00:54:35 I guess, I don't know if we've even gotten past the expository part of what even are we talking about yet. But like, maybe I'm jumping the gun there. But like, yeah, last thoughts on that idea is like that's another part. Like the twin entry points for me into this whole line of thinking are just like thinking about just like what could be an alternative to scholarly communication and publishing. It just like it should be possible for me to throw stuff up on the web and then have it be part of. of this sort of blob of information without a lot of gatekeepers in the way. The other part of it is that even long before I got interested in, I keep coming across these various like graveyards of things that are just like,
Starting point is 00:55:17 this is a really cool idea, like a browser extension that, like, everywhere I go, I can make sort of personal annotations and not just like bookmarks, but just like I highlight this section and then I can relate it and share it to my friends. like, oh, actually, that extension was for like Netscape 6.0 and like was abandoned 20 years ago. And like no one has thought about this ever since. And just like this long string of just like dead projects that are that are exactly like this. Because again, like thinking about like the kinds of open source projects that work and like are sustainable are usually ones that have some material tangible benefit for the people that use them day to day. like this is a tool I have active use for or their baseline behind the scenes of the structural work that like a lot of companies that will just like sort of rely on them like the types of like this niche of technology just like what you have to have in order to use it are a a website so that rules out 99% of all people and then be like a website where you are deeply in control of the HTML that goes on that page and that rules out 80% of the remaining 1%.
Starting point is 00:56:29 And so, like, there just, yeah, there never was a time when it had like an actual practical use. And this is something that just like gets called out as early as, the earliest I've seen of people saying, what is the point of all this was like in 2005 and 2006, where just like there's a series of these blog posts of just like abandoning the semantic web. It's like, no one actually figure out why we're doing this at all.
Starting point is 00:56:56 Like, there's one interesting example of, like, music annotation, or just, like, it's sort of like a peer-to-peer-ish music system. And then that's it. Like, the rest of it is totally pointless. Like, why would I ever do this? And the first, like, invest all this time into learning these incredibly complicated parts of it. Because, like, one of the things that we're missing in the exposition stack is, exposition section is, like, the sort of stack of things that the data is.
Starting point is 00:57:24 like you have the triples part which we talked about, but then you also have like ontologies and schemas and just like the way that these things all sort of relate to get. And it took me a year to even figure out what these meant and like what they look like and why they existed and just like why is a schema different than an ontology? Like that seems like the same sort of thing, but like different roles in the ecosystem.
Starting point is 00:57:52 And also definitely different. But like, just to say that just like, Why does neither of a record constraint language? Yeah. Entology means that your professor goes on tangents about first order logic when you're learning it. That's right. Yeah. Yeah.
Starting point is 00:58:07 And schemas are on schema.org. Exactly. That's how you know their schemas. Also, was the music project you were talking about linked jazz? I will look up this. It's in this blog post abandoning the semantic web. I'll see if I can find it. Because link jazz rules.
Starting point is 00:58:24 Yeah, that's a great little site. I love it. That was like the first I ever heard of Link Data. I was like an undergrad still working in a music library. Sure. And my, and my like mentor, professor, or not professor, my mentor, like, boss was like, this is the coolest thing I've ever seen in my life. Well, and, and, and, and, and music in particular in a library context is actually a really wonderfully subversive place for Lake Data to get a foothold. Because Mark for music So bad.
Starting point is 00:58:58 It's terrible. Music cataloging, like music copyright, is something that even seasoned professionals will not touch. Yeah, music cataloging has its own rules. I mean, heaven, blood, but wow, Mark was just not designed for that, and it shows. Oh, it shows. It shows.
Starting point is 00:59:21 Yeah. Back to the explaining, part of things as well. One of the main benefits always sold about link data is that since the web is sort of a page or document focused sharing of information, this would allow subsets of information to be pulled. Like Johnny said, pulling like all the headers from an article with a request. The thing is that like without like, I could pull 9,000, I don't know, 500 fields from a
Starting point is 00:59:52 mark record. what do I need that for because I don't know anything about the context of it without the full document? Plus, I'm guessing that's probably why it's so computationally heavy is that everything has to be done through servers, whereas documents can be retained locally. And it's mostly just text files, right? So it's sort of the same problem blockchain had where everything had to be done computationally. And that's why it took 20 minutes to buy a donut because it had to get pushed out to like 20 ledgers. And instead, this is like, if I want to query information, it has to go through different servers, which I think was kind of the idea of websites that heal. I have it pulled up.
Starting point is 01:00:33 It's John Rhodes blog posts. But when Johnny was talking about bots, I think that was the idea was websites, like link rot would happen between them. And eventually bots would just kind of communicate server to server constantly and then just fix links. And they would heal themselves. and that was kind of the idea. And that blog post ended with, if anyone wants to write this, I'll help. But until then.
Starting point is 01:00:59 But that's the thing is like, it's very difficult to do that because if you've ever worked with like government websites, particularly like healthcare websites, every presidential administration stuff moves entire divisions of the government. And so they're on completely different domains. And that's why government websites always break. And like really important ones.
Starting point is 01:01:19 And that's also why the, the government tends to do a lot of like dot coms now where it's just like health care. Healthcare.com. Okay, just go there and we'll point it wherever it ends up because trying to keep, because I was an allied health librarian and trying to keep those pages about like the Affordable Care Act up to date in lib guides. I mean, thank God has a very good link checker. But I constantly had to run that link checker because those things broke all the time.
Starting point is 01:01:47 They don't even keep their pearls or whatever it is. they use because like one of one year in grad school I was the the gov docs library and graduate assistant and half of my job was just like going through su doc stuff and then also like checking the the pearls or whatever permalink system that government websites and online gov docs uses and just finding all of the broken ones which was all of them um they don't like they don't even maintain their permalinks yeah which is the point of of permalinks is so that the back that the URL itself can change.
Starting point is 01:02:25 Well, I don't know if the LC again? Yeah, always. Yeah, that was actually a very example that Ruth and I wrote about in our piece was OCLC and Pearl.org, which was not originally OCLCs. It was a grassroots little thing for, okay, here's a place where you can mint
Starting point is 01:02:46 permalinks and we'll keep the database of where they point to and everything will just work and we'll have a happy permalink utopia. And then with absolutely no warning, some years after OCLC took over Pearl.org and made a very loud
Starting point is 01:03:05 statement about how it was very important and they were going to maintain it. And definitely, it broke. They broke it. I don't know the details. I think the person who had been maintaining it left retired, who even knows. But Pearl Org just completely broke. OCLC, of course, didn't give a fuck. And it remained broken for like several years. And now the Internet Archive eventually
Starting point is 01:03:35 took it over and they don't give a fuck. So you can't actually get any support for it. And a bunch of innocent third parties who believed OCLC's lies and gleefully minted all kinds of. of pearls because they thought that infrastructure was going to stick around. Dot burn, right? This idea that Justin, I believe, was talking about of self-healing websites. Right, that is nonsense. That is garbage. The world does not work that way.
Starting point is 01:04:07 The world needs maintenance. Yeah. And so there's like the whole nest of ideas about it's like roads not taken in the internet with a lot of this. Because it's like, I have the same feeling about just like, permanent ideas and as I do about just like in general when I see like yet another platform for scholarly communication or like we're going to fix the ills of like academia by making yet another platform is that just like this is intrinsically a political one where and it puts and it's one where you are putting power in the hands of a specific organization that just like and the longevity of that is strictly social we're just like it's the same way it's like permanent. permalinks exist as long as the organization exists. And so, like, I have, in general, sort of, like, more faith than average that archive.org will continue to exist in the next year, although they're sort of, like,
Starting point is 01:05:01 damaging that reputation lately to sort of, like, like, just like, you know, anyway, we won't go there just being sort of like, I think that they have good longevity plans for their archive of the web. Okay. But, and I also, in general, think that, like the DOI system is probably not going anywhere. That's largely because it's like, you know, one of the mechanisms for extracting billions of dollars from public funding every year.
Starting point is 01:05:29 Then just like, so there's like social reasons why these things persist. But it's like there's the major thing that was not taken. Like why the like as you're saying is like the web doesn't work in such a way where it would be possible to do self-healing websites or self-healing links is because it's designed to be a client to server. you go to a place and get something that someone else controls entirely and like you're not actually supposed to have any agency in this in this world and like there's good reasons for that
Starting point is 01:05:58 don't get me wrong but just like this is like one of the true things about linked open data is that just like it needs to be peer to peer like the thing that like the way that it could conceivably work is as a peer to peer system where that's where there it's possible to do efficient querying and caching between a bunch of different peers. So it's like designed to be distributing labor in this way instead of every time someone updates a link or makes a new record, everyone has to go and hit this one server to get this one, you know, URI that represents this core concept or whatever.
Starting point is 01:06:35 That just like, and so like as long as that doesn't exist, there's this duality of this beautiful idea of, of basing semantic web and linked data on URIs. Is that just like, okay, and elegant simplicity of this idea that the identifier is actually a location, that location and identity are the same thing. And when I go to that location,
Starting point is 01:07:00 I'm supposed to get something useful from it, and then that allows me to go to the next thing. That's like a wonderful, wonderful idea. But in reality, it doesn't work at all because identity and location are not the same thing. that like I didn't and you know for one one reason is identities change and like that like that like and so like there's this like you know it's classic thing that everyone always reference on the web is that it's like cool uri's don't change that's another tim bernersley classic it's like actually all uri's change all the time
Starting point is 01:07:34 and like and if for that to be something where just like you you're have a polemic trying to force something to behave in a way that it doesn't rather than adapting to the reality of that thing than just like yes you buy yourself in an infinite failure and so like one of the there's this raise and you hang please i just want to jump in yeah we do the raise hand thing to like you can keep going and then when you're done sadie will say something but also just like interrupt i actually would start trying to make some notes to organize this thought because this is a long idea so like i but like yeah like Oh, I just, just, I, I've been thinking a lot about the, the purpose of a system is what it does. Completely. Right.
Starting point is 01:08:22 Not what it thinks, it's not what it was designed to do, because we all know how design goes awry. But yeah, the purpose of a system is what it does. Right on. And I just think about, I don't remember where I saw that. I love systems theory. Yeah, right. If anyone has ever maintained a website or any sort of web technology, we're just like, if the intention of this thing is to be liberating and freeing, it certainly doesn't feel that way, that just like, that like, you know, what it would take to actually maintain a URL for forever. Like, if that's the way the web is supposed to be, that's the purpose of the web is to, like, put these documents on the web.
Starting point is 01:09:02 Like, it didn't, it doesn't do that. So, yeah, exactly. That just like, the purpose of the system is different. we're just like, and like, again, like, thinking about just like all the ways that the technical development has been stunted by the, you know, commercialization of the web that just like precluded a lot of these things from existing is like, it's not an accident. And so like, so one of like one of the ways, the ways that link data is working and mass right now in a pretty invisible way is the Fedaverse. And this is like what we were talking about the last time I was on here. So I won't believe for the point. But it's just like that that's built on link data, at least in the absolute. And this is there's sort of fascinating, like,
Starting point is 01:09:40 realization of that where just like, like, for example, like Massadon, like the largest implementation of that does not actually use link data as its internal data model. That's all like a Postgres database that then it sort of just like synthesizes JSONLD out of. And like, as there's benefits and tradeoffs that we're just like, as a result, it sort of doesn't do all of the link data parts of what activity the pub was supposed to do.
Starting point is 01:10:09 But there's the other, like one other major alternative to this is pluroma and a coma, like the fork of plumber that is based on a graph database. And that can do a bunch of really interesting things, but it also is like always crashing all the time and like sort of hard to, because it's like, you know,
Starting point is 01:10:31 thinking about just like, because social networks are networks. It's like easily modeled by a graph. and so doing something as simple as just like there's this notion of like this containers and these ordered collections and stuff like that in activity pub and one ends like this I've you know obviously lots of feelings about this this particular spec but like one of them is I have a this notion of who I'm addressing my message to and I should be able to address it to whoever I want to that I have I can address it to this one controlled ontology term public and that's just like I'm sending it to the the world. But also it should be possible for me to have collections of people and like I can address it to this collection of people. And so it's like in that way I have a graph and then that graph is modeled like and all the relationships are modeled within activity publishers being like, I'm allowed to send it to these people and I want to send it to this subset of them in this particular
Starting point is 01:11:27 case. And so you can do stuff like that in the coma and pluroma. Like the UI for it is a little less than what could be desired, but that's not something you can do in Macedon. where each one of those addressing features has to be carefully architected from, like, as a database query. So, like, there's a, this, this tension of just like, okay, we try and do it the semantic web way. It has the beautiful possibilities, but it's, like, really hard to implement. And one of the things that's hardest, that was an extremely, like, big reach and was really only, like, done and made work by just the sheer hegemony of Massadon as, like, you know, the thing that if it does something, everyone else has to adapt around it, is like implementing editing. Like, you know, thinking about just like, I have a post, I want to edit that post.
Starting point is 01:12:12 That means I have to propagate that new version out to everybody else. And so, like, thinking about just like what it would take to have like these sort of self-healing websites or just like the ability for the web to adapt to change is like you need to have that expectation that just like everything that I know about, I should be able to receive changes and be able to propagate those among the people. in the same way that just like, that's how rumors and horizontal information transfer works generally is that just like, oh, I heard that this new thing happened. And I tell my friends about it and just like, you know, maybe and doing so in a way that's like actually safe and that is resistant to counterfeiting is a remarkably hard thing to retrofit into a system. And so like that's like.
Starting point is 01:12:58 Like how do we make the web actually rhizomatic? Yeah. And yeah. And this is like, again, it goes back to the, like, the dawn of the web browser and what it is as a technology is like this idea of the read, write web. We're just like, it should be just as easy to write as it is to read on the web and like, you know, obviously controlled by permissions in some way. But like this, that experiment died basically when Netscape won in the early browser wars. But then it persisted in the form of wikis and this notion of soft security. where just like, how do we make that work?
Starting point is 01:13:35 Is we make it so that doing this kind of like, you know, we allow stuff to happen, but then make it so it can't damage the system in some profound way. We're just like, if someone does something they're not supposed to do, you know, something goes and vandalizes a Wikipedia page or whatever, then like, sure, the next person that goes and loads that page might see a bunch of vandalism.
Starting point is 01:13:58 And that's bad. But like, it's not, it doesn't ruin the page. It doesn't break it forever and completely. Like, it's possible for me to revert the old version of it and so on and so forth. So, like, and that's a radically different political vision than the most of the web stack that we're familiar with. So just like that it's like that ultimately for this technology to work, it needs to be constructed on a different set of political primitives that include other people existing and being a. able to do stuff in a way that just like is very uncomfortable for like most of the people who
Starting point is 01:14:38 design web technology nowadays to think of that as being I'm going to design a platform that I administer for other people. And so instead like thinking about it as being stuff that is designed so you get out of the way. Like the most successful technology that would enable like semantic web stuff is that no longer requires the developer to be there and allows people to actually have autonomy on computers. But again, there's no percentage in that. It's in fact, anti-profitable. And so that's a very difficult thing to organize that kind of,
Starting point is 01:15:13 not only a technical vision, but social vision as well. Yeah. I always end up just like back in Wiki world. It's just like some of the most lovely parts of the web as far as I'm concerned. I'm so curious I can find this like link data music project. that also is a major. Oh, so like, I don't know. I feel like the thing,
Starting point is 01:15:37 and I think about just like, survivable web technology, I would just like return to like pirate networks just being sort of like the things that can exist and do survive on the web. We're just like, what are the longest lived things on the internet? And it's like the W3C website,
Starting point is 01:15:54 just sort of they win by the hell. But like, other than that, like pirate networks. Like that is the other major answer that just like some of those like MP3s that were like released on Cazaar or something like that are still floating around. And that just like you compare that to the extreme adversarial conditions by which the entire global intellectual property regime is bearing down. And still it happens.
Starting point is 01:16:23 Like why does that work? And like, you know, to some degree it's a technological question. But it's also a social question of just being like, because people. take it as their responsibility. That it's like, I see myself as an active participant in this system. And so when my pirate site gets shut down, I go to the next one
Starting point is 01:16:41 and put everything back up. And so, yeah, that's... Anyway, you've got to love the pirates. Although there's a huge amount of power and political problems in those circles as well. Librarians need to read that, like, how to form an affinity group zine and, like, go from there, see what happens.
Starting point is 01:16:58 Totally. I mean, I was, it's likely to work as anything, really. Yeah. I think one of the practical reasons also linked open data is always difficult is that kind of all files are local files in the same way that like all history is local history. Because it's always local to somewhere. Anytime I try and think of, you know, particularly like when you mentioned EADs, there used to be a lot of stuff in the EAD literature about like, why does no one share their local authority files? Like, you know, like John Fuck Smith donated to the library and we have his name authority file in like our decks, but he doesn't have like a library of Congress name authority because he wasn't famous enough. Right.
Starting point is 01:17:46 So everyone's got there. Right, right. He just had a bunch of money. Right. And so so we have all of these people who are local in our local name authority files and they never ever get shared and they always stay siloed. and there is almost no solution to it because the amount of labor it would take to disambiguate the names, people who have common names, and is this the same person? And then who's going to do it too?
Starting point is 01:18:16 Because they barely have enough staff in special collections anyway. So who cares if like every local donor is going to get their own name authority file? And I think another thing is like Johnny mentioned having like the way Johnny used a word would have to go to a URI. It's kind of when we were talking about taxonomy last week, and that episode doesn't come out yet, but
Starting point is 01:18:39 sort of like the issues with like taxonomy for animals and everything, you need like smaller sets of words, not bigger ones in order to actually make it useful for humans. So when I was working with the bird working group, it was like everyone
Starting point is 01:18:55 keeps using too many different words. We need to just all we need to solve this problem is like a shortlist. And then we can use that as like user-submitted metadata and tags. And that's really all we need is just to agree between us humans. We're going to use the word paleo-anthropology instead of archaeoanthology. And like that's all we had to do is like kind of get people to agree to that. There's not really like a technical solution because, you know,
Starting point is 01:19:25 the entire birdworking group of paleo-ornatologists is like if they were all on a boat and it sank, there wouldn't be a bird working group. Right. So it's not too difficult to like, it's, it's not an impossible, like, political solution. And I always keep kind of thinking about is like, we have all these documents. Yeah. And they're, it's, it would be nice to break things up into data and share it as linked data. But as an organization, you don't really need to, depending on the size and scale.
Starting point is 01:19:57 And so that's why like so many libraries. have their own. When I think of how a library is organized, it is ultimately, you know, the reason why Mark is like that is its access points. And it's kind of what we always default back to is what's the access point for this. And I don't really care semantically, like, how the data works, as long as like, this is a subject area, this is the title, this is the author, how do I get to the information, like the quickest possible steps? And then that leads to, I feel like that's where always the disconnect has been for me with linked open data of like when is this going to help my users in my library? It's like, well, you can get stuff out into the.
Starting point is 01:20:42 And it's easy for me as a Skalkon person because it's like, I'm the only person who's like, no, I want this out everywhere in the world. I want everyone to look at this. But everything else in the library is categorically organized around how do people in here find the stuff that we're looking for? and I'm the only one who has to flip that and try and say, how do we get what's in here out to the world with no barriers and restrictions and logins? Yeah, like, last year, maybe a couple years ago, I was part of the like PCC ad hoc group that put out the final decision about like, hey, maybe don't put gender in name authority files.
Starting point is 01:21:25 Because there was the initial one and then a lot of people got mad at that one. and I was part of the ad hoc, hey, let's revisit this. Thank you for your service. And one of the final sticking points, like, because most of us were on board with, like,
Starting point is 01:21:41 maybe let's just don't. Like, it's too complicated to think of any ways to, like, put consistent language, ways to do this ethically that's not going to hurt,
Starting point is 01:21:53 like, trans people was mainly who we were thinking of. But, like, there's other reasons why you might put gender. I'm like some of the reasons we're like, but with like Asian names, sometimes it's hard to disambiguate. And I'm like, that's racist. Like, that's just lazy and racist.
Starting point is 01:22:07 But the big one, like the final kind of sticking point where we were like, maybe there's a point here, but ultimately, no, we don't care, was, well, in a linked data environment, people could query books about XYZ written by trans authors. Or, for example, like you can do a Sparkle with Wikidata where you can be like, pull all of the towns that currently have female mares or whatever is usually the example that they use when they tell you what Sparkle can do with WikiData. Like, what if you could do that with the library catalog? Whoa. And we had to be like, yeah, but no discovery layers, like, Primo doesn't even do that yet. like no discovery layer right now that's like popularly used by academic or public libraries has that capability.
Starting point is 01:23:00 They might have linked data in the records and they might have APIs exposed if you have a developer who can do neat shit. But ultimately that's not how those searches work right now. So maybe it is available in the future, but for right now we don't care and that's not the purpose of name authority files. Right. like the question of just like what is it for like what is the point of it you know why would I do it if there's no use is like also ultimately really just like beliefs about like how things are supposed to be designed we're just like is the goal of it to be able to get a exhaustive and true answer of all of the cities that have a woman as a mayor you know is that that the point of what we should be doing with semantic web is too like make the correct information exist in a unified vocabulary. And like, I don't, like, spoiler alert, I don't think so. That just like, well, because there's no such thing as like the authoritative and complete true archive of all knowledge.
Starting point is 01:24:06 But it's also just like thinking about it's like, well, that's like an impressive technical feat that I could put on like some sort of like tech specs document that just like my query engine can produce 10 billion triples in like one. one second. But like, yeah, like, what's the point of that? And just like thinking about it, like, in the context of language, we're just like, it's also related to the notion of like ontology curation about just like, how do we come to like know the terms that are the one term to use is like, that's only an important question. If the goal of it is to like make everything be totally uniform and also that that act of searching is like relatively precious and hard to do and like I can only do one of these or something like that that just like this is not an iterative process of exploration and ultimate and also that just like you're not able to so like I think about
Starting point is 01:25:00 just like the way that this works with language we're just like it doesn't ever work with language like say new phenomenon exists in the world like we need to get the council of languages together to agree on the one word for that and then everyone from then on has to agree to only use that word to refer to that phenomenon. It's like that never how it has happened and it never will be. And just like, instead just like this sort of local interpretation of what's happening in my immediate reality and just like you try and use this word and is this effective with it when I say it in this way. Oh, what I'm talking about is this and oh, I know it is this. And just like the sort of negotiation of what things mean and in what context and to who. And like being able to have your
Starting point is 01:25:43 personal vocabulary and ontology where just like as your history of your browsing, It's like, I've come to know that these terms are the same terms or just like when I am in this neighborhood of semantic space, I use this word instead of this word. And like, then you can imagine like the collective power of something like that. We're just like, okay, all of my friends know these words as being the same. And so just like in general, I can ask around and say who I'm looking for this. Does anyone know how I would refer to that? and just like being able to make sense of just as like an iterative
Starting point is 01:26:19 and a social and an interactive process, not one that's done once as if it were like a database query with a very controlled database schema that's like known in advance. And so like it just changes our expectations for what technology should look like.
Starting point is 01:26:35 That just like I don't go to the vast impersonal search engine that indexes the whole web. But instead I have to actively cultivate sort of like like a set of nodes and friends and like relationships and like prior acquaintances with this kind of thing and then expect it to take a little bit of time to find stuff, you know, that just like that and like, that sounds sort of counter-duty. I'm not saying it and create exclusion or create inefficiency, but like that just like the goal of the system isn't to produce maximally true,
Starting point is 01:27:12 maximally numerous and maximally cleanly organized data all the time. And like, it's just like, it's, I can imagine, like, thinking about just like, what happens, you know, just like, like, like, just like, why doesn't everybody share their, their, like, local. I actually am not familiar with this term, like, authority file. I assume that's like, you know, like a local, like, reference. Like subject headings or like if you publish a book, like your name, how that's in the Library of Congress. It's an authority file. Gotcha. Yeah. It's also just like one of the things like who gets to do that, you know, that like, the same problem with just like, you know, libraries and museums being the sites of just like pillaged cultural artifacts. It's just sort of like not your
Starting point is 01:27:59 job and not your role to be the purveyor of this information like it's about this person. And it becomes your role because like there you have no means of doing so themselves. Like there's There's just like these systems aren't ones that can be touched by the average person. Like I can't like deposit a book myself in Library of Congress. I need some intermediary force. And so like that's just like that like that's like another part just like why doesn't it happen and why doesn't it work is because like on the other end is like, who is it for? And should we even do that at all?
Starting point is 01:28:30 Because like same thing of just like what happens when you need to change your dead name in in the all the bibliometric records. Like how does that happen? I freak all my software friends out when I talk about eventually needing to write the anti-performance manifesto that just like sort of like that just like like in someone who is like a friend on the Fedover's and it's like we talk all the time just sort of like horrified just like what do you mean software should be delightful to run and like just like yeah yeah that's not exactly what I'm referring to though just being sort of like that like the we need to get page load time down. to two milliseconds or life will be lost and meaningless as we know it as just like a set of ideological commitments rather than making stuff be usable by people is the thing I'm talking about. Oh my God, I'm opening this. I'm opening this. You have an authority file. You have an official U.R. I do. I have a U.R. I'm part of the problem. we shall have many URIs
Starting point is 01:29:41 yeah I helped I helped write a book in like 2018 during my first job hell yeah like one of the interesting things that I think that Blue Sky and AD Protocol has done is like make it so that like domains are sort of meaningful as identity we're just like
Starting point is 01:30:01 that's cool yeah that just like I have a domain and like control over a domain and that gives me a source of identity. Even if it doesn't give me control over the computers that host the thing that, you know, whatever. Like, we talk about that a different time, but just being like, it's very interesting that just like that has resurged and it actually genuinely useful.
Starting point is 01:30:21 And I think one of the best ideas to come out of it is like actually using those, like, you know, URIs and URLs has just literally, this can be my name. Yeah. Because it's language independent, Cuban language independent. and things like dead naming, which we have to deal with in the authority file environment because it is predicated on names, it's just a URI. You don't have to do that. You can attach any name you want to it.
Starting point is 01:30:50 So there's definitely... That's the good thing about URIs is it allows the flexibility for trans names or any other kind of name that might change. Absolutely. That's the good part about them. Yeah. Love URIs. That's one thing that I want to keep. one thing.
Starting point is 01:31:06 At all this nonsense, URI's identifiers was genuinely a clever and useful idea. Yeah, like it was a big deal when the homosaurus moved from having the terms be the URIs to having alpha numeric URI so that we could change terms as language use changed. Yeah. Love it. Yeah. Does they ever tell you don't put semantic information into URI? everyone does it it's so stoop
Starting point is 01:31:35 we're queer we don't listen fuck you DOI dot org slash DOI dot org slash my journal volume one and it's like
Starting point is 01:31:51 yeah if you never meet Jeff Builder who's a wonder works a cross rep wonderful human being he has many many many rants about publishers
Starting point is 01:32:00 coming to cross ref wanting to change a DOI prefect's because they merge with another publisher or a journal change publishers or whatever the hell. And he's like, no, that's not the point. They have a suffix generator now. It's just, it's literally just a spreadsheet that generates a suffix. But they're like, use this, idiots.
Starting point is 01:32:22 Yes, please. Is that like half your job, Justin, is just being like, hey. No, I don't meant, I mean, I don't meant DOIs manually usually. But the thing that always bugged me was, OJS used to put, semantic information into the automated strings that it would create. So it would create, it would say like V and then the volume number and then the article number. And I was like, don't do that. Just put random numbers.
Starting point is 01:32:50 Just put random numbers. Just general, just random number generator. That's all you need to do. But they didn't do it until the latest update. So now they do it properly. Or you could do what every single baby. database administrator knows to do and just count. I don't know how to count.
Starting point is 01:33:12 I don't know how to count. I'm gay, as we've learned from the homosaurus. I do have an Excel sheet of like manuscripts and database bases, and it's just 0-000-0-0-0-0. What? Yeah. Me one. What happens when you go beyond the capacity for how many zeros you picked? What then? Add another zero at the end.
Starting point is 01:33:32 It doesn't matter. Okay. and like it's like it's like all of these things like have their times and applications and usages and everything like that we're just like just do all of them and make them all point to you know the same thing different things etc that just like like like I think like you know sequential numbering identify works you know there are times when you don't want to use it like we're just like you have like potentially personally identifying information where you don't want someone to be able to enumerate over all. possible things and find all the stuff on the server. And spoiler alert is like university IT terrible job at this. And frequently we'll just have like very sensitive documents hanging out that can be publicly enumerated on their on their public web. But like, you know,
Starting point is 01:34:20 so it's like super useful when designing some systems in the same way that just like having totally anonymous strings is super useful in like PID space, but then want to have semantic URIs and some other content. They just do all of these things. And the other one is like the content hashing where just like the identifier is like intrinsically based on the content of the thing. So if I have the thing, I know how it would be called everywhere in the world, like has its own benefits and tradeoffs. That's like that is one of those dangerous ideological territories where just like you get pirates and also cryptocurrency zealots in the same room. And it's just sort of like it becomes this maelstrom of just like.
Starting point is 01:35:03 the same idea, meaning completely different things to different people. But, like, yeah, we're not going to solve the identification problem. But basically just like, you know, it's the rigidity in being only able to use one thing that, like, is the problem to me. Yeah. Now, I don't have LIBRA of Congress name authority file. Although someone from Florida with my name born same year as me does, which is confusing. there's so many people with my I went to high school with someone with my name
Starting point is 01:35:36 it's very confusing it doesn't seem like it should be that common it makes you harder to docs though so that's like passive self-defense it is really good I can I have successfully scrubbed my information off the web several times it's not hard or one time I couldn't do it so I just redirected it to another dude with my name and so I just changed my information
Starting point is 01:35:57 to I changed my address to his and like I feel like this would be something that just like, like, like, North here probably have, like, stronger thoughts about it's like the notion of privacy and, like, when it comes to, like, linked open data and stuff like that, we're just like this,
Starting point is 01:36:13 the fact that just like, we don't want all the world's information to be publicly. We don't want, like, the Justin authority record that includes your, you know, social security number and, you know, phone number and everything like that. Like, like, like, limits to openness, you know, that just, like, needs to be some amount of, like,
Starting point is 01:36:29 fungability and, yeah. I'll actually give you a real-world example. If you go and look at my wikidata page, and you can just go to wikidata.org and look up Dorothea Sallo. I'm the only one as far as I know that has ever existed, so what you find will be me. I might, although I identify, like, I'm se as female, that is how I identify, that's who I am.
Starting point is 01:36:53 My wikidata page actually says, no gender, no gender recorded. And the reason for that is that Wikipedia, with which I have a very vexed relationship runs through wikidata every now and again to do things like make lists of people who maybe should have Wikipedia entries but don't.
Starting point is 01:37:14 And of course they do this for minorities and underrepresented populations. And of course, Wikipedia is well known for having a huge gender problem, gender disparity coverage problem. So I get sucked up into those lists and nobody asked me, I do not actually want a Wikipedia page, thank you very much.
Starting point is 01:37:36 And I would rather not be. So I change my gender that is listed on Wikidana. I do not actually change my gender. That's dumb. Like, like, anti-bought action. Like, you just like to read a digital channel. It seemed to be the only option for saying, no. Don't make me a Wikipedia entry.
Starting point is 01:37:57 Trans for the privacy of it. Pretty much. Gender opsec. My gender is fuck off. Get this gender working for me. Yeah, no, that's why I also like orchid IDs too, because it's a very nice system that you get to control and you get to change. You get to write your name how you want it. You can write it in multiple scripts.
Starting point is 01:38:24 And it's just an orchid. And it just will point to whatever you tell it. So you can change it whenever. whenever you want. And that's what I really like about it is, you know, that would be something that would be very nice to use for like local archiving and stuff like that. But the reason why is like,
Starting point is 01:38:41 no one's going to bother to do that. But like I couldn't even get like faculty to do it, even when this would save them time in the long run or it would make. Right. Or it would solve headaches. Like if they don't, if they have a double barrel first name and people keep putting their second first name as their last name,
Starting point is 01:38:57 it would solve them this problem. but they don't go sign up for an orchid. I was actually when I was cited in the ethics and name authority files, book, one of the chapters, and then they asked how I wanted to be cited. I was like, I would like my orchid because they were citing one of my articles or my thesis or something that had my dead name on it. And I was like, I want you to do it this way and I want you to have my orchid in there
Starting point is 01:39:22 so that it's collocated, like properly links back to like all of my stuff. right and I think it was Brie actually then went on to write an article and talk about like how I asked to be cited in that book as like using orchids and URIs and link data as a way to help trans people who maybe have published under dead names yeah and if they don't want to go back and change like ask for it to be changed which I don't but this way I can have people cite me and just use my first initial, then it point back to my current stuff and everything I've done with my current name while also still being like,
Starting point is 01:40:05 but I'm also the person that wrote that. Yeah. It's not that hard. Yeah, especially if you like use initials, because I use my initial a lot, because I do have a very common name. So I think, but I used to write my full middle name and I don't do that anymore. So it's nice to be able to be like, okay, I published my thesis with my full name, but now I only like using my middle initial.
Starting point is 01:40:25 And now I'm at an institution where I'm the only one of me, so I don't even have a number after my name. I was very excited when I got my email assigned to me because there is now someone else at my university with my name. So there is like a zero one now. And I'm like, ha, finally got there first. I used to get detention because some dude had my name. Are you serious? I used to get his detention. Yeah, they used to put out a roll with the names.
Starting point is 01:40:50 At the beginning of the period, teachers had to check them. And if you were on the list, you had to go to the cafeteria. So I kept getting called into the cafeteria because they wouldn't disambiguate my name. Did you? I had that happen to me too. I had my birth last name, which is I changed my last name when I got married. My birth last name is Johnson. So there's like, not only are there 70 billion like S. Johnson's out there.
Starting point is 01:41:17 But like I have a cousin who has like almost the same exact name as like we were born. or in almost the same exact, like, person practically, right? We have the same name, the same first name, same last name. Neither of us use our middle name, right? Yeah, so I got told I was supposed to go to detention a couple of times in high school because there was another person with my name. It's common. But like that's a, you know, like free bad kid and social currency, you know, just like,
Starting point is 01:41:44 hell yeah, I'm going to detention, baby. Like, that's like, right. And then you don't even have to do it. So you get the best of both worlds. Well, I used that. What they said to me was, well, if they don't put your middle initial, it's not you. And I use that excuse for the next four years, even though that dude was a senior when I was a freshman. Said, no middle initial, it's not me. Can't make me do it. That's like just social engineering, you know, just in the real world. You know, just people just intuitively do it.
Starting point is 01:42:14 There is no difference between social engineering and con artistry. Hell yeah. Yeah. I will die on that. Hill. Yeah, a good friend of mine is having a crisis of like direction in life. And I'm like, okay, so your strengths. You are super good at like infiltrating unfriendly organizations and groups of people and like taking on roles and shit. And did you know that that is a job? And like, and so like trying to like, yeah, turn this person.
Starting point is 01:42:47 Early a job. Like it's like, and a lot of the people that do it sort of accident. find themselves, you know, like, like, you know, seeing it the first, I was like, holy shit, you can do that. And then just like suddenly becoming really good at it. Anyway. I feel like the alternate of that fork is improv comedian. No. Fuck. Their true, their true destiny is they just become podcasters.
Starting point is 01:43:15 Impro people are good at doing podcasts. Like all my favorite podcasts, I've learned, like, the people did improv. I have no idea what I'm doing here. Yeah. That's like something. We did improv that one episode. What you did like improv games or like what? What are you talking about?
Starting point is 01:43:33 We had we'd seriously wrong on. We did skits and those were improv. Oh, yeah. I dipped. I was bad at it. We were very bad at it, but they are very good at editing. They're so good at editing, my God. When I finally listened to that episode, I was like, oh, wow, they made something.
Starting point is 01:43:51 that it is. But yeah, the only thing that we didn't mention that I wanted to maybe mention is kind of what we talked about last time was whoever controls the nodes of a graph can control the graph. And so I
Starting point is 01:44:12 was also thinking about that as a security problem with linked open data is, you know, when we were talking about like all of the privatization happening, if someone buy a certain node of the graph, then the same problem Sadie was saying with everyone having their own API is like if you're controlling this graph,
Starting point is 01:44:31 even though it's open and you control the right permissions, then like, I don't know, assume that's a problem that's going on because OCLC has meridian now and I assume that that only exists because it will make money. If you control the spice, you control the universe. Yeah.
Starting point is 01:44:50 Is that a animal This is a very cranky and just like Desirous animals It's like my turn Like I'm I haven't heard about this
Starting point is 01:45:02 This meridian thing was the first time I heard about this today Is this just like a It says May 2024th Is it like I assume it's Is it that new? I hadn't known about it until today either
Starting point is 01:45:12 For when it's worse Oh, CLC just loves to do shit Our metadata Librarian is currently work like on on a at my job is on like a committee for i think what is what is the organization the program for cooperative cataloging and they're working on a task group for like uri's in mark implementation so i guess like they're going to have separate types of like handle based permalinks
Starting point is 01:45:41 or something i don't know that are going to be in mark but they were also talking about how they had like a demonstration of Meridian. And I don't, I think it's just the link data they've made out of WorldCat. So they're, they're using an entry for Octavia Butler as the demo data. And I'm like, that's like an interesting, interesting like person and body of work to evoke in your like corporate platform. Like that's just like. Oh, yeah.
Starting point is 01:46:14 The don't build this machine. Yeah. The torment nexus. Thank you. Don't create the torment nexus. Wouldn't it be terrible if we created the torment nexus? It creates the torment nexus anyways. So here's a gift.
Starting point is 01:46:32 And this is totally off the cuff just because, again, I only heard about this today. I think it is clear to OCLC that their World Cat monopoly is not long for this world. One way or another. whether it's a customer revolt or we finally find a way to do this with linked data without getting sued out of existence, that's not going to last. So how can OCLC come up with a linked data store that they can fence around, limit to their customers, the same way that they've done with WorldCat? That's what I think Meridian is. probably I mean as
Starting point is 01:47:14 as you're saying like they're doing it because it makes money somehow and like I think that's a pretty good bet I mean and it's like continuous with the way that the rest of like Lincoln Open Data has worked
Starting point is 01:47:27 where just like that's like what WikiData is to some degree is that it's like basically a captive labor pool like and so it's like who funds WikiData is largely Google and so like Google bought freebase, like the predecessor
Starting point is 01:47:42 to it. You know, they did their attempts at cleaning it up and everything like that, and then basically, like, shunted that into Wikidata, and they profit from it immensely by being clean, corporate-friendly,
Starting point is 01:47:59 like, there's no, like, swearing on Wikidata, you know, and a way of concentrating a bunch of labor, so that then they can mine it and make derivative profits from it. And like, we're just like, the people that work on wikidata are like genuinely true believers in like the beneficence of cataloging the world's data. They're like not corporate stooges. They're like, view themselves as being like, we're just trying to do the same mission as Wikipedia, which is just like, yeah, make a global information store, but not really evaluating the like, why would Google want us to do this?
Starting point is 01:48:38 you know, and like, and so just like that, that sort of pure production as captive labor model is one of those biggest sort of like, you know, red pilling moments for like information people. Is that just like, what if it's actually bad to have like these sort of like crowdsourced information platforms that just like, so when we were watching, when we were watching, lo and behold, like one of the like examples of, like, just like the beauty of the internet. And so it's like, again, like, every time I think about this is like, this is a movie that was released in 2016, which is not that long ago. But yet, and yet it feels like the completely different universe where just like, this is like one of the promising things about it where you had this like chemical reaction, crowdsourced thing.
Starting point is 01:49:27 We're just like, the wisdom of the crowds, lots of people playing this game about like protein folding or whatever was able to do something that, you know, the best scientists in the world could do. And it's just like, cool, but were any of those people on the paper that got published from that and from all of that work? And like, we're just like, if it's just a thing where you farm out other people's labor in time, or just like, in this case, like, farm out all of the cataloging labor that like happens in libraries into sort of curating this like collection of information in the same way that I don't know the politics of world the cat. I assume it's the similar kind of way. We're just like, everyone is required to use this, but we don't actually
Starting point is 01:50:12 have much control over it, kind of thing. And just like, yeah, that like that is a massive extraction vector sort of hiding in plain sight under the guise of pro-social technologies. Yeah, and this is probably more of the same, which is to make that data than usable and useful to AI products, I would assume. Particularly, it's interesting that they mentioned like incorporating Orchid and ROR, which are like Skalkom specific things, really. Especially ROR is like a weird one to throw in there because that's like research organizations, right, to make sure that those are disambiguated because journals are really, really bad at disambiguating like the biology department of this university because
Starting point is 01:50:56 departments change all the time and also people abbreviate them. And, you know, so there's no, there's no like one identity and that causes all kinds of problems, even just like getting the university right half the time. It's like, it's wrong. So ROR is kind of like orchid for organizations. And so that's a very specific thing. And I find that very strange. Like, do they want like regular like cataloging librarians like fix the Skollcom metadata
Starting point is 01:51:24 problems that are out there? They do want to oyster. Yeah, that like clarivating fix. Scoop that up back in the day. What's that? Oh, it was a union search engine for institutional and sometimes disciplinary repositories is what it was. It's basically, there were always problems with it, but the problems go back to OAIPMH being complete garbage, such that you couldn't, for one of the things it does not allow you to say is, is there full tax associated with this item?
Starting point is 01:52:02 And so one of the reasons Oyster became completely useless is that it was choked with metadata only records, which really disappointed end users because they couldn't click on it and get to the thing. Right. And that's definitely why I auto embed SciHub links in all of my writing
Starting point is 01:52:22 because it's just like, what uses it to someone else for me to cite something if they can't actually see it? I wonder how they scrape the full text information now when stuff gets pulled from OAI IPMH because it still does. Because OAIPMH is how we push out to core, but it definitely does know if we've got full text.
Starting point is 01:52:41 I have to think they implemented a check, which is fascinating because they would have had to implement such a check for pretty much every single repository and repository design in existence. Like you're literally looking for a link that says PDF or something. Yeah. Wow. Oh, because Herbert Vandis,
Starting point is 01:53:02 is complete crack at building protocols and things that will be useful at last. All right, I said the name. This is obscure beef. Oh, I, you know, Herbert van de Stauffel, when I say serial project abandoner, he is the paradigm example. He totally did that with OAIPMH. He totally did it with Memento. There are probably six other projects of this. I could also.
Starting point is 01:53:32 Right. Remember memento? Yeah. And I'm just like, P funders, stop giving this guy money. It never turns out well. We got more obscure beef than a Wagyu farm. Heck yeah.
Starting point is 01:53:51 Look at me like that. I look at you however I want to. All right. I was very proud of that. It's good. Well done. Thank you. you. Okay. I think we should wrap up.
Starting point is 01:54:07 Yeah. Sadie, did we accomplish the mission? Yes, I've got a sleepy bitch disease. Did we clarify what the hell's going on or still cloudy? I think I've got a pretty good gist, actually. And you know what? Knowing the beef actually helps. It does. Good. That's like... And you know, I do teach this stuff, Sadie. You know, my email address.
Starting point is 01:54:32 He can totally ask me questions. That's true. Yeah. That's true. And like one of the things I have come to love in this world, you know, the few things that you can love in it. It's just like every time you get close to something, like you just like realize that it's all just people.
Starting point is 01:54:53 And just like all these things that are these immutable features of the world, one day you might just come face to face with like, oh, that was you? and then just be able to be just sort of like that just like yeah all of a sudden it makes sense where it's like I get why it is that way that just like you know you knowing the beef and knowing the people is the way to know the thing yep it all makes sense now oh glad to hear it thanks y'all as always love being on the on the podcast yeah oh thank you so much for coming on. Yeah, thanks. And I'm glad we got to do this.
Starting point is 01:55:34 Yep. Yes. Good to see you yet again. Let's find time to watch a movie sometime soon. It's been a while. Yes. I need to do more movies in the Discord, which I was about to plug, because Dorothy, you've also been answering questions in the Discord. It's very helpful. Yes. And we appreciate it. It's just us chip posting and you being helpful.
Starting point is 01:55:56 Yeah. Well, I mean, you know, that's the worst way you're using. usually is everybody else is being helpful and I've shit-pissing. So even the score. Good night.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.