Grey Beards on Systems - 130: GreyBeards talk high-speed database access using Apache Arrow Flight, with James Duong and David Li
Episode Date: March 23, 2022We had heard about Apache Arrow and Arrow Flight as being a hi-performing database with access speeds to match for a while now and finally got a chance to hear what it was all about with James Duong, ...Co-Fourder of Bit Quill Technologies/Senior Staff Developer at Dremio and David Li (@lidavidm), Apache PMC and software … Continue reading "130: GreyBeards talk high-speed database access using Apache Arrow Flight, with James Duong and David Li"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here with Matt Lieb.
Welcome to the next episode of Greybeards on Storage podcast,
a show where we get Greybeards Storage bloggers to talk with system vendors and other experts
to discuss upcoming products, technologies, and trends affecting the data center today. And now it is my pleasure to introduce James Duong, co-founder of BitQuil Technologies
and senior staff developer at BitQuil and Dremio, and David Lee, Apache Aero PMC member
and software engineer at Voltron Data.
So why don't you two tell us a little bit about yourselves and what Apache Arrow Flight is all about. Thanks, Ray. It's great to be here. So
as mentioned, so I am a PMC member for Apache Arrow. That means I'm one of the maintainers.
I can vote on decisions. I'm also a software engineer at Voltron Data, which also contributes a lot towards the Arrow project.
So before we introduce Arrow Flight, I think we should introduce Apache Arrow first.
So just to be brief, Apache Arrow is an in-memory format, a cross-language standardized in-memory format for columnar data. It also comes with other things like an IPC format
for serialization and Arrow Flight,
which is an RPC framework built on top of the in-memory format
and the IPC format.
So Arrow Flight is an RPC framework.
It's specialized for transferring columnar data
in Arrow format across the network.
It's built on top of gRPC and protobuf.
It is just a framework.
It's included with many of the Arrow libraries.
But of course, you need to take it and do interesting things with it.
And one of the things recently is a project called Flight SQL, which I think James can
introduce.
James Duong, Thanks, David.
So yeah, as I've previously mentioned i'm
james duong i'm a senior staff developer at bitcoin technologies and jremio corporation
so a while back at jremio corporation we decided to introduce this layer on top of the aeroflight
project called flight sql which is a standardized way of
accessing SQL databases through using the Aeroflight protocol and server framework.
So FlightSQL has a single Flight client,
a single FlightSQL client that can connect to any FlightSQL server. And so, maybe just, I'm not a database expert and some of the storage guys are not necessarily,
but they work with database experts all the time. What's the distinction between row-based
data and columnar-based data? Sorry, I can't even pronounce the thing. One of you, right?
David, maybe?
Yeah, so, well, I guess, yeah, so databases, there are both row-based databases and columnar
databases. It's really just how you shape the data. If you look at a table of data, do you,
when you flatten out, do you flatten out along the rows or along the columns?
And each of those has trade-offs and advantages.
Arrow focuses on columnar data because we think that has advantages for data science.
For instance, columnar data compresses better because the values within a column are of
the same type. You can use different compression techniques,
or even if you're just using
a general-purpose compression technique,
that'll probably work better.
If you're doing processing on the data,
columnar has advantages because, again,
all of the data in a column is adjacent in memory now,
and you can apply things like
SIMD or vectorization
to get a speed boost.
Of course, row-based has its advantages too.
If you're seeking through data row by row,
columnar is not necessarily going to be a good fit.
But for a lot of applications, we think columnar has advantages.
And the other thing you mentioned about Arrow
was that it was an in-memory solution.
You want to talk a little bit about that?
Yeah, so Wes McKinney,
one of the co-founders of the project,
when he introduces Arrow,
he kind of likens it to breaking up the layers
of a traditional analytic database
so that you can use all of its components
separately.
So Arrow provides an in-memory format, which is basically if you have, say, a column of
integers, how is that supposed to be laid out in memory?
But also, once you have a column of integers and you have a few columns and you want to
write that out to disk or transfer it
over to the network, how should you serialize that data?
How should you encode that data on the network?
And actually, for Arrow, we also try
to focus on avoiding encoding and copying as much as possible
for efficiency.
And speed and performance and that sort of stuff.
That's interesting. We talk a little bit from the storage side about in-memory advantages,
and particularly lately about the inherent advantages of expanding that memory,
leveraging Optane, for example, are there benefits to Arrow by increasing
the capacity of what's available in memory?
Yeah, so I'm not super familiar with Optane specifically, but one of the advantages, one
of the properties that the Arrow in-memory and file formats try to maintain is being able to just
memory map a file and start working with the data right away. So you can work with larger
than memory data sets by just memory mapping an arrow file without having to load it all
into memory, without having to decode it first.
And so the challenge is, of course, is that there's only so much memory in the world.
Even with Optane, maybe, I don't know, 64 terabytes per server might be a reasonable maximum or something like that.
So if your Arrow database exists and it's, I don't know, let's say it's a couple hundred terabytes you you page that in and out
of memory is that how it would work i mean just for an in-memory processing we're not even starting
to talk about flight yet which is the other side of this coin sure right yeah so at that point um
yeah you you could start how do i how do I want to say this? So columns don't have to be entirely contiguous, in a sense.
You can break up your columns into little chunks called record batches.
Inside a record batch, everything's contiguous.
But at a higher level, you can then stream or iterate through record batches
and process those incrementally.
So yeah, in a sense, you're paging data in and out.
But this is accounted for in the file format, in the in-memory format, and so on.
And when you say a record batch, that would be all the columns for, let's say, the database
across 1,000 records or however many would fit into the record batch
buffer or something like that? Or would it just be like column one down to how much, how much
column one actually fits into that record batch? It's a former, it's a 2D, it's a rectangular
chunk of data. All the columns, all with the same number of rows okay so we're just okay so
bringing you know hundreds of terabytes of data back and forth into memory and writing it out
stuff must be quite io intensive where does aeroflight you know how does aeroflight and
aeroflight sql uh SQL really improve that sort of overhead?
So it used to be, you know, you'd write out something from a database.
It moved from a database buffer to a memory cache buffer and memory cache buffer to, let's say, a NIC and then NIC out to the storage, which has its own set of buffers and all that stuff.
Right?
I mean, those are the old days.
Or maybe it's today. I don't know. So where does AeroFlight bring to the table?
Right. So AeroFlight basically tries to make all that more convenient and faster, especially if
you're working with AeroData, or really only if you're working with AeroData. So if you have
AeroData in memory and you want
to transfer it over the network by using arrow flight, you don't have to go implement all that
yourself. You get these high level methods that let you just say, I want to read the next record
batch or I want to write the next record batch. And arrow will take your data, it'll punch through
the layers of all the networking stuff it uses, gRPC and protobuf,
to avoid copying data as much as possible.
Get that onto the NIC and get that across the network
as fast as possible.
Sorry.
Go ahead.
And then Flight SQL is taking those benefits
and taking the benefits of Arrow and then trying to bring it
towards SQL databases. I always thought SQL databases, James, I guess, SQL databases were always row based and you'd sit there and you'd do like an if, you know, column X is, you know,
Matt Leib's social security number, then bump is pay raised by 10 or something like
that it was you know it was a row oriented bring it in do something with it and put it out kind of
stuff so how does where does flight sql how does flight sql work in a with a common database
well flight sql provides the protocol for high performance transport by making the data
sent in a columnar format.
Traditional APIs like ODBC and JDBC are row oriented.
When you have a JDBC driver, when you're accessing data from a select query, you get a result
set interface and what you do is you check if there's a row using result set.next and then
you get values on that single row using um say get object or get string on each column right so one
at a time um if you're using flight sequel now using flight sequel's interface you could just get a single record batch and pull out a vector representing a column for that batch.
And you could go through the stream getting record batches until you've gotten all the data.
Now if your application layer is working with arrow data, that's when you really get the
benefits out of flight.
You're already working with vectors that do not
have to be deserialized.
You mentioned serialization and deserialization before.
Can you explain to me what that sort of process is
or what that means in this sense?
Yeah, so say you have a JDBC driver.
Well, JDBC has its own formats for integers, strings,
and timestamps, for example.
So when you build a JDBC driver,
you have to convert from the database's wire format
representation of those to JDBC's format. Potentially the database also needs to convert from the database's wire format representation of those to JDBC's format.
Potentially the database also needs to convert from its own internal representation to the wire
format as well. So you've got a transfer from the database to the wire format and from the
wire format to JDBC format. Whereas if you're using Aeroflite and say your database is working, it uses Aero internally,
it's just copying data to the wire and then having the client not even deserialize the
data, but just be able to operate with it. So there's very little format conversion requirements, unlike
ODBC and JDBC, which would require multiple format
conversions across during the during the data transfer. Is
that what you're saying? That's correct. Yeah. Okay. Okay. There
was some mention of parallelization as part of
Aerofly. Could you explain how that plays out in this game?
I could talk about this.
So modern computing engines often support multi-node systems.
Most systems are distributed nowadays.
Yeah, yeah, yeah, yeah. And you're not talking multi-core,
you're talking multi-server node, right?
Multi-server, yeah. I'll use Dremio as an example. We have a
coordinator node, and then we have several executor nodes for
processing a query, coordinator's planning, and then we
delegate the work to executors. And they individually process the query, execute the query plan.
What Aeroflight provides is a way to, as a response to a request,
report each endpoint where the data is being served
so that your client layer can then start consuming data
at multiple different endpoints at once potentially
if your client itself is is also distributed you could have your client
working with data on multiple nodes on its side as well
so each core is its own compute engine effectively if i've got a parallel access to the data, can I have all the cores effectively working on their
own columns of our record batches separately or they would have within a server I guess
it's a it's it's a record batch that this server gets and a record batch that some other
server so the the element of or the unit of granularity
for parallelization is record batches?
So I would say Arrow and Arrow Flight
give you all the tools to parallelize
and split things up
whichever way is the best for your application.
For instance, so you can have multiple clients
making requests to the same server.
You can have one client making multiple requests
to multiple servers.
You can split data up.
So there's a little detail here.
So the Arrow Java and Arrow C++ libraries
conceptualize things slightly differently,
but effectively a
record batch or vector schema root is like a unit of data yes that you can uh you can have each
thread working independently on its own chunk of data and process that and send it back over arrow AeroFlight independently. Is there
any specialized hardware
in creating these
Aero nodes?
Aero as an
in-memory format is
intended to be
hardware agnostic. It's intended to be
designed in a way that it's efficient to
implement, but it's efficient to implement,
but it's not tied to particular hardware. We have, for instance, Arrow's CI infrastructure tests Arrow on x86 machines, it tests Arrow on Macs, it tests Arrow on an S390X from IBM,
and some PowerPC machines. So we have all power PC machines. Mainframe?
Did you say mainframe?
I said X390X, yeah.
Arrow, okay.
And I guess the other side of this is it's all open source, right?
I mean, it's Apache project, right?
Yeah.
Apache Arrow is under the Apache umbrella.
It's open source.
We have contributors from many open source we have contributors
from many companies we have contributors from all over the world uh we have arrow projects in all
sorts of different languages uh the julia project uh recently uh got recently joined the main arrow arrow umbrella as well. So we have lots of things that are supported. Yeah.
And so, I mean, the reason I really wanted you guys to come on the show was because it's,
there's not a lot of high performance access mechanisms or access protocols that exist out there,
especially in the open source community.
I mean, most of the high performance access protocols
are either proprietary or, yeah, they're POSIX based effectively.
So you would have a POSIX client for Vendor X
and he'd have his own servers to support their own parallelization. Now, NFS is coming
out with some parallelization in 4.2, I believe, but this is something different. I'm not trying
to think what the question is here. So do you have any performance statistics on what
Aeroflight could potentially deliver?
As far as gigabytes per second or record batches per second, I guess, would be the other claim. One of the things that occurs to me is that while the software side, which is where you're working, is highly dependent. It could be, who knows, line level speeds,
but it's going to require fast networking.
It's going to require,
and all of these hardware functionalities
are variables that are going to be hard
to compare apples to apples.
Yeah, I agree.
That is a good point.
So I think I briefly mentioned Aeroflite uses GRPC underneath.
GRPC is a RPC framework from Google,
and it's been pretty well optimized for TCP communication.
But recently, we're also looking at integrating the UCX networking library into Aeroflite as well.
Because Aeroflite abstracts away the underlying networking library it's using.
So UCX is a library that's designed to take advantage of specialized hardware like InfiniBand interfaces.
Oh, okay. InfiniBand interfaces. We're hoping the tests
were conducted on a cluster that I can't
disclose exact numbers from,
but UCX
does quite well when it has
access to
specialized hardware.
And this would be an InfiniBand
solution. So let's talk about the hardware
configuration here.
AeroFlight requires obviously
client software sitting on the client and there's server software as well sitting on some server
someplace and then behind that server would be SSDs directly attached or disks directly attached
or do you support other storage systems behind that?
Yeah, so think of Arrow and Arrow Flight as more of a toolkit and a toolkit and a set of standards and protocols.
So, again, we're not trying to make particular requirements on the kind of hardware setup you have or anything like that.
But basically, Arrow Flight at the network level is a set of APIs based on gRPC.
And then we also ship client and server libraries
that any application can use in a variety of different programming
languages to build higher level things like Flight SQL
on top of these libraries.
I see James has some performance figures if you want to mention those.
Yeah, please do, James.
Yeah, so when we did some testing of this at Jeremio,
we saw throughput rates of 20 gigabytes per second without using Flight's parallelization features.
Without Flight's parallelization.
So you potentially could see 20
gigabytes per second per parallel transfer that's right that's pretty skinny yeah if you had the
hardware that could support it and that's that's over Ethernet TCP right I mean it's not you don't
require any special switching or anything like that, right? Right.
And yet you did mention InfiniBand.
So is there some
reliance on InfiniBand as a protocol?
No.
So once we have UCX
fully integrated, we'll be able to take
advantage of InfiniBand hardware
if you have it. But if you don't,
you can continue
using grpc and everything will just work over tcp and you mentioned uh ipc as well as a pro
another protocol that you use um maybe just for my own edification can you tell me the distinction
between ipc and gprc yeah so we have the arrow and memory format.
That's just how the data gets laid out in RAM.
If we want to serialize it and then send it to another process
or write it to disk or something, that's when you use the IPC format.
IPC format basically specifies how you pack the buffers on the wire, the message headers, stuff like that.
That all gets sent over gRPC.
gRPC is a RPC framework from Google and the Cloud Native Computing Foundation.
It handles all the networking details.
So that's where the, those are like the three layers here.
Oh, okay. I got you.
Not like alternate layers or alternate.
They're, they all combined to support the transfers and that
sort of stuff. So where do you where do you see in memory
databases being used these days, a columnar columnar and memory
databases being I mean, what sort of
clients or customers would be and what would they be doing with them?
So I'd say in memory databases are really good at doing batch
analytics. Dealing with large fact tables and being able to
produce meaningful data using BI tools.
And you don't see that being applicable to things like machine learning or anything of that nature?
I can see this being used in machine learning. One of the big use cases for Arrow
is to be able to efficiently process data
using Spark,
be able to load data into Spark,
Arrow data into Spark without serialization.
Right, right, right.
Normally when you work with Spark jobs,
you ought to write a Python script
and then you send the work to a JVM that processes it.
But if you're using Arrow,
there's no serialization required
to go from the Python data to the JVM data
because it's just Arrow data.
Right.
Yeah, that's a good example.
So I guess think of Arrow as kind of like a bridge between all these different systems.
So Spark uses Arrow to implement its Python user-defined functions.
And other systems like BigQuery and Stealth Lake also use Arrow to transfer data at different points.
I think in the client libraries in these cases.
Kafka could be a potential solution here as well, or I know Kafka has some Spark support.
I can't say, I can't, I don't think off the top of my head, I can't think of anyone combining Kafka and Arrow per se, but there's no reason stopping you if you need to get polymer data from point A to point B.
Right, right, right.
So what about high availability and that sort of thing are somewhat are sometimes uh required uh attributes of uh especially databases quite
frankly because they become so critical to bi and other uh critical corporate functionality. Does AeroFlight offer high availability
or is that something you just kind of configure it with?
So ultimately that's up to the application
being built on top,
but AeroFlight does provide things
to try to make it easier
to implement reliable applications.
So again, because we're building on top of gRPC,
that means we inherit a lot of the tooling.
gRPC is its own rich ecosystem.
So Aeroflight building on top of that means we inherit all the tooling,
all the best practices that have been built up on top of gRPC over the years.
All of the observability, monitoring and logging tooling, all of the knowledge of how to debug
things, all of that still applies to AeroFlight because it is gRPC underneath.
Mm-hmm. Okay. So you get the advantage of gPRC and that sort of stuff. I'd like to mention that AeroFlight's ability to report multiple endpoints can be used for data redundancy as well.
So if, say, one of the endpoints has gone down with that source data, you can go to another one.
Right, if you've got a copy of it at that other endpoint.
I see.
That's interesting. We think and forgive me if I'm making too much assumption, Ray.
But but it seems as if you and I are thinking in terms of how hardware might resolve a lot of these issues.
But right. But ultimately, with a database language or a file format,
we're really looking here at how those problems
are actually resolved by software.
So, you know, split brain taking place,
that's a transaction that doesn't necessarily compete.
And is there a cache coherency from site to site?
And you're saying that's not really a function of Arrow.
That's really a function of the overriding architecture that actually handles the transactions.
Would that be a correct statement?
Yeah, that's correct.
I got you.
I got you. I got you. But you mentioned that you could automatically replicate or mirror AeroFlight data on to different storage servers if I'm using the correct terminology.
Just by configuring it that way, I guess.
Kind of.
So, well, sorry.
So I guess there's always more layers to peel back here, right?
So Flight just defines a protocol and some RPC methods
that you can use to build things like that.
And it kind of tries to be suggestive
and corral you into doing stuff like that.
So, for instance, when you're requesting data from a flight service, the recommended pattern is that you make first a metadata call called get flight info.
And that tells you where this data set can be fetched from and how it's partitioned.
And as James mentioned,
alternative endpoints that you can fetch data from
if the primary endpoint is down.
And as long as your application implements that,
as long as your client implements that,
then yeah, you can build in redundancy.
You can build in parallelism.
It's still up to your application
to actually implement those details,
but Flight tries to encourage you to do that
and make it easy for you to do that.
Right, right, right.
The other challenge that open source has had historically
is operations or configurations and that sort of stuff.
It's been always, I would say,
open source is typically developed
by technical development teams
and they're not necessarily usability teams
associated with that.
How hard is it to configure
and make use of something like Arrow
and Arrow Flight and Arrow Flight SQL?
That's something that the community is actively working on, I would say.
So we've been trying to improve the documentation, especially in languages like Java.
We recently started an Arrow cookbook initiative to try to provide these simple, reusable recipes
for accomplishing common tasks with the arrow libraries
now this is maybe a common cop-out of open source projects but
if if you if there's something that's not clear if there's something that you want improved
please let us know right uh at least for me because I've been in the project for a few years now,
it can be hard to see where things are confusing or unclear. So having these questions,
having these questions really helps me as a contributor know where to focus my efforts,
know where we need to be, know where we need to focus, explain more, basically.
Right.
That's a very valid point.
You know, the forest for the trees conversation.
But the difficulty with open source in general has always been a lack of support.
Which is the other side of this.
Yeah, I agree. Yeah. So I
imagine that community support, though, is, is quite robust in
a project of this magnitude.
Yeah, I'd say so. Well, I guess there are a couple of ways to
approach it. So Dremio from as far as I can see, is actually
fairly active in the project itself. And one of Dremio, as far as I can see, is actually fairly active in the project itself.
And one of Dremio's co-founders was also a co-founder of the Arrow project.
But also, yeah, I and many of the other contributors do our best to monitor
Stack Overflow, our mailing lists, GitHub issues,
all that to try to provide support as best we can.
And maybe that's not, of course, that's not guaranteed,
but I think we try our best to address everyone's questions.
Nobody's denying that.
I think that the historic need has
been finding a community of practitioners who actually understand the product and actually
understand the avenue that the end user is attempting to go through to resolve these
questions, et cetera. In my mind, when you've got a product of this significance, you've more than likely got people that have faced similar issues in the past and can set you into a decent direction in terms of even if it's ad hoc support.
So, you know, I think that
we're not seeing the same issues we used to see.
Mm hmm.
Mm hmm.
Right.
Yes.
The I think the Aero community has grown.
The Aero community has grown a lot, is still growing.
So yeah, there is there is a fairly active community around it
now across all of these different languages. And of
course, there's also commercial support, but that's always an
option.
Oh, and there is commercial support for arrow and arrow
flight.
Yeah, so I'm it's through my employer Voltron data. So I
Okay, I won't speak too much, but it's also an option.
Right.
Well, that's good.
We were kind of looking, we were probing to see if that was available as an option.
And that's good.
I mean, obviously, if you are a modern institution, you're relying on the data and the accessibility in the long term.
You want to know if you can gain greater levels of support.
And obviously you can't.
Can you guys speak to some of, let's say, some of your bigger installations, you don't have to actually name the company, but might talk about, you know, what they're doing from a vertical perspective
with Arrow, Arrow Flight, and perhaps Arrow Flight SQL.
Well, Dremio is the obvious candidate here.
James? Right. So Dremio is just getting ready, has recently made Dremio Cloud available. And with
Dremio Cloud comes support for Aeroflight through a centralized service now. So, that's one of the
big changes. We adopted Aeroflight into the Dremio Enterprise Edition about a couple of years ago.
So, we added support for AeroFlight on its own
and then started the initiative to do Flight SQL.
We're currently building up Flight SQL support.
Can you tell us just a little bit about Dremio as a company,
what you guys are doing?
Because I tell you the truth, I've heard about them,
but I don't know exactly what you're doing.
JOHN KOTTERMANN- Jeremio is a query engine
for accessing data lakes efficiently and executing SQL
using an arrow-based execution engine.
So we take advantage of the features
that David's mentioned, like being
able to do vectorized computations on data
for the purpose of processing SQL,
as well as exposing data to users using Aeroflight.
So Dremio can connect to a lot of different sources,
including Azure Data Lake, Amazon S3, and Google Compute Storage.
So data lakes, as well as more traditional sources like relational databases,
such as SQL Server, Postgres, or Redshift, for example.
Yeah, MySQL is included. Oracle as well. Oracle for example. Yeah. Yeah, MySQL's included.
Oracle as well.
Oracle as well.
Yeah.
That's the first mention we've had thus far,
and it had occurred to me.
But if you've got raw data sitting in an Oracle database, it seems logical that there be an interpreter of that data into Arrow.
So what Jermio does is it provides a connector based on JDBC to suck data in from a traditional
database and then get into Arrow format so that Jermio could work with it.
It tries to push as much work as possible down to the, down to the backend database though.
Right, right, right, right.
You mentioned vectorization and I would assume that because it's having this
data sitting in, you know, column or format sitting in memory,
vector operations would be useful, very useful here. So, I mean,
are you using things like GPUs to do those sorts of things,
or is this something that you're using,
I'll call it vector instructions of like x86, et cetera?
So we use a component of Arrow that was developed at Dremio
called Gandiva.
I'm sorry, Gandiva?
Yeah, so Dremio is a Java-based server,
and we use Gandiva to be able to access some of Arrow's more
lower level features, including its SIMD operations.
Single instruction and multiple data operations.
Right.
I just want to translate from our – so this is vectorization.
But, I mean, vectorization could occur at the CPU level.
It could occur in a GPU.
It could occur in an FPGA.
Am I assuming that you're using primarily the SIMD instruction sets for the CPUs that you're
operating on? I'm actually not sure about the answer to that, David. Do you know? This is kind
of abstracted from me. Yeah, no worries. So Gandiva is based on the LLVM compiler framework.
As far as I know, it targets CPUs mostly.
The interesting thing there is Gandiva is written in C++,
even though much of Dremio uses Java.
But because Arrow is a standardized memory layout,
those two languages can share data between them
without having to copy it all.
They can just pass pointers around.
So that's a big advantage of Arrow here,
that a JVM-based system can take full advantage
of native C++ capabilities.
But you mentioned GPUs and FPGAs, and I want to say,
so the NVIDIA Rapids ecosystem has a library called QDF, which implements data frame operations using the Arrow memory layout on GPUs.
So we do see Arrow usage with GPUs as well.
And the name escapes me at the moment, but there is also a project that works with FPGAs in Arrow.
You can take
basically you can give
it an Arrow schema
basically the data types and it'll generate
I think VHDL
or Verilog to
work with that data
on an FPGA.
Wait a minute.
Wait a minute. So you can give an arrow schema
to this process
and it will generate
the hardware design language
to program an FPGA
to process it?
Is that what you're saying?
Yes.
You still have to bring your own...
You still have to write
the actual processing part,
but it will generate...
The interfaces or something like that?
It'll generate all the interfaces for you, yeah.
So it reduces the amount of work you have to do to program the FPGA.
So yeah, again, that's called Fletcher, if you want to look at it.
Fletcher.
Okay.
Yes.
There's lots and lots and lots of arrow-based puns.
Yeah. Yeah.
Okay.
All right.
Well, hey, this has been great.
David and James, any last questions for Matt and myself?
Or is there something you want to get involved,
please reach out on the mailing list, dev at arrow.apache.org, or you can send GitHub issues
or pull requests on GitHub at Apache slash Arrow. Okay, great. Matt, anything you'd like to ask before we leave?
No, no questions. But I just want to thank you guys. This is a very interesting
project you're working on. And I learned a lot.
Yeah, yeah, yeah, yeah. Well, this has been great. David and James,
thank you for being on our show today. Thanks for having us.
Thank you. That's it for now. Bye, David. Bye,
James. Bye. Bye, Matt. Until next time. Next time, we will talk to the system storage technology
person. Any questions you want us to ask, please let us know. And if you enjoy our podcast,
tell your friends about it. Please review us on Apple Podcasts, Google Play, and Spotify,
as this will help get the word out.