ACM ByteCast - Jeffrey Heer - Episode 11

Episode Date: February 9, 2021

In this episode of ACM ByteCast, Rashmi Mohan hosts ACM Grace Murray Hopper Award recipient Jeffrey Heer. Heer is the co-founder of Trifacta, a provider of interactive tools for scalable data transfor...mation, and the Jerre D. Noe Endowed Professor of Computer Science & Engineering at the University of Washington, where he directs the Interactive Data Lab and conducts research on data visualization, human-computer interaction, and social computing. The visualization tools developed by Heer and his collaborators – Vega(-Lite), D3.js, Protovis, Prefuse – are used by researchers, companies, and data enthusiasts around the world. In the interview, Heer explains how his longstanding interest in psychology and cognitive science led him to focus on human-computer interaction as a student in computing. He describes the deep satisfaction (and fun) of interdisciplinary research drawing on computer science, statistics, psychology, and design, as well as his passion for building open-source tools that people in the real world can use. He also covers some of the challenges particular to building visualizations in the age of big data, starting a company to commercialize academic research, and his current efforts to promote more comprehensive, robust, and transparent analysis results.

Transcript
Discussion (0)
Starting point is 00:00:00 This is ACM ByteCast, a podcast series from the Association for Computing Machinery, the world's largest educational and scientific computing society. We talk to researchers, practitioners, and innovators who are at the intersection of computing research and practice. They share their experiences, the lessons they've learned, and their own visions for the future of computing. Today's guest is a master storyteller. He's the force behind transforming your spreadsheets of data into works of art that speak to you. His work in data visualization is considered pioneering.
Starting point is 00:00:38 Jeff Heer is a professor of computer science and engineering at the University of Washington, where he directs the Interactive Data Lab, and he is also Chief Experience Officer at Trifactor, a company he co-founded. He has won several awards and has been on the MIT Technology Review's Innovators Under 35 list. Jeff, welcome to ACM ByteCast. Thank you for having me. I'd like to lead with a simple question that I ask all of my guests. If you could please introduce yourself and talk about what you currently do and give us a little bit of insight into what drew you into this field of work. Sure.
Starting point is 00:01:13 So as you mentioned, I'm a professor at the University of Washington and then also an entrepreneur having founded a company. And generally, I'm interested in how we help people make sense of and more successfully work with data. And so I draw on a background in human-computer interaction, data visualization, and a larger field of interactive data analysis. And so we both run experiments and studies to understand how people work with and perceive data, and then build various systems. So that has included visualization tools, such as D3, which was written by my former student, Mike Bostock, and the Vega and Vega
Starting point is 00:01:50 Lite tools for data visualization as well, among with a number of other systems. Let's see. Well, in terms of background, taking it way back, I was intrigued by computers as a child, originally playing games and just messing around with applications on my parents' Commodore and later Amiga computers. And I was interested enough that when I applied to colleges, I selected computing as a primary interest. And so I was fortunate enough to be accepted into the EECS major, that's electrical engineering and computer science at UC Berkeley. And I should note, I didn't have any programming experience prior to college. But when I started taking comp sci courses, my interest really magnified. Programming was something I came to really enjoy. At the same time, I also had a longstanding interest in
Starting point is 00:02:34 psychology and cognitive science, which ended up being my unofficial minor at Berkeley. And so this naturally led for me to human-computer interaction as a field that richly combines these two interests. So while I was at Berkeley, I took my first HCI course with Professor James Landay. And during one of our discussion sections, my TA brought up the topic of information visualization, which is using interactive graphics to present and explore data. And he demoed a technique that was developed in Xerox PARC called the hyperbolic tree. And the idea was to take a branching hierarchy, whether it's a file system, taxonomy, something else, and then position the nodes and edges of that tree within a hyperbolic geometric space, which, like many tree structures, actually expands exponentially. And then you can project it back into the Euclidean space to create a visualization. So after experiencing the elegance of this idea,
Starting point is 00:03:25 but really the mesmerizing feel of whipping through large information spaces, I was hooked. And that began my journey as a data visualization researcher. And so then beyond my experiences at Berkeley, the biggest catalyst for my career was first an undergraduate internship I had at Xerox Park, where I got to work in the user interface group alongside a number of HCI pioneers who mixed both a computer science and psychology background. Prior to graduate school, I actually returned to PARC for a year
Starting point is 00:03:54 as a member of research staff. And during that time, I began earnestly working on visualization projects, many of which that would inform a lot of the work I would do in the years to come. And funny enough, some of those projects there even involved studies that showed why alternative browsers might be preferable to the hyperbolic tree. So even arguing against that technique I first fell in love with that had captured my imagination. So we have to be good scientists
Starting point is 00:04:17 and sort of discard our pet projects when the evidence mounts against them. And so these experiences at PARC and at Berkeley laid the foundation for my work in graduate school and then everything beyond. And so what ultimately sustained my interest in the area of data visualization in particular was really the fun of being able to draw on such a richly interdisciplinary field. So that includes computer science, statistics,
Starting point is 00:04:40 psychology, and design. And then to be able to apply that to create new systems and techniques that demonstrably help people better understand their data. And so I've been exploring variations on that theme, you know, ever since. Wonderful. Jeff, is that common? Do you find that most researchers who work in this field have that sort of interdisciplinary experience or interest? Yes, I would say so. So certainly in visualization and, of course, in human-computer interaction more broadly,
Starting point is 00:05:10 it's usually a conglomeration of people with various backgrounds, both in terms of interdisciplinary teams, but also interdisciplinary people, whether it's someone with backgrounds in both art and technology or empirical studies of behavior, whether that's coming from psychology or qualitative research methods. The intersection of these things, I think, is one of the most enjoyable and dynamic, fun tensions that helps drive the field forward.
Starting point is 00:05:36 I'd also note that one other aspect, certainly within my own research group, I think it really attracts people who, while they have deep intellectual academic research interests, are also very oriented to the world at large. They want to build tools or techniques that people out in the world will use and benefit from. And so I find that a lot of the students who apply to my group, Self-Select, because they're really excited to really build things that others can pick up and use in addition to making fundamental research contributions.
Starting point is 00:06:05 Got it. Would you say that from the time that you got into this field to where we are today, the interest as well as the demand for better visualization, the interest in data and using it to make crucial decisions has really been magnified? I think without a doubt. Certainly when I started, visualization already had a long history prior to computer scientists, for instance,
Starting point is 00:06:33 statisticians and others, certainly map makers, cartographers, others done really groundbreaking work in advancing visual forms of information presentation. But around the time I graduated with my PhD and switched to becoming a professor, I started as an assistant professor at Stanford. It was really when terms like big data and data science entered the mainstream. And so the amount of general interest, business interest, et cetera, in data systems, including data visualization, really took off.
Starting point is 00:07:10 And actually kind of funny, I remember I had applied to multiple faculty positions back in 2008 and then toured. And one of the places I visited but ultimately didn't go to at the time was the University of Washington. Later, after a number of years at Stanford, I was back on the job market considering other places, ended up interviewing at the University of Washington. And one of my colleagues there looked at me and he's like, you're back. He's all, since we last spoke, the world has really turned your way. So he's pointing out the stuff that maybe they thought was interesting, but they didn't know how important it was.
Starting point is 00:07:37 The importance had become undeniable. And so that's, I think, been both a blessing and then also, I think, in some ways a curse as we as a society are still trying to figure out how to best use data productively, reliably, and in a way that protects individual privacy and liberty as well. Absolutely. I mean, you touch upon so many different aspects that I'd probably get into more during our discussion. But you mentioned big data, right? I mean, today we collect data about everything. Every trip to the grocery store, I think my car is probably collecting millions of data points about my driving and the route and the air quality, et cetera. So how does volume of data really play
Starting point is 00:08:17 into the field of data visualizations? Is it a challenge? Are there specific issues with representation of this data or the accuracy of it? Yes, absolutely. I think as data sets get larger and more diverse, it brings new and interesting challenges. Maybe not new in the sense that no one's ever thought about them before, but certainly more pressing in the issue of scale. But I think I would start by actually pointing out that regardless of scale, I think some of the core things that make data visualization important are there regardless of whether you're talking about small, medium, or large data, however you choose to create thresholds to define those. And so the goal of data visualization, one way of framing it is helping people use vision to think. So that is, use our long-evolved perceptual system to understand patterns,
Starting point is 00:09:05 trends, and outliers in data. And this use graphical displays and input techniques as a mean to develop and further investigate domain-specific questions. And so the greater availability of data, which I think is an important component of this so-called big data movement, just becomes more pressing, these abilities, given that variety of data sets. And so I think given the number of times a graphic has influenced a business decision, it's probably too numerous to count. I mean, like how many millions of Excel charts have been included in presentations to decision makers? I mean, your visualization is, you know, I think quite important and quite prolific. But as a result of this, and this relates to big data as well, I think my favorite examples tend to involve cases where data visualization helps us identify questions
Starting point is 00:09:49 that help decision makers be more skeptical. It's not just about taking this bulk of data and using it to drive decisions. I think we really need to think about what the data can and cannot support. So just as one example, some years ago, I once tried to export my social graph from Facebook and I visualized it in various ways. And initially it all looked quite reasonable to me. I could identify recognizable clusters of friends and so forth. But then I visualized it in one particular way as an adjacency matrix with rows and columns ordered by join dates, really showing the sort of checkerboard pattern of all the links between people. And in this particular form, I discovered that a large proportion of newcomers were missing any connections between each other, which means people who joined maybe even months ago,
Starting point is 00:10:34 for some reason had no friendship connection in this data set. And that just didn't seem right. You know, it turns out this was a missing data problem where a silent query limit was enforced and it left out up to 20% of the edges in the graph. So any conclusions or models built from this data would have suffered from a garbage in, garbage out problem. And so I think, you know, one thing that comes up with big data is actually even harder data quality challenges. So given that, let's look at some of the different ways, you know, big data affects analysis and particularly visualization. So we can roughly group it into three kinds of challenges. So one would be that we have lots of records. So as I've imagined, we have a database table, this table has lots of rows, maybe millions, billions, or more rows. This is an interesting scalability problem. It's one that
Starting point is 00:11:21 the database field and the visualization field have both worked on in recent years. But we can largely handle this, not that there aren't challenges that remained, but using scalable database systems and smart indexing approaches, you can actually approach kind of real-time interactive querying over summaries of billions of records of data. So that's something the field, I think, has successfully contributed in recent years. A more difficult challenge is big data in the sense of having lots of attributes. If you think about it as a David table, having thousands or hundreds of thousands or more columns. This is harder in just that the algorithms involved are much more expensive, harder to get quick results for analysis. And the right thing to do is often
Starting point is 00:12:00 task dependent. Do I need to include all of these variables in my analysis? Can I combine them in some way? Can I leave some out? It's just one example. There's lots of work on dimensionality reduction algorithms that try and take high dimensional data and project it down to 2D or 3D representations in ways that still preserve important aspects of the structure of the data in order for us to visualize it. So still lots of work there. And this also brushes up against visualization techniques for machine learning, which I think we'll talk about later in the interview. And then the third challenge is the great variety of data sources. So big data isn't just, oh, I have lots of data that all has a homogeneous structure,
Starting point is 00:12:41 but rather it's many different types of observations from different devices, maybe from different people collected for different reasons. And so if I want to meaningfully combine those data, because I think it would be useful for my analysis, this requires significant effort just transforming that data to clean it up and combine it in an accurate way. And I think this also raises potential pitfall or danger with respect to opportunistic analysis, in that many data sets may have been designed and collected for a specific purpose, and then people to know how that data was collected, what might have been omitted, overlooked, what biases might be underlying the collection of this data. And so if we want to be able to draw causal conclusions, for example, we need to know about and control for confounders. And I think that's actually one of the larger challenges of so-called big data in that it's often opportunistic in nature and that it wasn't a scientific collection exercise that was designed for a specific analysis. Rather, we're trying to
Starting point is 00:13:50 repurpose data and we have to be very careful about how we go about doing that. I think you raise excellent points across the board. But one question I had was, as researchers or as practitioners, what can we do to protect from these kind of sort of pitfalls, right? So for example, are there tools that we build that can actually have some intelligence that will, you know, help us steer away from making these sort of, you know, huge glaring errors? I mean, data cleanliness, of course, is a huge issue. But what other things are you maybe thinking about and say, hey, I'm going to build these tools which will really help protect against some of these issues? It's an interesting question. It's one that the information visualization community is actively thinking about these days.
Starting point is 00:14:36 So just to give you a couple examples, in our own work, we often think about, you know, perceptual models of visualization. So when someone looks at a visualization, what do they take away from it? Oftentimes, that's operationalized in terms of what value comparisons or patterns can you rapidly and accurately decode seeing the graphic. And so we've built, for example, visualization recommender systems that allow people to communicate the data that they're interested in. And the system might help produce visualizations that on average for a large number of people are maybe more likely to be interpreted accurately. So that's one kind of more intelligent control where it's not just, let me give you unbridled power to create visualizations. Maybe the tools
Starting point is 00:15:21 start to bring a bit of their own design opinion, ideally backed by perceptual science and helping guide you through the space of choices. But that's already at the point where you've already committed to visualizing particular data variables, perhaps transformed a certain way. Kind of higher level problems in the space is the process of analysis itself. So for example, there are researchers asking questions like, how do we identify and potentially mitigate cognitive biases that may come up in people's exploration processes? Whether that's, is there appropriate coverage of the data
Starting point is 00:15:56 or people overlooking potential relationships of interest? And so to do this, some of the strategies people are thinking about is, well, can I look not just at the data itself and potential statistical analyses of that data? How do I log the various interactions that people are doing? So I learned something about people's processes and what they have and haven't looked at. Building a model of the user, not just of the data in a way to try and make recommendations. And I think, you know, some of the most exciting ideas in the space are being currently looked at. I know, for example, my colleague Jessica Holman, who's now a professor at Northwestern, has been thinking deeply about these issues as well in terms of, well, how do we safeguard processes of exploratory data analysis?
Starting point is 00:16:37 You know, I might look at hundreds of different charts and think I come away with some interesting findings. But when you're looking at that many things, the odds that you'll see something spurious just due to chance, you know, goes up and up and up and up. And so are there ways to, you know, bring ideas from statistical methodology more directly into some of these interactive exploration tools and help, you know, safeguard against false discovery? Got it. Yeah. And I think, I mean, when you talk about biases, I mean, I'd like to go back to our previous conversation, you were talking about, you know, having, you know, some sort of ML in use of ML and data visualization. What is the scene in that space? You know, how is it that you're using, you know, artificial intelligence in your research? Sure. Well, like almost every
Starting point is 00:17:21 other part of computer science, ML is making a splash in the database community. And so I'll note these three areas that have recently gained prominence in our field. to aid machine learning interpretability, right? So understanding why a model might make the predictions it does, how well can we compare and predict model behavior? I think this is particularly critical for production deployment situations where you're going to put out a model that might be affecting hundreds, if not millions, or more people. And these aspects of quality control and testing of models are extremely important. And so within the visualization field in particular, I think initially this started with important but relatively straightforward representations, like showing the architecture
Starting point is 00:18:13 and activation pattern within a neural network, which is sort of a direct representation, but one that was often hard to interpret what that really meant in terms of the output. And that's also, you know, included high dimensional visualization techniques I alluded to earlier. So for example, a lot of machine learning techniques, learn a latent vector space where points in that space, whether they're words or images or whatever else they represent, their similarity or distance within that, you know, high dimensional geometric space has important meaning, right? So more synonymous words might be related or things such as genderization or parallel structure of words might be reflected as linear structures in this space.
Starting point is 00:18:53 And so visualization of these vector spaces has been another kind of common popular approach to trying to make sense of what are the internal representations that ML techniques are learning. This is something that we've worked on in our group, for example, trying to map human meaningful concepts within the space and then seeing how they compare. And then also seeing how they change. For example, if you change the number of dimensions that algorithms allowed to use for learning these spaces, how does it affect the representations that are then formed? Of course, many, many others are looking at these techniques as well. And so these efforts are pushing forward, really trying to help better elucidate why ML techniques make the predictions they do, ideally in ways that people can begin to understand. I'd say there's still a really long way to go, but I'm hopeful these efforts will ultimately influence how we go about designing and engineering ML systems, not just slapping a visualization on them after the fact, but understanding how these tools can be part of a design process for end-user facing systems that use ML to really, again, improve the
Starting point is 00:19:55 quality control and help reduce bias and many of the other problems we've seen when ML systems are deployed at scale and make mistakes. So that's one area, an important one. Another area where machine learning is being picked up is using machine learning as a method to help generate visualizations. So this includes a wide variety of projects, including visualization recommender systems. So using ML techniques to help reason through the design space of visualizations and then help recommend potential charts given a task. That includes natural language interfaces,
Starting point is 00:20:29 you know, be able to describe either data features or correlations or tasks you'd like to see, and then having the system, you know, create a corresponding representation that's responsive to that query. And then, of course, the underlying use of ML algorithms, you know, similar to classical data mining techniques, to try and detect and highlight patterns of potential interest within a data set. And that might be pre-trained ML algorithms or mixed initiative systems where people are providing examples to train up a system rapidly for the types of patterns they want to discover, then letting the algorithm loose on the data and then kind of visualizing the corresponding highlights that come back. And then the third area, which I find pretty fun to
Starting point is 00:21:10 think about, is actually using ML and particularly computer vision methods to try and automatically interpret charts or to try and simulate in some way human perception. So just one example, my former postdoc Jorge P, led a project on reverse engineering visualizations. And this takes a bitmap image as input and then tries to produce as output the actual visualization program that could produce that image. So if visualization is typically concerned with taking data and going to image, the corresponding inverse problem, kind of borrowing from computer vision, is to start with a visualization image and can the computer tell you back what were the encodings applied, perhaps even begin to recover some of the underlying data that's been visualized.
Starting point is 00:21:54 And then lastly, others in the same topic have started looking at using ML techniques to partially evaluate the effectiveness of visualizations. And I think this, by and large, is still work very much in an early stage. But you can imagine potentially trying to run user studies, but on a neural network rather than actual people as ways to maybe initially test some different visualization ideas, maybe in terms of low level perception, before then moving on to more costly human subject experiments. Got it. I mean, I think so many areas that you touched upon certainly seems like an extremely rich, you know, space to work in. But I'd love to go back to the point that you were
Starting point is 00:22:33 talking about, about interactive, you know, data analysis. I know that's an area of interest as well as, you know, deep work that you've spent time on. What is it that drove your particular interest in this area, Jeff? And, you know, was the area of work even popular when you started? Yeah, so broadly speaking, the task of interactive data analysis has been around for centuries, because any data analysis requires human intervention, whether it's in the design of, you know, the data to collect in terms of what questions you're trying to answer, the choice of models, choice of graphics, etc're trying to answer, the choice of models,
Starting point is 00:23:06 choice of graphics, etc. Though, obviously, in a modern context, the rise of computers brought a fundamental sea change to how we go about interactive data analysis and what we can do. And for me, being interested in that topic, it was a natural expanding of my research horizons over time. And so I started off with this deep interest in data visualization. In fact, when I started at Stanford, my group was called the Stanford Visualization Group. But as our interest expanded and I moved to UW, I even renamed us the Interactive Data Lab because what we did, while largely focused on visualization, grew beyond just visualization itself.
Starting point is 00:23:42 So just working on visualizations alone, one can't help but run headlong into many other challenges that you associate with data analysis. So one notable among these is data wrangling, which is, of course, the process of cleaning, preparing, and profiling a data set in order to understand its shape and structure and then make it actionable for further downstream analysis. For example, trying to create better tools that make it easier for anyone to properly clean, format, and prepare their data was the PhD topic of my former student, Sean Candle. And then along with Joe Hellerstein,
Starting point is 00:24:15 Sean and I, we founded the company Trifacta, which is really commercializing Sean's thesis. And so that was just kind of one step from visualization to considering data wrangling and data transformation. And of course, there are many other steps in an analysis pipeline. And so more recently, we've become quite interested in considering this larger lifecycle of end-to-end data analysis. And in particular, what are the myriad decisions that people make throughout this process of analysis? So just as one example of work in this space,
Starting point is 00:24:45 my student, Yong Liu, and we're in collaboration with Alex Kael and Tim Althoff, we've been investigating ways to account for all of these different decisions throughout an analysis process. Something that many other researchers are also quite interested in, particularly in the face of replication crises
Starting point is 00:25:00 in a variety of scientific disciplines. And so, you know, in our case, we ran interview studies with a variety of analysts to better And so, you know, in our case, we ran interview studies with a variety of analysts to better understand how they do and don't make decisions, how they choose to include certain results and not others in their research reports. And based on that, we developed recently a tool called Boba for authoring and visualizing what are called multiverse analyses. So rather than evaluate just a single set of analytic decisions, such as one choice of variables, one way to handle outliers,
Starting point is 00:25:31 one choice of model machinery, instead, multiverse analyses seek to enumerate what are all the a priori reasonable specifications. So what are all the decisions that, given some theoretical and methodological background, you're not sure how to choose among them, but all seem like valid analysis decisions. Actually specify them all and then evaluate them all in parallel. And so our tool aids with specifying this combinatorial space of decisions and then visualizing and performing inference on the results. So, for example, not just seeing what all the different effect sizes might have been coming out of that, but also helping to identify which of these analytic decisions, right, choice of a covariate, choice of a particular model machinery or parameter choice, to which of these decisions are the final results most sensitive.
Starting point is 00:26:19 And so the goal of this and some other recent projects in my lab is to really promote more comprehensive, robust, and transparent analysis results. Got it. You mentioned Trifacta, and I know that you wear two primary hats, maybe three. You're a researcher, you're a teacher, and you're an entrepreneur. How do those worlds blend? How did you decide from being more in the academic space to jump headlong into industry and start a company? Yeah, so it happened rather naturally, or so I think.
Starting point is 00:26:52 So I was a university professor, pre-tenure, so working quite hard, focused on initially building up the research group, having my students become successful, etc. But along the way, an important component of that is not just publishing papers, but we've produced a lot of open source software. So I'd say one of my other hats among the others you listed is being an open source developer and maintainer. And that's true to this day. But in particular, when it came to Sean's thesis, our work on a system called Data Wrangler, this had interesting algorithmic components, but it was primarily an interactive UI. Unlike some of our other projects, which resulted in new toolkits or APIs, this felt like the
Starting point is 00:27:34 kind of project that really would benefit from having the infrastructure of a company behind it, both to make it robust, releasable, have the training, etc. It just seemed like a better fit in terms of bringing that technology to the broader world. It also aligned with Sean's interests. So Sean was finishing his PhD and he was oriented towards industry as opposed to academia. And our collaborator, Joe Hellerstein, was also keen to start a company. And so they convinced me that not just be an advisor, but to really, you know, jump in with both feet with them and help get the company up and running.
Starting point is 00:28:10 And so, you know, I can certainly tell you, it was quite exhausting having, you know, these multiple hats to wear, particularly with the start of the company. You know, one thing that ended up working well for me was that from the beginning, so that my part-time status was baked in as part of the DNA of the beginning, so that my part-time status was baked in as part of the DNA of the company. And so, you know, I was working quite hard for both the university and the company, but it was also understood from the beginning that there were certain limits. And so, you know, like as any entrepreneur starting out does, you know, you make many different mistakes, but I think one thing that we got right as a set of co-founders was really clearly laying out our expectations of each other from day one and being really well synced.
Starting point is 00:28:50 And so that then as both good times and bad times, you know, come upon us, you know, in the weeks and months and years to follow. We had that shared understanding. And I think that allowed us to continue to be great collaborators throughout the years. Excellent. And do you find that, you know, so I know a lot of people actually find that, you know, the blend between academia and industry can be quite challenging. I mean, there is intent, of course, everybody understands the value with those sort of interactions, but it's not always easy to put into practice. What do you find are, you know, maybe techniques that help you, you know, to just, you know, sort of ride along both these worlds in a way that is effective? And do you find that one sort of helps fuel, you know, when I first founded the company with Joe and Sean, it was actually really refreshing to just have a change of context and a change of expectation, the type of work that you have to do, right? So in many projects, you do a proof of concept in research. You make a prototype, you show that it works, you evaluate it, you publish the results.
Starting point is 00:30:02 You're proud of what you do, but it's not necessarily ready, you know, production ready for real world use. And there's also sort of a bias in terms of novelty, sometimes in terms of cleverness, et cetera, as opposed to simplicity or effectiveness to the core problem. I mean, obviously for it to be research, it's not enough for it to be useful. It has to be new. And that's, you know, I think that that makes sense. But when you switch to the industrial context, you can be really focused on, you know, what works, whether or not it's new,
Starting point is 00:30:30 that creates new pressures, of course, in terms of things being usable, correct, not too buggy, etc. But I found that the change of emphasis and thought was really refreshing. Unsurprising, you know, you work long enough in one strain of work, you start to tire that as well. So I also then, when I, you know, switch back to more focus on academic work, I was sort of newly reinvigorated and refreshed by that because it was this change. So just psychologically having these different realms in which to do this work, I found, you know, particularly rewarding. And of course, the content fueled different insights as well. So this was certainly true even before we started the company, just being at Stanford, having a lot of exposure to Silicon Valley and what people in industry were thinking,
Starting point is 00:31:17 helped me navigate choices of research projects and that there were many projects that might seem interesting, but which ones seem like they might be more impactful or meeting a need that was being demonstrated out in the real world. And then going into the company sort of feet first only further reinforces that. So I do think there is at least a high potential for a virtuous cycle between research and practice. It's useful for them not to be too tightly coupled, but I think the important flow of information between the two is vital to both of them, you know, succeeding.
Starting point is 00:31:50 And certainly at an individual level, I find that more invigorating and interesting and a great source for new ideas on what to work on next. Does it require an inordinate amount of strength? It sounds like a lot of work. It's certainly a lot of work, but I think it's also, it's about finding joy in the work. And so, I mean, not that you have the expectations that everything is going to be fun all the time, far from it, but nevertheless, the intellectual challenge and the interestingness of what you're working on, but just as importantly,
Starting point is 00:32:22 the people you're working on it with, whether that's been, you know, my students and collaborators in research, you know, my co-founders and collaborators in the company, you know, having those environments that you really enjoy being a part of, I think is part of what, you know, allows you to go, you know, you're trying to tackle these problems with the requisite amount of energy. For what it's worth, though, I also, you know, you know, always want to get at least eight hours of sleep, if not more per night. So I think having a healthy and balanced life actually helps you do these things more effectively, too. Awesome. Thank you. What would you say would be your advice to somebody maybe early in career, Jeff, if they wanted to get into
Starting point is 00:32:59 this field? You know, a lot of us end up going into college and thinking about computer science as an overall sort of major that we want to focus on. But how would you say that, at what point does the decision say, hey, I want to get into this field of work, into data visualization? What is it that I need to identify maybe an ability in myself to say, I think I might be good at this? One thing I'd recommend is if you have that opportunity, take a variety of courses. So for me, you know, I really enjoyed my undergraduate years
Starting point is 00:33:30 taking a variety of courses in computer science, right? So everything from, you know, data visualization, they didn't have a class, but it was part of human-computer interaction. So I was exposed to it there, but also, you know, classes on AI, you know, computer vision, you know, database systems, any of these topics can be quite interesting, but also taking classes AI, computer vision, database systems.
Starting point is 00:33:45 Any of these topics can be quite interesting, but also taking classes outside your major. So as I mentioned earlier, like the various cognitive science courses I took were some of my favorites. When I went to graduate school, actually I took half of my coursework, not in the CS department, which was great,
Starting point is 00:34:00 but also at the School of Information at Berkeley, where I got exposed to many other approaches and methodologies that have been really informative and influential for me. So the first is just having, I think, a broad based education that allows you to figure out what you're passionate about, but also allows you to build an intellectual foundation that's broad enough that even if it's not until years later, all of a sudden there will be just that amazing connection between things that you didn't realize at the time, but then, you know, becomes, you know, central to some later project. So that's one. And the case of data visualization specifically, you know, it should your interests draw you that way. There's really rich online resources and community. So on social media and Twitter and others, you know, there's many active data visualization
Starting point is 00:34:43 researchers and practitioners sharing ideas, sharing design, sharing process. There's lots of great visualization tools out there, including many tutorials and free curricula out there. So lots of things to not only become acquainted with the field, but really start to build up your skills and deeper understanding. And from that point, I recommend just diving in, like, you know, starting to do the work of data visualization. Like, what are the questions or topics that are most important to you? Maybe starting even in your local area, what kind of government or other, you know, services data might you collect, begin to create, you know, analyses and visualizations around that. I think it's a discipline that really rewards simultaneously learning concepts
Starting point is 00:35:26 and fundamentals, but really putting them into practice through hands-on exercises. So that's how I would recommend to get started. And again, I find it to be a very welcoming community. I hope that's true for everyone else as well, really getting involved, whether it's in terms of the research community or the practice community. I think a lot of meetups, et cetera, in various cities, lots of ways to start to connect. Excellent. I mean, I think you bring up a very good point. Experimentation, starting small,
Starting point is 00:35:53 trying out using the open source tools that might be available would be a great way to at least understand what does this entail and get a flavor of the field. So thank you. Thank you for that excellent advice. I'd love to know, what do you do outside of work? What are your interests? Well, you know, at the moment, I'm actually not in Seattle, where, you know, I teach at the
Starting point is 00:36:13 university. I'm actually living in Berlin, Germany. So we came here for sabbatical last year, and due to the coronavirus pandemic, ended up staying put for the time being. And so while the overall reasons for having to stay are obviously quite unfortunate, nevertheless, we're trying to take advantage of being in a different culture. Myself, my wife, my family, we're all learning and perfecting our German and really kind of enjoying the interesting contrast between the culture here and in the United States. I also have two small children. So I think my number one and most time-consuming hobby is spending time with them and also just sort of being fascinated at how they make sense of the world and watching them learn. I also have to admit though that, as I mentioned before,
Starting point is 00:36:59 programming is something that I came to really enjoy. So as boring as it may be, actually programming and particularly supporting a number of our open source projects is also one of my biggest hobbies. Fortunately, not my only one, but I would remiss and I'd be quite dishonest if I didn't include it among some of the top line responses here.
Starting point is 00:37:19 Got it. Any chances of Oktoberfest happening this year in Germany? No. Interestingly enough, it happens in September anyway, so it's coming and going. Okay. And also, I'm in Berlin, not Bavaria, so it's slightly different. But no, I definitely saw when the Oktoberfest beers came through the supermarket and are largely gone now, unfortunately. Well, I hope you get to enjoy, you know, all the
Starting point is 00:37:46 other lovely things that Germany has to offer. We'd love to sort of close the interview with, you know, our final bite. What is it that you are most excited about in this field of data visualization over the next five years, Jeff? Thanks so much. I think there's really interesting questions that people are starting to address that are, I think, much. I think there's really interesting questions that people are starting to address that are, I think, longstanding core visualization questions. So that includes not just how people decode, you know, everyday visualizations like bar charts or line charts, they're really starting to look more deeply at how we understand uncertainty and representations of uncertainty and how do we reason about that and corresponding perceptual studies that are
Starting point is 00:38:24 continuing to get richer in terms of expanding our understanding of what people do and don't perceive beyond visualization. So not just taking data, mapping it to an image, interacting with that image, etc. I think visualization alone is not enough. So one reason for this is that there are people who have different capabilities in terms of their sensory and physical abilities. So some people have visual impairments, other people may have other issues that affect how they interface with computer systems. I think data visualization needs to do a better job of thinking beyond the purely visual, whether that's thinking about tactics, sonic, or other modalities, or even generating meaningful text summaries of what is otherwise visual content.
Starting point is 00:39:22 Accessibility, I think, is an important area that visualization researchers care about. But I think as a community, we haven't done anywhere near enough. I think it requires a lot more attention. The other topics that I already mentioned, so I'm kind of voting with my feet in these cases. I hope to work on accessibility in the months to come. But we're also already working on end-to-end analysis.
Starting point is 00:39:42 So not just looking at visualization, but looking at it as one component in this larger process of data analysis and making that larger process, you know, kind of the overarching phenomena of our study and what our tool seeks to support. Whether that's better kind of cognitive support for, you know, processes of exploration with data, or as I mentioned before, better tools for scaffolding and analysis and helping promote robust and transparent results. And then finally, we also touched upon machine learning. And so I think visualization researchers and practitioners have an important role to play in a more human-centered approach to machine learning, system design, evaluation, and deployment. So both helping with the creators of
Starting point is 00:40:25 these systems and the users of the system, you know, have a better understanding of how they work. And but most importantly, I think making sure that the systems and the deployments are accountable and responsible for the decisions and recommendations that they make. Thank you, Jeff. This has been such a rich and detailed conversation. We really appreciate the fact that you took so much time to, you know, expand the idea of data visualization for us and our listeners. Thank you for taking the time to speak to us at ACM ByteCast. Thank you so much for having me.
Starting point is 00:40:57 ACM ByteCast is a production of the Association for Computing Machinery's Practitioners Board. To learn more about ACM and its activities, visit acm.org. Thank you. dot o-r-g slash b-y-t-e-c-a-s-t

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.