Advancing Basic Science for Humanity
The Future of Computing, from Extreme to Green
CAN COMPUTING KEEP UP WITH THE NEEDS OF SCIENCE? In 2009, 22 scientists from neuroscience, nanoscience, astrophysics, computer science and engineering gathered in Costa Rica to discuss this question — one that goes to the heart of whether the 21st century fulfills its promise of breakthroughs in knowledge and technology. Twenty months later, the scientists are returning to the table, this time in Tromsø, Norway. Now, along with the problem, their attention is on the solution.
In 2009, scientists gathered in Costa Rica for the second Kavli Futures Symposium to dscuss what science will need from computing. Now the dialogue continues in Olso as scientists return to move from needs to solutions.
Their focus: the development of extremely energy-efficient computing technologies that one day might mirror the efficiency of the human brain. This would take what is now called “extreme computing” to a new level, where it is not just powerful but genuinely green.
The dialogue is being facilitated by the Kavli Futures Symposia, which brings together small groups of scientists from different fields to discuss common future trends, challenges and opportunities in science. In the run-up to Tromsø, four of the scientists who had participated in the Costa Rica symposium joined in a teleconference to assess the current state of computing and to sketch out some key themes for the upcoming symposium. All globally recognized leaders in their disciplines, they brought diverse perspectives to the discussion, including cosmology, astrophysics, computer science and engineering, biology and neuroscience.
Like the larger group at the Costa Rica symposium and the group that will assemble in Tromsø, they represented both sides of scientific computing – the developers of computer technology and those who rely on it to process huge amounts of data. The participants:
- Tom Abel, Associate Professor of Physics at Stanford University and the Kavli Institute for Particle Astrophysics and Cosmology;
- Andreas G. Andreou, Professor of Electrical and Computer Engineering, Computer Science and the Center of Language and Speech Processing at Johns Hopkins University;
- William J. Dally, the Willard R. and Inez Kerr Bell Professor in the Stanford University School of Engineering and former Chairman of the Stanford Computer Science Department;
- Terry Sejnowski, Investigator with the Howard Hughes Medical Institute and Francis Crick Professor at The Salk Institute for Biological Studies.
Leading the teleconference was Andreas G. Andreou. Below is an edited version of the July 7, 2010 teleconference.
Kavli Futures Symposium Roundtable
Andreas Andreou (AA): I would like to start by asking what each of you sees as the most significant developments in computing science and technology since the 2009 Costa Rica workshop, “Real Problems for Imagined Computers.”
FROM EXTREME TO GREEN
William Dally (WD): What I’ve seen is …kind of a coming-of-age of GPU [graphics processing unit] computing. …. The world is shifting from one way of doing business to another. And as we look forward, there is going to be an even greater phase change as people start realizing that the way to get real energy efficiency is going to require coding their programs not just for parallelism, which they are doing today, but for locality. That’s probably one of the big jumps going forward that we’re going to have to cross.
AA:So locality of reference, the tendency of programs to reuse data and instructions that have been used recently, becomes even more important in an era where energy is a key concern in computing. I want to turn to Terry for his thoughts on developments since Costa Rica.
Terry Sejnowski, Investigator with the Howard Hughes Medical Institute and Francis Crick Professor at The Salk Institute for Biological Studies
Terry Sejnowski (TS): Since we met in Costa Rica and discussed how computing is changing the way that neuroscientists record from neurons and also reconstruct the complex connectivity between neurons in the cortex, there’s been an explosion of research in an area that’s now called “connectomics.” This is an attempt to reconstruct not just the connections in a local neighborhood but every connection, every synapse within a large brain. Historically, the first complete reconstruction was done on the brain of C. elegans, a worm that only has 302 neurons; this was done by hand about 40 years ago. But it’s now possible, using large-scale parallel computing and machine learning, to do that semi-automatically. Unfortunately, it’s not yet at a 100% level, but the accuracy of each step is getting better, first segmenting the profiles of all the axons and dendrites and synapses from electron microscope images, and then reconstructing the third dimension from serial sections.
Tom Abel (TA): The part of Costa Rica that really helped [my research team] a lot was to understand the mantra repeated to us -- to worry mostly about how to move data around on the architecture rather than about FLOP counts. This has influenced our thinking on the future design for the codes we’re writing. [We are] now thinking about how we actually augment our codes to be more flexible and to be more adaptable to adjust to these changing architectures. One example we explored was to connect message passing, thread parallelism and graphics processing units into a hybrid model, and we have now a fairly general framework that’s light enough and easy enough to learn and easy enough to use that works for differential equations we’d like to solve. A key aspect of it is to do it in a way that’s -- how should I say? – easy on the eye, so that a new user can learn it in a reasonable amount of time. That’s what’s been very much on our minds since then.
Andreas G. Andreou, Professor of Electrical and Computer Engineering, Computer Science and the Center of Language and Speech Processing at Johns Hopkins University
AA: In my mind, the most exciting development in technology was the emergence of 3D CMOS technology, which allows you to stack multiple wafers on top of each other thus allowing the placement of main memory in close proximity to the processor. By doing so one can see big savings, a factor of 100 or more in the energy expended to access main memory. This is a big deal! In essence we see computing technology come closer to the organization of the brain that has a layered laminar structure where neurons communicate both laterally within the layer and vertically between layers, much like we are witnessing integrated circuit technology evolve today.
Let’s turn to another question. Stu Feldman, computer scientist and VP of engineering at Google, at a recent seminar at Johns Hopkins University, declared the following: “Extreme computing is by definition specialized. If you were to use off-the-shelf components for the system, it is by definition not extreme.” Keep in mind specialized in this context refers to both hardware and software architecture. This seems to be a rather provocative comment. How do we move beyond general purpose (central processing units) CPUs and even GPUs, in the sense the GPUs are off-the-shelf components. Also, how about the brain? The brain seems to be homogeneous but yet also specialized at the same time. Who wants to start?
William J. Dally, the Willard R. and Inez Kerr Bell Professor in the Stanford University School of Engineering and former Chairman of the Stanford Computer Science Department
WD: I can take a stab at it. I think it’s interesting that someone from Google says that, since Google basically has this very rigid policy of using only off-the-shelf hardware in their computing centers. I think that, while it is true for specific functions such as video encoding or a radio modem you can far more efficiency by specializing hardware, over time specialization is becoming less important because the efficiency is being dominated not by the operation – which is what you tend to specialize – but by the data movement and memory access, which tends to be generic. Moving a bit is a generic commodity. Communication is something that, regardless of applications, boils down to bits per second over some communication pattern, and memory bandwidth is generic – it’s bits per second over an interface, regardless of what those bits are doing.
So I’ll actually agree that there are niches where specialization is important, usually aimed at the user interface end of things, things like video coding in radio modems, things like that where specialization plays a role. But certainly for scientific computing, it’s a fairly general-purpose set of resources that you need to carry out a wide range of computations. And I don’t see people building special-purpose machines aimed at particular scientific computations. Now there are some notable exceptions to this, like the machine David Shaw built (Anton Supercomputer), a computer architecture for Computational Chemistry, but in many ways I think that’s an exception that proves the rule, because I think that if you look at what he’s done, he’s really specialized the data movement in that machine, and I think you could build a general-purpose machine with very efficient data movement and get results very close to what he has spent enormous resources building a specialized machine for.
AA: Terry, any thoughts on this?
TS: The lesson from the brain is that it’s not a single architecture, but hundreds of different special-purpose architectures that are optimized for a particular task. Now it’s true that you can simulate all of these structures – the retina, which is specialized for image processing, or the cerebral cortex, which is specialized for memory and learning, in which the processing and the memory are very well integrated, very closely linked. All these architectures can all be simulated on general-purpose architectures. But some architectures can be simulated much more efficiently than others. So what I would like to know is how to optimize the memory, processing and communications in a computer architecture for different applications.
AA: Thank you Terry. How about you, Tom?
Tom Abel, Associate Professor of Physics at Stanford University and the Kavli Institute for Particle Astrophysics and Cosmology
TA: For me it’s a bit of a semantic point, perhaps. The words “extreme computing” can mean a lot of things. You could be running on 100,000 GPUs. That’s in one way “extreme.” Also the way we use our algorithms, can be extreme. A particular one that I was working on was to use this adaptive mesh refinement technique to resolve things with a dynamic range to the 10-to-the-15th in length scale. That is, doing a model of the earth we would still resolve a very small bacterium. In our case, we can simulate an entire galaxy yet still resolve an individual star. There is a lot of computing that is "extreme" in many different interesting ways.
AA: I will respectfully disagree with Bill. Indeed in the old days, designing a computer architecture and a custom chip for a specific task – because we are really talking about computing for specific class of problems, for example graphical models, or neuron models for large scale brain simulations – used to take many years of development. However, thanks to the Mead and Conway approach to Very Large Scale Integrated circuit design (VLSI), together with advances in design automation and computing technology, the process of designing task specific extreme computing machines (whether high performance or green or both) is today comparable to the process of designing an algorithm and “compiling” a computer program. Andrew Cassidy, a graduate student in my lab has recently completed the design of a massively parallel and pipelined processor architecture for speech recognition by “compiling” about 40,000 lines of high level description language code – about a man-year of work, so industry and academia have the tools and machinery to do this.
What we could use, and we do not have yet, is a high level quantitative approach to parallel computer architecture exploration, a framework that links problems to algorithms and computing resources -memory and processors- to energy and delay costs. Essentially I am in complete agreement with Terry’s earlier statement [that we would like to know how to optimize memory, processing, and communications in a computer architecture for different applications]. The good news is that the Costa Rica workshop has challenged us to start thinking about the latter problem, especially in the context of energy aware extreme computing and I will have something to report at the Tromso workshop ... so we are making progress!
WD: People have become very accustomed over the last 40 years to computing performance getting faster every year. It has actually fueled productivity growth in all sorts of industries beyond computing. It has led to all sorts of computing-related industries – e-commerce, commercial databases – many forms of entertainment and business are fundamentally based on information technology today. But it’s also been an underlying driver behind such things as mundane as better automobiles and better aircraft. The bulk of General Motors engineers are software engineers. They’re writing code for the 30 or 40 embedded processors in their cars.
This numerical simulation of the earliest stars in the Universe was created using adaptive mesh refinement, a computing technique that employs millions of different grid patches in space and time. Such sophisticated algorithms lead to enormous savings in required computing time and associated power consumption. Simulation: John Wise & Tom Abel (KIPAC); Visualization: Ralf Kaehler & Tom Abel (KIPAC)
So people have gotten used to this performance increase, but what a lot of people haven’t realized is that, while Gordon Moore predicted in his original 1965 paper that the number of transistors would basically double every year – that was later revised to every 18 months – he also predicted that they would do it in a way that you would wind up with constant power, because the energy scaling would go as L-cubed. That is, as you cut the line width in half you would get one-eighth the energy for each switching event. And that energy scaling ended when we stopped scaling supply voltage – which happened about 2005, because of leakage power. With the end of energy scaling, we were no longer able to do what the popular conception of Moore’s law is, which is to turn those additional transistors into additional performance at historical rates, about 50% per year. ...[This means that] mainstream computing is all going to have to become parallel in the future. But there are really different ways for it to become parallel. And what many people are trying to do is to take conventional processors and stick them together, either via their memory systems or by tying their I/O systems together with the network, and to build a parallel system that way.
What I argue in the Forbes op-ed piece is this is like trying to build an airplane by sticking wings on a train. …[W]hereas if instead you back off and you say, “we’re in a realm of parallelism, let’s build a processor that aims at parallelism and at being very efficient,” you wind up with something much more like a GPU, where today we can put 500 processors on a chip and will very shortly have thousands. That basically gives you a completely different view of parallelism. Instead of incrementally trying to parallelize your program by breaking it into threads and putting locks in there, which is extremely error-prone and difficult to do, you say, I’ve got thousands of processors, I have thousands of threads; I will have a different approach to parallelism where I do everything in parallel – by launching thread arrays to process data in parallel.
By taking this more radical approach to parallelism, you actually accomplish two things. One is that you have a more efficient platform, by roughly an order of magnitude. And the second is that you’ve actually made the programming job easier, because by avoiding this incremental thread-and-lock approach to parallelism and jumping in there with things where everything goes in parallel, the difficult parts of parallelism, which all tend to center around synchronization, actually become easier. The synchronization becomes based on the semantics of what you’re doing and not some arcane issue of a locking protocol.
AA: That’s an insightful commentary. Indeed Moore’s Law is really not about just the growth of transistors but about growth at a cost. And, as we know, the die of a typical chip has remained more or less the same size -- one and a half centimeters by one and a half centimeters or even smaller. So I think it really is about the usage of the extra transistors at cost of energy, and the manufacturing cost, which in some ways is really Moore’s Law.
WD: The real constraint today is total die power, which is in an envelope of somewhere between 100 and 200 watts maximum; 150 is probably pretty typical. Today you’re actually power-limited rather than transistor-limited, which means if you fill your chip entirely with functions, you can’t turn it on all at once, because that would melt the chip. So it’s really a question of how, given a constant total die power, which is not increasing over time, you best turn that into value for the end-user. Massive parallelism is really the only answer.
Screen shots for two tiers of a heterogeneous parallel processor and memory interface in a four tier processor architecture, for speech recognition (two additional layers of main memory are not shown). 3D stacking of main memory to the parallel processor architecture enables simultaneously high-bandwidth and low power memory interface. This low cost of main memory communication enables an architecture with a high degreee of vector, functional unit, and processor level parallelism for 100x to 1000x real time performance. Such specialized HLPUs (human language processor units) may replace the general purpose CPUs that perform the myriads of natural language processing tasks in the internet, making data center computing more powerful and greener. (Courtesy: Andrew Cassidy and Andreas Andreou)
TS: It’s intriguing to me that computer science has discovered that energy is a limiting issue, and energy density in particular, because nature has evolved highly energy efficient ways of computing in brains. Here are some performance figures: Our brains weigh about 2 percent of our bodies and consume about 20 percent of the power -- around 20 watts. A conservative estimate for [the brain’s] computing speed is 10-to-the-15th operations per second; we can argue about what an operation is, but that’s a lot of operations for 20 watts. This will be the major theme at our next conference on green computing, because we need to start using energy consumed as a major part of the cost function and find ways of reducing it. Nature has already been down the road that you already have brought up, which is very sparse coding. At any given moment only a few neurons at any location are actually firing at any appreciable firing rate, and so you’re representing the world at any given time with a relatively small number of active components. And it’s also load-balanced in the sense that with brain imaging you can see the oxygenated blood being shunted between different parts of the brain on a time scale of a second or two as activity shifts – these are the “hot” spots. If the visual cortex is firing more than on average, the blood will be shunted there. So there is an interesting convergence here between constraints, both from nature and from the computing world.
AA: Tom, any thoughts on this?
TA: Both comments apply to scientific applications in astrophysics. I am looking at a particular problem where I only look at a particular region of the universe that I want to simulate. This doesn’t scale particularly well to the new architectures. So I’ve started thinking in another way. Since I’ve got a hundred times the number of processors, we ask what else can we be computing that’s interesting. Whereas we still get the science out that we initially intended, we also use the capability of the computer to decide what else to go after. What’s interesting from Terry’s discussion is that, if you think about the brain, there are a number of things we can do at the same time – you know, looking out the window while talking to you. We’ve got all these operations lying around; we might as well use them. I think from a software design point of view that’s rather interesting, in particular if we have to worry about fault tolerance, which I’d like to hear from Bill about -- how much he thinks final users actually have to be aware of this.
AA: Let me bring up another issue here related to going beyond Moore’s Law. As time goes on our transistors are not going to be the transistors that we know today. The device will be on and off with a probability of point-eight or point-seven. We need to think forward about how to compute with these probabilistic elements. Terry, any thoughts on this?
TS: Redundancy is the traditional way to reduce noise, but we can do better. Rather than duplicate the same signal on many neurons, a large population of neurons can be used to represent combinations of signals, which is called a distributed representation. There are also probabilistic algorithms that take advantage of unreliable components. One of the remarkable things about the brain that amazes many who think about it as a computer is that synapses in the cortex -- and the vast majority are excitatory –do not transmit information every time they are activated. The average probability of release of a single packet of neurotransmitter at a cortical synapse is about 10 to 20 percent. A minority has a probability of 50 or 70 percent. The person who has thought most deeply about why this is so is Simon Laughlin at Cambridge University. In a review paper published in Science in 2002 he argued that brains use redundancy of synapses rather than neurons. On a chip this would correspond to unreliable wires rather than unreliable gates.
AA: Yesterday, July 6, The New York Times had a piece titled, “Building One Big Brain,” by Robert Wright. Wright asks, “Could it be that, in some sense, the point of evolution — both the biological evolution that created an intelligent species and the technological evolution that a sufficiently intelligent species is bound to unleash — has been to create these social brains, and maybe even to weave them into a giant, loosely organized planetary brain?” We as humans have learned to “speak” the language of silicon – we know how to tap on the computer and dial a number and so on – and now we as engineers and scientists are challenged to endow silicon with the ability to deal with human-language technology. Also, we have a challenge in engineering to actually build robust human language systems. From a computer science and engineering perspective, the challenge here is about data-intensive computing. It’s all about data and how you extract information and process data in a timely manner. Any thoughts on this? Let’s start with Tom.
TA: Yes, I feel really strongly it’s a big deal. We’re all building up another dimension. Astronomers are opening up the time domain. With the Large Synoptic Survey Telescope, for example, which we hope to build in a few years time, we will collect 20 terabytes [of data] a night [and] do it in an interesting way. We will take images of half the sky every four days, and take exposures down to the [arc-]second level until you see everything that’s varying on the night sky image, whether it varies rapidly or over year-long timescales. Similarly the numerical simulations we use to model astronomical objects routinely produce many TB of data giving a full temporal evolution over many spatial scales while tracing an enormous number of variables capturing the temperature, density, chemical composition, radiation intensity, etc. With some of these enormous data sets, it is difficult to figure out the role of the human and how the human interacts with these data -- in a humane way [laughs], in an ergonomic way and in a meaningful way, in which we actually extract as much information as possible… And these issues likely will always be with us. Data rates will continue to go up as we get larger and larger detectors.
WD: I think you touched on two really important trends in computing as a whole. One is human-computer interaction [HCI] and the social networking things that come along with that. And the other is the prevalence of massive data. And actually they are very closely tied together. Let me talk about HCI first. I think that as we build much more capable computers and we look at tasks that individuals do with them – not so much the big scientific tasks, but where there’s a particular thing in your life that you would like your computer to do for you – very often the bottleneck, because our computers are so capable today, is the interface between the person and the computer. And I think that that interface is evolving very rapidly to be one that we throw lots of computing cycles at to make it more convenient for the user, whether it’s a richer visual experience, or whether it’s doing good speech recognition, or whether it’s having new modalities of, for example, recognizing gestures of people and the like, to make the interaction with computers be more natural. Coupled very closely with this is the thing you’ve observed, which is that many fields are becoming characterized by truly massive data sets, very often measured in terabytes and petabytes of data.
There’s lots of crossover between these two. A great example of this is the current best natural language translation software, basically from written text to written text, [which] is Google Translate. It turns out that they were able to beat many hand-tuned grammar models by training simply with a very large corpus of data. By basically mining the information that was inherent in a massive data set they were able to build a best-of-its-kind human interface program. And there are all sorts of examples now where people are creating massive data sets and mining huge amounts of data. This is now sort of the standard approach in medical research. In fact, medical researchers no longer do that much hypothesis-driven research -- they now refer to that as “biased” research -- because you know what you’re looking for. Instead, they get massive amounts of data from some experimental procedure and feed it into their machine learning program to see what things they discover. And it’s really interesting, coupling this with the social-networking aspect of computing that is fusing the world into one giant machine, because as we get better human-to-computer interfaces, our computers become less a distant peripheral and more an extension of our own mind.
TS: This is a fascinating discussion, because all of these questions are ones that neuroscientists have been asking for decades. When you ask a person a question, they immediately know whether or not they even know the answer, which in a sense implies that they’ve done a complete memory search in a hundred milliseconds. Somehow nature has figured out how to index a huge memory store so that almost instantaneously you can come up with an approximation to an answer, and you also have a sense of how confident you are. There’s another issue that comes up with these very large data sets. I’m a director of the Science of Learning Center at UCSD, sponsored by the National Science Foundation, and we collectively share and analyze large data sets. As you pointed out, machine learning can be used to make discoveries, literally, that would not otherwise have been hypothesized.
But here’s a problem that we’re running into, having to do with access – who gets access to data under what conditions. As you probably know, science is a social enterprise, and who’s using whose data is often a very sensitive issue --and who gets on to what papers that are a consequence of analyzing the data. There have been embarrassing cases where someone has put data up on the Internet and found that some other group had downloaded and analyzed the data and published before they did.
WD: I think going forward, in some sense, all computing is going to have to become more green, because it’s all energy-limited. And so whether you’re in a very energy-constrained environment such as a mobile device or whether you’re simply trying to get as much performance as you can, it all boils down to performance per watt. And I think that, to achieve performance per watt, we’re going to see a couple of things in the future. The first is, if you look at a lot of the evolution of what happened to computing in the 1980s and 1990s, much of it is going to be reversed, because it was all about getting performance at any cost. The goal was single-thread performance, and most of that was achieved by techniques that are, by one form or another, speculation -- you try something, you’re not sure it’s correct, and if it’s not correct you throw that work away and re-do it. Fundamentally that’s very energy-inefficient, and modern CPUs are horribly inefficient for these reasons. They have lots of hardware that’s basically reordering instructions and attempting things before they know the right answer, and that’s going to be thrown away, because it’s not efficient.
The more fundamental way that we’re going to get energy efficiency is by realizing that most energy consumed in computers today is not consumed doing operations but is consumed moving data around. So I think programmers are going to need to be aware of memory hierarchy at a much deeper level, and programs that they want to be efficient are going to have to exploit this.
I also think programmers’ view of locality will have to change, especially for HPC [high-performance computing]. In HPC today, people tend to think about what I call horizontal locality, where you worry about whether a particular piece of data is on your node or on some other node. At first approximation it doesn’t matter. But what does matter is, when you pull something out of main memory -- if you pull it out of your node or a node on the other end of the machine -- that you localize it in a very local store on-chip, and have significant re-use of it on-chip. The vertical locality is what matters – the movement of data up and down the hierarchy and getting re-use once you move something down the hierarchy is what matters, not which node you happen to place the data on. And I think the evolution of programming to embrace locality is really going to be the thing that enables green computing.
AA: How about you, Tom?
TA: What Bill just said resonated very much with me; I agree essentially with all his points. There’s another one that interests me, and that is about extreme computing in the sense of a community approach to certain applications. Let me just focus again on the example of the Large Synoptic Survey Telescope. What’s nice is that to first order it, it doesn’t use a hypothesis-driven approach. It just goes and treats the sky as an image and then it tries to record it at all different sorts of time scales and makes the data public. This approach is sometimes very nice. Instead of small groups of people building smaller telescopes and all trying different patches of the sky, you get a larger community getting together behind one big experiment that does a lot of science at the same time. And similarly there are some good examples now already with some large groups getting together to do big numerical simulations and extract a lot of science from the data they created this way, without having to re-run calculations at many places. So I think there’s a sociological aspect actually to one version of making computing a little bit greener – to organize duplicate efforts as productive and rewarding joint projects.
AA: Terry, is there something analogous for the brain?
TS: We’re in the middle of a revolution right now in the brain sciences and in many other sciences that deal with extraordinarily high-dimensional problems involving complex, non-linear interactions. We haven’t made as much progress as I would have liked based on the traditional approaches of analyzing low-dimensional models, which have been successful in physics and many other areas of science. A major shift has occurred in genomics, where complete genomes became available, and now this is occurring in other areas of biology. The change is the ability to collect complete data sets and then to ask questions with computational tools and get answers back quickly enough so that you don’t forget what the question was. We’re entering an era now where computers have finally become fast enough to allow us to make rapid progress over the next decade. Of course, everyone’s always optimistic about how much they can do in the short term, but I really think that in this case I can see it happening. So I think we’re on the way.
AA: Continuing on Terry’s earlier thread, in 2004, David Goldberg, a student in my lab did some theoretical work to model information transmission in a spiking axon, the canonical structure in the brain through which neurons communicate with each other over long distance. What we discovered is that the maximum spike rate in spikes per second that is achieved in spiking neurons consistent with experimental observations suggests that neural communication may have evolved to maximize energy efficiency rather than information rate alone. In other words is all about bits per Joule rather than bits per second.
TS: Andreas, if I could summarize, in a sense the next conference is going to be about this transition from bits per second, which has been the focus for optimization, to bits per second per joule.
AA: I think that really very well summarizes things. With that I propose that we end our discussion.
TA: Sounds good, and we’ve got time in Norway to flesh this out.