Decoding the Universe at the New Center for Data-Driven Discovery

by Adam Hadhazy

Jia Liu, inaugural director of Kavli IPMU's new center, brings creativity and purpose to the looming challenge of big data

CD3 Opening Symposium (Credit: Kavli IPMU)

The Author

Adam Hadhazy

From scientists' handwritten notes of centuries past to the vast computerized databases of today, the extent of astrophysical observations has expanded many thousands upon thousands-fold. But even this dramatic growth is a mere prelude for the upcoming big data deluge. Over the next decade or so, new observatories will freshly haul in more observations than have ever been compiled in the history of astrophysics.

Ahead of this wave, the Kavli Institute for ​the ​Physics and Mathematics of the Universe (Kavli IPMU) at the University of Tokyo has launched the Center for Data-Driven Discovery (CD3). The new center, which opened its doors on April 1, 2023, is dedicated to leveraging advances in data science, as well as artificial intelligence and machine learning​ (AI/ML)​, to ​enable​ breakthroughs by Kavli IPMU researchers and colleagues for years to come.

"CD3 is built to meet our desperate need to cope with the large amount of incoming data from ​science ​projects," says Jia Liu, the inaugural director of CD3 and a Project Associate Professor at Kavli IPMU. "Our mission is to 'decode the universe'––using ​software algorithms​ to understand the fundamental physical laws of the universe."

CD3 already has around 70 members from across the many disciplines at Kavli IPMU. To illustrate the scale of the data challenge these members and the broader community face, Liu mentions "our astro group is used to dealing with millions of galaxies; soon we will have billions of them." She further points out that "the challenge is from not only the quantity, but also the quality of the data––higher resolution and higher precision."

Liu is singularly excited to take on the new challenge, just as she has embraced new challenges before in her professional life. Case in point: she came to astrophysics after already having earned a master's degree and started​ a career in corporate consulting. Yet the fit was not there. "The longer I stayed in business, the more unsatisfied I became," she says. "So I started exploring other interests."

Driven by curiosity, Liu found herself drawn to the unbounded wonder of the cosmos. "Astrophysics was the most convincing [interest] because it points me to answers to the most distant history and biggest things that I can imagine," Liu says.

Liu and her colleagues in the field plan to unprecedentedly plumb these cosmic depths with an array of upcoming, mammoth data-making projects. The project list that Kavli IPMU is involved in is extensive, ranging from ground observatories and instruments such as the Prime Focus Spectrograph, Hyper-K​amiokande​, the Vera Rubin Observatory’s Legacy Survey of Space and Time, Simons Observatory, and, to spaceborne instrumentation including LiteBIRD, Euclid, and the

Through CD3, Liu explains that her key job is to "identify common data-related obstacles faced by different people and connect them to work as a team––be it a 15-minute discussion to fix a minor bug, or a yearlong project to tackle an interdisciplinary problem that requires everyone to overthrow their preconceptions."

As Liu explains, bringing ​AI/ML​ fully to bear will help in three key ways. One is simply doing the things astrophysicists already know how to do, but faster and less expensively. An example Liu points to is how researchers will increasingly be able to use so-called generative models to create thousands or even millions of possible universe images based on a handful of complex and very expensive simulations. Doing so will enable far more extensive comparison of actual-reality observations with theory-based simulations, helping to discern the rules and interactions that govern the universe.

AI/ML will also compensate ​for ​human intelligence—"or ignorance," as Liu jokes. "In the past, we liked to first design some statistics and then apply them to our data. But the data have become so complex, even the smartest statisticians can no longer design a perfect statistic to extract all the information." To assist, systems known as deep neural networks—inspired by the architecture and functionality of the human brain—are becoming ever more powerful as a tool to replace human labor and exceed our abilities in finding patterns in data.

A third key way that AI/ML can move astrophysics forward is by detecting outliers in data that might be hints of new physics. The grandest mysteries in astrophysics, such as the true nature of dark matter and dark energy and nailing down the expansion rate of the universe, could all ultimately be solved, or perhaps only be solvable with new physics. Naturally, though, new physics can be hard to pinpoint, and have typically required many decades of diligent and at times frustrating effort. AI/ML might just be the ticket for breaking through, though, to new and deeper levels of understanding. "Discoveries often start with discrepancies between experimental data and existing theories," says Liu. "We want to scan our data very carefully, with the help of AI/ML, and find hints of new physics."

In organizing CD3 to deliver on these promises, Liu credits the influence of her three-year-old daughter. "I observed that she is the most creative when she is in the free-playing mode, and definitely not when I lectured her to memorize that 11 comes after 10," says Liu. She has seen her daughter, for example, build an airplane with merely three wooden blocks and construct an entire amusement park with chairs and pillows.

To encourage this sort of blue-sky creativity when it comes to crunching scientific data, Liu has started a data playground for CD3, dubbed "Hack Friday." CD3 members sit together in an open space to code, experiment, and connect with each other. "Everyone is asked to bring a small challenge that is explorable on a time scale of an afternoon," Liu explains. The time interacting and exploring together is also fostering collaboration, though Liu jokes that one discovery is "that no single playlist can make everyone happy!"

When it comes to her own subfield within astrophysics, Liu looks forward to CD3 advancing the state of the art there as well. Liu uses computer simulations and observational data to study the universe on its largest scales to study dark energy, dark matter, neutrinos, and inflation. As Liu explains to her daughter, "'I generate many, many mini-universes on my computer to learn how things work in the world'."

To uncover the hidden truths about how the universe operates, CD3 will help Liu's science team in developing AI/ML solutions to make their codes more efficient and faster, as well as better at extracting information that existing statistical tools fail to detect.

Gearing up in this way via CD3 ensures that the upcoming data deluge will not inundate so much as fulfill.

"By the time all the data comes in," Liu says, "we will be ready!"