Dust storms test the limits of scientific computing

Many of the systems studied by scientists evolve through time and space, like diseases spreading through a population, weather fronts rolling across a continent, and solar flares churning the atmosphere. A paper, part of a special series on what's termed "spatial computing," argues that understanding these processes will necessarily require computing systems to have a structure that reflects the corresponding spatial and temporal limits. According to its authors, the spatial nature of the systems being studied is reflected in everything from data gathering to its processing and the visualization of the results. So, unless we engineer the computing systems to handle spatial issues, we're not going to have very good results.

It would be easy to dismiss spatial computing as a buzzword, except the authors provide concrete examples that demonstrate spatial issues do play a major role in scientific computing. Unfortunately, they also demonstrate that they play several roles, some of them largely unrelated to each other; as a result, the spatial tag ends up getting applied to several unrelated problems, which dilutes the message to an extent.

Still, each of the individual problems, which they illustrate with the example of forecasting dust storms, make for compelling descriptions of spatial issues in scientific computing.

Given our current knowledge of dust storms, there are over 100 variables that have to be recorded, like wind speed, relative humidity, temperature, etc. This data is both dynamic (it can change rapidly with time) and spatial, in that even neighboring areas may experience very different conditions. The data itself is recorded by a variety of instruments, with different locations and capabilities. Even within the US, these are run by different government agencies — just finding all of the available stations could be a challenge.

Integrating these readings into a single, coherent whole is a nightmare; response times can be anywhere from a second to hours, and the different capabilities of the stations require some extensive processing before the data can be used. To tackle these issues, the authors set up a system of local servers to process the data on-site, and created a central portal to provide rapid access to their data; in short, they eliminated the spatial component of data access while retaining the spatial nature of the data itself. The end result was a set of data from 200 stations that could be accessed within a second from anywhere in the US.

That data can then be used for dust storm forecasting, where other spatial issues come into play. When the authors started, their algorithm took a full day to perform a three-day forecast when using a grid with a 10km square resolution. Obviously, that's not especially useful, and the clear solution was to generate parallel code so that each grid square could be assigned to an individual core.

Still, the spatial nature of the problem limited the benefits of parallel code. Each grid square will influence its nearest neighbors, so the simulation runs into a lot of dependencies that require communication between the parallel processes. You can speed things up considerably if you get neighboring nodes onto individual cores of a single processor; balancing all of this requires that the spatial arrangement of the grid be considered when dividing up the tasks. But that quickly runs into limits both in terms of how many cores a node has.