Big Data in the Geosciences: 2

We are inundated with environmental data – Earth observing satellites stream terabytes of data back to us daily; ground-based sensor networks track weather, water quality, and air pollution, taking readings every few minutes; and community scientists log hundreds and thousands of observations every day, recording everything from bird sightings to road closures and accidents. But this very richness of data has created a new set of problems.
This second post in our four-part series gives a high-level view of the challenges of portraying and communicating big data in the geosciences – and how these challenges are being addressed – loosely based on the Earth and Space Science Informatics sessions and town halls at the AGU fall meeting in Dec 2016.

Data Visualization
One of the challenges facing geoscientists is simply how to wrangle meaning from big data and effectively communicate their findings to other interested scientists, communities, students, planners or policy-makers. Big data is challenging as it can have a large number of variables with complex, non-linear relationships among them. Scientists are turning to data visualization – which leverages the incredible pattern-recognition power of the human eye – to design graphics that effectively convey complex information.

One area of data visualization that has been extensively studied is the use of color to convey information. Human perception of color is non-linear, and can be resolved into three axes: hue, saturation, and lightness.

The range of colors perceived by humans is uneven. (Equiluminant colors from the NASA Ames Color Tool) Credit: Robert Simmons blog Subtleties of Color Part 1

Kristin Thyng presented simple, powerful, guidelines on how to use color in scientific visualizations so as to reduce cognitive load. Her suggestions included using perceptually linear color maps; using sequential, divergent, or categorical color ramps as appropriate; and using colors that viewers intuitively associate with the displayed feature, such as green for trees and gray for roads, or red for warmer temperatures and blue for cooler.

In his talk, Dan Pisut stressed two things: first, that scientists need to consider the entire visualization – which includes legends, logos, and credits, as well as their size, location, and color; in addition to color maps. Second, that scientists must think of and design scientific visualizations not just as a way of communication information to other scientists, but for serving different purposes and engaging different audiences. For example, color maps and layouts for visualizations designed to intrigue and draw in a museum crowd will be different from those appropriate for engaging high school students or communicating critical information to stakeholders.

There are many online resources for the optimal use of color maps in scientific visualizations. For an excellent introduction to the use of color, see Robert Simmon’s blog on the subtleties of color. Perceptually linear color palettes to map sequential or divergent linear variables, as well as categorical variables can be found at Cynthia Brewer’s color brewer web page. Tools to develop custom perceptually linear color maps are available online, for python (viscm), and for R (ocean).