One way of getting insight into large data collections (known nowadays under the name of ‘big data’) is by depicting them visually and next interactively exploring the resulting visualizations. However, both the number of data points or measurements, and the number of dimensions describing each measurement, can be very large – much like a data table can have many rows and columns. Visualizing such so-called high-dimensional datasets is very challenging. One way to do this is to construct low (two or three) dimensional depictions of the data, and find patterns of interest in these depictions rather than in the original high-dimensional data. Techniques that perform this, called projections, have several advantages – they are visually scalable, work well with noisy data, and are fast to compute. However, a major limitation they have is that they generate hard-to-interpret images for the average user.
We approach this problem in this thesis from several angles – by showing where errors appear in the projection, and by explaining projections in terms of the original high dimensions both locally and globally. Our proposed mechanisms are simple to learn, computationally scalable, and easy to add to any data exploration pipeline using any type of projection. We demonstrate and validate our proposals on several applications using data from measurements, scientific simulations, software engineering, and networks.