MLx Home | Contents | MLx menu | MLx buttons | Widgets | Index | What's New | PCA

Principal Component Analysis

Superconductor Precursor Example


Principal component analysis of a group of images is done with the stack tool. Current algorithms limit the number of images to about 100. This example has 7 images.


Load the images

(detailed example with same image set)

Images are now loaded and appear small and all stacked behind the first one.

Images now look like this (after moving the Image Buttons window out of the way):


PCA on all 7 images:

For these small images, the stack initially appears at the upper left of the screen.

   The stack initially is not zoomed. Behind it is the 'stack plot window'.

 

 Initially, there is nothing in the plot window(top), and there is no mark in the stack image (bottom). The green line in the plot window indicates that the first slice (level) is being shown in the image.

Clicking on the image will show a plot of the intensity values down through the stack at the point clicked.

Clicking on the plot window will show the image for that level...

 

 This is the appearance after clicking on the image window where the red circle is, and then clicking on the plot window at the peak - where the green line is.

Note that the title of the image changes accordingly, as levels (slices) are changed by clicking on the plot window.

The plot changes in real time, as the mouse is dragged over the image window.

   The component images are labeled Score0, Score1,... Here the first component (the one corresponding to the largest eigenvalue) is shown, along with a plot for a position close to that for the original data stack above.


Eigenvalues and 'P' Matrix

 
 The top row is the actual eigenvalue for each score image (in order), and the bottom is its percentage of the total. In other words, the first score image accounts for 64% of the variation in all of the image data.

 
 S1 - S7 are the slices of the original stack. eg: Slice 2 (second image) and Score 0 are highly correlated, since the matrix value is .867.

 

 

 The flexible loading plot shows various rows or collumns of the p matrix graphically.

    • Score - plot rows or collumns - to correspond to original data or score images (here score is selected)
    • Names - plot the names of the images or the indices only (here, indices are plotted)
    • X and Y - which row (or collumn) to plot vs. which.

I have found this tool somewhat confusing, but PCA experts use loading plots routinely. (DSB)


Score images

To view the score images,

 

 Here are the score images after the tiling. They are in order.

Note from the eigenvalues that the first four account for 99% of the variation in the data set. Also note that the last score images appear to be mostly noise. They can be scaled more appropriately to examine the noise.


Reconstructing Data

Since the P matrix is a rotation matrix, all of the pixel data in the original data is present in the score images and can be reconstructed from them. The purpose of PCA is to reduce the dimensionality of the data set (number of images that need to be analyzed or explored), therefore it is interesting to see how well the image data can be reconstructed from a subset of the score images.

Here are various reconstructions using the PCA -> Reconstruct Data menu. The reconstructed data is put into a stack. To displah the images all in a row, the Slices -> Windows menu (in the Stack... button) and the Tile.. button were used as above.

 

   Oiginal images.
   Image set reconstructed from all seven score images. This should be identical to the original.

 

These images look like the originals, but since the last three components, which are visually noisy, are not included, these images also have had that 'noise' removed.

 Image set reconstructed from the first four components (score images). Still visually identical to the original except for some loss of contrast in second to last image.
   Reconstruction from the first three components only. Reconstruction errors easily visible.

 

This shows the 'danger' of reconstructing data from a reduced number of components, where missing components obviously have visual information. The reconstruced images have detail, but it is not correct detail.

 Reconstruction from only one component. All 'reconstructed' data are either the 1st score image or the inverse (negative) of it.