Tutorial: Quasi-Diagonalization of a Correlation Matrix

We first need to specify the default heat map colors that are going to be used. Thus, in the Preferences pull-down menu, select Color and graphics defaults:

Next, on the left side of the panel that appears, select the radiobutton at the bottom of the Cluster Heat Colors options:

Select the radiobutton for opening an Excel file that contains a class feature, which also has the number of classes and class names in the first row of the spreadheet:

This tutorial uses expression values for p=23 mRNAs which are strongly predictive of n=63 SRBCTs in 4 diagnostic classes (Ewing's sarcoma, neuroblastoma, Burkitt's lymphoma, and rhabdomyosarcoma).

In the file-open dialog window, locate the SRBCT4.xlsx file that is distributed with Explorer CE and double-click on it:

After the run has completed, you will notice the following icons in the treeview to the left. Click on the Heat map icon:

The heat map is shown below. Notice that the feature-by-feature matrix is now diagonally dominant and symmetric. However, this is not a correlation matrix, it is a matrix based on Euclidean distance of sqrt(0.5*1-r)). We generate a correlation matrix next - see below.

In addition, the order of the features in memory were re-arranged according to the order shown in the cluster heat map.

Since the order of the input features is now changed, we can straightforwardly run a correlation analysis and generate a correlation matrix as well as a "significance heat map" to reveal that the greatest correlation coefficients are near the diagonal of the correlation matrix.

From the Analysis pull-down menu, select Association, then Correlation, Covariance, or Euclidean distance:

Click on the Pearson correlation (html) icon, which will show the html version of the correlation matrix. Notice that a majority of the greatest and most significantly positive (r>0) coefficients are near the diagonal of the correlation matrix. (to zoom out from the html table, select Cntl and use your mouse-wheel).

Next, click on the Pearson correlation icon (the 2nd icon) and you will see a heat map showing the correlation matrix with color-coding related to significance test results for each coefficient. Red-colored coefficient cells in the image represent positive correlation coefficients whose p-values are significant at the 0.01 level. This image is perhaps the best one to use to show a diagonally dominant correlation matrix based on quasi-diagonalization, since the original cluster heat map (shown above after hierarchical cluster analysis - HCA) is based on Euclidean distance.