Heterogenous Expression Profile Analysis

The principle of HEPA analysis

Tumor specific antigen genes are pivotal targets in the management of human cancers. By analysis of gene expression profiles for eight well-established tumor specific antigen genes widely accepted as clinical targets，we observed that these proto-type tumor-specific antigen genes usually exhibit distinctive heterogeneous expression profiles (Figure 1).

Figure 1. The heterogenous gene expression profiles of 8 prototype tumor specific antigens widely adopted as clinical targets. Gene expression profiles are analyzed using publicly available Affymetrix U133 plus 2.0 microarray datasets for 34 normal tissues from human body index dataset and 28 cancer types (see Data Source). The expression values are median-centered and scaled to a median absolute deviation of 1, and then depicted by grey color scales. Each of the antigens shows a typical heterogeneous expression pattern.

This observation lays the foundation to discover tumor specific genes as immunological and clinical targets by gene expression profile analysis. Toward this end, we developed a novel analysis called the Heterogeneous Expression Profile Analysis (HEPA), incorporating the key expression features of clinically adopted TSA genes (Figure 2).

Figure 2. The principle and algorithm of HEPA analysis. A. The heterogeneous expression profile of MAGEA3, a canonical cancer-testis antigen gene, across a compendium of gene expression datasets from multiple tumor entities and a spectrum of normal tissues. Individual samples from normal or malignant tissues are sorted in descending order based on gene expression signals to reveal the marked over-expression of MAGEA3 in a small subset of tumor or normal samples (outliers). B. The algorithms of HEPA analysis. Microarray gene expression data from j normal somatic tissue types and cancer type k are shown in the heat-map. Data were processed as described in Methods, resulting in a final HEPA score which accentuates heterogeneously expressed genes in cancers. C. The rationale of using the adjusted upper quartile mean (Mean{P75~P95}), to highlight heterogeneously expressed genes in cancer. The upper plot shows the expression of Genes X and Y with the same average expression level in transitional cell carcinoma. The “mean” function fails to discriminate Gene Y with distinct over-expression in a subset of samples. The middle plot shows that the 85 percentile function fails to highlight Gene Y over-expressed in less than 15% of TCC. The lower plot shows the expression of Genes X and Y with the same upper quartile mean (UQM). UQM fails to discriminate gene X with a biased max expression signal.