Statistical analysis of gene expression in blood before diagnosis of breast cancer

Publikasjonsdetaljer

The analyses in this note are based on a dataset with gene expression in blood before diagnosis of breast cancer. The dataset consists of case‐control pairs that are matched on birth year and time of blood sampling, and the data for a pair is the log2 difference in gene expression between the case and control. For each case‐ control pair the gene expression is measured once before diagnosis. As the blood samples of the different case‐controls pairs are measured at different points in time before diagnosis, we have used the dataset for examining whether the gene expression profile varies with time. We have also used the dataset for examining whether the gene expression profile varies between cases and controls, or between cases with and without spread (metastases), and for predicting whether a case has breast cancer with or without spread. The dataset consists of two subdatasets, one where the cases participated in the screening program (the screening group) and one where for cases did not participate in the screening program (the clinical group). All analyses have been performed separately for these two subdatasets. We have used and adapted a method based on hypothesis testing using randomization, that is able to identify small changes that are varying slowly in time and/or among strata, by using a large number of genes in each hypothesis test and predictor. Even though the signals in the data are weak, we concluded that the gene expression profile varies in time, between cases and controls and between cases with and without spread (metastases). The dataset is quite small, with only 108 (30) case‐control pairs with spread and 272 (57) without spread in the screening (clinical) group, that are distributed over an eight year period before diagnosis. We can therefore not draw any firm conclusion about whether the predictive power of the method used for predicting the metastasis status of the cases is sufficiently good. In the screening group we obtained p‐value 0.5 for the entire period but 0.03 for the last year before diagnosis. For the clinical group the p‐value for the entire period was 0.05. Here the results indicated best prediction 3‐4 years before diagnosis. The p‐value is equal 0.05 in this time period but this may be due to a small data set).