RNA Expression Analysis Using a 30 Base Pair Resolution Escherichia coli Genome Array

Douglas W. Selinger1, Kevin J. Cheung2, Rui Mei3, Eric M. Johansson3, Craig S. Richmond5, Frederick R. Blattner5, David J. Lockhart3,4, and George M. Church1*

1Department of Genetics, Harvard Medical School, 200 Longwood Ave, Boston, MA 02115, 2Harvard College, Cambridge, MA 02138, 3Affymetrix Inc., 3380 Central Expressway, Santa Clara, CA, 4Genomics Institute of the Novartis Research Foundation, 3115 Merryfield Row, San Diego, CA 92121, 5Laboratory of Genetics, University of Wisconsin, Madison, WI 53706. *Corresponding author (e-mail church@arep.med.harvard.edu).

High density DNA microarrays allow the simultaneous quantitation of large numbers of transcripts. For smaller, completely-sequenced organisms, every open reading frame (ORF) in the genome can be assayed. All of these analyses to date have focused strictly on ORFs, usually with a single assay per ORF, and with no attention paid to intergenic regions.

In this study, we describe the first use of a "genome" array, which has probes for both ORFs and intergenic regions in the sequenced model organism Escherichia coli. This array, synthesized by Affymetrix using a highly parallel light-directed in situ oligonucleotide synthesis method, contains almost 300,000 oligonucleotide probes of known sequence. This large number of oligos allows the genome to be sampled at an average resolution of 1 oligo probe every 30 bases. Intergenic regions are probed at a higher resolution with 1 probe every 6 bases compared with 1 every 60 for the ORFs. A "reverse complement" array has also been designed which allows the opposite strand to be probed in the same way.

A genome array with dense lateral coverage of the genome has a number of uses, including the identifying transcript starts and stops, studying operon structure, identifying small and antisense RNAs, and potentially acting as a whole-genome RNA secondary structure readout. These applications are demonstrated using a software package we have developed called Genome Array Processing Software, or 'GAPS', which allows oligo by oligo analysis of array results. We have carried out a comparison of E. coli cells growing in log phase vs. stationary phase and have shown that the assay can sensitively and accurately identify ORFs which are known to be growth-phase regulated, as well as identify new putatively growth-phase regulated genes. Complete coverage of both strands has also produced the perhaps surprising result that the vast majority of the genome is transcribed at a detectable level.