An open-access long oligonucleotide microarray resource for analysis of the human and mouse transcriptomes
An open-access long oligonucleotide microarray resource for analysis of the human and mouse transcriptomes
July 19, 2006
K?vin Le Brigand Roslin Russell, Chim?ne Moreilhon, Jean-Marie Rouillard, Bernard Jost, Franck Amiot, Virginie Magnone, Christine Bole-Feysot, Philippe Rostagno, Virginie Virolle, Virginie Defamie, Philippe Dessen, Gary Williams, Paul Lyons, G?raldine Rios, Bernard Mari, Erdogan Gulari, Philippe Kastner, Xavier Gidrol, Tom C. Freeman and Pascal Barbry
Nucleic Acids Research
Two collections of oligonucleotides have been designed for preparing pangenomic human and mouse microarrays. A total of 148 993 and 121 703 oligonucleotides were designed against human and mouse transcripts. Quality scores were created in order to select 25 342 human and 24 109 mouse oligonucleotides. They correspond to: (i) a BLAST-specificity score; (ii) the number of expressed sequence tags matching each probe; (iii) the distance to the 3' end of the target mRNA. Scores were also used to compare in silico the two microarrays with commercial microarrays. The sets described here, called RNG/MRC collections, appear at least as specific and sensitive as those from the commercial platforms. The RNG/MRC collections have now been used by an Anglo-French consortium to distribute more than 3500 microarrays to the academic community. Ad hoc identification of tissue-specific transcripts and a 80% correlation with hybridizations performed on Affymetrix GeneChipTM suggest that the RNG/MRC microarrays perform well. This work provides a comprehensive open resource for investigators working on human and mouse transcriptomes, as well as a generic method to generate new microarray collections in other organisms. All information related to these probes, as well as additional information about commercial microarrays have been stored in a freely-accessible database called MEDIANTE.
Microarray technologies for expression profiling may be split into two broad categories, platforms that are based on in situ synthesis of oligonucleotide probes and those that are based of the deposition of preassembled DNA probes. The first class of array platforms is dominated by the commercial sector with a number of companies, e.g. Affymetrix (1), Nimblegen (2), Agilent (3), offering a range of off-the-shelf or custom arrays to their customers. Microarrays fabricated using preassembled probes have traditionally been favoured by many academic laboratories and are also available from a number of commercial sources e.g. GE Healthcare's Codelink platform (4), Illumina's ?BeadChip? arrays (5). Primarily for reasons of flexibility and cost, many academic laboratories still favour the use of spotted arrays made in-house for their research.
For a number of years the fabrication of spotted microarrays largely relied on the attachment of gene fragments amplified from cDNA libraries (6). Whilst this approach clearly works and can provide useable tools for expression analysis, it suffers from several fundamental limitations: gene representation within cDNA libraries is incomplete; there is often a significant degree of redundancy within clone collections; annotation of clones can be flawed and cDNA libraries often come with legal restrictions on their distribution and use. Furthermore, the relatively large size of the cDNA amplicons can be associated with the presence of repeat sequences or homology to related genes, which can compromise the specificity of the probes in an unpredictable way (7). An alternative approach that addresses this issue involves the production of gene-specific DNA fragments by PCR amplification using specific primers (8?10). Existence of a significant fraction of genes where a specific PCR amplicon cannot be designed or generated, as well as the high costs and technical difficulty of DNA production, makes this approach impractical for the fabrication of mammalian whole genome expression microarrays.
An alternative approach for probe synthesis for spotted microarray production has come through the use of long (50?70mers) oligonucleotides (11,12). A significant reduction in the cost of production of the synthetic oligonucleotides, an improvement of the quality control provided by the different suppliers and the ability to design one or several specific probes to any given target sequence, has made the use of long oligonucleotides for the fabrication of microarrays a very attractive option. As a result, the last few years have seen a number of companies offering aliquots of oligonucleotide libraries for array fabrication. Transcript coverage has then increasing alongside our knowledge of transcript diversity. However, these sets have been relatively expensive to purchase and the small aliquots provided can severely limit the utility of the resource. In addition, though less of an issue now, the design criteria and the sequence of the oligonucleotides often remained proprietary. Finally, the use of a diverse range of probe sets by different laboratories has made comparison of data between groups difficult (13?19).
In order to address the need for improved access and standardization of microarray resources within the academic biomedical research community, a programme to develop long-oligonucleotide resources for every human and mouse gene was created. Specifically, a collaboration was launched between the French Genopole Network (RNG), a consortium of French laboratories involved in functional genomics, and the Microarray Programme of the MRC Rosalind Franklin Centre for Genomics Research, which had a remit to provide spotted microarrays for human and mouse expression analysis to the UK academic community. The primary objective of the project was to develop an open-access probe resource that would support the fabrication of high quality cost effective microarrays in UK and French academic laboratories. To ensure that probe design was open, dynamic and that annotation of the resources was kept up to date and available to the wider community, the creation of ad hoc bioinformatics tools was also central to the project.
Here we describe the bioinformatic pipeline that has been used in the design of two pangenomic oligonucleotide collections for study the expression profiling of human and mouse systems. This includes in silico validation steps and benchmark comparisons with commercial human and mouse oligonucleotide probe collections, and the creation of an open-access database called MEDIANTE, which integrates information about the RNG/MRC, Affymetrix, Agilent and Illumina probe sets. Lastly, we present experimental validation data obtained after hybridizing distinct RNAs originating from human or mouse tissues on microarrays spotted with the RNG/MRC probe collections.
Note: Please visit the website to read the entire illustrated article for free.
Votes:5