~ About AGR ~
This section is intended to briefly explain a little about AGR, what is in it,
and how you can download the database for local use. A more complete description
of what is in AGR is available from the
AGR help pages.
These help pages are stored within AGR itself, and they also provide other
information that is not covered on this page.
What's in AGR?
AGR contains several million pieces of data, most of which are cross linked
to each other. The majority of this data consists of the following:
- Sequences (DNA, RNA, and protein)
- Maps and markers (Physical, Genetic, and Recombinant inbred maps)
- Clone information for nearly 150,000 clones
- Germplasm resources (details of over 10,000 stocks)
- Authors, published papers, and patent information
- BLAST homology information (millions of individual BLAST hits)
- Insert data - details of insert sequences and their genomic locations
- Images - pictures of plants and DNA gels
DNA sequence information is updated daily from the
EMBL database and protein sequence
is updated regularly from the SwissProt
and TREMBL databases to ensure that AGR contains all of the available
Arabidopsis sequences. BLAST information is updated with each new release
of EMBL (every 3 months).
The ACEDB database system makes it easy for links to be generated
between these different types of data. Each major category of data is known as
a 'class' in ACEDB, and categories can also have subcategories, e.g.
AGI_Genome_Sequence
is a subcategory of the Sequence
class and contains only those sequences that have been associated with the
Arabidopsis genome project. Many of these classes are extremely large and not
always easy to browse because there is so much data. The advantages of this
database structure though is that the ACEDB query language
can be used to extract very specific pieces of information. E.g. consider the
following query:
Find AGI_Genome_Sequence; Chromosome = "IV"; Follow Locus; Map
The semi-colons break up the individual parts of this query. In 'English'
it reads as "Find all Arabidopsis genome sequences that are known to be on
chromosome IV. Extract a list of all loci objects from this, and only list those
which are known to have a map position.". This query returns a list of about
150 loci objects, all of which have a map position and also which correspond
to a genome sequence from chromosome IV. This may not be what you are after, but
it demonstrates how very specific queries can be made.
Download AGR
AGR is freely available for download
from our FTP site.
To use AGR in this way, you will also need to download an ACEDB executable
from the Sanger Centre ACEDB site. ACEDB
can be installed on UNIX, LINUX, Mac, and PC platforms. Once you have copied
and installed your ACEDB program, copy the files from the above FTP link to
a suitable location on your computer. You may wish to use a program such as
Mirrorto do this.
The minimum number of files that you need to copy are those
that are in the /AGR/database folder on our FTP site. These represent the raw
data files that comprise all of the information in AGR. However, you will probably
also want to copy the images in the /AGR/externalFiles folder. Furthermore, if
you want your copy of AGR to perfectly resemble the copy here, you will also need
to copy the files in the /AGR/wspec folder. We therefore suggest that you simply
copy the entire /AGR folder. Information on our FTP site is updated every night.