RefSeq

The Reference Sequence (RefSeq) database is a non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxa. The collection includes sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq represents a single, naturally occurring molecule from one organism. The goal is to provide a comprehensive, standard dataset that represents sequence information for a species. It should be noted, though, that RefSeq has been built using data from public archival databases only.

RefSeq biological sequences (also known as RefSeqs) are derived from GenBank records but differ in that each RefSeq is a synthesis of information, not an archived unit of primary research data. Similar to a review article in the literature, a RefSeq represents the consolidation of information by a particular group at a particular time. RefSeqs are available without restriction and can be retrieved in several different ways such as: searching NCBI's databases including Nucleotide, Protein, Gene, and Map Viewer; searching with a sequence via BLAST; doing an FTP download; or through links from other NCBI resources including Gene, Map Viewer, and PubMed.

The Reference Sequence (RefSeq) Database - The NCBI Handbook - NCBI Bookshelf