Brian T Foley,
Los Alamos National Lab, Los Alamos, NM 87545 USA
Send response to journal:
Jeffrey Evans Wrote:
“... Would you agree that dashes indicate a complete lack of information for a particular nucleotide position? If so, what is the source for the original alignments on which consensus genomes are built? ...”
No. The dashes do not indicate a complete lack of information. As stated in the text you quoted, the dashes indicate sites in the multiple sequence alignment, where gaps were inserted to maintain the alignment. Gaps need to be inserted to maintain the alignment whenever any one isolate of an organism (in this case HIVs and/or SIVs) has either an insertion or a deletion in its genome, relative to the other genomes in the alignment.
Jeffrey Evans wrote:
No. Individual researchers do not make consensus sequences for an entire group of viruses, such as the HIV-1 M group subtype B viruses. Each researcher can only sample and sequence a relatively small subset of the viruses circulating in the human population. They sequence the subset they have sampled, and submit those sequences to the GenBank or EMBL DNA sequence databases. From there, the HIV Sequence Databases staff retrieves all primate (human and simian) lentiviral sequences, organizes them and selects a representative sample for inclusion in the multiple sequence alignments, which are then used to build the consensus sequences to which you are referring.
Jeffey Evans asks:
The answser to those question depend upon which region of the genome is analyzed (the env gene is more variable than the pol gene, for example), and which group of viruses is being analyzed (for example there is less diversity within the HIV-1 M group subtype B viruses than there is among all sequences from all subtypes of the HIV-1 M group).
Jeffrey Evans asks:
There would only be one “consensus genome” from a single patient. The degree of diversity between individual cloned sequences from a single patient depends on many factors, most notably the length of time the patient has been infected, because diversity accrues over time as the virus evolves. Typically, HIV-1 M group viruses are observed to evolve at a rate of close to 0.5% per year in the env gene, and close to 0.2% per year in the pol gene. Thus, two virus clones from a patient who has been infected for one year could be as much as 1% different form each other in the env gene, as each of the two could have evolved 0.5% from the infecting virus. It is very common for a single patient to be follwed over time to observe the degree and pattern of viral sequence evolution over time. It is also common for a simian research animal (such as a macaque) to be infected with a single infectious molecular clone and followed over time. Although there are dozens of such studies, I would not call them “typical” because the majority of the tens of thousands of sequences in the HIV Databases are single sequences from an individual.
Jeffrey Evans asks:
Reasearchers discuss the variability of HIV in at least dozens of different contexts. But yes, any site in the genome that is 100% conserved among all isolates could not be said to be “variable”, it is conserved. Nomenclature is the attempt of humans to label the viruses such that they can be discussed accurately. The viruses are under no biological obligation to to evolve in a way that is convenient for such human classification. Thus there are nomenclature conventions, but they are primarily for the convenience of humans, rather than primarily for accuracy in describing the evolution of the viruses, although this description of the evolution of the viruses is a major influence in the nomenclature process.
With regard to the most highly conseverd regions of the lentiviral genome, such as the Lys tRNA primer binding site, Jeffrey Evans asks: “...Would it be accurate to describe these "less than 50 bp" sequences as absolutely consistent in all HIV and SIV genomes ever reported? If less than 100%, by how much? How many of these 50 bp or so sequences are there, and with which genes are they associated? ...”
The answer to that question could fill dozens of years of research time and whole books could be written. Every gene in the genome of lentiviruses has codons which encode the amino acids which are critical to the function of the encoded protein. These codons are conserved in all functional viruses, but not all genomes that have ever been sequenced are from functional viruses. The error rate of retroviral reverse transcriptases is very high, and this, coupled with a relative lack of proofreading compared to mammalian DNA replication machinery, leads to production of many non-functional genomes. On top of this, sequencing errors can lead to incorrectly reported bases. So we would not expect even the most highly conserved single base to be 100% conserved among all sequences if more than 10,000 or so different isolates were sequenced at that base. In all aspects of biology, there is an exception to nearly every rule.
many sites in the genomes of HIV-1 M group viruses that have been
sequenced in more than 1,000 different isolates and found to be more
than 99.99% identical in all isolates. For one example, the Asp-175 and
Asp-186 are part of the catalytic core of the reverse transcriptase
Here is a small sample of the result:
Catalytic Asp (D) amino acids: ** HIV-1 M group subtype B clone HXB2R: YQYMDDLYVG HIV-1 M group Subtype G isolate 92NG083: YQYMDDLYVG SIV-Chimpanzee isolate CAM5: YQYMDDLYVGTo see an alignment of the reverse transcriptase from other organisms, see the PFAM (protein families) database entry.
Here is a small fragment of that result:
Catalytic Asp (D) amino acids: ** HIV-1 M group subtype B clone HXB2R: YQYMDDLYVG HIV-1 M group Subtype G isolate 92NG083: YQYMDDLYVG SIV-Chimpanzee isolate CAM5: YQYMDDLYVG Equine Infectious Anemia Virus: YQYMDDLFMG Feline Immunodeficiency Virus Isolate Gp2: YQYMDDIYIG Puma Immunodeficiency Virus Isolate Snowy: YQYMDDIYIG Lion Immunodeficiency Virus Isolate SB10-5: YQYMDDIYIG Commelina yellow mottle virus (CoYMV): AVYIDDILVF Human T-Cell Leukemia Virus Type I: LQYMDDILLA Gibbon Ape Leukemia Virus: LQYVDDLLVA Rous Sarcoma Virus: LHYMDDLLLA Human Endogenous Retrovirus RTLV-Hp3: IQYIDDLRLC Human Endogenous Retrovirus K: IHCIDDILCA
As you can see, the lentiviruses are more similar to each other, than they are to other families of exogenouns and endogenous retroviruses.
Competing interests: None declared