Re: Re: The Perth Group "answer" - Questions for Noble and/or Foley 9 September 2003
Previous Rapid Response Next Rapid Response Top
Brian T Foley,
HIV Researcher
Los Alamos National Lab, Los Alamos, NM 87545 USA

Send response to journal:
Re: Re: Re: The Perth Group "answer" - Questions for Noble and/or Foley

Jeffrey Evans Wrote:
“... Would you agree that dashes indicate a complete lack of information for a particular nucleotide position? If so, what is the source for the original alignments on which consensus genomes are built? ...”

No. The dashes do not indicate a complete lack of information. As stated in the text you quoted, the dashes indicate sites in the multiple sequence alignment, where gaps were inserted to maintain the alignment. Gaps need to be inserted to maintain the alignment whenever any one isolate of an organism (in this case HIVs and/or SIVs) has either an insertion or a deletion in its genome, relative to the other genomes in the alignment.

Jeffrey Evans wrote:
“...Do individual researchers complete the process of determining a consensus genome prior to submission of the sequence to the HIV database? If so, is there a minimum requirement for the number of sequences used? ...”

No. Individual researchers do not make consensus sequences for an entire group of viruses, such as the HIV-1 M group subtype B viruses. Each researcher can only sample and sequence a relatively small subset of the viruses circulating in the human population. They sequence the subset they have sampled, and submit those sequences to the GenBank or EMBL DNA sequence databases. From there, the HIV Sequence Databases staff retrieves all primate (human and simian) lentiviral sequences, organizes them and selects a representative sample for inclusion in the multiple sequence alignments, which are then used to build the consensus sequences to which you are referring.

Jeffey Evans asks:
“...In the example above, there is only one nucleotide position which is reported as less than 100% consistent, but more than 50%. How common is this occurrence in practice, as a percentage of the nucleotide positions in the string being reported? What percentage typically fails to satisfy the 50% cut-off? ...”

The answser to those question depend upon which region of the genome is analyzed (the env gene is more variable than the pol gene, for example), and which group of viruses is being analyzed (for example there is less diversity within the HIV-1 M group subtype B viruses than there is among all sequences from all subtypes of the HIV-1 M group).

Jeffrey Evans asks:
“...Assuming a certain number of sequences as the minimum basis for a consensus genome, what degree of homology is demonstrated among different consensus genomes derived from the same patient? Is this typically attempted? ...”

There would only be one “consensus genome” from a single patient. The degree of diversity between individual cloned sequences from a single patient depends on many factors, most notably the length of time the patient has been infected, because diversity accrues over time as the virus evolves. Typically, HIV-1 M group viruses are observed to evolve at a rate of close to 0.5% per year in the env gene, and close to 0.2% per year in the pol gene. Thus, two virus clones from a patient who has been infected for one year could be as much as 1% different form each other in the env gene, as each of the two could have evolved 0.5% from the infecting virus. It is very common for a single patient to be follwed over time to observe the degree and pattern of viral sequence evolution over time. It is also common for a simian research animal (such as a macaque) to be infected with a single infectious molecular clone and followed over time. Although there are dozens of such studies, I would not call them “typical” because the majority of the tens of thousands of sequences in the HIV Databases are single sequences from an individual.

Jeffrey Evans asks:
“...It appears that when researchers discuss the variability of HIV, they may be thinking of "less than 100% consistency at a certain nucleotide position", "less than 50%", some range of variation among consensus genomes (either between or within patients), or any number of observations of mutation or evolution over time. Are there any nomenclature conventions which insure a commonality of context?

Reasearchers discuss the variability of HIV in at least dozens of different contexts. But yes, any site in the genome that is 100% conserved among all isolates could not be said to be “variable”, it is conserved. Nomenclature is the attempt of humans to label the viruses such that they can be discussed accurately. The viruses are under no biological obligation to to evolve in a way that is convenient for such human classification. Thus there are nomenclature conventions, but they are primarily for the convenience of humans, rather than primarily for accuracy in describing the evolution of the viruses, although this description of the evolution of the viruses is a major influence in the nomenclature process.

With regard to the most highly conseverd regions of the lentiviral genome, such as the Lys tRNA primer binding site, Jeffrey Evans asks: “...Would it be accurate to describe these "less than 50 bp" sequences as absolutely consistent in all HIV and SIV genomes ever reported? If less than 100%, by how much? How many of these 50 bp or so sequences are there, and with which genes are they associated? ...”

The answer to that question could fill dozens of years of research time and whole books could be written. Every gene in the genome of lentiviruses has codons which encode the amino acids which are critical to the function of the encoded protein. These codons are conserved in all functional viruses, but not all genomes that have ever been sequenced are from functional viruses. The error rate of retroviral reverse transcriptases is very high, and this, coupled with a relative lack of proofreading compared to mammalian DNA replication machinery, leads to production of many non-functional genomes. On top of this, sequencing errors can lead to incorrectly reported bases. So we would not expect even the most highly conserved single base to be 100% conserved among all sequences if more than 10,000 or so different isolates were sequenced at that base. In all aspects of biology, there is an exception to nearly every rule.

There are many sites in the genomes of HIV-1 M group viruses that have been sequenced in more than 1,000 different isolates and found to be more than 99.99% identical in all isolates. For one example, the Asp-175 and Asp-186 are part of the catalytic core of the reverse transcriptase enzyme:
Rodgers DW, Gamblin SJ, Harris BA, Ray S, Culp JS, Hellmig B, Woolf DJ, Debouck C, Harrison SC.
The structure of unliganded reverse transcriptase from the human immunodeficiency virus type 1.
Proc Natl Acad Sci U S A. 1995 Feb 14;92(4):1222-6.
PMID: 7532306
And thus no retrovirus can replicate without these Asparagine amino acids in its reverse transcriptase protein. The codons for these amino acids are thus conserved not only in the lentiviruses, but also in all other retroviruses. The conservation of this region of the Pol protein can be ascertained by querying the HIV databases using our epitope alignment tool. with the M group consensus sequence “YQYMDDLYVG”.

Here is a small sample of the result:

Catalytic Asp (D) amino acids:                **
HIV-1 M group subtype B clone HXB2R:      YQYMDDLYVG 
HIV-1 M group Subtype G isolate 92NG083:  YQYMDDLYVG 
SIV-Chimpanzee isolate CAM5:              YQYMDDLYVG
To see an alignment of the reverse transcriptase from other organisms, see the PFAM (protein families) database entry.

Here is a small fragment of that result:

Catalytic Asp (D) amino acids:                  **               
HIV-1 M group subtype B clone HXB2R:        YQYMDDLYVG 
HIV-1 M group Subtype G isolate 92NG083:    YQYMDDLYVG 
SIV-Chimpanzee isolate CAM5:                YQYMDDLYVG
Equine Infectious Anemia Virus:             YQYMDDLFMG
Feline Immunodeficiency Virus Isolate Gp2:  YQYMDDIYIG
Puma Immunodeficiency Virus Isolate Snowy:  YQYMDDIYIG  
Lion Immunodeficiency Virus Isolate SB10-5: YQYMDDIYIG         
Commelina yellow mottle virus (CoYMV):      AVYIDDILVF
Human T-Cell Leukemia Virus Type I:         LQYMDDILLA
Gibbon Ape Leukemia Virus:                  LQYVDDLLVA
Rous Sarcoma Virus:                         LHYMDDLLLA
Human Endogenous Retrovirus RTLV-Hp3:       IQYIDDLRLC
Human Endogenous Retrovirus K:              IHCIDDILCA

As you can see, the lentiviruses are more similar to each other, than they are to other families of exogenouns and endogenous retroviruses.

Competing interests:   None declared