We base our primary analysis on a set of viruses collected between Jan 2015 and Feb 2017. We analyze mutation frequency trajectories in different geographic regions using all available sequence data for each region. Our phylogenetic analysis is based on a subsample of approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible. This equilibration attempts to collect equal samples from Africa, China, Europe, Japan/South Korea, North America, Oceania, South America, South Asia, Southeast Asia and West Asia. In the following analysis we collapse samples from China, South Asia, Southeast Asia, Japan and Korea into a single region referred to here as "Asia", resulting in Asia possessing greater sample counts than North America or Europe. There is reasonably broad geographic sampling throughout, except Dec 2016 and Jan 2017, which have more samples from Europe and North America.
Of the clades that emerged from the Texas/2012 background in early 2014, only 3c2.a currently remains at globally high prevalence. 3c3.b viruses have scarcly been seen since the end of 2015. The uptick of 3c3.a viruses in mid-2016 in the US has failed to carry forward to the 2016-2017 US winter season. Oceania still harbors 3c3.a viruses, but we fully expect these to be replaced by another clade in the coming year as Oceania is reseeded from the global reservoir.
Within clade 3c2.a, a number of subclades have emerged. Notable clades include N171K (clade 3c2a.1), N171K/N121K, N121K/S144K, 131K/142K and R142K/Q197K. Clades with R142K, N121K/R144K, Q197K arose in late 2014 or early 2015 but have not come to dominate. However, the 171K clade (3c2a.1) that was first observed in mid-2015 has continued to increase in frequency at a moderate rate and is now at a global frequency of approximately 60%. Within the 171K clade, a subclade bearing the 121K mutation emerged in at the beginning of 2016 and has steadily increased in frequency now making of the majority of the parent 171K clade.
Currently, the only noteworthy sister clade of 171K is the clade with mutations T131K/R142K. This clade was first observed mid-2016 in China and dominated in China during last 3 month according to sequences available in GISAID, making up ~80% of Chinese isolate since October. It has been rising rapidly and has spread to Europe and North America. In the branch leading up to this clade, we observe a rapid succession of mutation in codon 131 from threonine to asparagine and then to lysine and the mutation R142K. Rapid succession of multiple mutations in the same codon suggest that these changes are adaptive. Interestingly, we observe a large number of mutations to positively charged residues and particularly many parallel mutations to lysine at positions 121, 131, 142, 144, 171, and 197. Clades N171K/N121K and T131K/R142K are on opposing genetic backgrounds so that there is competition between these clades; both cannot succeed.
Despite the large genetic diversity at sites in the vicinity of the receptor binding site, HI or FRA assays show no evidence of substantial antigenic change. HI assays by the WHO collaborating center at the Crick Institute in London indicate a moderate two-fold titer drop relative to A/HongKong/4801/2014 for several subclades, including the 171K subclade. However, there is substantial variability in those data. More dense and recent data by the CDC in Atlanta, Georgia, do not support a homogeneous titer drop of the 171K clade relative to the current vaccine strain Hong Kong/4801/2014. The figure below shows trees colored by predicted titer drop relative to Hong Kong/4801/2014 using the tree model (Neher et al. 2016) trained on HI data from Crick and CDC as well as FRA data from CDC. Taken together, these data provide little evidence for substantial antigenic evolution within 3c2.a.
The local branching index (LBI), a phylogenetic indicator of clade growth, corroborates the observations made on the basis of clade frequencies. Over the last 6 monthes, the 131K clade has highest LBI, while the clade 171K has highest LBI when considering sequences in all of 2016. Interpretation of the sequence based indices such as number of epitope or non-epitope mutations is difficult due the extensive genetic heterogeneity. The clade 131K has accumulated the same number of epitope mutations as 171K but has fewer non-epitope mutations relative to A/Texas/50/2012.
Given current patterns of clade growth and decline, we predict clades 171K/121K and 131K/142K to be the most successful of currently circulating clades. We expect both to increase in the coming months. Longer term projections are difficult. 171K/121K may continue to dominate, 131K/142K may displace 171K/121K or another mutation may appear that determines the eventual outcome.
Very few H1N1pdm viruses have been observed in recent monthes. The dominant clade continues to be 6b.1 and there is little amino acid sequence variation within HA. The only notable subclade that has been growing recently is the clade bearing HA1:R205K/S183P. This clade is dominated by North American viruses and we see no evidence that this clade has a particular competitive advantage.
As above, we base our primary analysis on a set of viruses collected between Jan 2015 and Feb 2017, comprising approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible. However, in the case of H1N1pdm we were not able to collect a full 100 samples per month from the database in many months due to low absolute prevalence.
There has been continued dominance of clade 6b.1 (characterized by mutation 162N) over clade 6b.2 (characterized by mutation 152T).
Within clade 6b.1 there is very little amino acid sequence variation. The only mutations of any size are I166V, S183P and R205K. Interestingly, these three amino acid mutations occur in rapid succession along a single lineage, resulting in a series of nested clades.
Mutations 183P and 205K has been rising in Europe and North America starting in early-mid 2016, but this rise corresponds to a small number of cases and might turn out insignificant during the next H1N1pdm dominated season. The pace of increase is only moderate, suggesting a lack of strong selective pressure. Additionally, no consistent antigenic variation is observed among recently circulating H1N1pdm viruses (according to primary infection in naive animals).
Given the lack of competition, we predict the continued dominance of 6b.1 viruses.
Clade 1A has continued to dominate and mutation 117V has all but taken over the global population. The rise of this mutation was fairly gradual and we have no evidence that it is associated with antigenic change or other benefit to the virus.
As above, we base our primary analysis on a set of viruses collected between Jan 2015 and Feb 2017, comprising approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible.
We observe the continued dominance of the clade bearing mutations N129D, V146I and I117V. There is very little HA amino acid variation within the 117V viruses with no subclades rising to appreciable frequency. There is deletion at AA sites 162/163 that first appeared in mid-2016, but remains at low global frequency. The frequency graph shows the frequency of '-' at position 162 as a proxy for sites 162 and 163.
Neither the local branching index nor clade frequencies single out a particular variant within the 1A/I117V clade.
Given the lack of competition, we predict the continued dominance of 117V viruses.
Clade 3 has continued to dominate. Within clade 3, a clade with mutation HA1:251V is globally at frequency of about 80% throughout 2016. Within this clade, mutation 211R is at 25% frequency. In addition, a clade without prominent amino acid mutations has been rising throughout 2016.
As above, we base our primary analysis on a set of viruses collected between Jan 2015 and Feb 2017, comprising approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible.
Clade 3 viruses have continued to dominate with clade 2 viruses scarcely observed throughout 2016. Within clade 3, mutations L172Q and M251V remain dominant and mutation K211R continues to persist at low frequency. There is little evidence of emerging clades of strong selective effect.
The LBI highlights the central clade in the tree below. This clade has no amino acid mutations in HA along its backbone relative to its parent clade 251V.
Genetic variation within 172Q/251V viruses is beginning to develop, but given lack of credible competition, we expect 172Q/251V to continual to dominate in the coming months.
Written by Trevor Bedford and Richard Neher. This work is made possible by the GISAID Initiative and the open sharing of genetic data by influenza research groups from all over the world. We gratefully acknowledge their contributions. Give us a shout at @trvrb or @richardneher with questions or comments.