We base our primary analysis on a set of viruses collected between Sep 2014 and Aug 2016, comprising approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible. This equilibration attempts to collect equal samples from Africa, China, Europe, Japan/South Korea, North America, Oceania, South America, South Asia, Southeast Asia and West Asia. In the following analysis we collapse samples from China, South Asia, Southeast Asia, Japan and Korea into a single region referred to here as "Asia", resulting in Asia possessing greater sample counts than North America or Europe. The only month that significantly departs from equitable sampling is Aug 2016 with 38 viruses, primarily from Europe and North America. We subsample to 100 viruses per month and not more to keep sample counts as equitable as possible across space and time. Repeating mutation and clade frequency calculations with up to 1200 viruses per month yields similar results (see below).
Viral clades 3c3.a, 3c3.b and 3c2.a emerged from the Texas/2012 background in early 2014 and rapidly spread through the viral population. Subsequently, we have observed competition among these clades, with 3c2.a viruses being globally dominant beginning in 2015. Recently, we have observed the steady decline of 3c3.b viruses. At this point, we estimate that they are either extinct or nearly extinct. 3c3.a viruses were largely replaced by 3c2.a viruses starting in 2015. However, we have observed an anomalous late-season epidemic of 3c3.a viruses within the USA from Jan to Aug 2016. Elsewhere in the world, 3c2.a viruses have remained dominant, particularly in Asia, where we estimate 3c2.a frequencies to be >95% throughout 2016.
The late-season 3c3.a epidemic in the USA is puzzling. However, we suspect that this epidemic is due to epidemiologic circumstance rather than the emergence of a selective variant. The strongest evidence for this is that there is not a single clade within 3c3.a that is spreading throughout the USA. Instead, a variety of 3c3.a viruses are spreading, each with different HA1 mutations. The emergence and spread of an adaptive variant would have appeared as a single clade bearing a characteristic epitope mutation. This is not what we see. One small clade, however, carries the S145N and F193S mutations close to the receptor binding site which resulted in antigenic evolution in previous years. This is a small clade restricted to viruses from US and remains at less than 1% global frequency.
We doubt that these 3c3.a viruses will spread globally. It is conceivable however, that the 2016-2017 USA season could derive from over-summering transmission chains. This would be an unusual event, but not unheard of (Bedford et al. 2010, Bedford et al. 2015). In one recent season (2008-2009), USA viruses derived primarily from the previous USA season. In other years (at least back to 2000), USA seasons were primarily reseeded from elsewhere. We believe that the 2016-2017 USA season will likely derive from 171K viruses due to their rapid spread (see below).The dominant 3c2.a viruses have continued to genetically diversify with the emergence of multiple subclades. We observe 3 subclades of decent frequency in 2016 viruses. These are characterized by HA1 mutations 142K/197R, HA1 mutation 197K and HA1 mutation 171K + HA2 mutations 77V/155E. The 171K variant comprises approximately 58% of 3c2.a viruses collected in 2016. Within 171K, the variant 121K has emerged and comprises approximately 23% of 3c2.a viruses collected in 2016.
The 142K/197R clade first emerged around June 2015 and rose to nearly 20% global frequency in Nov 2015. However, it's declined throughout 2016. We doubt this is a competitive virus based on current clade success, but site 142 mutated several times within 3c2a to 142G or 142K suggesting that mutating this position might enable the virus to escape preexisting immunity at a fitness cost; it warrants continued observation. The 197K clade emerged in late 2015 and has slowly grown in frequency since. We estimate that it now comprises ~14% of H3N2 viruses. However, 197K is at extremely low frequency in Asia with an estimated present-day frequency ~2%. On the other hand, clade 171K viruses have done remarkably well throughout 2016 and now comprise an estimated 69% of currently circulating H3N2 viruses. This steady and rapid increase is strongly suggestive of an adaptive origin. Notably, 171K emerged and spread first in Asia, reaching nearly 80% frequency in April 2016. Higher frequency of 171K in Asia is expected to spread to the rest of the world given historic geographic observations (Bedford et al. 2015). Within 171K, the variant 121K has spread. However, it doesn't appear to be spreading more rapidly than its parent 171K clade. Phylogenetic patterns suggest 171K as driver rather than 121K. Still, the rate of increase of 121K suggests that a sizable fraction of 2016-2017 viruses will be comprised of 121K.
Without strong competition from another novel H3N2 virus, we believe that 171K will continue to increase in frequency in the global population and predominate in the 2016-2017 influenza season. The continued spread of 171K is fully in line with the predictions we made in Feb 2016. At this point, we can't say whether the 171K mutation in HA1 or the HA2 mutations 77V/155E (or some combination) is driving the selective spread of this clade.
Other indicators suggest evolutionary success of 171K viruses. Notably "local branching index" (Neher et al. 2014) supports 171K as a high fitness virus. 3c3.a viruses and other clades within 3c2.a and do not show signal in the "local branching index" analysis.
Unfortunately, we lack sufficient recent serological data to distinguish antigenic evolution for most subclades within 3c2.a and 3c3.a. These observations derive entirely from genetic data.
There are a variety of viruses at the base of the 171K clade, possessing HA1:171K along with HA2:77V/155E, but lacking further amino acid changes in HA.
As discussed above, we base our primary analysis on a set of viruses collected between Sep 2014 and Aug 2016, comprising approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible. Recent months through June 2016 have largely sufficient sample counts and sample distributions. There are fewer samples from July to present.
Within clade 6b, two major genetic variants have emerged. These are clade 6b.1 comprised of HA1:84N/162N/216T and clade 6b.2 comprised of HA1:152T and HA2:174E. Most recent samples have been from 6b.1 viruses, with 85% of 2016 samples being from 6b.1 viruses.
At this point, all regions of the world are dominated by 6b.1 viruses. This clade rose from low frequency in Aug 2015 to reach present day global frequencies of ~98%. There remain a minority of circulating 6b.2 viruses. We estimate 2% of H1N1pdm viruses globally to be 6b.2. These are slightly higher prevalence in Asia, but still a distinct minority. We estimate that 6b.1 is at 85% frequency in Asia, while 6b.2 is at 15% in Asia. Notably, the frequency of 6b.2 has remained stable for almost 12 months.
Every indication suggests the continued dominance of 6b.1 viruses, and their extremely rapid rise suggests a selective origin. We are now watching for the emergence of genetic variants within the 6b.1 clade. Notably, the continued rise and dominance of 6b.1 viruses fits with our predictions from Feb 2016.
Analysis of local branching index also flags the clade 6b.1 as rapidly expanding and the most successful lineages within H1N1pdm. Analysis of antigenic evolution via the method of Neher et al. 2015 and using recent HI data from the Feb 2016 VCM report from the Crick Institute Collaborating Center suggests a small effect on HI titer in both the 6b.1 and 6b.2 clades. In this case, 6b.1 shows an antigenic impact of 0.65 log2 units and 6b.2 shows an impact of 0.66 log2 units. Interesting, this analysis places the impact on clade 6b.1 from the 84N mutation rather than the later 162N or 216T mutations.Within clade 1A viruses, the clade 129D/146I/117V has risen to high frequency, but at a rate that suggests a smaller effect of natural selection. At this point, 117V viruses are dominant in the global viral population.
As above, we base our primary analysis on a set of viruses collected between Sep 2014 and Aug 2016, comprising approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible.
In the past year, B/Vic clade 129D/146I/117V has dominated the viral population, with 94% of samples possessing the 117V mutation. This dominance has increased throughout 2016 and we estimate that currently circulating Vic viruses are 94% 117V. At this point, we haven't observed new variants of appreciable frequency within the 117V clade and there aren't decent competitors outside the 117V clade.
Additionally, the 117V clade has been growing most quickly and is picked by the local branching index as the currently most successful clade. All indicators suggest the continued success of 117V in the coming year.
As above, we base our primary analysis on a set of viruses collected between Sep 2014 and Jul 2016, comprising approximately 100 viruses per month where available and seeking to equilibrate sample counts geographically where possible.
During 2016, the vast majority of B/Yamagata isolates were of clade 3 viruses. We estimate that the current frequency of clade 2 is now ~2%. At this rate, we expect clade 2 to go extinct in the coming year.
Within clade 3, HA1:172Q has predominated throughout 2015 and 2016. On this background the HA1:251V variant has emerged and on top of 251V, the HA1:211R variant has appeared.
The 251V clade increased from low frequency in Oct 2014 to predominate in the population. We estimate that 251V is currently at 79% globally. The 211R variant rose from low frequency in Apr 2015 to reach 34% in currently circulating viruses. The rate of increase, however, has been rather mild.
The clade and mutation frequencies discussed above were based on a limited sequence subsample with an equitable geographical distribution. More accurate region specific trajectories can be estimated using all sequence data available in GISAID. We repeated the clade and mutation frequency estimation using up to 500 sequences per month and region.
With this more inclusive sampling, northern hemisphere winter months tend to be dominated by North America and Europe, while other months are dominated by samples from North America, Asia, and Oceania. The global average will track the regions contributing the majority of the sequence data.
The H3N2 clades 3c3.a (159S) and 3c2.a (159Y) show broadly similar trajectories as discussed above. 3c2.a came to dominate Asia completely, while viruses outside of 3c3.a and 3c2.a continue to circulate at low levels in Oceania. Within 3c2.a, the mutation 171K dominates across all regions.
Written by Trevor Bedford and Richard Neher. This work is made possible by the GISAID Initiative and the open sharing of genetic data by influenza research groups from all over the world. We gratefully acknowledge their contributions. Give us a shout at @trvrb or @richardneher with questions or comments.