For each combination of virus \(i\) and antiserum \(\alpha\), we define antigenic distance as \(H_{i,\alpha} = \log_2(T_{a\alpha}) - \log_2(T_{i\alpha})\), where \(T_{i\alpha}\) is the antiserum titer required to inhibit virus \(i\) and \(T_{a\alpha}\) is the homologous titer.
In case multiple measurements are available, we average the base 2 logarithm of the titers. When no homologous titer was available, the maximal titer was used as a proxy for the homologous titer. \(H_{i,\alpha}\) is then modeled as a sum of virus avidity \(c_i\), serum potency \(p_\alpha\) and antigenic contribution of branches \(b \in (i \ldots \alpha)\) connecting the virus and antiserum in the phylogenetic tree.
$$
\Delta_{i\alpha} = p_\alpha + c_i + \sum_{b\in (i\ldots\alpha)} d_{b}
$$
The parameters \(d_b\), \(p_\alpha\), \(c_i\) are then estimated by minimizing the cost function
$$
C = \sum_{i,\alpha} (H_{i,\alpha} - \Delta_{i,\alpha})^2 + \lambda \sum_b d_b + \gamma \sum_i c_i^2 + \delta \sum_\alpha p_\alpha^2
$$
subject to the constraints \(d_b\geq0\). To avoid overfitting, the different parameters of the model are regularized by the last three terms in the above equation. Large titer drops are penalized with their absolute value multiplied by \(\lambda\) (\(\ell_1\) regularization), which results in a sparse model in which most branches have no titer drop (
Candes and Tao, 2005). The antiserum potencies and virus avidities are \(\ell_2\)-regularized by \(\gamma\) and \(\delta\), penalizing very large values without enforcing sparsity. This constrained minimization can be cast into a canonical convex optimization problem and solved efficiently. In the substitution model, the sum over the path in the tree is replaced by a sum over amino acid differences in HA1. Sets of substitutions that always occur together are merged and treated as one compound substitution. The inference of the substitution model parameters is done in the same way as for the tree model (see
Harvey et al, 2014,
Sun et al, 2013} for a similar approach).
This optimization problem can be cast into a canonical quadratic programming problem which we solve using
cvxopt by M. Andersen and L. Vandenberghe.