健壮的番茄 · <nodejs>querystring被标记 ...· 4 月前 · |
八块腹肌的小狗 · Windows批处理 - 知乎· 10 月前 · |
行走的烤红薯 · Python闭眼时长标准差脚本使用实例代码讲 ...· 1 年前 · |
高兴的太阳 · python bottle安装教程 - 知乎· 1 年前 · |
八块腹肌的绿茶 · WPF按钮删除默认的鼠标悬停效果_weixi ...· 1 年前 · |
Double electron-electron resonance (DEER) spectroscopy, also called pulsed electron-electron double resonance (PELDOR) spectroscopy, measures the magnetic dipolar coupling between two or more paramagnetic centers, such as spin labels attached to proteins [ 1 – 3 ]. DEER data analysis usually involves the removal of a background signal followed by a transformation of the oscillatory time-domain signal into a distance-domain probability distribution function describing the distances between nearby paramagnetic centers (1.5-10 nm).
There exist several different approaches for extracting distance distributions from DEER data: Tikhonov regularization [ 4 – 7 ], Gaussian mixture models [ 8 – 10 ], Tikhonov regularization post-processed with Gaussians [ 11 , 12 ], Tikhonov regularization combined with maximum entropy [ 13 ], Bayesian inference (based upon Tikhonov regularization) [ 14 ], regularization by limiting the number of points in the distance domain [ 15 ], wavelet denoising [ 16 ], truncated singular-value decomposition [ 6 , 17 ], and neural networks [ 18 ]. Among them, Tikhonov regularization is the most widely employed method.
In this paper, we are concerned with the determination of optimal settings for Tikhonov regularization. This involves the choice of a regularization operator L and of a value for the regularization parameter α . An optimal choice of L and α ensures good distance distribution recovery and prevents overfitting the data; a bad choice causes poor recovery and either under- or overfitting to the data. There are several operators to choose from, and many methods are available for selecting α , each based on a defensible rationale. However, they vary greatly both in terms of theoretical justification and empirical track record. Therefore, the selection of the method/operator combination ought to be based on a thorough comparison of their performance for a practically relevant benchmark set of data analysis problems.
Tikhonov regularization was introduced to NMR for de-Pake-ing [ 19 ], for the extraction of internuclear distances from dipolar time-domain signals such as those from REDOR [ 20 ], for the determination of orientational distributions from 2 H NMR data [ 21 , 22 ], and relaxation rate distributions [ 23 , 24 ]. These approaches used the self-consistent method for selecting α , as introduced and implemented in the program FTIKREG [ 25 , 26 ]. In the context of extracting distance distributions from DEER data, Tikhonov regularization was initally mentioned in 2002 [ 27 , 28 ], and first applications appeared in 2004 [ 4 , 5 ]. In these papers, the regularization parameter was selected manually or using FTIKREG. A thorough paper examining Tikhonov regularization and introducing the use of the L-curve maximum-curvature criterion appeared in 2005 [ 6 ]. A different L-curve method, the minimum-radius criterion, was introduced in 2006 in the program DeerAnalysis [ 7 ] and is used in its current release (2016).
Despite the long history and the widespread use of Tikhonov regularization for DEER data analysis, there has been no systematic assessment and efficiency comparison of regularization operators and regularization parameter selection methods. This is what we present here. We evaluate a large number of regularization parameter selection methods and three regularization operators with respect to their performance against a large set of synthetic DEER time-domain data derived from model distributions obtained by in silico spin labeling of a PDB crystal stucture of T4 lysozyme. We examine and compare all methods by their efficiency in recovering the underlying model distributions. The results indicate that the performance of the methods varies significantly, and that there are methods that perform better than the currently employed L-curve α selection methods.
The paper is structured as follows. Section 2 summarizes the principles of Tikhonov regularization for DEER and introduces the necessary notation. Section 3 explains the construction and characteristics of the large test set of distance distributions and associated synthetic DEER time-domain traces. Section 4 describes the assessment methodology. Section 5 presents the results and discusses benefits and drawbacks of various methods in light of these results. The paper ends with recommendations regarding the preferred methods. Mathematical details of the various regularization parameter selection methods are listed in the Appendix .
DEER measures the electron spin echo intensity V as a function of the position in time of one or more pump pulses, t . For dilute samples of doubly-labelled proteins or complexes, V is a product of an overall amplitude, V 0 , a modulation function due to intra-complex coupling, F , and a background modulation function due to the interaction between spins on different complexes, B :
λ is the modulation depth parameter related to the excitation efficiency of the DEER experiment. When utilizing Tikhonov regularization, a background model is typically fitted to V ( t )/ V 0 and divided out, yielding the isolated intra-complex modulation function S ( t ) after scaling by λ . S ( t ) is normalized such that S (0) = 1 in the noise-free limit.
In the absence of orientation selection, exchange coupling, differential relaxation, and multi-spin effects, S ( t ) is related to the distance distribution between the two spins on the complex, P ( r ), by
where
and . P ( r ) integrates to one: .
When S is discretized as an n t -element vector with elements S i = S ( t i ), and P as an n r -element vector with elements P i = P ( r i ), the integral transformation from P to S is represented by the matrix-vector multiplication
where K is the n t × n r kernel matrix. Its elements are K ij = K ( t i , r j ) · Δ r , with the distance increment Δ r . Usually, n r is set equal to n t .
The matrix K is close to singular, and the calculation of the inverse of K T K —needed to obtain P from S in an ordinary least-squares fitting procedure—is very inaccurate. To enable the solution, Tikhonov regularization is used:
This is a form of penalized least-squares fitting. The first term is the least-squares term capturing the misfit between the model KP and the data S . The second term penalizes for unwanted properties of the solution P and depends on a specific form for the regularization operator L and a specific value for the regularization parameter α . The fit is constrained by the requirement that all elements of P be non-negative ( P ≥ 0). The subscripts in P αL indicate that the solution depends on the particular choices of α and L .
L defines the criterion by which P should be penalized. Physically reasonable distance distributions between spin labels on proteins are smooth on a tenths-of-nanometer scale. Three L choices are possible that all encourage smoothness and penalize roughness of P , in one sense or another. The second derivative, represented by the ( n r −2) × n r second-order difference matrix
penalizes sharp turns in the distribution, which arise from sharp peaks. The first derivative, represented by the ( n r − 1) × n r first-order difference matrix
penalizes steep slopes, which are also associated with sharp peaks. Note that the matrices defined in Eqs. (6) and (7) do not contain Δ r . Lastly, the n r × n r identity matrix L 0 = I can be used. It penalizes tall peaks, which tend to be narrow due to the overall normalization of P .
The value of α determines the balance between the two terms in Eq. (5) . Large α values lead to oversmoothing of P and a poor fit to the time-domain data S , while small α values lead to unrealistically spiky distributions (undersmoothing) and overfitting of the data. Therefore, a proper choice of α is essential. A large number of α selection methods are described in the literature, and we include many of them in our performance analysis. They are described in the Appendix .
The optimal values of both α and L , α opt and L opt , are the ones which together best recover the underlying true distribution P 0 , i.e. the ones for which the model recovery error (mre), defined as
is minimal. For synthetic data, as used in this study, this error and thereby α opt and L opt can be determined, since P 0 is given. However, in experimental practice P 0 is not known, and one has to resort to other methods to choose a form for L and a value for α .
In order to test α selection methods and regularization operators on DEER data from practically relevant distance distributions, we numerically generated a large set of synthetic noisy DEER time-domain signals based on a crystal structure of T4 lysozyme (PDB ID 2LZM, 1.7 Å resolution); its structure is shown in Fig. 1 . We use this protein since it is currently one of the most thoroughly investigated proteins by DEER and other EPR techniques [ 29 – 37 ] and is of a size commensurate with the typical DEER distance range of 1.5 to 6 nm. We in silico labeled the 2LZM protein structure with the nitroxide spin label MTSSL which is to date, by a wide margin, the most commonly used label for DEER. In this section, we detail the construction of the test set.
Using scripts based on MMM 2017.1 [ 38 – 40 ] with the default 216-member rotamer library R1A_298K_UFF_216_r1_CASD for MTSSL at ambient temperature, we calculated rotamer distributions for all 164 sites on the protein. For each site, this yielded co-ordinates of the mid-points of the N-O bonds of all 216 rotamers and their associated populations, as well as the site partition function. In order to keep the test set realistic, we eliminated all sites with low predicted labelling probability (partition function smaller than 0.05), leaving 129 sites. These sites are indicated in Fig. 1 .
For each of the resulting 8256 site pairs, we calculated the associated model distance distribution, consisting of a sum of Gaussian lineshapes, with centers corresponding to the inter-rotamer distances, intensities corresponding to the products of the respective populations, and with a uniform full width at half maximum (FWHM) of 0.15 nm to account for structural heterogeneity around the energy minimum of each rotamer due to librational motion [ 38 ]. Each distribution was discretized with a high resolution of Δ r = 0.005 nm, corresponding to 1/30 of the FWHM. To keep the test set experimentally realistic, we discarded all distributions with more than 5% integrated population below 1.7 nm. This guarantees that there is no population below 1.5 nm. Site pairs with population at such short distances are always avoided in experimental studies in order to avoid potential distortions due to exchange coupling and incomplete excitation.
This procedure resulted in a set of 5622 model distance distributions. Their properties are summarized in Fig. 2 . The distribution modes (location of the distribution maxima) range from 1.7 to 6.1 nm, and the inter-quartile ranges (iqr; range of central 50% integrated population) are between 0.09 and 1.1 nm wide. Long, short, narrow, wide, unimodal, multimodal, symmetric, and skewed distributions are all well-represented in the test set. Skewness is relatively evenly distributed across both the mode and iqr. The number of significant peaks is evenly distributed across modal distance, but there is a positive correlation between iqr and number of peaks. Fig. 3 shows several representative example distributions.
An alternative strategy of constructing a test set would be to generate completely artificial distributions without reference to a protein where the center, width, multi-modality, and skewness is varied either systematically or randomly. However, such a test set will be less experimentally relevant. Figure 2 shows that the structure-based test set used here covers a sufficiently diverse range of distribution centers, widths, and shapes.
Each model distribution was used to generate a set of time-domain traces S with varying maximum evolution times t max , time increments Δ t , noise levels σ , and noise realizations, subject to the constraints that: (1) each S must be long enough to permit accurate recovery of the longest distances in the corresponding P , and (2) that the sampling rate must be high enough to recover the shortest distances. The starting time for all time traces was set to t min = 0. We used multiple values of t max and Δ t for each model distribution in order to be able to assess whether or not any α selection methods display uneven performance with respect to these parameters.
To choose a set of appropriate t max for a given distribution, we take as an upper limit t mm = 3 T ⊥,95 , where T ⊥,95 is the period corresponding to the perpendicular-orientation dipolar frequency of the 95-percentile distance, i.e., T ⊥,95 = ( r 95 /nm) 3 /(52 MHz). The factor 3 prolongs the time in order to completely capture the long-distance tail of the distribution, which has the lowest frequencies. With this, we choose the set of t max according to the following procedure: if t mm > 6.4 μs, then t max is 6.4 and 3.2 μs; if 3.2 μs < t mm < 6.4 μs, then t max is 3.2 and 1.6 μs; if 1.6 μs < t mm < 3.2 μs, then t max is 1.6 and 0.8 μs; if 0.8 μs < t mm < 1.6 μs, then t max is 0.8 μs; and if t mm < 0.8 μs, then t max is 0.4 μs. Figure 4 shows an example distribution for which t mm = 2.36 μs, hence t max is 1.6 and 0.8 μs. Note that even the shorter t max value of 0.8 μs is long enough to accommodate a full oscillation of period T ⊥,95 = 0.786 μs.
To find a reasonable set of time increments Δ t for each distribution, we first determined the longest allowable increment, Δ t max , that correctly samples all frequencies in the signal. This is dictated by the highest dipolar frequency in the time-domain signal, which corresponds to the shortest distance in the distribution. We use the period T ‖,05 of the parallel-orientation dipolar frequency corresponding to the 5-percentile distance. According to the Nyquist theorem, Δ t max must be less than T ‖,05 /2 to avoid frequency aliasing. To capture the short-distance tail of the distribution and to obtain enough time-domain points to achieve reasonable distance resolution in the Tikhonov fits in all cases, we chose Δ t max = T ‖,05 /6. This resulted in values from 8 ns to 280 ns. Δ t max now guides the choice of Δ t : If Δ t max > 200 ns, then Δ t is 50, 100, and 200 ns; if 100 ns < Δ t max < 200 ns, then Δ t is 20, 50, and 100 ns; if 50 ns < Δ t max < 100 ns, then Δ t is 10, 20, and 50 ns; if 20 ns < Δ t max < 50 ns, then Δ t is 10 and 20 ns; if 10 ns < Δ t max < 20 ns, then Δ t is 10 ns; and if Δ t max < 10 ns, then Δ t is 5 ns. For the example distribution in Fig. 4 , Δ t max = T ‖,05 /6 = 31.3 ns, so Δ t was set to 10 and 20 ns.
To generate the noise-free time trace for a particular combination of t min , t max , and Δ t (with n t = 33, 41, 65, 81, 129, 161, 321, or 641 points), we used in Eq. (4) the high-resolution model distribution ( n 0 = 1341 points) and generated the n t -element time trace using an n t × n 0 kernel matrix. Therefore, the model distributions used for this forward modeling have a higher resolution than the ones obtained via Tikhonov regularization ( Eq. (5) ) of the simulated noisy data, with n r = n t . This avoids circular reasoning and committing the “inverse crime” [ 41 , 42 ]. Fig. 4 shows the synthetic time-domain traces resulting from the above rules for an example distribution.
The combinations of Δ t , t min , and t max resulted in a total of 20701 noise-free time-domain traces with between 33 and 641 points. A few traces with fewer than 33 points were thrown out, since such short traces are not acquired in practice.
Finally, we generated 30 noisy traces from each noise-free trace. We utilized uncorrelated Gaussian noise [ 14 ] with standard deviations σ = 0.02, 0.05, and 0.1. These noise levels span the range typically obtained experimentally for proteins. In experimental settings, the background removal step typically increases the magnitude of the noise at the end of the S ( t ) trace relative to the start. This effect is more pronounced for steeper background functions and can be negligible for shallower ones. We did not attempt to simulate this characteristic of the data. For each noise level, m = 10 noise realizations were generated using a random-number generator. For reproducibility, seeds for the random-number generator are stored as part of the test set.
Altogether, with these selections for t max , Δ t , σ , and m , the 5622 model distributions P 0 resulted in 621030 noisy time-domain traces S . These constitute the final test data set.
For each of the traces in the test set, we use 60 different variants of α selection methods to determine an appropriate regularization parameter value α sel and the associated Tikhonov solution. Mathematical details of the methods are given in the Appendix . Each method is referred to via a short acronym, which is listed in Table 1 . We evaluate each α selection method for each of the three regularization operators L (identity, first derivative, and second derivative). Thus, we examine Tikhonov regularization along both its degrees of freedom ( L and α ), with a total of 180 approaches.
Acronym | Full name |
---|---|
|
|
AIC | Akaike information criterion |
AICC | corrected Akaike information criterion |
BIC | Bayesian information criterion |
BP | balancing principle |
CV | leave-one-out cross validation |
DP | discrepancy principle |
EE | extrapolated error |
GCV | generalized cross validation |
GML | generalized maximum-likelihood |
hBP | hardened balancing principle |
ICOMP | information complexity criterion |
LC | L-curve, maximum curvature |
LR | L-curve, minimum radius |
LR2 | L-curve, minimum radius 2 |
MCL | Mallows’ C L |
mGCV | modified generalized cross validation |
NCP | normalized cumulative periodogram |
QO | quasi-optimality criterion |
rGCV | robust generalized cross validation |
RM | residual method |
SC | self-consistency method |
srGCV | strong robust generalized cross validation |
tDP | transformed discrepancy principle |
Each method determines α by minimizing or thresholding some cost function over α . To search for these points, we use an α range from 10 −3 to 10 3 with 61 points on a logarithmic scale, i.e. 10 points per decade (1 dB increments). All methods require the data S as input and need to calculate P αL over the entire α range for each L . For this, P αL can be determined with or without the constraint P ≥ 0 in Eq. (5) , and we test both variants in all cases. Several methods require knowledge of the time-domain noise standard deviation, σ . The cost functions of a few methods are equipped with a tuning or scaling parameter that can be adjusted to increase stability.
The analysis procedure also requires the choice of a distance range and resolution for P αL . We use a range of 1.0 nm to 7.0 nm for all cases. These limits encompass the full ranges of all model distributions in the test set. The number of points in the distance domain, n r , is determined separately for each case and is set equal to the number of points in the time-domain trace, n t .
Since the goal of analyzing the time-domain data is to reveal the underlying model distribution, we base our performance evaluation of the various selection methods not on α (which is a nuisance parameter whose numerical value is physically irrelevant), but on the model recovery error defined in Eq. (8) . It quantifies how close the calculated Tikhonov solution, P αL , for a given α and L is to the model distribution P 0 . This error is generally non-zero, since noise in the data and the fundamental ill-posedness of the problem prevent full recovery of the model from the data, no matter which α and L are used. As the actual performance measure, we use the inefficiency [ 43 ]
with . This measure relates the model recovery error for a given L and the particular α value chosen by a selection method to the smallest possible model recovery error, obtainable with α opt and L opt . This error, and the associated optimal solution P opt are found by minimizing the mre as a function of α and L . The inefficiency equals 1 when P αL = P opt and is greater otherwise. The closer the inefficiency is to 1, the better the method.
Figure 5 shows an overview of the calculated inefficiencies for all 621030 time traces, 60 α selection methods, and three regularization operators. Each histogram shows the distribution of inefficiencies for a particular combination of method and operator. The methods are sorted by increasing 75 th -percentile inefficiency from top left to bottom right. There is a wide range of performances. Most methods in the top row achieve low inefficiencies that are sharply peaked at 1. Methods shown lower down have a propensity towards worse inefficiencies. The high-inefficiency tails of the histograms are not shown, but are protracted in many cases, indicating a not insignificant failure rate for those methods. It is apparent from the histograms that the L 0 operator, shown in blue, tends to perform worse than the others for the same method, except for some with overall bad performance. For most methods, L 1 and L 2 appear to give similar results.
To compare the method/operator combinations more quantitatively, we use the 99 th percentile from each inefficiency histogram as the metric. This stringent choice is motivated by the consideration that a method/operator combination can be deemed good and safe if it delivers overall low inefficiencies with negligible risk of large inefficiencies (i.e. severe under- or over-smoothing). Figure 6 shows the results. There is a group of methods that perform similarly well, leading to small 99 th -percentile inefficiencies just above 2. Mallows’ C L (MCL), the Akaike information criterion (AIC) and the generalized cross validation (GCV) perform equally well with L 1 and L 2 , whereas the generalized maximum likelihood method and its unconstrained variant (GML and GMLu) perform well only with L 1 . L 0 is the worst-performing operator for these top methods. The methods ranking below this top group are parameterized modifications of GCV and AIC. Since they are inferior to their parent methods, they can be disregarded. Currently, the most commonly applied α selection methods are based on the L-curve. The results show that the L-curve methods LC, LCu, LR, LRu, LR2, LR2u are not competitive with the top group. Their overall tendency is to oversmooth. Similarly, the SC and SCu methods display elevated inefficiency for DEER. The computational cost of all methods is essentially identical, since it is dominated by the solution of the Tikhonov minimization problem for each α .
The performance comparison shown in Fig. 6 is based on our particular choice of metric, the 99 th percentile of the ratio of rmsds, defined in Eq. (9) . Varying the percentile to 90, 75, or 50 does not significantly affect the composition of the top group, although it affects the detailed rankings (see SI ). Altering the definition of inefficiency in Eq. (9) by using the difference instead of the ratio, or by using the maximum absolute deviation (mad) instead of the rmsd, yields similar results (see SI ). Therefore, we conclude that the identity of the top methods is insensitive towards the particular choice of ranking metric, and that our assessment procedure is overall robust.
To check for uneven performance for certain subsets of the test set, we examined the performance across noise level, the ratio of iqr to mode (a measure of the damping rate of oscillations in S ), number of points, and number of modal periods, using the 99 th percentile rmsd ratio inefficiency metric. The breakdowns are available in the SI . For all values of these variables, MCL, AIC, GCV, GML, and GMLu are usually within 5% of the optimal method/operator combination and never deviate by more than 11%. The specific rankings vary between subsets, but actual performance changes to such a small degree that this is not significant. The conclusions reached above regarding the three operators generalize to the subsets considered in the SI. In addition to poor performance relative to L 1 and L 2 , L 0 also displays much higher variation in performance across test subsets. Several variations on GCV (mGCV, mGCVu, rGCV, and rGCVu) with varied tuning parameter values, as well as AICC, come within 10% of the best case for many of the test subsets. However, none display sufficient consistency or quality of performance to surpass MCL, AIC, GCV, GML, or GMLu.
Since there is no correlation between the performance of any top method and these subset characteristics, the results are therefore likely to be applicable to situations with different relative representations of distance distribution characteristics, such as other spin labels or proteins with more β sheets than T4 lysozyme.
The performance differences among the top method/operator combinations are so small that we cannot identify one as an evident best choice. However, additional considerations can provide some guidance. MCL requires the time-domain noise variance σ 2 as an input. An under- or overestimation of σ 2 , which is likely in experimental settings, will affect the performance and likely degrade it. GML and GMLu depend on a thresholding value to remove eigenvalues close to zero, and the choice of this value can affect the performance. In contrast, AIC and GCV do not require a priori knowledge of σ 2 nor do they depend on thresholding or tuning parameters. Therefore, the parameter-free AIC and GCV methods with either L 1 or L 2 appear to be the simplest, best, and safest choices for practical applications.
Due to the diversity of distribution shapes and time traces, and due to the wide range of inefficiencies, it is impossible to visualize method performances with a few sample datasets. Nevertheless, for the sake of illustration, Figure 7 shows the relative performance of LR2u, GCV, and AIC for a typical test dataset. This example is typical in these sense that the inefficiency of each method is very close to the method’s median inefficiency. The figure shows that the L-curve has a tendency to oversmooth, and that AIC and GCV perform similarly well. Fig. S1 in the Supplementary Material illustrates the uncertainty in the extracted P ( r ) based on these α selection methods, quantified using Bayesian inference [ 14 ].
Our performance analysis of a large number of regularization parameter selection methods over a large set of protein-based synthetic DEER data indicates that there are several method/operator combinations that perform equally well. Among these, the Akaike information criterion (AIC) and the generalized cross validation (GCV) are preferable, as they are parameter-free and do not require accurate knowledge of the noise level. They work equally well with the first- and second-derivative operator, but not with the identity. L-curve methods, some of which are currently widely employed, perform distinctly worse.
The structure-based test set developed in this work is useful beyond Tikhonov regularization, as it can be used to assess the performance of other existing analysis methods (truncated singular-value decomposition, Gaussian mixture models) and of any new solution approaches.
This work was supported by the National Science Foundation with grant CHE-1452967 (S.S.) and by the National Institutes of Health with grants R01 EY010329 (S.S.) and T32 GM008268 (T.H.E.). This work was further facilitated though the use of advanced computational, storage, and networking infrastructure provided by the Hyak super-computer system funded by the STF at the University of Washington.
Here, we summarize all α selection methods included in this study and give the expressions necessary to implement them. Further details about the methods can be found in the cited statistics literature.
Most methods are derived assuming an unconstrained Tikhonov regularization, i.e. without the P ≥ 0 constraint in Eq. (5) . In this case, the solution P αL can be expressed in closed form as
For the physically relevant constrained problem (with P ≥ 0), a closed-form solution is not possible, and we obtain P αL via an iterative optimization algorithm [ 44 ]. In either case, once P αL is obtained, the time-domain fit is
We apply each method described below in two ways, one using the unconstrained solutions and one using the constrained solutions. The methods are referred to by short acronyms (see Table 1 ). A suffix ’u’ is appended to the method acronym when unconstrained P αL are used. In addition, if the method contains a tuning or scaling parameter, then its value is appended to the acronym as well.
Several methods select α based on heuristic considerations about a parametric log-log plot of the Tikhonov penalty term, η = ‖ LP αL ‖, against the residual norm, ρ = ‖ S − S αL ‖, as a function of α [ 45 , 46 ]. Such a plot shows a monotonically decaying η ( ρ ) and is called the L-curve since it tends to form a characteristic “L” shape with a visual corner. The optimal value for α is considered to correspond to this corner, since it intuitively represents a reasonable balance between the fitting error ρ and the regularization error η . The corner is not a mathematically defined quantity, therefore different operational definitions of locating this corner exist.
One of them is the maximum-curvature method (LC) [ 46 ]. It selects the α corresponding to the point of maximum (positive) curvature of the L-curve, given by
where , , and the primes indicate derivatives with respect to lg α or α . This method picks the global maximum of the curvature, even though there can be several local maxima.
Another possible definition of corner is the point closest to the lower left corner of the L-curve plot [ 45 ]. DeerAnalysis uses one implementation of this idea [ 7 ]. The two coordinates and are evaluated over a range of α values and then rescaled to the interval [0,1]. The corner is determined as the location on the L-curve that is closest to the origin in these rescaled coordinates:
The results from this method depend on the α range. We refer to this method as the minimum-radius method and use two variants of it. In the first one (LR), we use the same α range as for all other methods (10 −3 to 10 3 ), and in the other one (LR2), we employ the same α range as DeerAnalysis2016. The latter extends from the largest generalized singular eigenvalue of K and L , s max , to a value cs max with c = 16 ε · 10 6 ·· 2 σ /0.0025 , where ε ≈ 2.2 · 10 −16 and σ is the noise level. In order to keep the range large enough, we limit c to ≤ 10 −6 .
We use several methods based on the idea of cross validation. Leave-one-out cross validation (CV) [ 47 , 48 ] is conceptually the simplest of them. For a given α , it minimizes the total prediction error, which is obtained as the sum of the prediction errors for each individual data point. For that, a single data point is excluded and a fit to the remaining data is calculated. That fit is judged based upon its ability to reproduce the excluded data point. By repeating the method for each data point, the total prediction error is obtained, and the α value is selected that minimizes it. This procedure can be condensed into the following expression:
with the α -dependent influence matrix .
Generalized cross validation (GCV) [ 49 , 50 ] is very similar to leave-one-out cross validation, but the diagonal matrix element H αL ( i , i ) in the denominator is replaced by the average of all diagonal elements, rendering the method more stable. The expression simplifies to
The modified GCV (mGCV c ) [ 51 , 52 ] method is a simple tunable modification to GCV intended to stabilize the method further.
with the tuning parameter c ≥ 1. We use this with c = 1.2, 1.5, 2, and 3, to test a range of stabilizations.
The robust GCV (rGCV γ ) method [ 53 , 54 ] is also designed to exhibit greater stability than the GCV method. It is given by
with the tuning parameter γ ≤ 1. With γ = 1, the method reduces to GCV. As γ gets smaller, the method becomes increasingly stable and less likely to undersmooth. We examine γ values of 0.1, 0.5, and 0.9, with increasing contributions from the second term.
Strong robust GCV (srGCV γ ) [ 55 ] is another tunable modification to GCV that is based on stronger statistical arguments than rGCV. It selects α via
with γ ≥ 1. Like rGCV, with γ = 1, the method reduces to GCV. We use γ values of 0.8 and 0.95.
This criterion, from Tikhonov and coworkers [ 56 – 58 ], selects α such that a small change in α from that selected value has minimal effects on the resulting P αL :
The underlying rationale is that the model recovery error is flat at its minimum.
This principle [ 59 – 61 ] is predicated upon the idea that the root-mean-square residual, , should be on the order of the noise standard deviation in the data, σ . It requires a priori knowledge of σ . The value for α is selected as the largest value such that
where τ is a safety factor ≥ 1 to guard against under-smoothing in the case σ is underestimated. We use this principle with τ = 1 and 1.5.
The transformed discrepancy principle (tDP τ ) [ 62 – 64 ] performs the comparison in the distance domain, choosing the largest α that satisfies
where . We use τ = 1.5.
The balancing principle (BP) [ 65 , 66 ] balances the propagated noise error with the unknown model recovery error. Using
and , it selects α sel as the largest α value that satisfies B ( α ) ≤ 1. The hardened balancing principle (hBP) [ 66 ] selects α using
This method [ 43 , 67 ] determines α by minimizing a scaled norm of the residual vector
where B = K T (I − H αL ). The scaling penalizes under-smoothing.
This method [ 25 , 26 ], utilized in FTIKREG, seeks to minimize the sum of the estimated model recovery error and the propagated data noise variance.
This expression is valid for the unconstrained case. In order to include the non-negativity constraint, further steps are necessary. First, α sel for the unconstrained case is determined. Next, the constrained solution is determined using this α value and the indices q of the active non-negativity constraints are stored. Finally, the columns of K and L with indices q are removed and Eq. (25) is re-evaluated.
This method [ 68 ] selects the α that maximizes the likelihood (or, equivalently, minimizes the negative log-likelihood) and is given by
where det nz (·) indicates the product of the non-zero eigen-values, and m is the their number. To account for numerical errors, we treat all eigenvalues with magnitude below 10 −8 as zero.
This method [ 69 – 71 ] minimizes an estimate of the regularization error via
This method [ 72 – 74 ] is based on the idea that the power spectrum of the residuals should match the power spectrum of the noise. The unscaled power spectrum (periodogram) for a given α is a vector p with elements
where k = 1, …, n t , dft refers to the discrete Fourier transform. The normalized cumulative periodogram is an ( n t − 1)-element vector c with elements
where ‖…‖ 1 is the ℓ 1 norm (sum of absolute values). The zero-frequency component, p 1 , is omitted. c represents the integrated power spectrum of the residuals. The α value is selected that minimizes the deviation between c and the integrated power spectrum c noise expected for the noise:
where c noise is a ( n t − 1)-element vector, with elements c noise, i = i /( n t − 1) for white noise.
This method [ 75 ] minimizes an approximation to the model recovery error, derived under the assumption of unconstrained regularization and uncorrelated Gaussian noise.
This requires the knowledge of the noise level σ .
In the context of information theory [ 76 , 77 ], the set of P αL is regarded as a set of candidate models, and criteria have been developed that select a parsimonious model that balances a small fitting error with a low model complexity. The general expression is
where the second term is a measure of model complexity, which decreases with increasing α . The term tr( H αL ) can be regarded as the effective number of free parameters in the model. The constant c depends on the particular criterion: c = 2 for the Akaike information criterion (AIC) [ 78 ], c = 2 n t /( n t − tr( H αL ) −1) for the corrected AIC (AICC) [ 79 , 80 ], and c = ln( n t ) for the Bayesian information criterion (BIC) [ 81 ]. The information complexity criterion (ICOMP) [ 82 , 83 ] is yet another information-theoretical procedure and is given by
where and are the arithmetic and geometric means of the singular values of ( K T K + α 2 L T L ) −1 . The last term penalizes for interdependence among model parameters. In contrast to the other information-theoretical criteria, this one requires knowledge of the noise level σ .
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Supplementary material
The online supplementary material contains (a) tables of relative performance based on different metrics and percentiles, including breakdowns into subsets based on data characteristics, (b) Bayesian uncertainty analysis for the test case in Fig. 7 . The complete set of synthetic model distributions and test time-domain traces is freely available for download from the University of Washington ResearchWorks public repository at http://hdl.handle.net/1773/40984 .