After some work on singular equations and quasi-Newton methods (both were a central part of my thesis in 1980) I became engrossed in automatic or algorithmic differentiation (AD) shortly after my move to Argonne National Laboratory in 1987. This was as much an effort in searching out and bringing together different communities as doing mathematical analysis. After all there is nothing much to discover about the rules of (smooth) differentiation, except that they can be applied backward in what we call the reverse mode. Bert Speelpenning had that all in his thesis finished in 1980, though we all know now, there were lots of antecedents.
The only thing I got wrong in my first paper published in the proceedings of the ISMP in Tokyo 1989 was to grossly overestimate the complexity of higher derivatives. Also, with due respect to subsequent developers, ADOL-C, as I wrote it in 1988 during a two weeks visit at ZIB in Berlin, already had the key capabilities of AD tools based on operator overloading. Of course, there was no sparsity, no cross-country elimination, no parallelism and no checkpointing. All these things are important and required a lot of imagination and hard work to implement. I am proud to say that my former students Jean Utke, Uwe Naumann and Andrea Walther have played a major role in spreading the gospel by co-authoring books and developing top notch software. My latest technical contribution to AD as such was an article with Andrea Walther and Kshitij Kulshreshtha that demonstrates the backward stability of first derivative evaluations in some appropriate sense. My recent contribution to AD was a short survey on Automatic Differentiation for the Princeton Companion to Applied Mathematics edited by Nick Higham.
Since coming to Berlin in 2003 I have worked mainly on applications of AD in various areas of nonlinear scientific computing. The main thrust was a total quasi-Newton approach to nonlinear optimization, based on the theoretical observation that Jacobians and Hessians can be orders of magnitude more expensive than their products with vectors from the left or the right. At least for the problems in the popular test set CUTEr that is never true as their complete full Jacobians are never more than 20 times as expensive to evaluate as the underlying vector function. Another effort that has soft-stalled is the analysis and numerical solution of DAEs on the basis of the computational graph of the defining equation. This generalized structural analysis continues to be viewed as mathematically unsound by the DAE mainstream.
For the last six years I have worked extensively with the groups of my former associate Nicholas Gauger, now at Aachen, and my colleague Thomas Slawig on general schemes for one-shot design optimization. Our joint project within the DFG priority program SPP 1253 has just come to an end. Again the focus was on flexible and adaptive procedures that minimize the human effort needed to turn a legacy simulation code into a reasonably efficient optimization code. The results of the project have been reported and are compared to similar approaches in a survey paper.
Finally, for the last three years I have been involved in an extensive collaboration with physicists on the application of Quasi-Monte-Carlo methods to problems from high energy physics. This is currently a project within the Center of Computational Sciences Adlershof and will hopefully soon be supported by a DFG grant. So far, a conference proceedings contribution appeared and a regular journal paper was accepted. From the mathematical side, the project has mainly been carried by the doctoral student Hernan Leovey, who is a scientific jack of all trades.
The new departure:
Even though supposedly being an expert on differentiation in some sense, I never seriously looked at the tremendous development in nonsmooth analysis and generalized differentiation until some five years ago. The convex origins seemed too restrictive and nothing seemed to be implementable in an automated AD like fashion. Generalized differentiation rules being merely inclusions rather than strict equalities, every family of problems apparently needed to be analyzed by a skillful mathematician to make it computationally implementable. While in some schools of economics and mechanics the Clarke calculus is now well understood and occasionally applied, non-smooth analysis remains a mystery to most computational practitioners and there is no general purpose software implementation.
I think all that can be changed, at least under the following restrictions on the problem specific functions in question:
The functions of interest are set in finite dimensions and piecewise smooth.
The selection and evaluation of the smooth pieces is integrated in an evaluation procedure of uniformly limited length, invoking a finite set of elemental functions, including the absolute value function.
While these may appear to be serious limitations, one may allow, on the other hand, that:
Nonsmoothness may be superimposed, so that generalized derivative cancellations can occur.
The last effect cannot arise in case of standard complementarity equations and other simply switched systems, where nonsmooth elemental functions occur only independently of each other. Only in such cases I consider the usual notion that the evaluation of 'just one' generalized Jacobian poses no problem at all to be justified.
In fact, similar to the approach of Paul Barton and Kamil Khan we have found a way to constructively calculate generalized Jacobians that are 'conically active', a property that is slightly stronger than 'essentially active' in the sense of Scholtes. Wherever the function in question is Fr\'echet differentiable, only its proper Jacobian is conically active. The generalized derivative cancellation effect mentioned above is properly treated in this approach.
Our construction is based on the piecewise linearization of piecewise smooth functions in an AD like fashion and can be found in On stable piecewise linearization and generalized algorithmic differentiation. This piecewise linear model can be viewed as globally defined generalized Taylor expansion with an error that is of second order in the distance to the development point. Moreover, in contrast to all other generalized derivative concepts, which are at best outer semi-continuous in the sense of multifunctions, our Non-homogeneous piecewise linear model varies Lipschitz-continuously with respect to the development point. This is an absolutely indispensable property of tools for the construction of stable numerical algorithms.
Another essential aspect is the effective representation and manipulation of piecewise linear functions. The more or less standard, explicit representation by linear pieces on polyhedral subdomains is fraught with redundancy, inconsistency, and combinatorial complexity. Here we have proposed a representation in abs-normal form, which arises naturally from the linearization of an underlying piecewise smooth evaluation procedure. The representation has been implemented in a package called PLAN-C (Piecewise Linear Abs-Normal function in C). It contains already methods for accumulating generalized Jacobians, checking for coherent orientation and solving systems by several Newton and fixed point variants.
As of now we have considered three main applications for the piecewise linearization approach.
(i) The unconstrained optimization of a piecewise smooth f : ℝn → ℝ
(ii) The solution of equations F(x) = 0 for piecewise smooth F: ℝn → ℝn
(iii) The numerical integration of ODEs \dot x = F(x) for piecewise smooth F: ℝn → ℝn
In the non-smooth setting, constrained optimization can always be reduced to unconstrained optimization by exact penalization using the L1 or Linfty norm. Between (i) and (ii) lies least squares fitting for F: ℝn → ℝm, where special adaptions have already been considered but not yet implemented. Similarly, a specialization of (iii) shall yield new methods for numerical quadratures on Lipschitzian integrands.
A much more significant extension will be the generalization of (ii) and (iii) to algebraic and differential inclusions, where neither the original F nor its local piecewise linearizations are required to be continuous. A symmetric scenario of an algebraic inclusion is F = ∂ f with f as in (i). Moreover, it was also found, based on the analysis of and Hirriart-Urruty and Lemarecheal, that (i) can also be solved by following a true steepest descent trajectory x′ ∈ ∂ F.
Generally, for (iii) with F the convex, outer semi-continuous envelop of a piecewise smooth function the theory of Filipov ensures the existence of some solution. However, that is usually not unique unless some regularization is applied. Even further beyond lie optimal control and the general calculus of variation, where F is a true multi-function, i.e., set valued on open parts of its domain.
The grand finale:
As of now I would like to dedicate the rest of my professional life to a project tentatively titled
'Nonsmooth Numerics via Piecewise Linearization'
It should take at least 3 and hopefully not more than 5 years to become self-sustaining. The components should be in reverse order of completion
(v) A book similar to the SIAM-piece on AD. ( I feel that was really well done and held up in theory and sales. )
(iv) Packages for solving the three main applications considered above and hopefully some of their extensions, especially to algebraic and differential inclusions.
(iii) Analytical 'downloads' of the existing non-smooth theory of Mordukhovich and others to the piecewise smooth case in our limited sense.
(ii) Extension of the core package PLAN-C for performing basic piecewise linear algebra tasks exploiting structure, parallelism etc.
(i) Provision of font-ends from various AD tools to fill PLAN-C with piecewise linear models in abs normal form. Possibly some feed-back for outer loop solvers.
At some bare bones level I can do all of that by myself. Since I have no opportunity and interest to hang around in Berlin beyond my due date in early 2015 I won't be able to rely on the support by undergraduate students much longer. Several of the present ones are quite enthusiastic and productive, it's a shame I can't provide them with an option to continue. I have never been able to manage big groups effectively anyway, so collaboration with some colleagues, visitors and postdocs must do the trick. I'll be looking for some host institution outside Germany that supports me with some funds for travel and accommodation for myself and some visitors.