The following notational conventions are applied in this disclosure. We use x and y to denote scalar reals through subsection 5 and arbitrary VLAD values from subsection 1-6 on. We use x and y to denote real vectors and X and Y to denote real matrices. We often use x and its typographic variants to denote function arguments and y and its typographic variants to denote function results. We use primes and subscripts to denote distinct names and brackets, as in x[i], to denote selection of vector components. We take 1 to be the index of the first element of vectors (and lists) and the first row and column of matrices. We use comma to denote pair formation and CAR and CDR to denote the functions that extract the elements of pairs. We use e to denote expressions and τ to denote VLAD types. We use f and g to denote functions from real vectors to real vectors through section 5 and procedures of τ1→τ2 from subsection 1-6 on. We use α to denote functions of type →b to denote functions of type → or (×)→, and p to denote functions of type τ→ boolean. We use juxtaposition of a function and its argument to denote function application, of two functions to denote function composition: (g f) x=g (f x), of a matrix and a vector to denote matrix-vector multiplication, of two matrices to denote matrix-matrix multiplication, and of two scalars to denote ordinary multiplication. Note that matrices can be viewed as linear functions, thus matrix-vector multiplication is application of a linear function and matrix-matrix multiplication is composition of linear functions. Scalars can be viewed as one-element vectors or 1×1 matrices, thus ordinary multiplication can be viewed as either function application or function composition.
We use infix e1+e2 and e1⊕e2 to denote ++(e1, e2) and PLUS (e1, e2) and
      ∑          i      =      1        n    ⁢      e    i  to denote e1+ . . . +en. Comma associates to the right; juxtaposition, +, and ⊕ associate to the left; and vector-component selection has highest precedence, followed by comma, then juxtaposition, and then + and ⊕. The scope of lambda expressions and summations extends as far right as possible. We use parentheses solely to specify precedence. Note that function composition and matrix multiplication are associative. More generally, a juxtaposition sequence denoting either a sequence of function compositions followed by a function application or a sequence of matrix-matrix multiplications followed by a matrix-vector multiplication is associative. We use  to denote the transpose of a matrix X. Recall that  more generally,
  (                    X        1            ⁢                          ⁢      …      ⁢                          ⁢              =                ⁢                          ⁢      …      ⁢                          ⁢                      .            
In the next subsection, we give an unconventional derivation of forward- and reverse-mode AD. The particular derivation that we present lends itself to extension to functional programming languages.
1-2.1 Noncompositionality of the Jacobian Operator
An operator ◯ is compositional if ◯ (g f)=(◯g) (◯F) and, more generally, ◯ (fn . . . f1)=(◯fn) . . . (◯f1). If we take f1, . . . , fn to be primitives of some programming language and fn . . . f1 to be a program constructed out of those primitives, then compositional operators have a desirable property: one can compute ◯ (fn . . . f1) by an abstract interpretation of a fn . . . f1, interpreting each fi abstractly as ◯f1.
We can see that the Jacobian operator is not compositional. The chain rule states that:
          ⁢          (      gf      )        ⁢    x    =            (                      ⁢        gfx            )        ⁢          (                      ⁢        fx            )      and, more generally, that:
                                          𝒥            ⁡                          (                                                f                  n                                ⁢                                                                  ⁢                …                ⁢                                                                  ⁢                                  f                  1                                            )                                ⁢          x                =                ⁢                  (                      𝒥            ⁢                                                  ⁢                          f              n                        ⁢                          f                              n                -                1                                      ⁢                                                  ⁢            …            ⁢                                                  ⁢                          f              2                        ⁢                          f              1                        ⁢            x                    )                                                ⁢                  (                      𝒥            ⁢                                                  ⁢                          f                              n                -                1                                      ⁢                                                  ⁢            …            ⁢                                                  ⁢                          f              2                        ⁢                          f              1                        ⁢            x                    )                                                ⁢        ⋮                                        ⁢                  (                      𝒥            ⁢                                                  ⁢                          f              2                        ⁢                          f              1                        ⁢            x                    )                                                ⁢                  (                      𝒥            ⁢                                                  ⁢                          f              1                        ⁢            x                    )                    Because the Jacobian operator is not compositional, we seek alternative operators that are compositional and allow us to compute the Jacobian.
1-2.2 Adjoint Computation: the Essential Insight of AD
As a first step in our search for compositional alternatives to the Jacobian operator, we introduce:
                    ∇        ⇀            ⁢      fx        ,                  x        ′            ⁢              =        Δ            ⁢                      ⁢        fx        ⁢                  x          ′                                        ∇        ↼            ⁢      fx        ,                  y        .            ⁢              =        Δ            ⁢                                    (                          𝒥              ⁢                                                          ⁢              fx                        )                    T                ⁢                  y          ‵                    We refer to x as the primal and to {acute over (x)} and {grave over (x)} as the forward and reverse adjoints of x respectively. Note that the rows and columns of  f x can be computed as  f x, e and  f x, e for basis vectors e respectively. Thus one can derive  f from either  f or  f.
The motivation for introducing  and  can be seen with the following. Suppose we wish to compute  (fn . . . f1) x0, {grave over (x)}0. Let xi denote fi . . . f1 x0 and {acute over (x)}i denote  (fi . . . f1) x0, {acute over (x)}0 for i=1, . . . , n. Note that each xi can be computed from xi . . . 1 as fi xi . . . 1. Furthermore, each {acute over (x)}i can be similarly computed from xi . . . 1 and {acute over (x)}i . . . 1 as fixi . . . 1, {acute over (x)}i . . . 1:
                                                        x              ′                        i                    =                    ⁢                                                    ∇                ⇀                            ⁢                              (                                                      f                    i                                    ⁢                                                                          ⁢                  …                  ⁢                                                                          ⁢                                      f                    1                                                  )                                      ⁢                          x              0                                      ,                              x            ′                    0                                        =                ⁢                            ⁢                      (                                          f                i                            ⁢                                                          ⁢              …              ⁢                                                          ⁢                              f                1                                      )                    ⁢                      x            0                    ⁢                                    x              ′                        0                                                  =                ⁢                  (                      𝒥            ⁢                                                  ⁢                          f              i                        ⁢                          f                              i                -                1                                      ⁢                                                  ⁢            …            ⁢                                                  ⁢                          f              1                        ⁢                          x              0                                )                                                ⁢                  (                      𝒥            ⁢                                                  ⁢                          f                              i                -                1                                      ⁢                                                  ⁢            …            ⁢                                                  ⁢                          f              1                        ⁢                          x              0                                )                                                ⁢        ⋮                                        ⁢                  (                      𝒥            ⁢                                                  ⁢                          f              1                        ⁢                          x              0                                )                                                ⁢                              x            ′                    0                                        =                ⁢                  𝒥          ⁢                                          ⁢                      f            i                    ⁢                      f                          i              -              1                                ⁢                                          ⁢          …          ⁢                                          ⁢                      f            1                    ⁢                      x            0                    ⁢                                    x              ′                                      i              -              1                                                              =                ⁢                  𝒥          ⁢                                          ⁢                      f            i                    ⁢                      x                          i              -              1                                ⁢                                    x              ~                                      i              -              1                                                                        =                    ⁢                                                    ∇                ⇀                            ⁢                              f                i                                      ⁢                          x                              i                -                1                                                    ,                              x            ′                                i            -            1                              
In a similar fashion, suppose we wish to compute  (fn . . . f1)x0, {grave over (x)}n. Let {grave over (x)}i denote:
                              (                      𝒥            ⁢                                                  ⁢                          f                              i                +                1                                      ⁢                                                  ⁢            …            ⁢                                                  ⁢                          f              1                        ⁢                          x              0                                )                T            ⁢                          ⁢      …      ⁢                          ⁢                        (                      𝒥            ⁢                                                  ⁢                          f              n                        ⁢                                                  ⁢            …            ⁢                                                  ⁢                          f              1                        ⁢                          x              0                                )                T            ⁢                        x          ‵                n            ⁢                          ⁢      for      ⁢                          ⁢      i        =    0    ,  …  ⁢          ,      n    -    1.  We can see that  (fn . . . f1)x0, {grave over (x)}n={grave over (x)}0:
                                                        ∇              ↼                        ⁢                          (                                                f                  n                                ⁢                                                                  ⁢                …                ⁢                                                                  ⁢                                  f                  1                                            )                                ⁢                      x            0                    ⁢                                    x              ‵                        n                          =                ⁢                                            (                                              ⁢                                  (                                                            f                      n                                        ⁢                                                                                  ⁢                    …                    ⁢                                                                                  ⁢                                          f                      1                                                        )                                ⁢                                  x                  0                                            )                        T                    ⁢                      x            n                                                  =                ⁢                                            (                                                                                          (                                              𝒥                        ⁢                                                                                                  ⁢                                                  f                          n                                                ⁢                                                                                                  ⁢                        …                        ⁢                                                                                                  ⁢                                                  f                          1                                                ⁢                                                  x                          0                                                                    )                                                                                                            ⋮                                                                                                              (                                              𝒥                        ⁢                                                                                                  ⁢                                                  f                          1                                                ⁢                                                  x                          0                                                                    )                                                                                  )                        T                    ⁢                                    x              ^                        n                                                  =                ⁢                              (                          𝒥              ⁢                                                          ⁢                              f                1                            ⁢                              x                0                                      )                    T                                                ⁢        ⋮                                        ⁢                              (                          𝒥              ⁢                                                          ⁢                              f                n                            ⁢                                                          ⁢              …              ⁢                                                          ⁢                              f                1                            ⁢                              x                0                                      )                    T                                                ⁢                              x            ‵                    n                                        =                ⁢                              x            ‵                    0                    
Furthermore, each {grave over (x)}i−1 can be similarly computed from xi−1 and {grave over (x)}i, as fixi . . . 1, {grave over (x)}i:
                                          x            ‵                                i            -            1                          =                ⁢                                            (                                              ⁢                                  f                  1                                ⁢                                                                  ⁢                …                ⁢                                                                  ⁢                                  f                  1                                ⁢                                  x                  0                                            )                        T                    ⁢                                    x              ‵                        i                                                  =                ⁢                                            (                              𝒥                ⁢                                                                  ⁢                                  f                  i                                ⁢                                  x                                      i                    -                    1                                                              )                        T                    ⁢                                    x              ‵                        i                                                            =                    ⁢                                                    ∇                ↼                            ⁢                              f                1                                      ⁢                          x                              i                -                1                                                    ,                              x            ‵                    i                    
The chain rule allows us to derive expressions for (gf) and (g f):
                                                        ∇              ⇀                        ⁢                          (              gf              )                                ⁢                      (                          x              ,                              x                ′                                      )                          =                ⁢                            ⁢                      (            gf            )                    ⁢          x          ⁢                      x            ′                                                  =                ⁢                              (                          𝒥              ⁢                                                          ⁢              gfx                        )                    ⁢                      (                          𝒥              ⁢                                                          ⁢              fx                        )                    ⁢                      x            ′                                                  =                ⁢                              (                          𝒥              ⁢                                                          ⁢              gfx                        )                    ⁢                      (                                                            ∇                  ⇀                                ⁢                fx                            ,                              x                ′                                      )                                                            =                    ⁢                                    ∇              ⇀                        ⁢                          g              ⁡                              (                fx                )                                                    ,                  (                                                    ∇                ⇀                            ⁢              fx                        ,                          x              ′                                )                                                                            ∇              ↼                        ⁢                          (              gf              )                                ⁢                      (                          x              ,                              y                ‵                                      )                          =                ⁢                                            (                                                𝒥                  ⁡                                      (                    gf                    )                                                  ⁢                x                            )                        T                    ⁢                      y            ‵                                                  =                ⁢                                            (                                                (                                      𝒥                    ⁢                                                                                  ⁢                    gfx                                    )                                ⁢                                  (                                      𝒥                    ⁢                                                                                  ⁢                    fx                                    )                                            )                        T                    ⁢                      y            ‵                                                  =                ⁢                                            (                              𝒥                ⁢                                                                  ⁢                fx                            )                        T                    ⁢                                    (                              𝒥                ⁢                                                                  ⁢                gfx                            )                        T                    ⁢                      y            ‵                                                            =                    ⁢                                                    ∇                ↼                            ⁢              f                        ⁢                                                  ⁢            x                          ,                  (                                                    (                                  𝒥                  ⁢                                                                          ⁢                  g                  ⁢                                                                          ⁢                  f                  ⁢                                                                          ⁢                  x                                )                            T                        ⁢                          y              ‵                                )                                                  =                    ⁢                                                    ∇                ↼                            ⁢              f                        ⁢                                                  ⁢            x                          ,                  (                                                    ∇                ↼                            ⁢                              g                ⁡                                  (                  fx                  )                                                      ,                          y              ‵                                )                    Note that  and  are still not compositional. This is due to the fact that  f and  f map primals paired with adjoints to adjoints.
1-2.3 Compositionality of the AD Transforms
It is easy to derive a compositional variant of . Recall that fi maps xi . . . 1 to xi and  fi maps xi . . . 1 paired with {grave over (x)}i . . . 1 to {acute over (x)}i. We simply introduce a variant of  that combines these two maps:
          ⁢    fx    ,            x      ′        ⁢          =      Δ        ⁢          (      fx      )        ,            (                                    ∇            ⇀                    ⁢          fx                ,                  x          ′                    )        .   fi thus maps xi . . . 1 paired with {acute over (x)}i . . . 1 to xi paired with {acute over (x)}i. Note that  f x, {acute over (x)}=CDR ( f x, {acute over (x)}). Thus one can derive  f from  f and ultimately derive  f from  f.
It is easy to see that  is compositional:
                                        ⁢                      (            gf            )                    ⁢          x                ,                              x            ′                    =                    ⁢                      (            gfx            )                          ,                  (                                                                      ∇                  ⇀                                ⁢                                  (                  gf                  )                                            ⁢              x                        ,                          x              ′                                )                                                  =                    ⁢                      (            gfx            )                          ,                  (                                                    ∇                ⇀                            ⁢                              g                ⁡                                  (                  fx                  )                                                      ,                          (                                                                    ∇                    ⇀                                    ⁢                  fx                                ,                                  x                  ′                                            )                                )                                                  =                    ⁢                                    𝒥              ⇀                        ⁢                          g              ⁡                              (                fx                )                                                    ,                  (                                                    ∇                ⇀                            ⁢              fx                        ,                          x              ′                                )                                                  =                    ⁢                                    (                                                𝒥                  ⇀                                ⁢                g                            )                        ⁢                          (                                                𝒥                  ⇀                                ⁢                f                            )                        ⁢            x                          ,                  x          ′                    It is a little more difficult to derive a compositional variant of . The reason is that we need to derive the {grave over (x)}i values in reverse order from the xi values because  fi maps xi−1 paired with {grave over (x)}i to {grave over (x)}i . . . 1, Recall that:
            x      ‵        i    -                    (                  𝒥          ⁢                                          ⁢                      f                                          i                +                1                            ⁢                                                                            ⁢          …          ⁢                                          ⁢                      f            1                    ⁢                      x            0                          )            T        ⁢                  …        ⁡                  (                      𝒥            ⁢                                                  ⁢                          f              n                        ⁢                                                  ⁢            …            ⁢                                                  ⁢                          f              1                        ⁢                          x              0                                )                    T        ⁢                  x        ‵            n      and, in particular, that:
            x      ‵        0    =                    (                            ⁢                      f            1                    ⁢                      x            0                          )            T        ⁢                  ⁢    …    ⁢                  ⁢                  (                  𝒥          ⁢                                          ⁢                      f            n                    ⁢                                          ⁢          …          ⁢                                          ⁢                      f            1                    ⁢                      x            0                          )            T        ⁢                  x        ‵            n      So:
            x      ‵        0    =                    (                  𝒥          ⁢                                          ⁢                      f            1                    ⁢                      x            0                          )            T        ⁢                  ⁢    …    ⁢                  ⁢                  (                  𝒥          ⁢                                          ⁢                      f            1                    ⁢                                          ⁢          …          ⁢                                          ⁢                      f            1                    ⁢                      x            0                          )            T        ⁢                  x        ‵            i      
Let: {tilde over (x)}i denote this function that maps {tilde over (x)}i to {grave over (x)}0. We can derive {tilde over (x)}i from {tilde over (x)}i . . . 1:
                                          x            ^                    i                =                ⁢                  λ          ⁢                                          ⁢                                                                      x                  .                                i                            ⁡                              (                                  𝒥                  ⁢                                                                          ⁢                                      f                    1                                    ⁢                                      x                    0                                                  )                                      T                    ⁢                                          ⁢          …          ⁢                                          ⁢                                    (                              𝒥                ⁢                                                                  ⁢                                  f                  i                                ⁢                                                                  ⁢                …                ⁢                                                                  ⁢                                  f                  1                                ⁢                                  x                  0                                            )                        T                    ⁢                                    x              ‵                        i                                                  =                ⁢                                            x              ~                                      i              -              1                                ⁢          λ          ⁢                                          ⁢                                                                      x                  ‵                                i                            ⁡                              (                                  𝒥                  ⁢                                                                          ⁢                                      f                    i                                    ⁢                                                                          ⁢                  …                  ⁢                                                                          ⁢                                      f                    1                                    ⁢                                      x                    0                                                  )                                      T                    ⁢                                    x              ‵                        i                                                  =                ⁢                                            x              ~                                      i              -              1                                ⁢          λ          ⁢                                          ⁢                                                                      x                  ‵                                i                            ⁡                              (                                  𝒥                  ⁢                                                                          ⁢                                      f                    i                                    ⁢                                      x                                          i                      -                      1                                                                      )                                      T                    ⁢                                    x              ‵                        i                                                            =                    ⁢                                                    x                ~                                            i                -                1                                      ⁢            λ            ⁢                                                  ⁢                                          x                ‵                            i                        ⁢                                          ∇                ↼                            ⁢                              f                i                                      ⁢                          x                              i                -                1                                                    ,                              x            ‵                    i                                        =                ⁢                  λ          ⁢                                          ⁢                                    x              ‵                        i                    ⁢                                                    x                ~                                            i                -                1                                      (                                                                                ∇                    ↼                                    ⁢                                      f                    1                                                  ⁢                                  x                                      i                    -                    1                                                              ,                                                x                  ‵                                i                                      )                              Just as  fi is a variant of  fi that maps xi . . . 1 paired with {acute over (x)}i . . . 1 to xi paired with {acute over (x)}i, we can define  fi to be a variant of  fi that maps xi . . . 1 paired with {tilde over (x)}i . . . 1 to xi paired with {tilde over (x)}i:
          ⁢    fx    ,            x      ^        ⁢          =      Δ        ⁢          (      fx      )        ,      λ    ⁢                  ⁢          y      ‵        ⁢                            x          ^                (                                            ∇              ↼                        ⁢            fx                    ,                      y            ‵                          )            .      
We refer to {tilde over (x)} as the backpropagator associated with the primal x. If y=f x, then {grave over (x)}={tilde over (y)} {grave over (y)}. Note that  f x, {grave over (y)}=CDR ( f x, I) {grave over (y)}, where I denotes the identity function. Thus one can derive  f from  f and ultimately derive  f from  f.
It is easy to see that  is compositional:
                                        ⁢                      (            gf            )                    ⁢          x                ,                              x            ^                    =                    ⁢                      (            gfx            )                          ,                  λ          ⁢                                          ⁢                      y            ‵                    ⁢                      x            ~                    ⁢                                    ∇              ↼                        ⁢                          (              gf              )                                ⁢          x                ,                  y          ‵                                                  =                    ⁢                      (            gfx            )                          ,                                        ⁢                              λ            ⁢                                                  ⁢                          y              ‵                        ⁢                          x              ~                        ⁢                                          ∇                ↼                            ⁢              fx                                ,                      (                                                            ∇                  ↼                                ⁢                                  g                  ⁡                                      (                    fx                    )                                                              ,                              y                .                                      )                                                            =                    ⁢                      (            gfx            )                          ,                                        ⁢                  λ          ⁢                                          ⁢                                    y              ‵                        (                                          λ                ⁢                                                                  ⁢                                  y                  .                                ⁢                                  x                  ~                                ⁢                                                      ∇                    ↼                                    ⁢                  fx                                            ,                              y                .                                      )                    ⁢                      (                                                            ∇                  ↼                                ⁢                                  g                  ⁡                                      (                    fx                    )                                                              ,                              y                ‵                                      )                                                            =                    ⁢                                    𝒥              ↼                        ⁢                          g              ⁡                              (                fx                )                                                    ,                  λ          ⁢                                          ⁢                      y            .                    ⁢                      x            ~                    ⁢                                    ∇              ↼                        ⁢            fx                          ,                  y          ‵                                                  =                    ⁢                                    (                                                𝒥                  ↼                                ⁢                g                            )                        ⁢                          (                                                𝒥                  ↼                                ⁢                f                            )                        ⁢            x                          ,                  x          ~                    
We refer to a primal x paired with a forward adjoint {acute over (x)} as a forward conjoint and to a primal x paired with a backpropagator {tilde over (x)} as a reverse conjoint. This gives the basic recipe for AD transformations. The forward transformation is an abstract interpretation where primal values are interpreted abstractly as forward conjoint values and functions f are interpreted abstractly as functions  f that map forward conjoint values to forward conjoint values. The reverse transformation is an abstract interpretation where primal values are interpreted abstractly as reverse conjoint values and functions f are interpreted abstractly as functions  f that map reverse conjoint values to reverse conjoint values.