The set of all decisions is called the operating policy or, more simply, the policy. An optimal policy is one which in some sense gets the best out of the process as a whole by maximizing the value of the product. There are thus three components to an optimal design problem: (1) The specification of the state of the process stream; (2) The specification of the operating variables and the transformation they effect; (3) The specification of the objective function of which the optimization is desired. For a chemical process the first of these might involve the concentrations of the different chemical species, and the temperature or pressure of the stream. For the second we might have to choose the volume of reactor or amount of cooling to be supplied; the way in which the transformation of state depends on the operating variables for the main types of reactors is discussed in the next chapter. The objective function is some measure of the increase in value of the stream by processing; it is the subject of Chapter 4. The essential characteristic of an optimal policy when the state of the stream is transformed in a sequence of stages with no feedback was first isolated by Bellman. He recognized that whatever transformation may be effected in the first stage of an R-stage process, the remaining stages must use an optimal Af-stage policy with respect to the state resulting from the first stage, if there is to be any chance of optimizing the complete process. Moreover, by systematically varying the operating conditions in the first stage and always using the optimal Af-stage policy for the remaining stages, we shall eventually find the optimal policy for all R stages. Proceeding in this way, from one to two and from two to three stages, we may gradually build up the policy for any number. At each step of the calculation the operating variables of only one stage need be varied. To see how important this economy is, let us suppose that there are M operating variables at each stage and that the state is specified by N variables; then the search for the maximum at any one stage will require a number of operations of order Af (where a is some number not unreasonably large). To proceed from one stage to the next a sufficient number of feed states must be investigated to allow for interpolation; this number will be of the order of Af. If, however, we are seeking the optimal R-stage policy for a given feed state, only one search for a maximum is required at the final step. Thus a number of operations of the order of Af are required. If all the operating variables were varied simultaneously, Af operations would be required to do the same job, and as R increases this increases very much more rapidly than the number of operations required by the dynamic program. But even more important than this is the fact that the direct search by simultaneously varying all operating conditions has produced only one optimal policy, namely, that for the given feed state and R stages. In contrast, the dynamic program produces this policy and a whole family of policies for any smaller number of stages. If the problem is enlarged to require a complete coverage of feed states, Af operations are needed by the dynamic program and Af by the direct search. But Af is vastly larger than R. No optimism is more baseless than that which believes that the high speed of modern digital computers allows for use of the crudest of methods in searching out a result. Suppose that Af, and that the average operation requires only Af sec. Then the dynamic program would require about a minute whereas the direct search would take more than three millennia! The principle of optimality thus brings a vital organization into the search for the optimal policy of a multistage decision process. Bellman (1957) has annunciated in the following terms: "An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with respect to the state resulting from the first decision". This is the principle which we will invoke in every case to set up a functional equation. It appears in a form that is admirably suited to the powers of the digital computer. At the same time, every device that can be employed to reduce the number of variables is of the greatest value, and it is one of the attractive features of dynamic programming that room is left for ingenuity in using the special features of the problem to this end. 2.2 the discrete deterministic process Consider the process illustrated in Fig. 2.1, consisting of R distinct stages. These will be numbered in the direction opposite to the flow of the process stream, so that stage R is the T stage from the end. Let the state of the stream leaving stage R be denoted by a vector Af and the operating variables of stage R by Af. Thus Af denotes the state of the feed to the R-stage process, and Af the state of the product from the last stage. Each stage transforms the state Af of its feed to the state Af in a way that depends on the operating variables Af. We write this Af. This transformation is uniquely determined by Af and we therefore speak of the process as deterministic. In practical situations there will be restrictions on the admissible operating conditions, and we regard the vectors as belonging to a fixed and bounded set S. The set of vectors Af constitutes the operating policy or, more briefly, the policy, and a policy is admissible if all the Af belong to S. When the policy has been chosen, the state of the product can be obtained from the state of the feed by repeated application of the transformation (1); thus Af. The objective function, which is to be maximized, is some function, usually piecewise continuous, of the product state. Let this be denoted by Af. An optimal policy is an admissible policy Af which maximizes the objective function P. The policy may not be unique but the maximum value of P certainly is, and once the policy is specified this maximum can be calculated by (2) and (3) as a function of the feed state Af. Let Af where the maximization is over all admissible policies Af. When it is necessary to be specific we say that the optimal policy is an optimal R-stage policy with respect to the feed state Af. For any choice of admissible policy Af in the first stage, the state of the stream leaving this stage is given by Af. This is the feed state of the subsequent Af stages which, according to the principle of optimality, must use an optimal Af-stage policy with respect to this state. This will result in a value Af of the objective function, and when Af is chosen correctly this will give Af, the maximum of the objective function. Thus Af where the maximization is over all admissible policies Af, and Af is related to Af by (5). The sequence of equations (6) can be solved for Af when Af is known, and clearly Af, the maximization being over all admissible Af. The set of equations (5), (6), and the starting equation (7) is of a recursive type well suited to programming on the digital computer. In finding the optimal R-stage policy from that of Af stages, only the function Af is needed. When Af has been found it may be transferred into the storage location of Af and the whole calculation repeated. We also see how the results may be presented, although if n, the number of state variables, is large any tabulation will become cumbersome. A table or set of tables may be set out as in Table 2.1. To extract the optimal R-stage policy with respect to the feed state Af, we enter section R of this table at the state Af and find immediately from the last column the maximum value of the objective function. In the third column is given the optimal policy for stage R, and in the fourth, the resulting state of the stream when this policy is used. Since by the principle of optimality the remaining stages use an optimal Af-stage policy with respect to Af, we may enter section Af of the table at this state Af and read off the optimal policy for stage Af and the resulting state Af. Proceeding in this way up the table we extract the complete optimal policy and, if it is desired, we can check on Af by evaluating Af at the last stage. It may be that the objective function depends not only on Af but also on Af, as when the cost of the operating policy is considered. A moment's reflection shows that the above algorithm and presentation work equally well in this case. A form of objective function that we shall often have occasion to consider is Af. Here P represents the value of the stream in state P and Q the cost of operating the stage with conditions Q. Hence P is the increase in value of the stream minus the cost of operation, that is, the net profit. If Af denotes the net profit from stage R and Af, then the principle of optimality gives Af. This sequence of equations may be started with the remark that with no process Af there is no profit, i.e., Af. 2.3 the discrete stochastic process The process in which the outcome of any one stage is known only statistically is also of interest, although for chemical reactor design it is not as important as the deterministic process. In this case the stage R operating with conditions Af transforms the state of the stream from Af to Af, but only the probability distribution of Af is known. This is specified by a distribution function Af such that the probability that Af lies in some region D of the stage space is Af. We cannot now speak of maximizing the value of the objective function, since this function is now known only in a probabilistic sense. We can, however, maximize its expected value. For a single stage we may define Af where the maximization is by choice of Af. We thus have an optimal policy which maximizes the expected value of the objective function for a given Af. If we consider a process in which the outcome of one stage is known before passage to the next, then the principle of optimality shows that the policy in subsequent stages should be optimal with respect to the outcome of the first. Then Af, the maximization being over all admissible Af and the integration over the whole of stage space. The type of presentation of results used in the deterministic process may be used here, except that now the fourth column is redundant. The third column gives the optimal policy, but we must wait to see the outcome of stage R and enter the preceding section of the table at this state. The discussion of the optimal policy when the outcome of one stage is not known before passing to the next is a very much more difficult matter. 2.4 the continuous deterministic process In many cases it is not possible to divide the process into a finite number of discrete stages, since the state of the stream is transformed in a continuous manner through the process. We replace r, the number of the stage from the end of the process, by t, a continuous variable which measures the "distance" of the point considered from the end of the process. The word distance is used here in a rather general sense; it may in fact be the time that will elapse before the end of the process. If T is the total "length" of the process, its feed state may be denoted by a vector p(T) and the product state by p(Q). P denotes the state at any point T and Q the vector of operating variables there.