MAbEnv

Introduction to Modeling and Operation of a Continuous Monoclonal Antibody (mAb) Manufacturing Process

Drugs based on monoclonal antibodies (mAbs) play an indispensable role in biopharmaceutical industry in aspects of therapeutic and market potentials. In therapy and diagnosis applications, mAbs are widely used for the treatment of autoimmune diseases, and cancer, etc. According to a recent publication, mAbs also show promising results in the treatment of COVID-19. Until September 22, 2020, 94 therapeutic mAbs have been approved by U.S. Food & Drug Administration (FDA) and the number of mAbs approved within 2010-2020 is three times more than those approved before 2010. In terms of its market value, it is expected to reach a value of $198.2 billion in 2023. Integrated continuous manufacturing of mAbs represents the state-of-the-art in mAb manufacturing and has attracted a lot of attention, because of the steady-state operations, high volumetric productivity, and reduced equipment size and capital cost, etc. However, there is no existing mathematical model of the integrated manufacturing process and there is no optimal control algorithm of the entire integrated process. This project fills the knowledge gaps by first developing a mathematical model of the integrated continuous manufacturing process of mAbs.

Process description

The mAb production process consists of the upstream and the downstream processes. In the upstream process, mAb is produced in a bioreactor which provides a conducive environment mAb growth. The downstream process on the other hand recovers the mAb from the upstream process for storage. In the upstream process for mAb production, fresh media is fed into the bioreactor where a conducive environment is provided for the growth of mAb. A cooling jacket in which a coolant flows is used to control the temperature of the reaction mixture. The contents exiting the bioreactor is passed through a microfiltration unit which recovers part of the fresh media in the stream. The recovered fresh media is recycled back into the bioreactor while the stream with high amount of mAb is sent to the downstream process for further processing. A schematic diagram of upstream process is shown in Figure Fig. 1.

_images/upstream_process.png

Fig. 1 A schematic diagram of the upstream process for mAb production

The objective of the downstream process for mAb production is to purify the stream with high concentration of mAb from the upstream and obtain the desired product. It is composed of a set of fractionating columns, for separating mAb from impurities, and holdup loops, for virus inactivation (VI) and pH conditioning. The schematic diagram of downstream process is shown in Fig. 2.

_images/downstream_process.png

Fig. 2 A schematic diagram of the downstream process for mAb production

Prebuilt Upstream Controllers

There are two provided implementation of advanced process control (APC) techniques on the operation of the upstream continuous mAb production process, Model Predictive Control (MAbUpstreamMPC) and Economic Model Predictive Control (MAbUpstreamEMPC). Here we provide a brief description of both.

Simulation settings

After conducting extensive open-loop tests, the control and prediction horizons \(N\) for both controllers was fixed at 100. This implies that at a sampling time of 1 hour, the controllers plan 100 hours into the future. The weights on the deviation of the states and input from the setpoint were identify matrices.

Upstream Simulation Results

The state and input trajectories of the system under the operation of both MPC and EMPC is shown in Figures Fig. 3 and Fig. 14. It can be seen that MPC and EMPC uses different strategies to control the process. As an example, it can be seen in Figure Fig. 11 that EMPC initially heats up the system before gradually reducing it whereas MPC goes to the setpoint and stays there. Again, EMPC tries to reduce the flow of the recycle stream while MPC increases it as can be seen in Figure Fig. 13. In both controllers though, the recycle stream flow rate was kept low. Although the setpoint for the MPC was determined under the same economic cost function used in the EMPC, it can be seen that the EMPC does not go to that optimal steady state. This could be due to the horizon being short for EMPC. Another possibility could be due to numerical errors since the cost function was not scaled in the EMPC. The cases where MPC was unable to go to the setpoint could be due to numerical errors as a result of the large values of the states and inputs. Further analysis may be required to confirm these assertions.

_images/Figure_1.png

Fig. 3 Trajectories of concentration of viable cells in the bioreactor and separator under the two control algorithms

_images/Figure_2.png

Fig. 4 Trajectories of total viable cells in the bioreactor and separator under the two control algorithms

_images/Figure_3.png

Fig. 5 Trajectories of glucose concentration in the bioreactor and separator under the two control algorithms

_images/Figure_4.png

Fig. 6 Trajectories of glutamine concentration in the bioreactor and separator under the two control algorithms

_images/Figure_5.png

Fig. 7 Trajectories of lactate concentration in the bioreactor and separator under the two control algorithms

_images/Figure_6.png

Fig. 8 Trajectories of ammonia concentration in the bioreactor and separator under the two control algorithms

_images/Figure_7.png

Fig. 9 Trajectories of mAb concentration in the bioreactor and separator under the two control algorithms

_images/Figure_8.png

Fig. 10 Trajectories of reaction mixture volume in the bioreactor and separator under the two control algorithms

_images/Figure_9.png

Fig. 11 Trajectories of the bioreactor temperature and the coolant temperature under the two control algorithms

_images/Figure_10.png

Fig. 12 Trajectories of flow in and out of the bioreactor under the two control algorithms

_images/Figure_11.png

Fig. 13 Trajectories of the recycle flow rate and the flow rate out of the upstream process under the two control algorithms

_images/Figure_12.png

Fig. 14 Trajectories of glucose in fresh media under the two control algorithms

Model parameters

Upstream

Table 1 Parameters for the upstream process model

Parameter

Unit

Value

\(K_{d,amm}\)

\(mM\)

\(1.76\)

\(K_{d,gln}\)

\(hr^{-1}\)

\(0.0096\)

\(K_{glc}\)

\(mM\)

\(0.75\)

\(K_{gln}\)

\(mM\)

\(0.038\)

\(KI_{amm}\)

\(mM\)

\(28.48\)

\(KI_{lac}\)

\(mM\)

\(171.76\)

\(m_{glc}\)

\(mmol/(cell \cdot hr)\)

\(4.9 \times 10^{-14}\)

\(Q_{mAb}^{max}\)

\(mg/(cell\cdot hr)\)

\(6.59 \times 10^{-10}\)

\(Y_{amm,gln}\)

\(mmol/mmol\)

\(0.45\)

\(Y_{lac,glc}\)

\(mmol/mmol\)

\(2.0\)

\(Y_{X,glc}\)

\(cell/mmol\)

\(2.6 \times 10^8\)

\(Y_{X,gln}\)

\(cell/mmol\)

\(8.0 \times 10^8\)

\(\alpha_1\)

\((mM \cdot L)/(cell \cdot h)\)

\(3.4 \times 10^{-13}\)

\(\alpha_2\)

\(mM\)

4.0

\(-\Delta H\)

\(J/mol\)

\(5.0 \times 10^5\)

\(rho\)

\(g/L\)

\(1560.0\)

\(c_p\)

\(J/(g ^\circ C)\)

\(1.244\)

\(U\)

\(J/(h ^\circ C)\)

\(4 \times 10^2\)

\(T_{in}\)

\(^\circ C\)

\(37.0\)

Downstream

The parameters of downstream model are obtained from the work of Gomis-Fons et al and several parameters are modified because the process is upscaled from lab scale to industrial scale. They are summarized in Table 6.2.

Table 2 Parameters of digital twin of downstream

Step

Parameter

Unit

Value

Capture

\(q_{max,1}\)

\(mg/mL\)

\(36.45\)

\(k_{1}\)

\(mL/(mg~min)\)

\(0.704\)

\(q_{max,2}\)

\(mg/mL\)

\(77.85\)

\(k_{2}\)

\(mL/(mg~min)\)

\(2.1\cdot10^{-2}\)

\(K\)

\(mL/mg\)

\(15.3\)

\(D_{eff}\)

\(cm^{2}/min\)

\(7.6\cdot10^{-5}\)

\(D_{ax}\)

\(cm^{2}/min\)

\(5.5\cdot10^{-1}v\)

\(k_{f}\)

\(cm/min\)

\(6.7\cdot10^{-2}v^{0.58}\)

\(r_{p}\)

\(cm\)

\(4.25\cdot10^{-3}\)

\(L\)

\(cm\)

\(20\)

\(V\)

\(mL\)

\(10^5\)

\(\epsilon_c\)

\(-\)

\(0.31\)

\(\epsilon_p\)

\(-\)

\(0.94\)

\(q_{max,elu}\)

\(mg/mL\)

\(114.3\)

\(k_{elu}\)

\(min^{-1}\)

\(0.64\)

\(H_{0,elu}\)

\(M^{\beta}\)

\(2.2\cdot10^{-2}\)

\(\beta_{elu}\)

\(-\)

\(0.2\)

Loop

\(D_{ax}\)

\(cm^{2}/min\)

\(2.9\cdot10^{2}v\)

\(L\)

\(cm\)

\(600\)

\(V\)

\(mL\)

\(5\cdot10^5\)

CEX

\(q_{max}\)

\(mg/mL\)

\(150.2\)

\(k\)

\(min^{-1}\)

\(0.99\)

\(H_{0}\)

\(M^{\beta}\)

\(6.9\cdot10^{-4}\)

\(\beta\)

\(-\)

\(8.5\)

\(D_{app}\)

\(cm^{2}/min\)

\(1.1\cdot10^{-1}v\)

\(L\)

\(cm\)

\(10\)

\(V\)

\(mL\)

\(5\cdot10^{4}\)

\(\epsilon_{c}\)

\(-\)

\(0.34\)

AEX

\(D_{app}\)

\(cm^{2}/min\)

\(1.6\cdot10^{-1}v\)

\(k\)

\(min^{-1}\)

\(0\)

\(L\)

\(cm\)

10

\(V\)

\(mL\)

\(5\cdot10^{4}\)

\(\epsilon_{c}\)

\(-\)

\(0.34\)

mAbEnv module

Following the discription above, we provide APIs as below:

class smpl.envs.mabenv.MAbEnvGym(dataset_dir='smpl/configdata/mabenv', dense_reward=True, normalize=True, debug_mode=False, action_dim=9, observation_dim=1970, reward_function=None, done_calculator=None, max_observations=None, min_observations=None, max_actions=None, min_actions=None, observation_name=None, action_name=None, np_dtype=<class 'numpy.float32'>, max_steps=200, error_reward=-100.0, initial_state_deviation_ratio=0.1, upstream_states=19, switch_threshold=0.5, dt_itgr=60, dt_spl=60, ss_dir=None, standard_reward_style='setpoint')[source]

Bases: smpl.envs.utils.smplEnvBase

done_calculator_standard(current_observation, step_count, reward, done=None, done_info=None)[source]
check whether the current episode is considered finished.

returns a boolean value indicated done or not, and a dictionary with information. here in done_calculator_standard, done_info looks like {“terminal”: boolean, “timeout”: boolean}, where “timeout” is true when episode end due to reaching the maximum episode length, “terminal” is true when “timeout” or episode end due to termination conditions such as env error encountered. (basically done)

Parameters
  • current_observation ([np.ndarray]) – This is denormalized observation, as usual.

  • step_count ([int]) – step_count.

  • reward ([float]) – reward.

  • done ([bool], optional) – Defaults to None.

  • done_info ([dict], optional) – how the environment is finished. Defaults to None.

Returns

done and done_info.

Return type

[(float, dict)]

evaluate_rewards_mean_std_over_episodes(algorithms, num_episodes=1, error_reward=None, initial_states=None, to_plt=True, plot_dir='./plt_results', computer_on_episodes=False)[source]

returns: mean and std of rewards over all episodes. since the rewards_list is not aligned (e.g. some trajectories are shorter than the others), so we cannot directly convert it to numpy array. we have to convert and unwrap the nested list. if computer_on_episodes, we first average the rewards_list over episodes, then compute the mean and std. else, we directly compute the mean and std for each step.

evalute_algorithms(algorithms, num_episodes=1, error_reward=None, initial_states=None, to_plt=True, plot_dir='./plt_results')[source]

when excecuting evalute_algorithms, the self.normalize should be False. algorithms: list of (algorithm, algorithm_name, normalize). algorithm has to have a method predict(observation) -> action: np.ndarray. num_episodes: number of episodes to run error_reward: overwrite self.error_reward initial_states: None, location of numpy file of initial states or a (numpy) list of initial states to_plt: whether generates plot or not plot_dir: None or directory to save plots returns: list of average_rewards over each episode and num of episodes

evenly_spread_initial_states(val_per_state, dump_location=None)[source]

Evenly spread initial states. This function is needed only if the environment has steady_observations.

Parameters

val_per_state (int) – how many values to sampler per state.

Returns: [initial_states]: evenly spread initial_states.

observation_beyond_box(observation)[source]

check if the observation is beyond the box, which is what we don’t want.

Parameters

observation ([np.ndarray]) – This is denormalized observation, as usual.

Returns

observation is beyond the box or not.

Return type

[bool]

reset(initial_state=None, normalize=None)[source]

required by gym. This function resets the environment and returns an initial observation.

reward_function_standard(previous_observation, action, current_observation, reward=None)[source]

the s, a, r, s, a calculation.

Parameters
  • previous_observation ([np.ndarray]) – This is denormalized observation, as usual.

  • action ([np.ndarray]) – This is denormalized action, as usual.

  • current_observation ([np.ndarray]) – This is denormalized observation, as usual.

  • reward ([float]) – If reward is provided, directly return the reward.

Returns

reward.

Return type

[float]

sample_initial_state(lower_bound=None, upper_bound=None)[source]

[summary]

Parameters
  • lower_bound (float, optional) – proportional to steady state.

  • upper_bound (float, optional) – proportional to steady state.

Returns

[description]

Return type

[np.ndarray]

step(action, normalize=None)[source]

required by gym. This function performs one step within the environment and returns the observation, the reward, whether the episode is finished and debug information, if any.

class smpl.envs.mabenv.MAbUpstreamEMPC(controller, action_dim=9, observation_dim=1970)[source]

Bases: object

predict(o)[source]
class smpl.envs.mabenv.MAbUpstreamMPC(controller, action_dim=9, observation_dim=1970)[source]

Bases: object

predict(o)[source]