Dynamic programming with Java

Introduction

Dynamic programming is a very powerful tool  for optimization that not all developers have encountered.

So what is dynamic programming? Dynamic programming is a method for solving complex problems by dividing them into simpler subproblems and storing the solution to each subproblem so it can be looked up later. There is another type of algorithm, the ”greedy” algorithm, that also solves problem by dividing them into to subproblem but the difference is that for greedy algorithms you only use the result of the last subproblem when calculating the result for the following problem. The greedy approach will not work for more complicated problems.

Example: Using dynamic programming to attack the death star

Suppose you are the commander of the alliance fleet and you want to attack the death star. Rebel spies have gathered information about potential targets on the death star and calculated the amount of damage that would be caused by attacking these. The problem is how to deal with the protective shield surrounding the death star and the imperial star fleet. Now the alliance fleet has a brand new star ship that is able to penetrate the shield but this still leaves the problem with the imperial fleet, what to do? You come up with the idea that the rest of the rebel fleet should make a diversion by attacking an imperial stronghold on a nearby planet. This will create a window of opportunity for say 10 minutes. So the goal is to make as much possible damage to the death star given these 10 minutes.

You have the following available missions:

 Mission nr Estimated time (minutes) Estimated damage on death star (%) 1 2 4 2 4 10 3 6 12 4 7 14

Let T be the total time (10 minutes).
So we have the following inputs:

• Missions m1,….,m4
• Each mission takes time t1,…., t4
• Each mission causes damage d1,…., d4
• Total time T = 10 minutes

What missions should we choose given the timeframe of 10 minutes to make the most damage?

Initial solution

To make things easier we will initially assume that the missions are ”repeatable” that is it is possible to execute the same mission several times.

A naive and ”greedy” approach would be to always choose the mission that has makes the most damage and is possible to execute given the time left. This would result in the following:

 7 m 2 1 m m4 m1

This results in a total of 18% damage. It is obvious that a better solution would be:

 4 m 4 m 2 m m2 m2 m1

This results in a total of 24% damage.

Now for this simple problem with few items (missions) and a few discrete values (amount of damage) it is pretty easy to find the optimal solution by visual inspection but when the number of items increases it will soon be impossible.

It is obvious that in order to find an optimal solution you must investigate different combinations of items (missions). One way of doing it would be to compare the value for all possible combinations of missions for a given time limit. Say that we have 10 items and that each item can be position in 1 of 20 locations that means 10^20 combinations = it will take more than a lifetime to calculate so we need a better solution. Dynamic programming to rescue!

We will try to find a solution to the problem by dividing it into subproblems.

Lets say we have an optimal solution for total time T and it contains missions m1, mi, mn with their corresponding times of t1, ti, tn:

 t1 ti tn m1 mi mn

Now if we remove mission mi with time ti, clearly what is left is an optimal solution for time T – ti. This implies that if we want to calculate maximal damage for a given time, damage(t), then we can do this by solving damage(t – ti) + di. Here ”i” can denote any mission with time ti and damage di but we dont know which one so we must try all of them.

Lets define the subproblems more formally:
D(t) is the maximal damage for that can be achieved given a timeframe of t.
Then D(t) can be defined using the subproblem: max { D(t – ti) + di}. Where ti <= t

So in order to find maximal D(t) the maximal damage for t – ti must already be known. We will therefore start with the simplest case e.g D(t0) = 0 and then move forward, minute by minute, until we reach t.

For a given timelimit t we will try to add all possible missions with time less or equal to the total available time and then choose the mission that result in greatest damage in total. To calculate the optimal total damage for a given time we need to know the optimal total damage for all cases with less time so that we can compare the net effect for adding a given mission.

An optimal vector for t0,…,t10 will be:

 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 0% 0% 4% 4% 10% 10% 14% 14% 20% 20% 24%

How do we reach this result?

• For t0, t1 the maximal damage will be 0% since there are no missions that are less than 2 minutes
• For t2, t3 the maximal damage will be 4% since the only mission that can be executed is m1
• For t4 things get a little more interesting. We can now choose to execute m1 or m2. To calculate which one will result in most damage for the total solution we will compare these two options. To estimate the total damage for using m1 we will look at the optimal damage for t4-2 = 4% and then add the damage for m1 (4%) which implies a total damage of 8%. To estimate the total damage for using m2 we will look at the optimal damage for t4-4 (0%) and then add the damage for m2 (10%) which implies a total damage of 10%. To get maximal damage we therefore choose m2.

The following values are calculated in a similar way and note how we look up the results of previous subproblems two make the right decision for a given point in time.

The optimal damage given time = 10 minutes will be the value for t10 which is 24% but what missions were involved for reaching this value?

To find the missions we need to backtrack which subproblems were used to arrive at the value of 24% for t10.

• For t10 we added mission m2 that has time t2 = 4 minutes
• For t10 – 4 we also added mission m2 that has time t2 = 4 minutes
• For t10 – 4 – 4 we added mission m1 that has time t1 = 2 minutes
 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 0% 0% 4% 4% 10% 10% 14% 14% 20% 20% 24%

To summarize, the solution included:

• m2 (d2 = 10%, t2 = 4 minutes)
• m2 (d2 = 10%, t2 = 4 minutes)
• m1 (d1 = 4%, t1 = 2 minutes)
• Total damage 24%, total time = 10 minutes

Below is function in Java that can be used to calculate a vector of damage for a given total time. The function returns a matrix where the first column contains the damage in procent and the second column is a pointer to the mission that was used chosen for the current time.

int[][] optimalForRepeatableMissions(int totalTime, int[] missionTimes, int[] missionDamages) {
int totalNrOfAvailableMissions = missionTimes.length;
int damages[][] = new int[totalTime + 1]; // add one extra dimension for storing index for used mission

for (int currentTotalTime = 1; currentTotalTime <= totalTime; currentTotalTime++) {
damages[currentTotalTime] = -1;
for (int missionNr = 1; missionNr <= totalNrOfAvailableMissions; missionNr++) {
int missionTime = missionTimes[missionNr - 1];
int damage = missionDamages[missionNr - 1];
if (missionTime <= currentTotalTime) {
int damageForAddingCurrentMission = damages[currentTotalTime - missionTime] + damage; if (damages[currentTotalTime] < damageForAddingCurrentMission) { damages[currentTotalTime] = damageForAddingCurrentMission; damages[currentTotalTime] = missionNr; } } } } return damages; }

To print the maximal damage for the previous example it can be invoked like this:

int[] missionTimes = new int[] { 2, 4, 6, 7 };
int[] missionDamages = new int[] { 4, 10, 12, 14 };
int[][] damages = optimalForRepeatableMissions(10, missionTimes, missionDamages);
System.out.println(damages[damages.length - 1]);

The expected running time for calculating maximal damage is O(Tn) where n is the number of available missions and T is total available time.

Final solution

Now consider a more "realistic" situation where the missions are not repeatable (its only possible to blow up a target once...) what changes must we do to our initial solution?

Previous we defined a subproblem as:

max { D(t - ti) + di}. Where ti <= t

But in the statement above we assumed that mission mi with time ti and damage di could be any mission. This is no longer true since a given mission can only be used once so we can only consider missions that have not been executed. We need to add an additional variable to indicate what missions are being used.

Therefore will use the following definitions:
D(t, i) is the maximal damage that can be achieved given a timeframe of t using missions 1,...,i
Then D(t, i) can be defined using the subproblem: max { D(t - ti, i - 1) + di, D(t, i - 1)}

D(t - ti, i - 1) will be applied when mission i is needed to achieve the optimal solution and D(t, i - 1) will be applied when i is not needed to achieve the optimal solution.

Instead of a vector the result will be a matrix with time on one axis and number of available missions on the other. We will calculate the maximum damage progressively looking up the damage for previous cases.

 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 m0 0 0 0 0 0 0 0 0 0 0 0 m1 0 0 4 4 4 4 4 4 4 4 4 m2 0 0 4 4 10 10 14 14 14 14 14 m3 0 0 4 4 10 10 14 14 16 16 22 m4 0 0 4 4 10 10 14 14 16 18 22

We find that using non repeatable missions leads to a total damage of 22% for time = 10 minutes.

We can find the path taken through the matrix (ie what missions were used) by backtracking from D(t10, m4).

• We see that moving from D(t10, m4) to D(t10, m3) does not imply any change in damage, this means that m4 was not included in the solution.
• We see that moving from D(t10, m3) to D(t10, m2) implies a change in damage, this means that m3 was included in the solution.
• m3 has a mission time of 6 minutes which means that we will move back 6 minutes to find previous solution so we end up on D(t4, m2)
• Using the same reasoning again we can conlude that m2 was included in the solution.
• m2 has a mission time of 4 minutes which leads us back to time = 0 and we are finished

To summarize, the final solution included:

• m3 (d3 = 12%, t3 = 6 minutes)
• m2 (d2 = 10%, t2 = 4 minutes)
• Total damage 22%, total time = 10 minutes

Below is the function in Java that is used to calculate the matrix above.

int[][] optimalForNonRepeatableMissions(final int totalTime, final int[] missionTimes, final int[] missionDamages) {
int totAvailableMissions = missionTimes.length;
int damages[][] = new int[totalTime + 1][totAvailableMissions + 1];
for (int missionNr = 1; missionNr <= totAvailableMissions; missionNr++) {
int missionTime = missionTimes[missionNr - 1];
int damage = missionDamages[missionNr - 1];
for (int currentTotTime = 1; currentTotTime <= totalTime; currentTotTime++) {
damages[currentTotTime][missionNr] = damages[currentTotTime][missionNr - 1];
if (missionTime <= currentTotTime) {
int damageAddingCurrentMission = damages[currentTotTime - missionTime][missionNr - 1] + damage;
if (damages[currentTotTime][missionNr] < damageAddingCurrentMission) {
}
}
}
}
return damages;
}

The complete java code example can be downloaded from GitHub

May the force be with you... Java specialist at CAG Contactor