Introduction
Dynamic programming is a very powerful tool for optimization that not all developers have encountered.
So what is dynamic programming? Dynamic programming is a method for solving complex problems by dividing them into simpler subproblems and storing the solution to each subproblem so it can be looked up later. There is another type of algorithm, the ”greedy” algorithm, that also solves problem by dividing them into to subproblem but the difference is that for greedy algorithms you only use the result of the last subproblem when calculating the result for the following problem. The greedy approach will not work for more complicated problems.
Example: Using dynamic programming to attack the death star
Suppose you are the commander of the alliance fleet and you want to attack the death star. Rebel spies have gathered information about potential targets on the death star and calculated the amount of damage that would be caused by attacking these. The problem is how to deal with the protective shield surrounding the death star and the imperial star fleet. Now the alliance fleet has a brand new star ship that is able to penetrate the shield but this still leaves the problem with the imperial fleet, what to do? You come up with the idea that the rest of the rebel fleet should make a diversion by attacking an imperial stronghold on a nearby planet. This will create a window of opportunity for say 10 minutes. So the goal is to make as much possible damage to the death star given these 10 minutes.
You have the following available missions:
Mission nr | Estimated time (minutes) | Estimated damage on death star (%) |
1 | 2 | 4 |
2 | 4 | 10 |
3 | 6 | 12 |
4 | 7 | 14 |
Let T be the total time (10 minutes).
So we have the following inputs:
- Missions m_{1},….,m_{4}
- Each mission takes time t_{1},…., t_{4}
- Each mission causes damage d_{1},…., d_{4}
- Total time T = 10 minutes
What missions should we choose given the timeframe of 10 minutes to make the most damage?
Initial solution
To make things easier we will initially assume that the missions are ”repeatable” that is it is possible to execute the same mission several times.
A naive and ”greedy” approach would be to always choose the mission that has makes the most damage and is possible to execute given the time left. This would result in the following:
7 m | 2 | 1 m |
m_{4} | m_{1} |
This results in a total of 18% damage. It is obvious that a better solution would be:
4 m | 4 m | 2 m |
m_{2} | m_{2} | m_{1} |
This results in a total of 24% damage.
Now for this simple problem with few items (missions) and a few discrete values (amount of damage) it is pretty easy to find the optimal solution by visual inspection but when the number of items increases it will soon be impossible.
It is obvious that in order to find an optimal solution you must investigate different combinations of items (missions). One way of doing it would be to compare the value for all possible combinations of missions for a given time limit. Say that we have 10 items and that each item can be position in 1 of 20 locations that means 10^20 combinations = it will take more than a lifetime to calculate so we need a better solution. Dynamic programming to rescue!
We will try to find a solution to the problem by dividing it into subproblems.
Lets say we have an optimal solution for total time T and it contains missions m_{1}, m_{i}, m_{n} with their corresponding times of t_{1}, t_{i}, t_{n}:
t_{1} | t_{i} | t_{n} |
m_{1} | m_{i} | m_{n} |
Now if we remove mission m_{i} with time t_{i}, clearly what is left is an optimal solution for time T – t_{i}. This implies that if we want to calculate maximal damage for a given time, damage(t), then we can do this by solving damage(t – t_{i}) + d_{i}. Here ”i” can denote any mission with time t_{i} and damage d_{i} but we dont know which one so we must try all of them.
Lets define the subproblems more formally:
D(t) is the maximal damage for that can be achieved given a timeframe of t.
Then D(t) can be defined using the subproblem: max { D(t – t_{i}) + d_{i}}. Where t_{i} <= t
So in order to find maximal D(t) the maximal damage for t – t_{i} must already be known. We will therefore start with the simplest case e.g D(t_{0}) = 0 and then move forward, minute by minute, until we reach t.
For a given timelimit t we will try to add all possible missions with time less or equal to the total available time and then choose the mission that result in greatest damage in total. To calculate the optimal total damage for a given time we need to know the optimal total damage for all cases with less time so that we can compare the net effect for adding a given mission.
An optimal vector for t_{0},…,t_{10} will be:
t_{0} | t_{1} | t_{2} | t_{3} | t_{4} | t_{5} | t_{6} | t_{7} | t_{8} | t_{9} | t_{10} |
0% | 0% | 4% | 4% | 10% | 10% | 14% | 14% | 20% | 20% | 24% |
How do we reach this result?
- For t_{0}, t_{1} the maximal damage will be 0% since there are no missions that are less than 2 minutes
- For t_{2}, t_{3} the maximal damage will be 4% since the only mission that can be executed is m_{1}
- For t_{4} things get a little more interesting. We can now choose to execute m_{1} or m_{2}. To calculate which one will result in most damage for the total solution we will compare these two options. To estimate the total damage for using m_{1} we will look at the optimal damage for t_{4-2} = 4% and then add the damage for m_{1} (4%) which implies a total damage of 8%. To estimate the total damage for using m_{2} we will look at the optimal damage for t_{4-4} (0%) and then add the damage for m_{2} (10%) which implies a total damage of 10%. To get maximal damage we therefore choose m_{2}.
The following values are calculated in a similar way and note how we look up the results of previous subproblems two make the right decision for a given point in time.
The optimal damage given time = 10 minutes will be the value for t_{10} which is 24% but what missions were involved for reaching this value?
To find the missions we need to backtrack which subproblems were used to arrive at the value of 24% for t_{10}.
- For t_{10} we added mission m_{2} that has time t_{2} = 4 minutes
- For t_{10 – 4} we also added mission m_{2} that has time t_{2} = 4 minutes
- For t_{10 – 4 – 4} we added mission m_{1} that has time t_{1} = 2 minutes
t_{0} | t_{1} | t_{2} | t_{3} | t_{4} | t_{5} | t_{6} | t_{7} | t_{8} | t_{9} | t_{10} |
0% | 0% | 4% | 4% | 10% | 10% | 14% | 14% | 20% | 20% | 24% |
To summarize, the solution included:
- m_{2} (d_{2} = 10%, t_{2} = 4 minutes)
- m_{2} (d_{2} = 10%, t_{2} = 4 minutes)
- m_{1} (d_{1} = 4%, t_{1} = 2 minutes)
- Total damage 24%, total time = 10 minutes
Below is function in Java that can be used to calculate a vector of damage for a given total time. The function returns a matrix where the first column contains the damage in procent and the second column is a pointer to the mission that was used chosen for the current time.
int[][] optimalForRepeatableMissions(int totalTime, int[] missionTimes, int[] missionDamages) {
int totalNrOfAvailableMissions = missionTimes.length;
int damages[][] = new int[totalTime + 1][2]; // add one extra dimension for storing index for used mission
for (int currentTotalTime = 1; currentTotalTime <= totalTime; currentTotalTime++) {
damages[currentTotalTime][1] = -1;
for (int missionNr = 1; missionNr <= totalNrOfAvailableMissions; missionNr++) {
int missionTime = missionTimes[missionNr - 1];
int damage = missionDamages[missionNr - 1];
if (missionTime <= currentTotalTime) {
int damageForAddingCurrentMission = damages[currentTotalTime - missionTime][0] + damage;
if (damages[currentTotalTime][0] < damageForAddingCurrentMission) {
damages[currentTotalTime][0] = damageForAddingCurrentMission;
damages[currentTotalTime][1] = missionNr;
}
}
}
}
return damages;
}
To print the maximal damage for the previous example it can be invoked like this:
int[] missionTimes = new int[] { 2, 4, 6, 7 };
int[] missionDamages = new int[] { 4, 10, 12, 14 };
int[][] damages = optimalForRepeatableMissions(10, missionTimes, missionDamages);
System.out.println(damages[damages.length - 1][0]);
The expected running time for calculating maximal damage is O(Tn) where n is the number of available missions and T is total available time.
Final solution
Now consider a more "realistic" situation where the missions are not repeatable (its only possible to blow up a target once...) what changes must we do to our initial solution?
Previous we defined a subproblem as:
max { D(t - t_{i}) + d_{i}}. Where t_{i} <= t
But in the statement above we assumed that mission m_{i} with time t_{i} and damage d_{i} could be any mission. This is no longer true since a given mission can only be used once so we can only consider missions that have not been executed. We need to add an additional variable to indicate what missions are being used.
Therefore will use the following definitions:
D(t, i) is the maximal damage that can be achieved given a timeframe of t using missions 1,...,i
Then D(t, i) can be defined using the subproblem: max { D(t - t_{i}, i - 1) + d_{i}, D(t, i - 1)}
D(t - t_{i}, i - 1) will be applied when mission i is needed to achieve the optimal solution and D(t, i - 1) will be applied when i is not needed to achieve the optimal solution.
Instead of a vector the result will be a matrix with time on one axis and number of available missions on the other. We will calculate the maximum damage progressively looking up the damage for previous cases.
t_{0} | t_{1} | t_{2} | t_{3} | t_{4} | t_{5} | t_{6} | t_{7} | t_{8} | t_{9} | t_{10} | |
m_{0} | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
m_{1} | 0 | 0 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
m_{2} | 0 | 0 | 4 | 4 | 10 | 10 | 14 | 14 | 14 | 14 | 14 |
m_{3} | 0 | 0 | 4 | 4 | 10 | 10 | 14 | 14 | 16 | 16 | 22 |
m_{4} | 0 | 0 | 4 | 4 | 10 | 10 | 14 | 14 | 16 | 18 | 22 |
We find that using non repeatable missions leads to a total damage of 22% for time = 10 minutes.
We can find the path taken through the matrix (ie what missions were used) by backtracking from D(t_{10}, m_{4}).
- We see that moving from D(t_{10}, m_{4}) to D(t_{10}, m_{3}) does not imply any change in damage, this means that m_{4} was not included in the solution.
- We see that moving from D(t_{10}, m_{3}) to D(t_{10}, m_{2}) implies a change in damage, this means that m_{3} was included in the solution.
- m_{3} has a mission time of 6 minutes which means that we will move back 6 minutes to find previous solution so we end up on D(t_{4}, m_{2})
- Using the same reasoning again we can conlude that m_{2} was included in the solution.
- m_{2} has a mission time of 4 minutes which leads us back to time = 0 and we are finished
To summarize, the final solution included:
- m_{3} (d_{3} = 12%, t_{3} = 6 minutes)
- m_{2} (d_{2} = 10%, t_{2} = 4 minutes)
- Total damage 22%, total time = 10 minutes
Below is the function in Java that is used to calculate the matrix above.
int[][] optimalForNonRepeatableMissions(final int totalTime, final int[] missionTimes, final int[] missionDamages) {
int totAvailableMissions = missionTimes.length;
int damages[][] = new int[totalTime + 1][totAvailableMissions + 1];
for (int missionNr = 1; missionNr <= totAvailableMissions; missionNr++) {
int missionTime = missionTimes[missionNr - 1];
int damage = missionDamages[missionNr - 1];
for (int currentTotTime = 1; currentTotTime <= totalTime; currentTotTime++) {
damages[currentTotTime][missionNr] = damages[currentTotTime][missionNr - 1];
if (missionTime <= currentTotTime) {
int damageAddingCurrentMission = damages[currentTotTime - missionTime][missionNr - 1] + damage;
if (damages[currentTotTime][missionNr] < damageAddingCurrentMission) {
damages[currentTotTime][missionNr] = damageAddingCurrentMission;
}
}
}
}
return damages;
}
The complete java code example can be downloaded from GitHub
May the force be with you...
Kommentarer