Featured Post

Trie implementation in C

Longest common subsequence in Java

Longest Common Subsequence (LCS)

  • It's the longest sequence of characters that is present in the same relative order within two or more strings.
  • It doesn't have to be a contiguous substring; the characters can appear in different positions within the original strings.

Understanding the Problem

  • Imagine two strings, written vertically on separate columns.
  • The goal is to find the longest sequence of characters that appears in the same order within both strings, even if they're not consecutive.

Character Grid

  • Create a grid where each row represents a character from the first string, and each column represents a character from the second string.
  • Fill each cell with a color indicating whether the corresponding characters match (e.g., green) or not (e.g., white).

Tracing Paths

  • Start at the top-left corner of the grid.
  • Move diagonally down-right only when characters match (green cells).
  • If characters don't match, move either down or right to continue searching for matches.
  • The path you take represents the LCS.

Visualize the LCS

  • Highlight the cells along the longest path with a bolder color or border.
  • The characters in those cells, read in order, form the LCS.

Example:

Consider strings "AGGTAB" and "GXTXAYB":

LCS: "GTAB" (highlighted cells)

Algorithm

  1. Dynamic Programming Table:

    • Create a 2D table dp with dimensions (m+1) x (n+1), where m and n are the lengths of the two strings.
    • Fill the first row and column with 0s, indicating no LCS for empty strings.
  2. Filling the Table:

    • Iterate through the table, starting from dp[1][1].
    • For each cell dp[i][j]:
      • If the characters at indices i-1 and j-1 in the strings match:
        • Set dp[i][j] = dp[i-1][j-1] + 1 (add 1 to the LCS length of the previous substrings).
      • Else:
        • Set dp[i][j] = max(dp[i-1][j], dp[i][j-1]) (take the maximum LCS length from the previous options).
  3. Reconstructing the LCS:

    • Start from the bottom-right corner of the table (dp[m][n]).
    • Backtrack:
      • If the characters at indices i-1 and j-1 match, add the character to the LCS and move diagonally up (i--, j--).
      • Else, move to the cell with the maximum value (either i-- or j--).
    • Reverse the constructed LCS string to get the correct order.
 


public class LCS {

    public static String lcs(String str1, String str2) {
        int m = str1.length();
        int n = str2.length();

        int[][] dp = new int[m + 1][n + 1];

        // Build the table in a bottom-up manner
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }

        // Reconstruct the LCS
        StringBuilder lcs = new StringBuilder();
        int i = m, j = n;
        while (i > 0 && j > 0) {
            if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                lcs.append(str1.charAt(i - 1));
                i--;
                j--;
            } else if (dp[i - 1][j] > dp[i][j - 1]) {
                i--;
            } else {
                j--;
            }
        }

        // Reverse the LCS string
        return lcs.reverse().toString();
    }

    // Example usage for testing
    public static void main(String[] args) {
        System.out.println(lcs("ABCDEFG", "BCDGK"));
        System.out.println(lcs("AGGTAB", "GXTXAYB"));
        System.out.println(lcs("ABCDE", "A"));
        System.out.println(lcs("A", "BCDEF"));
    }
}



Explanation for code

1. lcs() Method:

  • Takes two strings as input.
  • Uses dynamic programming to find the longest common subsequence (LCS).
  • Returns the LCS string.

Steps

  • Create a DP Table: Stores the lengths of LCSs for substrings of different lengths.
  • Fill the DP Table:
    • Iterates through characters of both strings.
    • For each cell, calculates the length of the LCS based on previous calculations:
      • If characters match, add 1 to the LCS length of the previous substrings.
      • If not, take the maximum LCS length from the previous options.
  • Reconstruct the LCS: Backtracks through the DP table to find the characters in the LCS.
  • Reverse the LCS: Since it's built backward, reverse it before returning.

Take away points

  • Bottom-up approach: Builds the solution from smaller subproblems to the final LCS.
  • Dynamic programming principle: Avoids redundant calculations by storing intermediate results.
  • Time complexity: O(mn), where m and n are string lengths.
  • Space complexity: O(mn), due to the DP table.

Comments