Check If Two Strings Match Where One String Contains Wildcard Characters In C++ Program
Matching strings with wildcards is a common problem in computer science, used in file system searches, text editors, and database queries. This article explores how to implement a wildcard matching algorithm in C++, focusing on efficiency and clarity. You will learn two distinct approaches to solve this problem, handling both single-character and multi-character wildcards.
Problem Statement
The challenge is to determine if a given text string s matches a pattern string p that can contain wildcard characters. Specifically, the pattern p can include:
?: Matches any single character.*: Matches any sequence of characters (including an empty sequence).
This problem often arises in scenarios where flexible string comparison is needed, such as filtering filenames, searching logs, or validating user input against a templated format.
Example
Consider the text s = "adceb" and pattern p = "*a*b". The desired output is true because the * can match "adce" before 'a' and "e" before 'b'.
Background & Knowledge Prerequisites
To understand the solutions presented, a basic grasp of the following C++ concepts is beneficial:
- Strings: Manipulating
std::stringobjects. - Recursion: Understanding how functions call themselves.
- Dynamic Programming (DP): Familiarity with memoization or tabulation to optimize recursive solutions.
- Pointers or Indices: Tracking current positions in strings.
No specific libraries beyond and are required for the core logic.
Use Cases or Case Studies
Wildcard matching finds application in various domains:
- File Search Utilities: When you type
*.txtto find all text files, a wildcard matching algorithm is at work. - Text Editors: Features like "find with wildcards" allow users to search for patterns beyond exact substrings.
- Database Query Optimization: Some SQL
LIKEclauses utilize similar logic for pattern matching. - Network Packet Filtering: Defining rules based on source/destination addresses or port numbers that might contain wildcards.
- Configuration File Parsing: Matching specific lines or parameters that follow a general pattern.
Solution Approaches
We will explore two common approaches: recursive backtracking and dynamic programming.
Approach 1: Recursive Backtracking
This approach directly translates the problem's rules into recursive calls. It's intuitive but can be inefficient due to repeated computations for the same subproblems.
- One-line summary: Recursively checks all possible matches by handling characters,
?, and*case by case, backtracking when a path fails.
// Wildcard Matching Recursive
#include <iostream>
#include <string>
#include <vector>
// Function to check if string s matches pattern p using recursion
bool isMatchRecursive(const std::string& s, const std::string& p, int s_idx, int p_idx) {
// Base Case 1: If both pattern and string are exhausted, it's a match.
if (p_idx == p.length()) {
return s_idx == s.length();
}
// Base Case 2: If string is exhausted but pattern still has characters
// Only a match if remaining pattern consists solely of '*'
if (s_idx == s.length()) {
for (int i = p_idx; i < p.length(); ++i) {
if (p[i] != '*') {
return false;
}
}
return true;
}
// Case 1: Current pattern character is '*'
if (p[p_idx] == '*') {
// Option A: '*' matches an empty sequence (advance pattern)
// Option B: '*' matches the current character in s (advance string)
return isMatchRecursive(s, p, s_idx, p_idx + 1) ||
isMatchRecursive(s, p, s_idx + 1, p_idx);
}
// Case 2: Current pattern character is '?' or matches current string character
else if (p[p_idx] == '?' || s[s_idx] == p[p_idx]) {
return isMatchRecursive(s, p, s_idx + 1, p_idx + 1);
}
// Case 3: No match
else {
return false;
}
}
int main() {
// Step 1: Define test cases
std::string s1 = "adceb";
std::string p1 = "*a*b"; // Expected: true
std::string s2 = "acdcb";
std::string p2 = "a*c?b"; // Expected: false
std::string s3 = "aa";
std::string p3 = "a"; // Expected: false
std::string s4 = "aa";
std::string p4 = "*"; // Expected: true
std::string s5 = "ab";
std::string p5 = "?*"; // Expected: true
std::string s6 = "mississippi";
std::string p6 = "mis*is*p*?"; // Expected: false
// Step 2: Test the recursive function
std::cout << "Test 1 (\\"" << s1 << "\\", \\"" << p1 << "\\"): "
<< (isMatchRecursive(s1, p1, 0, 0) ? "true" : "false") << std::endl;
std::cout << "Test 2 (\\"" << s2 << "\\", \\"" << p2 << "\\"): "
<< (isMatchRecursive(s2, p2, 0, 0) ? "true" : "false") << std::endl;
std::cout << "Test 3 (\\"" << s3 << "\\", \\"" << p3 << "\\"): "
<< (isMatchRecursive(s3, p3, 0, 0) ? "true" : "false") << std::endl;
std::cout << "Test 4 (\\"" << s4 << "\\", \\"" << p4 << "\\"): "
<< (isMatchRecursive(s4, p4, 0, 0) ? "true" : "false") << std::endl;
std::cout << "Test 5 (\\"" << s5 << "\\", \\"" << p5 << "\\"): "
<< (isMatchRecursive(s5, p5, 0, 0) ? "true" : "false") << std::endl;
std::cout << "Test 6 (\\"" << s6 << "\\", \\"" << p6 << "\\"): "
<< (isMatchRecursive(s6, p6, 0, 0) ? "true" : "false") << std::endl;
return 0;
}
- Sample output:
Test 1 ("adceb", "*a*b"): true
Test 2 ("acdcb", "a*c?b"): false
Test 3 ("aa", "a"): false
Test 4 ("aa", "*"): true
Test 5 ("ab", "?*"): true
Test 6 ("mississippi", "mis*is*p*?"): false
- Stepwise explanation:
- The
isMatchRecursivefunction takes the strings, patternp, and their current indicess_idxandp_idx. - Base Cases:
- If
p_idxreaches the end ofp: The match is true only ifs_idxalso reached the end ofs.
- If
- If
s_idxreaches the end ofs: The match is true only if all remaining characters inpare*.
- Wildcard
*: Ifp[p_idx]is*, there are two possibilities:*matches an empty sequence: Recurse by advancing onlyp_idx(isMatchRecursive(s, p, s_idx, p_idx + 1)).
*matches the current character ins: Recurse by advancing onlys_idx(isMatchRecursive(s, p, s_idx + 1, p_idx)).- The result is
trueif either of these paths leads to a match.- Wildcard
?or direct match: Ifp[p_idx]is?ors[s_idx]equalsp[p_idx], they match. Recurse by advancing both indices (isMatchRecursive(s, p, s_idx + 1, p_idx + 1)). - No match: If none of the above conditions are met, the characters do not match, and
falseis returned.
Approach 2: Dynamic Programming
Dynamic programming optimizes the recursive approach by storing the results of subproblems in a table, avoiding redundant calculations. This transforms the exponential time complexity of pure recursion into a polynomial one.
- One-line summary: Builds a 2D boolean table (DP table) where
dp[i][j]indicates if the firsticharacters of the string match the firstjcharacters of the pattern.
// Wildcard Matching Dynamic Programming #include <iostream> #include <string> #include <vector> // Function to check if string s matches pattern p using dynamic programming bool isMatchDP(const std::string& s, const std::string& p) { int m = s.length(); int n = p.length(); // dp[i][j] will be true if the first i characters of s match // the first j characters of p. std::vector<std::vector<bool>> dp(m + 1, std::vector<bool>(n + 1, false)); // Base Case 1: Empty string and empty pattern match dp[0][0] = true; // Base Case 2: Empty string s can match patterns consisting only of '*' // e.g., s = "", p = "***" => true for (int j = 1; j <= n; ++j) { if (p[j - 1] == '*') { dp[0][j] = dp[0][j - 1]; // '*' matches an empty sequence } } // Fill the DP table for (int i = 1; i <= m; ++i) { for (int j = 1; j <= n; ++j) { // Case 1: Current pattern character is '*' if (p[j - 1] == '*') { // '*' matches an empty sequence (dp[i][j-1]) OR // '*' matches the current character in s (dp[i-1][j]) dp[i][j] = dp[i][j - 1] || dp[i - 1][j]; } // Case 2: Current pattern character is '?' or matches current string character else if (p[j - 1] == '?' || s[i - 1] == p[j - 1]) { dp[i][j] = dp[i - 1][j - 1]; } // Case 3: No match (dp[i][j] remains false) } } return dp[m][n]; } int main() { // Step 1: Define test cases std::string s1 = "adceb"; std::string p1 = "*a*b"; // Expected: true std::string s2 = "acdcb"; std::string p2 = "a*c?b"; // Expected: false std::string s3 = "aa"; std::string p3 = "a"; // Expected: false std::string s4 = "aa"; std::string p4 = "*"; // Expected: true std::string s5 = "ab"; std::string p5 = "?*"; // Expected: true std::string s6 = "mississippi"; std::string p6 = "mis*is*p*?"; // Expected: false // Step 2: Test the DP function std::cout << "Test 1 (\\"" << s1 << "\\", \\"" << p1 << "\\"): " << (isMatchDP(s1, p1) ? "true" : "false") << std::endl; std::cout << "Test 2 (\\"" << s2 << "\\", \\"" << p2 << "\\"): " << (isMatchDP(s2, p2) ? "true" : "false") << std::endl; std::cout << "Test 3 (\\"" << s3 << "\\", \\"" << p3 << "\\"): " << (isMatchDP(s3, p3) ? "true" : "false") << std::endl; std::cout << "Test 4 (\\"" << s4 << "\\", \\"" << p4 << "\\"): " << (isMatchDP(s4, p4) ? "true" : "false") << std::endl; std::cout << "Test 5 (\\"" << s5 << "\\", \\"" << p5 << "\\"): " << (isMatchDP(s5, p5) ? "true" : "false") << std::endl; std::cout << "Test 6 (\\"" << s6 << "\\", \\"" << p6 << "\\"): " << (isMatchDP(s6, p6) ? "true" : "false") << std::endl; return 0; }- Sample output:
Test 1 ("adceb", "*a*b"): true Test 2 ("acdcb", "a*c?b"): false Test 3 ("aa", "a"): false Test 4 ("aa", "*"): true Test 5 ("ab", "?*"): true Test 6 ("mississippi", "mis*is*p*?"): false- Stepwise explanation:
- Initialize a 2D boolean array
dpof size(m+1) x (n+1), wheremiss.length()andnisp.length(). dp[i][j]will betrueifs[0...i-1]matchesp[0...j-1].- Base Case
dp[0][0] = true: An empty string matches an empty pattern. - Initialize first row (
dp[0][j]): Ifsis empty, it can only matchpifpconsists solely of*. So,dp[0][j]istrueifp[j-1]is*anddp[0][j-1]is alsotrue. - Fill the table: Iterate
ifrom1tomandjfrom1ton.- If
p[j-1]is*:
- If
- Wildcard
dp[i][j] = dp[i][j-1](the*matches an empty sequence, effectively skipping*in pattern).dp[i][j] = dp[i][j] || dp[i-1][j](the*matches the current characters[i-1], or previous characters, so we try to matchs[i-1]with*and see ifs[0...i-2]already matchedp[0...j-1]).- If
p[j-1]is?ors[i-1] == p[j-1]: dp[i][j] = dp[i-1][j-1](if current characters match, then the overall match depends on the preceding substrings matching).- Otherwise (no match):
dp[i][j]remainsfalse.
- The final result is
dp[m][n].
Conclusion
Wildcard matching is a classic problem with practical applications. While a recursive backtracking approach offers an intuitive understanding, it can lead to performance issues on larger inputs due to repeated computations. Dynamic programming provides an efficient solution by memoizing results, transforming the problem into filling a 2D table and ensuring optimal performance. Both approaches effectively handle the ? (single character) and * (zero or more characters) wildcards.
Summary
- Wildcard matching allows flexible string comparisons using
?(any single character) and*(any sequence of characters). - Recursive Backtracking: Directly implements the matching rules; simple to understand but can be inefficient for long strings/patterns.
- Dynamic Programming: Optimizes recursion by storing subproblem results in a table (
dp[i][j]meanss[0...i-1]matchesp[0...j-1]), leading to a more efficient solution. - The DP approach involves initializing
dp[0][0]totrueand handling*by considering it matching an empty sequence or one or more characters, and?or direct character matches by checkingdp[i-1][j-1].