Sunday, July 10, 2011

Finding the last match

One relatively common search that is unfortunately not provided by the Java Standard Library's Matcher is finding the last match of a regular expression.  The most common solution is to do something like:
int lastFind;
while (matcher.find()) {
    lastFind = matcher.start();
}
// Do things with lastFind...

That's fine as long as you only want the absolute last instance of a match.  However, if you want the last match in a particular range, that while loop becomes significantly more complicated. I found that out when I tried to implement a find previous function similar to Firefox's.

With some major help from this StackOverflow post I found a much better solution that uses a binary-like search:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RecursiveFind {
    public static int findLastIndexOf(int start, int end, String pattern,
            String haystack) {
        return findLastIndexOf(start, end,
                Pattern.compile(pattern).matcher(haystack));
    }

    private static int findLastIndexOf(int start, int end, Matcher matcher) {
        int pivot = ((end - start) / 2) + start;
        if (matcher.find(pivot)) {
            // recurse on right side
            int foundIndex = findLastIndexOfRecurse(end, matcher);
            if (foundIndex != -1) {
                return foundIndex; // match found on right side
            }
        }

        if (matcher.find(start)) {
            // recurse on left side
            int foundIndex = findLastIndexOfRecurse(end, matcher);
            if (foundIndex != -1) {
                return foundIndex; // match found on left side
            }
        }

        // not found at all between start and end
        return -1;
    }

    private static int findLastIndexOfRecurse(int end, Matcher m) {
        int foundIndex = m.start();
        if (foundIndex > end) {
            return -1;
        }
        int recurseIndex = findLastIndexOf(foundIndex + 1, end, m);
        if (recurseIndex == -1) {
            return foundIndex;
        } else {
            return recurseIndex;
        }
    }
}

No comments:

Post a Comment