Regular expression example: IP location (ctd)

We now have a total of three expressions to extract the country code from Yahoo and Google referrer strings. To get the country code from a referrer string, we simply try matching the string against each pattern in turn. Since in each case the captured country code will be group 1, we can declare a single Matcher variable, which we successively instantiate with the next pattern on failure. The code looks something like this. Note that we write it in such a way as to avoid calling matches() more than once on the same matcher:

  Pattern pGoogle1 =
    Pattern.compile("(?:http://)?www\\.google\\.com/.*hl=([a-z]{2}).*");
  Pattern pGoogle2 = Pattern.compile("(?:http://)?" +
      "www\\.google(?:\\.com|\\.co)?\\.([a-z]{2})/.*");
  Pattern pYahoo = Pattern.compile("(?:http://)?" +
    "([a-z]{2})\\.search\\.yahoo\\.com/.*");

  public String guessCountryCode(String referrer) {
    Matcher m = pGoogle1.matcher(referrer);
    if (!m.matches()) {
      m = pGoogle2.matcher(referrer);
      if (!m.matches()) {
        m = pYahoo.matcher(referrer);
        if (!m.matches()) {
          return null;
        }
      }
    }
    String code = m.group(1).toUpperCase();
    if ("UK".equals(code)) {
      code = "GB";
    }
    return code;
  }

Of course if we had a large number of Patterns (as well may happen in real life), we may well want to put them in an array and cycle through in a loop. Notice that at the end of this method we can put in any corrections necessary to turn the domain suffixes and/or language codes into standard country codes (e.g. the standard country code GB generally covers the UK).


If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.