Regular Expressions: Replace the Last Occurrence of a Pattern In Ruby

As of this writing, there are not a lot of great resources out there to show you how to replace the last occurrence of a pattern in any language. Here I’ll show you the Ruby version, but for the most part you can use a similar technique in other languages. We’ll begin by solving the basic problem and then dive into some more advanced usage. Let’s set up an example use case:

  • You have a database of human-entered addresses.
  • These addresses typically use abbreviations for street type (e.g., “ST” for “Street”).
  • We want to display the unabbreviated version.
  • Don’t forget that addresses like “1234 St Peter St, Ste 12” are a real possibility.

Doing it wrong first, replacing every instance

Let’s start off with a simple pattern that won’t work quite right, just to get down the basics. We’ll replace every “st” (/i for case insensitive) with “Street”:


'1234 St Peter St, Ste 12'.sub(/st/i, 'Street')
# => "1234 Street Peter Street, Streete 12"

First thing’s first; we need to stop replacing “Ste” with “Streete”. We will only replace “st” when it stands alone at word boundaries (\b):


'1234 St Peter St, Ste 12'.sub(/\bst\b/i, 'Street')
# => "1234 Street Peter Street, Ste 12"

 

Using a capturing group to replace just the last instance

Okay, now we need to figure out how to stop replacing the “st” in “St Peter”. This is where it gets a bit trickier. Again, we’ll start off with the doesn’t-quite-work version. We’re going to replace everything up to a final word-bounded “st” by adding .* at the beginning. . is any character. .* is zero or more instances of any character:


'1234 St Peter St, Ste 12'.sub(/.*\bst\b/i, 'Street')
# => "Street, Ste 12"

 

So why would we do that? Well, it’s kind of useless in this form, but we can do now is throw parentheses around the .* bit to put it in capturing group \1 and glue it back onto the front.

'1234 St Peter St, Ste 12'.sub(/(.*)\bst\b/i, '\1Street')
# => "1234 St Peter Street, Ste 12"

Boom! Got it! Here’s the breakdown:

  1. The full regex (the part to be replaced) matches “1234 St Peter St”.
  2. Capture group \1 matches “1234 St Peter “.
  3. '\1Street' essentially says, “Replace '1234 St Peter St' with '1234 St Peter ' + 'Street'.”

Advanced usage with block syntax and interpolation

So obviously there are more street types than just street. What about streets and lanes and courts and boulevards? How do we handle all of those? Let’s extend our regex to make it match any of these, and test it on “1234 St Mary Ave”…

'1234 St Mary Ln'.sub(/(.*)\b(st|ct|ln|blvd)\b/i, '\1Street')
# => "1234 St Mary Street"

Oops, now “St Mary” is a street, not a lane. We need to use a different replacement depending on the abbreviation. For this, we’ll have to make a few changes.

  • We need to set up a dictionary to map abbreviations to long forms.
  • We need a capture group for the abbreviation. (We already do because it has parentheses around it; it’s \2.)
  • We need to use sub‘s block syntax in place of '\1Street' that we have now.
expansions = {
  'st' => 'Street',
  'ct' => 'Court',
  'ln' => 'Lane',
  'blvd' => 'Boulevard',
}

Did you know that you can use interpolation in a Ruby RegExp? #{expansions.keys.join('|')} inserts st|ct|ln|blvd:

pattern = /(.*)\b(#{expansions.keys.join('|')})\b/i

With block syntax, capture groups become available as block variables $1, $2, etc. Using $2 we can look up the correct expansion to substitute for the given street abbreviation.

"1234 St Peter's State Blvd Ste E".sub(pattern) do
  "#{$1}#{expansions[$2.downcase]}"
end
# => "1234 St Peter's State Boulevard Ste E"


At Revelry, we love chatting about ways to make design and development better. Meet the team and consider joining us!