TL;DR: If you need to build dynamically regular expressions memoize them. Compiling them is expensive.
I don’t like memoizing methods, or caching in general. Obviously, caching adds more complexity to your source code and might lead to more errors, or even worse, errors which are harder to reproduce because they only happen due to a specific history. For memoized methods, you have to ensure that there are no other dependencies for the result than the passed arguments. Or that the result of a memoized method does not reference a larger object tree ending with a memory leak. There might be even more pitfalls.
Thus, it should be worth the trouble. Caching larger page fragments in order to avoid hunderds of method calls or even database queries might be a great boost for the performance. Memoizing smaller objects, just to avoid composing them again and again seems usually not to be the best idea to start optimizing.
Anyway, today, we were able to kick a performance killer from one of our data analysis processors by memoizing quite small objects: regular expressions, more precisely dynamically built regular expressions.
The impact on the response time of the data analysis processor at 9:08 pm:
Example
In our data analysis processor, we are trying to make some educated guess about the browser or the mobile app of a request based on the User-Agent
header. We are calling some helper methods building the regular expressions to extract that information for each known browser or app.
Let’s take the Safari browser. The version of that browser in the User-Agent
string is not after the browser name, but after the string Version/
. The call of the following helper method
user_agent_browser('Safari', version_prefix: 'Version/')
will build two simple regular expressions like /Safari/i
and /Version\/([a-z0-9.]+)/i
.
Unfortunately that expressions have not been hard coded literals in Ruby source file, but have been built from /#{identifier}/i
and /#{version_prefix}([a-z0-9.]+)/i
. And that was the performance killer.
The solution was to extract the building of the regular expressions into memoized class methods:
class << self
extend Memoist
def user_agent_name_regexp(identifiers)
/#{Regexp.union(identifiers).source}/i
end
memoize :user_agent_name_regexp
def user_agent_version_regexps(identifiers, version_prefix)
prefixes = Array(version_prefix || identifiers.map { |i| ["#{i}/", "#{i} "] }.flatten)
prefixes.map { |prefix| /#{Regexp.escape(prefix)}([a-z0-9.]+)/i }
end
memoize :user_agent_version_regexps
end
This had reduced the time for processing the browser and mobile app guessing based on the User-Agent
header by half; and the entire processing step after all by one quarter up to one third.