Dynamically Built Regular Expressions

TL;DR: If you need to build dynamically regular expressions memoize them. Compiling them is expensive.

I don’t like memoizing methods, or caching in general. Obviously, caching adds more complexity to your source code and might lead to more errors, or even worse, errors which are harder to reproduce because they only happen due to a specific history. For memoized methods, you have to ensure that there are no other dependencies for the result than the passed arguments. Or that the result of a memoized method does not reference a larger object tree ending with a memory leak. There might be even more pitfalls.

Thus, it should be worth the trouble. Caching larger page fragments in order to avoid hunderds of method calls or even database queries might be a great boost for the performance. Memoizing smaller objects, just to avoid composing them again and again seems usually not to be the best idea to start optimizing.

Anyway, today, we were able to kick a performance killer from one of our data analysis processors by memoizing quite small objects: regular expressions, more precisely dynamically built regular expressions.

The impact on the response time of the data analysis processor at 9:08 pm:

Response Time After Memoizing

Example

In our data analysis processor, we are trying to make some educated guess about the browser or the mobile app of a request based on the User-Agent header. We are calling some helper methods building the regular expressions to extract that information for each known browser or app.

Let’s take the Safari browser. The version of that browser in the User-Agent string is not after the browser name, but after the string Version/. The call of the following helper method

user_agent_browser('Safari', version_prefix: 'Version/')

will build two simple regular expressions like /Safari/i and /Version\/([a-z0-9.]+)/i.

Unfortunately that expressions have not been hard coded literals in Ruby source file, but have been built from /#{identifier}/i and /#{version_prefix}([a-z0-9.]+)/i. And that was the performance killer.

The solution was to extract the building of the regular expressions into memoized class methods:

class << self
  extend Memoist

  def user_agent_name_regexp(identifiers)
    /#{Regexp.union(identifiers).source}/i
  end
  memoize :user_agent_name_regexp

  def user_agent_version_regexps(identifiers, version_prefix)
    prefixes = Array(version_prefix || identifiers.map { |i| ["#{i}/", "#{i} "] }.flatten)
    prefixes.map { |prefix| /#{Regexp.escape(prefix)}([a-z0-9.]+)/i }
  end
  memoize :user_agent_version_regexps

end

This had reduced the time for processing the browser and mobile app guessing based on the User-Agent header by half; and the entire processing step after all by one quarter up to one third.

Dynamically Built Regular Expressions

Example

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112