How to Cluster / Categorize Keywords with Google Sheets Wizardry

Have I ever mentioned how much I loved Google Sheets? Yes, all the time. One of the reasons is the ability to use regular expressions with: REGEXMATCH, REGEXREPLACE, REGEXEXTRACT

It becomes handy when trying to classify any messy data source. For instance, you downloaded the top X,000 pages for a domain, and using that methodology on the URL and Title tag, you can quickly regroup them in buckets.

(I know AI is a thing but I grew up watching Terminator so…)

It’s a 4k image, your eyes are blurry

Here, we’ll use REGEXMATCH to define clusters on the list of keywords extracted for the gap analysis done here: CONTENT GAP ANALYSIS MADE EASY WITH GOOGLE SHEETS MAGIC

The Formula

REGEXMATCH(lower(A2:A),".*full\smoon.*"),"Full Moon",
REGEXMATCH(lower(A2:A),".*(phase|calendar|cycle|full moon).*"),"Phases",
REGEXMATCH(lower(A2:A),".*(jupiter|saturn).*"),"Other planets",
REGEXMATCH(lower(A2:A),".*(landing|apollo|first m(a|e)n).*"),"Landing"

You can find it again Column U of the “data” tab in here:

Formula Explained

Let’s break it down into pieces so it’s easier to tweak it.


ARRAYFORMULA allows us to apply the formula to a range instead of a single cell. That’s why we’re using A2:A instead of A2, A3, etc. The main benefit is for you to update that one cell only (vs. pulling the formula down and apply to the cells below)

IFS is a magic IF: it allows you to evaluate multiple conditions and returns the first that meets the requirement (Important!). e.g. I intentionally added “full moon” in 2 different conditions (3rd and 4th line), if your keyword contains ‘ full moon’ the evaluation will stop when the first condition returning TRUE will be met, so the 4th condition will never even be tested and those keywords would systematically be tagged “full moon” and never “phases”.

IFNA will help us to return a default value if our string doesn’t meet any of the conditions in the IFS. In our case we return nothing “” but we could return “Other”, “N/A”, etc.

REGEXMATCH(lower(A2:A),".*(phase|calendar|cycle|full moon).*"),"Phases",

REGEXMATCH matches a pattern to a string.
Our pattern here is: .*(phase|calendar|cycle|full moon).* which means anything (.*) followed by any of those expressions: phasem calendar, cycle etc. (| means OR) followed by anything again.

It could be translated by: contains phase or contains calendar or contains cycle etc.

LOWER(A2:A) because we don’t want to be dealing with the case.

The Result

In a few minutes, of browsing the list and picking themes, I was able to categorize 34% of the 10k keywords. Chances are you will never be able to go through all the noise, but I’m sure you won’t mind Google Sheets helping you get through a lot of it without much effort.

Inspired by Iky Tai, who wrote a post with another method:

Leave a Reply 0

Your email address will not be published. Required fields are marked *