Content Gap Analysis made easy with Google Sheets magic

Did I ever mention my favorite thing in the whole world? It’s the QUERY() function… and making computers work for me 🤖, which is essentially the same thing. I might write an article about how much I love it (or have a computer write it for me now it’s a thing), but for now, you’d have to read about it elsewhere. I suggest this excellent post that covers it really well: Coding is for Losers

QUERY() allows you to use Sheets like a small database and query the data with a query language very similar to SQL. Combined with the power of Regular Expressions, it becomes VERY powerful (see how I wrote “very” with capital letters?).

One of the challenges with content gap analyses is usually the volume of data and getting through it efficiently. SEMrush, AHrefs, etc are fantastic for providing a big pile of data but now what? … QUERY() it!

Let’s define a use case (for an actual website, so some SEO will owe me a beer).

Use Case: identify gaps on space.com about the moon

Step 1: Export the data

I used SEMrush (thanks for the 7-day trial, hopefully, I won’t forget to cancel), here’s the link to the report if you can access the tool. I’m using the first four competitors they suggested and exported the first 10K results, which is the limit for my trial. Export “all”

Import in Google Sheets.

Step 2: Prepare the data

Note for a product manager @SEMrush: the tool provides the option to filter the results by gap type: untapped (=gap), weak (=ranking but not well), missing (= you’re the only domain not ranking) etc. but doesn’t provide it in the export *sigh* why???
Well… We can “lemons-lemonade” this by downloading ‘All’ and re-recreate our own simplified (and customizable) categorization

Define the gap type

=iferror(ifs(and(min(C2:F2)<10,B2=0), "untapped", and(min(C2:F2)<10,B2>15), "weak"), "skip")

Col B is the domain we’re working on. Col C to F are the competitors.

if any of the competitors is ranking top 10:

  • and we’re not ranking at all –> untapped / gap
  • and we’re ranking > 15 –> weak (you can apply the threshold you want)
  • we’re ranking < 15 –> skip

Top Ranking URL + Position (optional)

Optional but I find it very useful to provide the top-ranking URL in the final report.

Let’s start with the comp. top rank. MIN() won’t work because we want to exclude zeros. MINIFS() does that perfectly

=minifs(B2:F2,B2:F2,">0")

Now we use that data to get the top-ranking URL. (There’s probably a more elegant way but this is the most straightforward)

=ifs(B2=T2,M2,C2=T2,N2,D2=T2,O2,E2=T2,P2,F2=T2,Q2)

Data Cleaning

Just by looking at the list, you should notice some noise you can easily filter using regular expressions. I usually create a separate tab ‘config’ (or ‘setup’ depending on my mood) that contains useful information if I ever need to work on the report again (e.g. link to the semrush / ahrefs report) and the filters.

In our case, we don’t want mentions of ‘moons’ because I’m only interested in our unique moon. “MoonS” would refer to Jupiters and other planets with more than one moon. I usually remove by default mentions of dates, cities, state, countries, etc.

Pro-tip: Create your filter library you can re-use anytime you want

Step 3: Magic!

In a separate tab, you can now QUERY() your “database”.

Let’s start with the “untapped” keywords:

  • Col R = untapped
  • not A matches our regex exclusion filter (config!B2)
  • and T (i.e. top comp. ranking) is 5 maximum! the higher you go, the more noise you’ll have
=query(data!A1:T, "select A,G,R,B,M,T,S where not A matches '.*("&config!B2&").*' and R = 'untapped' and T < 5  ",1)

We’re down to 93 results! not too bad considering we started with 10k

ok, looks like our moon is rusting…

Now you have a better idea of what you’re missing out, let’s check the “questions”, my favorite because very actionable. Just duplicate the tab and edit the query.

=query(data!A1:T, "select A,G,R,B,M,T,S where not A matches '.*("&config!B2&").*' and A matches '^(what|where|how|why|when|is|does|are|will).*' and R matches 'weak|untapped' and T < 5  ",1)

you get 143 results (weak + untapped) and the tab didn’t exist 5 seconds ago!

by now you should start to wrap your head around the topic and filter ideas are flowing:

  • moon waxing
  • moon rust
  • moon colors
  • moon phases
  • etc.

you can’t stop creating tabs! Maybe you can’t resist asking ChatGPT to find clusters for you?
Look at you!

Link to the report: https://docs.google.com/spreadsheets/d/1ww4ZmSolv1W0lFMInteyj8UNenXgQpFeZMzzf4ypXoc/edit#gid=0

Was this helpful?

1 / 0

Leave a Reply 0

Your email address will not be published. Required fields are marked *