- Quickly analyze large amounts of text for specific character patterns
- Extract, edit, replace or delete substrings of text.
- Add extracted strings to a collection in order to generate a report.
In many cases they become an indispensable tool. Regular expressions in SEO In SEO, regular expressions can be very useful. With its use we can define a pattern and find it quickly in one or several documents. There is a set of SEO tools on the market that allow the use of Regex, among which we can highlight:
- Crawlers or tracking tools: Screaming Frog and DeepCrawl among others.
- Google Analytics: we can create custom filters to extract traffic from certain pages.
- Google Sheets: Google’s own spreadsheets we can use the syntax = REGEXTRACT to extract data from URL strings among other uses.
Some basic regular expressions Diving in a bit deeper in this section, we show you a set of regular expressions very useful in SEO, which are quite effective and save us a lot of work especially when: we have to extract specific information in several documents of a site, or when a certain website is big, and doing a full crawl is a nightmare; we can choose to track a specific path or exclude some paths. To do so, here are some examples of the use of regular expressions (regex) in crawler tools such as Screaming Frog:
- If from our blogWhat are regular expressions?Regular expressions, also known as regex or rational expression, are a sequence of characters that create a search pattern, and as such provide us with an efficient and flexible method of searching and recognizing text strings.Regular expressions allow:Quickly analyze large amounts of text for specific character patterns Extract, edit, replace or delete substrings of text. Add extracted strings to a collection in order to generate a report. In many cases they become an indispensable tool.Regular expressions in SEOIn SEO, regular expressions can be very useful. With its use we can define a pattern and find it quickly in one or several documents.There is a set of SEO tools on the market that allow the use of Regex, among which we can highlight:Crawlers or tracking tools: Screaming Frog and DeepCrawl among others. Google Analytics: we can create custom filters to extract traffic from certain pages. Google Sheet: in Google’s own spreadsheets we can use the syntax = REGEXTRACT to extract data from URL strings among other uses. Some basic regular expressions Deepening a little in this section, we show you a set of regular expressions very useful in SEO, which are quite effective and save us a lot of work especially when: we have to extract a specific information in several documents of a site, or when a certain site web is so big, that doing a full crawl is a nightmare, and we choose to track a specific path or exclude some. To do so, here are some examples of the use of regular expressions (regex) in crawl tools such as Screaming Frog: If from our blog https://www.makingscience.com/, we want to track pages containing only the path ‘/ en /’ in the path of the URL, with Screaming we can go to the top menu and select “Configuration” – “Include “And we include within the function” Include “, the following regular expression:. * / En /.* As a result, only the URLs containing that path will be tracked, as can be seen in the following images:
data:image/s3,"s3://crabby-images/3593f/3593fd381e547680610c707da95462545583316b" alt=""
data:image/s3,"s3://crabby-images/f6149/f6149ffc96fc8370242cc787130eb2e2d28e68d7" alt=""
data:image/s3,"s3://crabby-images/bc6dc/bc6dc2aaff4e2b28894131ba1b0e2437a8a7f339" alt=""
data:image/s3,"s3://crabby-images/692e4/692e4dd866b179c470c0b45dea6530203c794e0e" alt=""
data:image/s3,"s3://crabby-images/7804e/7804e45d9423abd52e13f40736005adf46cacb8c" alt=""
data:image/s3,"s3://crabby-images/4662f/4662fa3bbbaaeb80cde10ccf43da2d8af4014db8" alt=""
data:image/s3,"s3://crabby-images/c126a/c126a0ab89f2c939886b65594445d88ac4bd9097" alt=""
data:image/s3,"s3://crabby-images/b1c2f/b1c2f6b2c2ec949280b5874d3e2f50ff2165fa5a" alt=""
data:image/s3,"s3://crabby-images/ce555/ce55584519c438fc7596d3c50b26e37d14485c90" alt=""
- If we want to exclude pages that contain a specific term in the URL such as “developer”, the expression regex would be:. * Developer. *
data:image/s3,"s3://crabby-images/f12b3/f12b35b2120d89bda331992cdf56e4b4961b57a8" alt=""
-
- If we are interested in excluding URLs that contain the security protocol (HTTPS), the regular expression would be:. * Https. *
And if we want to exclude all pages with HTTP, the regex would be: http://www.dominio.com/.*
data:image/s3,"s3://crabby-images/16399/16399165dcfc2b578b8fb6df82490c98d863f782" alt=""
- To give an example of the use of a more complex regular expression, imagine we have grouped in Google Sheet a list of URLs belonging to different domains, and we want to extract only the domains from it, we can make use of the following syntax:
=REGEXEXTRACT(A2;”^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)”) Next, we specify an example with the use of this syntax in Google Sheets, but with URLs from our own blog, so you can see the result of the process:
data:image/s3,"s3://crabby-images/0dea0/0dea05301fa440dcefec71db9a61b179549a30f9" alt=""
data:image/s3,"s3://crabby-images/c946a/c946a50639b8265c7bd4f2a72ef9790d856d6104" alt=""
data:image/s3,"s3://crabby-images/7bb21/7bb2198e4775e748f33029f857086bfa81f57f83" alt=""
- https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
- https://docs.microsoft.com/es-es/dotnet/standard/base-types/regular-expressions
- https://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/
- https://support.google.com/a/answer/1371415?hl=es
- https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html