- Quickly analyze large amounts of text for specific character patterns
- Extract, edit, replace or delete substrings of text.
- Add extracted strings to a collection in order to generate a report.
In many cases they become an indispensable tool. Regular expressions in SEO In SEO, regular expressions can be very useful. With its use we can define a pattern and find it quickly in one or several documents. There is a set of SEO tools on the market that allow the use of Regex, among which we can highlight:
- Crawlers or tracking tools: Screaming Frog and DeepCrawl among others.
- Google Analytics: we can create custom filters to extract traffic from certain pages.
- Google Sheets: Google’s own spreadsheets we can use the syntax = REGEXTRACT to extract data from URL strings among other uses.
Some basic regular expressions Diving in a bit deeper in this section, we show you a set of regular expressions very useful in SEO, which are quite effective and save us a lot of work especially when: we have to extract specific information in several documents of a site, or when a certain website is big, and doing a full crawl is a nightmare; we can choose to track a specific path or exclude some paths. To do so, here are some examples of the use of regular expressions (regex) in crawler tools such as Screaming Frog:
- If from our blogWhat are regular expressions?Regular expressions, also known as regex or rational expression, are a sequence of characters that create a search pattern, and as such provide us with an efficient and flexible method of searching and recognizing text strings.Regular expressions allow:Quickly analyze large amounts of text for specific character patterns Extract, edit, replace or delete substrings of text. Add extracted strings to a collection in order to generate a report. In many cases they become an indispensable tool.Regular expressions in SEOIn SEO, regular expressions can be very useful. With its use we can define a pattern and find it quickly in one or several documents.There is a set of SEO tools on the market that allow the use of Regex, among which we can highlight:Crawlers or tracking tools: Screaming Frog and DeepCrawl among others. Google Analytics: we can create custom filters to extract traffic from certain pages. Google Sheet: in Google’s own spreadsheets we can use the syntax = REGEXTRACT to extract data from URL strings among other uses. Some basic regular expressions Deepening a little in this section, we show you a set of regular expressions very useful in SEO, which are quite effective and save us a lot of work especially when: we have to extract a specific information in several documents of a site, or when a certain site web is so big, that doing a full crawl is a nightmare, and we choose to track a specific path or exclude some. To do so, here are some examples of the use of regular expressions (regex) in crawl tools such as Screaming Frog: If from our blog https://www.makingscience.com/, we want to track pages containing only the path ‘/ en /’ in the path of the URL, with Screaming we can go to the top menu and select “Configuration” – “Include “And we include within the function” Include “, the following regular expression:. * / En /.* As a result, only the URLs containing that path will be tracked, as can be seen in the following images:
- If we want to exclude pages that contain a specific term in the URL such as “developer”, the expression regex would be:. * Developer. *
-
- If we are interested in excluding URLs that contain the security protocol (HTTPS), the regular expression would be:. * Https. *
And if we want to exclude all pages with HTTP, the regex would be: http://www.dominio.com/.*
- To give an example of the use of a more complex regular expression, imagine we have grouped in Google Sheet a list of URLs belonging to different domains, and we want to extract only the domains from it, we can make use of the following syntax:
=REGEXEXTRACT(A2;”^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n]+)”) Next, we specify an example with the use of this syntax in Google Sheets, but with URLs from our own blog, so you can see the result of the process:
- https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
- https://docs.microsoft.com/es-es/dotnet/standard/base-types/regular-expressions
- https://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/
- https://support.google.com/a/answer/1371415?hl=es
- https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html