robots.txt file: The Ultimate Guide
There are many methods of enhancing SEO that isn’t difficult or time-consuming. And the robots file is one of them. You don’t have to have any technical knowledge to leverage the power of this txt file. If you can find the source code for your website, you can use this. One of the major goals of SEO is to get search engines to crawl your website easily so they can increase your rankings in SERP results.
This tiny text file is part of every website on the Internet, but most people don’t even know about it. It’s designed to use & work with search engines, but surprisingly, it’s a way to obtain SEO juice just waiting around to end up being unlocked. It helps to improve the SEO of your site. It helps to enhance the SEO of your website. It (also called robots exclusion protocol or standard) is a legitimate SEO hack that you can start using right away. It’s a way to increase your website SEO. And, It’s not difficult to implement either.
What is robots.txt?
It is a txt file that guides search engine bots either to crawl or not to crawl certain pages or sections of a website. usually, search engine bots such as Google, Bing & Yahoo do recognize & respect directions of this file.
Is robots.txt file really important?
Now, let’s take a look at why the robots file is the necessary driving force for crawling a website.
The robots file is a text file that tells web robots (example search engines) which pages on your site to crawl. It also tells web robots which pages not to crawl too.
For example, A search engine is planning to visit a site. Before it visits the target page, it will check the robots.txt file for instructions of crawling. The search engine tries to locate a robots file in the domain home directory. An example of robots.txt file.
There are different types of robots txt files. So let’s look at a few different examples of what they look like.
Examples of robots file
Below are actually the basic skeletons of a robots file.
The asterisk after “user-agent” implies that it applies to all search engine robots that go to the site.
The no slash after “Disallow” tells the robot to visit each & every page on the site.
Here is another basic skeleton of a robots file.
The asterisk after “user-agent” implies that the robots txt applies to all internet web robots that visit the website.
The slash after “Disallow” tells the robot not to visit any pages on the site.
Now, let’s come back to robots file.
You probably have a whole lot of pages on your own website, right?. Even if you don’t think you do, go & check it. You might be surprised to see a lot of links to the website. If a search engine crawls your site, it will crawl every page of the website. And for those websites who have a lot of web pages, it will require the internet search engine bot some time to crawl them. That may have negative effects on your website ranking.
It is because Googlebot (Google’s search engine bot) has a “crawl budget.” for each & every website.
Googlebot’s Crawl budget breaks down into two parts.
1. The first is crawl rate limit
2. The second part is the crawl demand.
Here’s how Google explains about Crawl Rate & Crawl Demand
Fundamentally, the crawl budget is “the number of URLs Googlebot can and really wants to crawl.”
So, You need to help Googlebot spend its crawl budget on your site in the simplest way possible. In other words, it should be by crawling your most valuable pages of the website.
According to Google, There are certain factors that will “negatively affect a site’s crawling and indexing.”
Here are those factors affecting crawl budget
So let’s come back to robots txt.
If you create the proper robots file, you can tell internet search engine bots (ex Googlebot) in order to avoid certain webpages of the website. Take into account the implications. Let’s assume you tell internet search engine bots to just crawl your most readily useful articles. Then, the bots will crawl and index your website as per the instruction of it. It must be based on that content alone.
“If You don’t want your web server to be overwhelmed by Google’s crawler or to waste search engine bot’s crawl budget crawling unimportant pages or similar unnecessary pages on your site.”
Through the use of your robots txt the proper way, you can tell internet search engine bots to spend their crawl budgets very wisely. And that’s the reason the robots file is so useful within the context of SEO.
What is the way to locate your website’s robots file?
Are you interested to look at your robots file?. There’s a super-easy way to view it.
All you have to do is type the basic URL of the site into your browser’s search bar (e.g., https://auniqueweb.in, etc.). Then add /robots.txt to the end of the above URL. ie https://auniqueweb.in/robots.txt.
Further, this method will work for any site. So you can peek on other sites’ files and see what they’re doing in robots file.
While doing it, you may witness one of three situations :
1) You may find a robots file.
2) You may find an empty file. For example, the Disney website does not contain a robots file.
3) You may get a 404. It means that accessing the URL returns a 404 for robots text file.
Now, let’s do a review of your own website’s robots text file. If you find an empty file or a 404 error while accessing it, you may fix that as per your wish. If you do find a valid file, it’s probably set to default settings that were created when you made your website. The same approach may be used to look at other sites’ robots text files.
Note: If you’re using WordPress, you might see a robots text file when you go to yourwebsite.com/robots.txt. The reason being WordPress creates a virtual robots text file if there is no robots text in the domain root directory.
If you don’t find a robots text file in the domain’s root directory, Then, you will need to create one from scratch.
Tools to create robots text file of a website from scratch
Now, let’s look at actually changing your robots text file or creating one from scratch.
To create a new robots text file, Always open a plain text editor like Notepad (Windows) or TextEdit (Mac).
If you have a robots text file, you’ll need to locate it in your Domain’s root directory. Usually, you can find your root directory by going to your hosting account website, logging in, and heading to the file management or FTP section of your site.
You should see something that looks like this:
Find your website robots text file in the domain’s root directory. Now, open it for editing. Also, Delete all of the text, but keep the file as it is.
Creating a robots text file
As said earlier, You can create a new robots text file by using the plain text editor of your choice. Remember, only use a plain text editor. If your website already has a robots text file, make sure you have deleted the text of file (but not the file itself).
First, It is better to become familiar with some of the syntax used in a robots text file.
Google has a very nice explanation of some basic terms of robots text file
Let’s start by setting the following term “user-agent” . It is going to set it so that it is applicable to all web robots. Do this by placing an asterisk after the term user-agent, like this:
In the next line, type “Disallow:” But, don’t type anything next to it.
Since there’s nothing following the disallow, internet robots will be directed to crawl your website completely. At this time, everything on your own site is a fair game for all crawlers.
So far, your website’s robots text file must look like the below code.
The above two lines actually helping crawlers doing a lot of work.
Further, Believe it or not, This is what a basic robots text file looks like. Now, let’s take it to another level and make this little txt file into an SEO booster.
Please note it’s not necessary. If you want to, here’s what to type in robots text file
Optimizing robots text file for SEO
Optimization of robots text file depends on the content you have on your website. There are many ways to use robots.txt file to your advantage.
Please Keep in mind that you should not use robots text to block pages from search engines.
Here are the most common ways to use robots text file. They are
1. Maximize search engines crawl budgets by telling them not to crawl the parts of your site that aren’t displayed to the public. It is One of the best uses of the robots text file.
For example, if you visit the robots text file of auniqueweb.in, you’ll see that it disallows certain file paths (wp-admin) for crawling.
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
Since this page is just used for logging into the backend of the site. It wouldn’t make sense for search engine bots to waste their time crawling unnecessary stuff on this page.
2. You may use an identical directive (or command) to avoid bots from crawling specific webpages of the website. After the disallow, enter the part of the URL that comes after the .in or .com, etc. Put that between two forward slashes.
So if you want to tell a bot not to crawl your page “Prohibited Page” ie https://auniqueweb.in/Prohibited-Page/, you can place it like this in robots text file:
Types of pages to exclude from indexation
Since there are no universal rules for which pages to disallow indexation, your robots text file will be unique to your website. Use it as per your wish & make sure it is acceptable by google’s robots testing tool. Then You might be wondering specifically what types of pages to exclude from indexation.
Here are a couple of common scenarios where it is need of the hour.
1. Purposeful duplicate content.
While duplicate content is mostly a bad thing, there are a handful of cases in which it’s necessary and acceptable.
For example, if you have a printer-friendly version of a page, you technically have duplicate content. In this case, you could tell bots to not crawl one of those versions (typically, the printer-friendly version of the page).
Another example is Split-testing pages that have the same content but different designs. This is also a well-known scenario from excluding from indexation of google’s SERP.
2. Thank You Pages.
The thank you page is one of the marketer’s favorite page because it means a new lead.
Allowing access to thank you pages through Google is a bad practice. You can disable it by blocking your thank you page. Therefore, you must make sure only qualified leads are seeing them.
For example, the thank you page is at https://auniqueweb.in/thankyou/. In your robots text file, blocking that page would look like this:
But, Using disallow directive doesn’t actually prevent the page from being indexed by google. So theoretically, you could disallow a page, but it could still end up in the index by the google & Generally, you don’t want that. right?
Two directives noindex, nofollow of robots text file
There are two directives you should know: noindex and nofollow.
That’s why you need to use the noindex directive. It works with the disallow directive to make sure bots don’t visit or index certain pages of websites as per instructions in robots text file.
For example, If your website has a page that you don’t wish indexed (aka thank you pages), you may use both disallow and noindex directives:
Now, that page won’t show up in the SERPs.
Now, Let’s look into the nofollow directive.
This is identical to a nofollow link. In short, it tells web robots not to crawl the links on a page.
However, the nofollow directive will be implemented a bit differently because it is actually not part of the robots text file. Nevertheless, the nofollow directive continues to be instructing web robots, therefore it’s the same concept. The only difference is where it actually takes place.
Find the source code of the page you want to change, and make sure you’re in between the <head> tags.
Then paste this line:
<meta name=”robots” content=”nofollow”>
So it should look like below:
<meta name=”robots” content=”nofollow”>
If you want to add both noindex and nofollow directives to page, use below line of code between <head> tags.
<meta name=”robots” content=”noindex,nofollow”>
This will give web robots both directives at once.
Now, Lets Test It Thoroughly
Finally, test your robots text file to make sure everything is valid and operating the right way mentioned by Google.
As part of the Webmaster tools, Google has been providing a free robots text tester for testing robots text file.
Here is the process to test the robots text file.
First, sign in to your Webmasters account or Google Search Console by clicking “Sign In” on the top right corner.
Select appropriate property (i.e., your website) and click on “Crawl” located in the left-hand sidebar of web page
You’ll see “robots text Tester” Click on that.
If there’s any code already in the box, delete it and replace it with your new robots txt file.
Now, Click “Test” located on the lower right section of the screen.
If the “Test” text changes to “Allowed”, which means your robots text file is valid.
Finally, upload your robots text to your domain’s root directory (or over-write file there if you already had one). You’re now armed with a tiny but powerful file, and you should see an increase in your search visibility definitely.
Using robots text can make a significant difference in SEO of the website. By setting up your robots text file the right way, you’re not just enhancing your own SEO. You’re also helping out your visitors. If search engine bots can spend their crawl budgets wisely, they’ll organize and display your content in the SERPs in the best way. It means your website will be more visible in google search results.
Further, It’s mostly a one-time setup. It does not take a lot of effort to create your robots text file. And you can make little changes as your website needed. I reckon that gives a spin if you haven’t done setup robots text file before. Also, This post discussed how to find and use it, setting up a simple robot.txt file, and further learned about how to customize it for SEO.