robots txt file faq

Robots.txt: A Perfect Tool To Improve Website SEO

robots.txt file

There are many methods of enhancing SEO that isn’t difficult or time-consuming. And robots.txt file is one of them. You don’t have to have any technical knowledge to leverage the power of robots.txt. If you can find the source code for your website, you can use this. One of the major goals of SEO is to get search engines to crawl your website easily so they can increase your rankings in SERP results.

This tiny text file is part of every website on the Internet, but most people don’t even know about it. It’s designed to use & work with search engines, but surprisingly, it’s a way to obtain SEO juice just waiting around to end up being unlocked. It helps to improve the SEO of your site. It helps to enhance the SEO of your website. robots.txt file (also called as robots exclusion protocol or standard) is a legitimate SEO hack that you can start using right away. It’s a way to increase your website SEO. And, It’s not difficult to implement either.

Is robots.txt file really important?

Now, let’s take a look at why the robots.txt file is the necessary driving force for crawling a website.

The robots.txt file is a text file that tells web robots (example search engines) which pages on your site to crawl. It also tells web robots which pages not to crawl too.

For example, A search engine is planning to visit a site. Before it visits the target page, it will check the robots.txt for instructions of crawling. The search engine tries to locate a robots.txt file in the domain home directory. An example of robots.txt file.

There are different types of robots.txt files. So let’s look at a few different examples of what they look like.

Examples of robots.txt file

robots txt file

Below are actually the basic skeletons of a robots.txt file.

Example 1:

User-agent:  *

       Disallow:

The asterisk after “user-agent” implies that the robots.txt applies to all search engine robots that go to the site.

The no slash after “Disallow” tells the robot to visit each & every page on the site.

Example 2

Here is another basic skeleton of a robots txt file.

User-agent:  *

       Disallow: /

The asterisk after “user-agent” implies that the robots.txt applies to all internet web robots that visit the website.

The slash after “Disallow” tells the robot not to visit any pages on the site.

Now, let’s come back to robots.txt file.

You probably have a whole lot of pages on your own website, right?. Even if you don’t think you do, go & check it. You might be surprised to see a lot of links to the website. If a search engine crawls your site, it will crawl every page of the website. And for those websites who have a lot of web pages, it will require the internet search engine bot some time to crawl them. That may have negative effects on your website ranking.

It is because Googlebot (Google’s search engine bot) has a “crawl budget.” for each & every website.

Googlebot’s Crawl budget breaks down into two parts.

1. The first is crawl rate limit

2. The second part is the crawl demand.

Here’s how Google explains about  Crawl Rate & Crawl Demand

Fundamentally, the crawl budget is “the number of URLs Googlebot can and really wants to crawl.”

So, You need to help Googlebot spend its crawl budget on your site in the simplest way possible. In other words, it should be by crawling your most valuable pages of the website.

According to Google, There are certain factors that will  “negatively affect a site’s crawling and indexing.”

Here are those factors affecting crawl budget

So let’s come back to robots.txt.

If you create the proper robots.txt file, you can tell internet search engine bots (ex  Googlebot) in order to avoid certain webpages of the website. Take into account the implications. Let’s assume you tell internet search engine bots to just crawl your most readily useful articles. Then, the bots will crawl and index your website as per the instruction of it. It must be based on that content alone.

If you create the proper robots.txt file, you can instruct internet search engine bots (ex: Googlebot) to ignore certain webpages. Now, Take into account the implications it has. In the event that you tell internet search engine bots to just crawl your most readily useful articles or content, the web bots will crawl and index your website only based on that content alone.

See what Google says in this regard:

“If You don’t want your web server to be overwhelmed by Google’s crawler or to waste search engine bot’s crawl budget crawling unimportant pages or similar unnecessary pages on your site.”

Through the use of your robots.txt the proper way, you can tell internet search engine bots to spend their crawl budgets very wisely. And that’s the reason the robots.txt file is so useful within the context of SEO.

What is the way to locate your website’s robots txt file?

Are you interested to look at your robots.txt file?. There’s a super easy way to view it.

All you have to do is type the basic URL of the site into your browser’s search bar (e.g., https://auniqueweb.in, etc.). Then add /robots.txt to the end of the above URL. ie https://auniqueweb.in/robots.txt.

Further, this method will work for any site. So you can peek on other sites files and see what they’re doing in robots.txt file.

While doing it, you may witness one of three situations :

1) You may find a robots.txt file.

2) You may find an empty file. For example, the Disney website does not contain a robots txt file.

3) You may get a 404. It means that accessing the URL returns a 404 for robots.txt.

Now, let’s do a re-view of your own website’s robots.txt file. If you find an empty file or a 404 error while accessing it, you may fix that as per your wish.  If you do find a valid file, it’s probably set to default settings that were created when you made your website. The same approach may be used to look at other sites’ robots.txt files.

Note: If you’re using WordPress, you might see a robots.txt file when you go to yourwebsite.com/robots.txt. The reason being WordPress creates a virtual robots txt file if there is no robots.txt in the domain root directory.

If you don’t find a robots.txt file in the domain’s root directory, Then, you will need to create one from scratch.

Tools to create robots.txt file of a website from scratch

Now, let’s look at actually changing your robots.txt file or creating one from scratch.

To create a new robots.txt file, Always open a plain text editor like Notepad (Windows) or TextEdit (Mac).

If you have a robots.txt file, you’ll need to locate it in your Domain’s root directory. Usually, you can find your root directory by going to your hosting account website, logging in, and heading to the file management or FTP section of your site.

You should see something that looks like this:

robots.txt file at domains root directory

Find your website robots.txt file in the domain’s root directory. Now, open it for editing. Also, Delete all of the text, but keep the file as it is.

Creating a robots.txt file

As said earlier, You can create a new robots.txt file by using the plain text editor of your choice. Remember, only use a plain text editor. If your website already has a robots txt file, make sure you have deleted the text of file (but not the file itself).

First, It is better to become familiar with some of the syntax used in a robots.txt file.

Google has a very nice explanation of some basic terms of robots.txt

Let’s Start by setting the following term “user-agent” . It is going to set it so that it is applicable to all web robots. Do this by placing an asterisk after the term user-agent , like this:

user-agent: *

In the next line, type “Disallow:”  But, don’t type anything next to it.

Since there’s nothing following the disallow, internet robots will be directed to crawl your website completely. At this time, everything on your own site is a fair game for all crawlers.

So far, your website’s robots.txt file must look like the below code.

           User-agent: *

           Disallow:

The above two lines actually helping crawlers doing a lot of work.

Further, Believe it or not, This is what a basic robots.txt file looks like. Now, let’s take it to another level and make this little txt file into an SEO booster.

How to link your XML sitemap to robots.txt file?

Please note it’s not necessary. If you want to, here’s what to type in robots.txt

sitemap: https://auniqueweb.in/robots.txt

Optimizing robots.txt for SEO

Optimization of robots.txt file depends on the content you have on your website. There are many ways to use robots.txt to your advantage.

Please Keep in mind that you should not use robots.txt to block pages from search engines.

Here are the most common ways to use robots.txt file. They are

1.  Maximize search engines crawl budgets by telling them not to crawl the parts of your site that aren’t displayed to the public. It is One of the best uses of the robots.txt file.

For example, if you visit the robots.txt file of auniqueweb.in, you’ll see that it disallows certain file paths (wp-admin) for crawling.

         User-agent: *
         Disallow: /wp-admin/
         Allow: /wp-admin/admin-ajax.php

Since this page is just used for logging into the backend of the site. It wouldn’t make sense for search engine bots to waste their time crawling unnecessary stuff on this page.

2. You may use an identical directive (or command) to avoid bots from crawling specific webpages of the website. After the disallow, enter the part of the URL that comes after the .in or .com, etc. Put that between two forward slashes.

So if you want to tell a bot not to crawl your page “Prohibited Page” ie https://auniqueweb.in/Prohibited-Page/, you can place it like this in robots.txt file:

Disallow: /Prohibited-Page/

Types of pages to exclude from indexation

Since there are no universal rules for which pages to disallow indexation, your robots.txt file will be unique to your website. Use it as per your wish & make sure it is acceptable by google’s robots testing tool. Then You might be wondering specifically what types of pages to exclude from indexation.

Here are a couple of common scenarios where it is need of the hour.

1. Purposeful duplicate content.

While duplicate content is mostly a bad thing, there are a handful of cases in which it’s necessary and acceptable.

For example, if you have a printer-friendly version of a page, you technically have duplicate content. In this case, you could tell bots to not crawl one of those versions (typically, the printer-friendly version of the page).

Another example is Split-testing pages that have the same content but different designs. This is also a well-known scenario from excluding from indexation of google’s SERP.

2. Thank You Pages.

The thank you page is one of the marketer’s favorite page because it means a new lead.

Allowing access to thank you pages through Google is a bad practice.  You can disable it by blocking your thank you page. Therefore, you must make sure only qualified leads are seeing them.

For example, the thank you page is at https://auniqueweb.in/thankyou/. In your robots.txt file, blocking that page would look like this:

Disallow: /thankyou/

But, Using disallow directive doesn’t actually prevent the page from being indexed by google. So theoretically, you could disallow a page, but it could still end up in the index by the google & Generally, you don’t want that. right?

Two directives  noindex,  nofollow of Robots.txt file

There are two directives you should know: noindex and nofollow.

noindex directive

That’s why you need to use the noindex directive. It works with the disallow directive to make sure bots don’t visit or index certain pages of websites as per instructions in robots txt file.

For example, If your website has a page that you don’t wish indexed (aka thank you pages), you may use both disallow and noindex directives:

       Disallow: /Prohibited-Page/
       Noindex: /Prohibited-Page/

Now, that page won’t show up in the SERPs.

nofollow directive

Now, Let’s look into the nofollow directive.

This is identical to a nofollow link. In short, it tells web robots not to crawl the links on a page.

However, the nofollow directive will be implemented a bit differently because it is actually not part of the robots txt file. Nevertheless, the nofollow directive continues to be instructing web robots, therefore it’s the same concept. The only difference is where it actually takes place.

Find the source code of the page you want to change, and make sure you’re in between the <head> tags.

Then paste this line:

         <meta name=”robots” content=”nofollow”>

So it should look like below:

<head>

<meta name=”robots” content=”nofollow”>

</head>

If you want to add both noindex and nofollow directives to page, use below line of code between <head> tags.

                                              <meta name=”robots” content=”noindex,nofollow”>

This will give web robots both directives at once.

Now, Lets Test It Thoroughly

Finally, test your robots.txt file to make sure everything is valid and operating the right way mentioned by Google.

As part of the Webmaster tools, Google has been providing a free robots txt tester for testing robots.txt file.

Here is the process to test robots.txt file.

First, sign in to your Webmasters account or Google Search Console by clicking “Sign In” on the top right corner.

Select appropriate property (i.e., your website) and click on “Crawl” located in the left-hand sidebar of web page

You’ll see “robots.txt Tester.” Click on that.

If there’s any code already in the box, delete it and replace it with your new robots txt file.

Now, Click “Test” located on the lower right section of the screen.

If the “Test” text changes to “Allowed”, which means your robots txt file is valid.

Finally, upload your robots.txt to your domain’s root directory (or over-write file there if you already had one). You’re now armed with a tiny but powerful file, and you should see an increase in your search visibility definitely.

Conclusion

Using robots.txt can make a significant difference in SEO of the website. By setting up your robots.txt file the right way, you’re not just enhancing your own SEO. You’re also helping out your visitors. If search engine bots can spend their crawl budgets wisely, they’ll organize and display your content in the SERPs in the best way. It means your website will be more visible in google search results.

Further, It’s mostly a one-time setup. It does not take a lot of effort to create your robots txt file.  And you can make little changes as your website needed. I reckon that gives a spin if you haven’t done setup robots.txt file before. Also, This post discussed how to find and use it, setting up a simple robot.txt file, and further learned about how to customize it for SEO.

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

Robots.txt : A Perfect Tool To Improve Website SEO