This article was co-authored by wikiHow Staff. Our trained team of editors and researchers validate articles for accuracy and comprehensiveness. wikiHow's Content Management Team carefully monitors the work from our editorial staff to ensure that each article is backed by trusted research and meets our high quality standards.
The wikiHow Video Team also followed the article's instructions and verified that they work.
This article has been viewed 227,114 times.
Learn more...
Search engines are equipped with robots, also known as spiders or bots, that crawl and index webpages. If your site or page is under development or contains sensitive content, you may want to block bots from crawling and indexing your site. Learn how to block entire websites, pages, and links with robots.txt files and block specific pages and links with <meta> </meta> html tags. Read on to discover how to block specific bots from accessing your content.
Steps
Blocking Search Engines with robots.txt Files
-
1Understand robots.txt files. A robots.txt file is a plain or ASCII text file that informs search engine spiders what they are allowed to access on your site. Files and folders listed in a robots.txt file may not be crawled and indexed by a search engine spiders. You may need a robots.txt file if:
- You want to block specific content from search engine spiders.
- You are developing a live site and are not prepared to have search engine spiders crawl and index the site
- You want to limit access to reputable bots.[1]
-
2Create and save and robots.txt file. To create the file, launch a plain text editor or a code editor. Save the file as: robots.txt. The file name must be all lowercase.[2]
- Do not forget the “s.”
- When you save the file, choose the extension “'.txt”'. If you are using Word, select the “Plain Text” option.
Advertisement -
3Write a full-disallow robots.txt file. It is possible to block every reputable search engine spider from crawling and indexing your site with a “full-disallow” robots.txt. Write the following lines in your text file:
- Using a “full-disallow” robots.txt file is not strongly recommended. When a bot, such as Bingbot, reads this file, it will not index your site and the search engine will not display your website.
- User-agents: this is another term for search engine spiders, or robots
- *: the asterisk signifies that the code applies to all user-agents
- Disallow: /: the forward slash indicates that the entire site is off-limits to bots[3]
User-agent: * Disallow: /
-
4Write a conditional-allow robots.txt file. Instead of blocking all bots, consider blocking specific spiders from certain areas of your site.[4] Common conditional-allow commands include:
- Block a specific bot: replace the asterisks next to User-agent with googlebot, googlebot-news, googlebot-image, bingbot, or teoma.[5]
- Block a directory and its contents:
User-agent: * Disallow: /sample-directory/
- Block a webpage:
User-agent: * Disallow: /private_file.html
- Block an image:
User-agent: googlebot-image Disallow: /images_mypicture.jpg
- Block all images:
User-agent: googlebot-image Disallow: /
- Block a specific file format:
User-agent: * Disallow: /p*.gif$
-
5Encourage bots to index and crawl your site. Many people want to welcome, instead of block, search engine spiders because they want their entire site indexed. To accomplish this, you have three options. First, you can opt out of creating a robots.txt file—when the robot does not find a robots.txt file, it will continue to crawl and index your entire site. Second, you can create an empty robots.txt file—the robot will find the robots.txt file, recognize that it is empty, and continue to crawl and index your site. Lastly, you can write a full-allow robots.txt file.[6] Use the code:
- When a bot, such as googlebot, reads this file, it will feel free to visit your entire site.
- User-agents: this is another term for search engine spiders, or robots
- *: the asterisk signifies that the code applies to all user-agents
- Disallow: the blank disallow command indicates that all files and folders are accessible
User-agent: * Disallow:
-
6Save the txt file to the root of your domain. After you have written the robots.txt file, save the changes. Upload the file to your site's root directory. For example, if your domain is www.yourdomain.com, place the robots.txt file at www.yourdomain.com/robots.txt.
Blocking Search Engines with Meta Tags
-
1Understand HTML robots meta tags. The robots meta tag allows programmers to set parameters for bots, or search engine spiders. These tags are used to block bots from indexing and crawling an entire site or just parts of the site. You can also use these tags to block a specific search engine spider from indexing your content. These tags appear in the head of your HTML file.[7]
- This method is commonly used by programmers that do not have access to a website's root directory.
-
2Block bots from a single page. It is possible to block all bots from indexing a page and or from following a page's links. This tag is commonly used when a live site is under development. Once the site is complete, it is strongly recommended that you remove this tag. If you do not remove the tag, your page will not be indexed or searchable via search engines.[8]
- You may block bots from indexing the page and from following any of the links:
<meta name=”robots” content=“noindex, nofollow”>
- You may block all bots from indexing the page:
<meta name=”robots” content=“noindex”>
- You may block all bots from following the page's links:
<meta name=”robots” content=“nofollow”>
- You may block bots from indexing the page and from following any of the links:
-
3Allow the bots to index a page, but not follow its links. If you allow the bots to index the page, the page will be indexed; if you prevent the spiders from following the links, the link path from this specific page to other pages will break.[9] Insert the following line of code into your header:
<meta name=”robots” content=“index, nofollow”>
-
4Let the search engine spiders follow the links but not index the page. If you allow the bots to follow the links the link path from this specific page to other pages will remain in tact; if you restrict them from indexing the page, your web page will not appear in the index.[10] Insert the following line of code into your header:
<meta name=”robots” content=“noindex, follow”>
-
5Block a single outgoing link. To hide a single link on a page, embed a rel tag within the <a href> </a> link tag. You may wish to use this tag to block links on other pages that lead to the specific page you want to block.[11]
<a href="yourdomain.html" rel="nofollow">Insert Link to Blocked Page</a>
-
6Block a specific search engine spider. Instead of blocking all bots from your web page, you may wish to prevent one bot from crawling and indexing the page. To accomplish this, replace “'robot”' within the meta tag with the name of a specific bot.[12] Examples include: googlebot, googlebot-news, googlebot-image, bingbot, and teoma.[13]
<meta name=”bingbot” content=“noindex, nofollow”>
-
7Encourage bots to crawl and index your page. If you want to ensure that your page will be indexed and its links will be followed, you can insert a follow-allow meta “robot” tag into your header.[14] Use the following code:
<meta name=”robots” content=“index, follow”>
Community Q&A
-
QuestionWhat does the phrase 'Blocking Search Engines with Meta Tags' mean? iI's ambiguous, is it a) Use Metatags to block Search Engines, or b) Block Search Engines that have Metatags?PinguTop AnswererThis article describes using meta tags to block search engines. You can use this method to block search engines regardless of whether they have meta tags or not.
References
- ↑ https://varvy.com/robottxt.html
- ↑ https://varvy.com/robottxt.html
- ↑ https://support.google.com/webmasters/answer/6062596?hl=en
- ↑ https://support.google.com/webmasters/answer/6062596?hl=en
- ↑ https://www.elegantthemes.com/blog/tips-tricks/how-to-stop-search-engines-from-indexing-specific-posts-and-pages-in-wordpress
- ↑ https://varvy.com/robottxt.html
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://css-tricks.com/snippets/html/meta-tag-to-prevent-search-engine-bots/
- ↑ https://css-tricks.com/snippets/html/meta-tag-to-prevent-search-engine-bots/
- ↑ https://www.elegantthemes.com/blog/tips-tricks/how-to-stop-search-engines-from-indexing-specific-posts-and-pages-in-wordpress
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags