|
| Add Your Company | SEO Glossary |
| Hi Guest !! Sign In |
Robots and Robots.txt file | ||
| Posted On 06 Dec, 2007 | Views : 1818 Previous | Next | |
Before discussing Robot.text file, it is important to understand the term ‘Robot’ used on WWW. What is a Robot? A Robot is an automated software program used to locate and collect data from web pages for inclusion in a search engine's database and to follow links to find new pages on the World Wide Web. Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images). Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders. These names are a bit misleading as they give the impression the software itself moves between sites like a virus; this not the case, a robot simply visits sites by requesting documents from them. Autonomous agents These are programs that do travel between sites, deciding themselves when to move and what to do. These can only travel between special servers and are currently not widespread in the Internet. Intelligent agents These are programs that help users with things, such as choosing a product, or guiding a user through form filling, or even helping users find things. These have generally little to do with networking. User-agent It is a technical name for programs that perform networking tasks for a user, such as Web User-agents like Netscape Navigator and Microsoft Internet Explorer, and Email User-agent like Qualcomm Eudora etc. Spiders Worms
WebAnts How to make? The basic structure is : Disallow : /Filename here User-agent User-agent : * User-agent: Googlebot Disallow In that case your sentence structure will be: User-agent: * While writing this section of file following rules must be followed : Disallow:/mydirectory/ This sentence structure disallows an entire directory. Disallow:/file.htm This sentence structure disallows an individual file. You have to use a separate sentence structure for each disallow. Also you need to include both the user agent and a file or folder to disallow. Use of comma between the filenames are incorrect e.g. Incorrect : Disallow:/file1.htm,file2.html Correct : Use-agent/* Some more rules: User-agent: * Disallow: This sentence structure allows to visit the whole site to all robots. If you don’t have a robots.txt file, it means that robots are free to access and index all of your web pages. An empty file named robots.tet file also allows robots to freely access and index all webpages. Googlebot’s Allow: In the above example of photo folder, if there was a photo called myphoto.jpg that you want Googlebot to index. Then the following sentence structure User-agent: * would tell Googlebot that it can visit "myphoto.jpg" in the photo folder, even though the "photo" folder is otherwise excluded. Important Note Utmost care must be taken before writing a robots.txt file because an incorrect file can block the bots that index your website. There is also a robot.txt tool that allows you to experiment a little, letting you know if their are any problems with your file prior to putting it online. If you are using a Google sitemap as part of their webmaster tools, then you can log in and see if Google is having any issues crawling your site. | ||
Related articles | ||
| • What is Search Engine Optimization | ||
| • Benefits of search engine optimization | ||
| • Factors that matters for Indexing | ||
| • Do's and Don't of Search Engine Optimization | ||
| • Points to remember before building a web site | ||
| • Google Page Rank | ||
| • Tips to improve google page rank | ||
| • How to use NOINDEX, NOFOLLOW and NOODP | ||
| All Rights Reserved to SEOsite.in |