How to Keep Robots Out of Your Web Site

Diposting oleh writer on Senin, 26 September 2011



robots.txt file,

You know that search engines are designed to help people quickly find information on the Internet, and search engines to gain much of their data through a robot (also known as spiders or crawlers), which look for sites for them.

spiders or crawlers to explore the web looking for the shooting and all kinds of information. They usually start with a URL submitted by user, or from the links they find on web pages, sitemap file, or top-level sites.

Once the robot to access home page, then recursively access to all sites linked from this page. However, the robot can check all the pages that you find at a server.

Once the robot finds the web pages for indexing the title, keywords, text, etc. But sometimes you might want to prevent search engines from indexing some of your web page such as news posts, and specially designated web site (in this example: affiliate sites), but whether the individual robots in accordance with these conventions is pure voluntary

.

Robots Exclusion protocol

So, if you want to keep robots from some of your website, you can ask the robots to ignore sites that do not want indexed, and for that you can put a robots.txt file nalokalni root of the server your website.

In the example, if you have a directory called e-book, and you want to search for the robot to get out of it, the robots.txt file should read:

User-agent: * Disallow: e-books /

When you do not have enough control over their server set up a robots.txt file, you can try adding a meta tag in head section of any HTML document.

In the example, this tag tells robots not to index, not follow links to a specific page:

meta name = "robots" content = "NOINDEX, nofollow"

support among the robots META tag is not as common as the Robots Exclusion Protocol, but most major indexes websites currently support it.

News Posts

If you want to keep search engines from your posts news, you can create a "X-no-archive" line in your posts' headers:

X-no-archive: yes

However, although it is often news clients allow you to add X-no-archive header line news postings, some of them do not allow you to do so.

The problem is that most search engines assume that all information is considered public, unless otherwise indicated.

So be careful, because even though the robot exclusion standard and archives can help to make your material from the major search engines, there are some other terms that such rules.

If you are very concerned about the privacy of your e-mail and Usenet postings, you have to use some anonymous remailers and PGP. You can read about it here:

[http://www.well.com/user/abacard/remail.html]

[http://www.io.com/ ~ combs / htmls / crypto.html]

[~ http://world.std.com/ franl / PGP /]

Even if you're not particularly concerned about privacy, do not forget that whatever you write will be indexed and archived somewhere for eternity, how to use the robots.txt file as you need.

{ 0 komentar... read them below or add one }

Posting Komentar