Hi….Welcome to world of SEO and SEM again.
While we are thinking of Search Engine Optimization this is not about ranking on Search result page but also to prevent unnecessary ranking of our internal pages and some other confidential pages. There are so many ways as per different SEO professionals. Here I am discussing about very simple thing “Robots.txt”. This tiny txt file but yet has a powerful impact in SEO or Search Engine Optimization practices.
At very first line I would like to clear that Robots.txt won’t stop a URL being displayed in Google serps. We should use Meta ‘noindex‘ stop a URL being displayed in Google serps. The difference is defined by a subtle difference in the specs. Robots.txt is about whether a spider can ‘visit’ the page; Robots meta is about whether the spider can ‘index‘ the page (or harvest links from the page). The Standard for Robot Exclusion says that robots.txt is intended to control robots’ fetching/ visiting of pages. Quote: Disallow – The value of this field specifies a partial URL that is not to be visited http://www.robotstxt.org/wc/norobots.html Note that it says “not to be visited”.
The Robots META tag allows HTML authors to indicate to visiting robots if a document may be indexed, or used to harvest more links. http://www.robotstxt.org/wc/exclusion.html Note that it says ‘not to be indexed’. The basic idea is that if you include a tag like: in your HTML document, that document won’t be indexed. If you do: the links in that document will not be parsed by the robot. http://www.robotstxt.org/wc/faq.html#noindex relying on Robots.txt will give you a ‘URL only” result in the SERPS. Using Meta Robots will ensure the URL is NOT in the SERPS.
Scrubs said: I personally would concrete the non-indexing by: 1. 301 in place 2. no index follow meta 3. robot.txt to include no index all That won’t work the way you intend it to either, because some methods will ‘trump’ other methods.. If you ‘disallow’ a page in robots.txt, then your on page metatags tag can have no effect on a robot which obeys a robots.txt exclusion, because an ‘obedient’ robot won’t visit/ fetch the page. Logically – if it does not visit the page, it can’t parse the metatag……. Also – if it can’t request the page – how can it get the 301 HTTP Header in response to the request? Personally, based on what Beth has said – I’m with Robert – 301 one domain to the other….. So, Now you know what you should use for noindex property.
Wish you a success SEO….