Many search engine optimisation campaigns have their primary focus on getting their content indexed by Google, but the opposite of this process is greatly needed when certain things happen. It might be that your whole environment got indexed, sensitive content which should not have been indexed by Google or a hacker adding spam pages in Google, and you will want these URLs removed fast.
A link or URL needs to be removed from Google search results as soon as possible if there’s duplicate or outdated content, if the staging environment has been indexed when your site is hacked, when there are spam pages or when sensitive content has been indexed by accident.
How to remove URLs with duplicate or outdated content
Duplicate or outdated content is the most common reason people remove URLs from Google’s search results (remove a link), and this is because most outdated content is of no value to your visitors. However, the content can still have some value in SEO. All in all, duplicate content will indeed negatively impact your SEO performance because Google and other search engines will experience the confusion of which exact URL to index and rank.
When content needs to remain accessible to visitors
There are some instances whereby URL ought to remain accessible to visitors, but you do not wish for Google and other search engines to index them, because this would prove fatal to your SEO. This can be compared to duplicate content, for instance.
Here’s a typical example: An online store is run by you, and T-shirts on your online store are being offered which are the same the only difference being in colours and sizes and they appear on the same page. All those pages of the product do not have unique product descriptions but just different names and images. Google views this case so that the product page’s content is considered to be near-duplicate on the same site.
When there are near-duplicate pages, this only causes Google to be given the task of deciding which URL to be chosen as the canonical one to index and your crawl budget being spent on pages that do not add any SEO value. When this situation exists, Google has to be signalled on which URLs have to be indexed and which URLs ought to be removed from the index. Certain factors determine the best course of action for a URL, and these are:
The URL has value: in the case that the URL is receiving organic traffic and links that are incoming from other sites, there is absolutely a need for you to canonicalise them to this preferred URL which you would like to be indexed. This is the stage that Google will now assign its value to the selected URL, but the other URLs remain in accessibility to your visitors.
The URL has no value: in the case that the URL is not receiving organic traffic and does not have any links incoming from other sites, the noindex robots tag has to be implemented. A clear message is sent by this, not for the URL to be indexed, which results in them not showing it on the search engine results pages. This case needs to be understood as Google will not consolidate any value.
If you have low-quality lots, thin or duplicate content, this will only result in your SEO efforts being impacted negatively. When duplicate content issues exist, there is indeed no need for you to remove the offending pages; these pages can be canonicalised in case they are needed for other reasons. The duplicated pages need to be merged to create a piece of content that is of high quality and robust. This provides an increase in the organic traffic for the entire website.
Andreja Čičak says if you wish to avoid having duplicate content issues on product variants, it is of paramount importance that you build a strategic plan on your SEO and get ready to adapt if you notice any change need.
For instance, your catalogue consists only of simple products where each product is a representation of a specific variation. When you are in this situation, there is a need to index them all, although the differences between the products are not that important. Although this is the case, there is still a need for you to monitor their performance closely and, if there arise any duplicate content issues, the parent products have to be introduced to your online store. Immediately after the parent products have started being shown on the front end, your indexing strategy must be adjusted.
In the case that both parent and child products are visible as separate items on the front end, there is a solid suggestion to implement the same rel canonical on all products to get away from duplicate content issues. When caught up in such situations, the version you should greatly prefer must be a parent product that aims to collect all variants of the products; this will provide a significant boost to its UX performance in addition to improving your store’s CEO because your customers will be able to find their desired product variant more easily.
This is to make a reference to products with the same or very similar content. If you have unique content on all product pages, every page should have a self-referencing canonical URL.
When content Shouldn’t remain accessible to visitors
In the case that outdated content exists on your website which you do not want anyone to see, there are two ways in which you can handle this, which is determined by the context of the URLs, and these are:
When the URLs have traffic and/or links: 301 redirects have to be implemented to the URLs which are most relevant when it comes to your website. Make sure to run away from redirecting irrelevant URLs because Google might see these as soft 404 errors. This would only lead to Google not assigning any value when it comes to the redirect target.
When the URLs do not have any traffic and/or links: the HTTP 410 status code has to be returned, which tells Google that there was a permanent removal of the URLs. Google will remove the URLs at a fast pace when the 419 status code index is used.
Andy Chadwick says the moment you have implemented the directs, the old sitemap must still be submitted to The google search console as well as the new one, and it has to be left there for a period of 3 to 4 months. This way, Google will take the redirects at a quick speed, and the new URLs will start to show in the SERPs.
Remove cached URLs with Google Search Console.
Google most of them is a keeper of the cached version copy of your pages which will take a very long time for an update to take place or to remove. The clear cache URL feature found in the Google search console has to be used if you do not want visitors to see the cached copy of the page.
How to clear cached URLs using Google Search Console
- Sign in to your Google Search Console account.
- The right property has to be selected
- The removals button has to be clicked in the right-column menu
- Menu with removals in Google search console
- The new request button has to be clicked
- Switch to the CLEAR CACHED URL tab.
Removing cached URLs through the Google Search Control Removals tool
A choice of whatever you want Google to take out from the cache for just one URL or for all URLs whicwhichrt which start with a specific prefix
Enter the URL, and hit Next.
Remember that you can tell Google to get rid of the cached copies of your pages using a noarchive meta robots tag.
How to remove staging environment URLs
Releases are tested and approved with the aid of staging and acceptance environments. These environments do not necessarily exist to be accessible and indexable for search engines. Still, most of them are mistaken for such, and you end up with the staging-environment URLs that have been indexed by Google.
This is something that happens now and then. In this section, we will discuss how to remove URLs out of Google with effectiveness and great speed.
When staging URLs aren’t outranking production URLs
Usually, your production URLs will not be outranked by the staging URLs. If you are also experiencing the same case, below are the steps you can follow to rectify this situation.
- Log in to your account in the Google search console.
- The staging property has to be selected.
- The right column menu contains the removals button which needs to be clicked; this is the menu that includes removals in the Google search console
- Press on the new request button which will take you to the temporarily remove URL tab; this will imply the removal of URLs through the Google search control removals tool.
- A choice of all URLs with this prefix has to be made, the URL has to be entered, and the next button has to be hit. The URLs will be kept hidden by Google for 180 days, although the URLs will remain in the index of Google. Hence extra action needs to be taken for them to be removed.
- The noindex tag has to be used, and this can be by either implementing it via an HTML source code or the X-robots-ta HTTP header.
- An XML sitemap with the noindexed URLs has to be created with the aim that Google should not find difficulties in discovering them and processing the index robots directive.
The staging URLs upon being deindexed by Google, the XML sitemap can now be removed, and the HTTP authentication can be added to protect your staging environment, which will prevent this from happening. Note that a content removal tool can be used if you are removing URLs from Microsoft Bing.
When staging URLs are outranking production URLs
In the case that URLs outranking your production URLs are staged, there is a need for Google to assign the staging URLs signals to the production URLs while making sure that at the same time, visitors do not end up staging URLs. You must follow the first steps as explained in the previous section.
Then implementation of 301 redirects from the staging URLs to the production URLs is made. A new staging environment has to be set up on a different domain than the one that got indexed and sees to it that you apply HTTP authentication to it to prevent it from being indexed again.
What to avoid when dealing with indexed staging URLs
In the case that you want to remove staging environment URLs from Google, never do so with the use of a disallow :/ in your robots.txt.file.
This would only lead to Google being denied access to the staging URLs which will not allow them to find out information about the noindex tag. Google will go on to surface the staging URLs, by using a very poor snippet like this example:
Does a robots.txt disallow instruct search engines to deindex pages?
Tomas Ignatavicius says that when making website changes to live, talk to your developers to see to it that the process is 100%. The progress of your entire website stands a chance of being harmed by some SEO bits if there is improper management. These include:
Web server config files like .htaccess, nginx.conf or web.config.
Meta tag deployment is supported by certain files which work in protecting your staging environment from getting indexed and the live website from getting de-indexed.
When it comes to content and DOM rendering, there are JS files that are involved.
If during the deployment of websites, if the robot.txt file was overwritten by the staging version with the disallow directive or the other way around, healthy websites have been seen to drop in Google SERPs mainly because the indexing flood gates were opened as a result of important directives being removed.
How to remove spam URLs
In the case that your website has been hacked, and contains several spam URLs, make sure to get rid of them as fast as you can so that they do not continue hurting the performance of your SEO, and your trustworthiness in the eyes of your visitors. Below are the steps you can quickly follow to reverse the damage.
Step 1: Use Google Search Console’s Removals Tool
Google has a removals tool that works in helping you to remove spammy pages from Google SERPs at a fast pace. Remember that this tool does not deindex the pages, rather it temporarily hides them.
How to remove URLs using GSC’s Removals tool
- Make sure you sign in to your account of Google search console
- The right property has to be selected
- There is the removals button in the right column, click it
- The menu with removals on the Google search console appears
- Press on the new request button and you will find yourself on the temporarily remove URL tab
Removing URLs through the Google Search Control Removals tool
When you arrive at this stage, a choice of remove this URL only has to be made, enter the URL that you desire to remove and hit the next button. This URL will now be kept hidden by Google for 180 days, but remember that the URLs will still be in Google’s index, hence the need for you to take additional action to hide them.
The process has to be repeated for as much as you need to. When you are dealing with a large number of Spam pages, it is recommended that the ones which are surfacing a lot in Google should be the ones to be hidden.
Use the Remove all URLs with this prefix option with caution, because this may hide all URLs which match the URLs that have been entered in the URL field. In addition to that, Google’s cached copies of the Spam URLs should be removed by going through the steps which are described in the Remove cached URLs section.
Step 2: Remove the spam URLs and serve a 410
Restoring a backup can allow you to restore the previous state of your website. Run updates, and in addition to that, add additional security measures to make sure that you remove the vulnerability of your site. After that, work to check whether all the Spam URLs are gone from your website. It is recommended that you run a 410 HTTP status code in the case of them being requested, and to make it clear, these URLs are gone and will never come back.
Step 3: Create an additional XML sitemap
The Spam URLs have to be included in a separate XML sitemap, and it has to be submitted to the Google search console. In this method, Google can quickly eat through the Spam URLs, and the removal process can easily be monitored via the Google search console.
How to remove URLs with sensitive content
If you collect sensitive data, for instance, the personal data of your customer or resumes from job applicants on your website, it is of paramount importance to keep them safe. No matter what the situation, Google should not index this data, even any other search engine.
Mistakes are part of life, and sometimes sensitive content can end up in the search results of Google. Do not worry much though, an explanation will be made on how to remove this content quickly from Google.
Step 1: Use Google Search Console URL’s Removal Tool
The fastest way you can use to make Google stop showing them in its SERPs is by hiding URLs with sensitive content through the GSC’s removal tool. Although that is the case, remember that this tool merely hides the submitted pages for 180 days, it does not remove them from the index of Google.
How to hide URLs using the GSC Removals tool
- Sign in to your Google Search Console account.
- The right property has to be selected
- Click the Removals button in the right-column menu.
- Menu with Removals in Google Search Console
- The new request button has to be clicked, which will direct you to the temporarily remove URL tab.
Removing URLs through the Google Search Control Removals tool
A choice of remove this URL only has to be made, the URL that you want to be removed, and then click on the Next button. The URL will now be kept hidden by Google, but make sure to keep in mind that URLs will remain in the index of Google, hence the need for you to take extra action to hide them, as outlined in the steps below.
Repeat as much as you need to. In the case that the sensitive content is located in a specific directory, it is recommended that using the remove all URLs with this prefix option because that will allow you to hide all URLs within that directory in one go. In the case that you are up against a large number of URLs, which do not have a shared URL prefix containing sensitive content, it is recommended that you focus on hiding the ones which appear most often in Google. In addition to that, remove the cached copies of Google of the sensitive content by going through the steps described in the Remove cached URLs section.
Step 2: Remove the content and serve a 410
In the case that the sensitive content is no longer on your website, you can go ahead and delete the URLs and return the 410 HTTP status code, which only tells Google that the URLs have been permanently removed.
Step 3: Use an additional XML sitemap
If you wish to have control and be able to monitor the process that involves removing URLs with sensitive content, they have to be added to a separate XML sitemap and submit it in the Google search console.
Step 4: sensitive data has to be prevented from leaking
If you do not wish for sensitive content to be indexed and leaked again, the appropriate security measures have to be taken to make sure this does not happen.
How to remove content that’s not on your site
In the case that you discover that other websites are making use of your content, there are several ways which you can remove it from Google.
Reach out to the website owner
This is the first step you have to take, make contact with the people running the website. In most of these cases, the intern mistakenly copying your content and swift action will be taken. They can be offered to point a cross-domain canonical to your content along with a link and then ask them to 301 redirect it to your URL to remove it altogether.
What if the website’s owners aren’t responding or refuse to take any action?
If you find that the owners of the website are not cooperative, there are other methods to ask Google to remove it.
For the removal of personal information, there is a form that you can use for removal requests for Google to permanently delete the data.
When it comes to legal violations, you can ask Google to make an anevaluateval request filed under applicable law.
In the case that you have discovered content violating your copyright, you can submit a DMCA takedown request.
Note that if the pages have been deleted on another site and Google has not caught up yet, you can fasten the process by using the remove outdated content tool.
You can also use it when content has already been updated, but Google’s still showing the old snippet and cache. It’ll force them to update it.
How to remove images from Google Search
Although it is not recommended that you use the robots.txt file to remove indexed pages from Google search, it is recommended by Google that you use it to remove indexed images.
Yes, although this might sound confusing, it is the right way.
Google’s documentation lacks clarity because if you look at the Removals Tool documentation, a section will be found which talks about both the HTML code and non HTML code files the line. The locking mechanism should not necessarily be the use of robots.txt.
Screenshot from Google’s Removal Tools documentation
At the same time, their the “Prevent images on your page from appearing in search results” article.
Screenshot from Google’s Prevent images on your page from appearing in search results article
So, how do you go about removing these images?
This might concern some images in the folder which have been indexed accidentally. Here is how you remove them:
Follow carefully the steps 1-6 in the section above to quickly hide the URLs in Google Search.
After that is done, particular lines have to be added to your robots.txt, and these are:
If next time your robots.txt is downloaded by Googlebot, a disallow directive for the images will be displayed, including the remove images directive from its index.
How do I cancel a URL removal request?
Your URL shows the pending removal option. This can also be quickly done by just clicking cancel. This method will only temporarily hide the URL.