Table of Contents
If it’s ever occurred to you ‘how can i remove a link from Google search’, you’ve stumbled into the right place. Many search engine optimisation campaigns have their primary focus on getting their content indexed by Google, but the opposite of this process is greatly needed when certain things happen. It might be that your whole environment got indexed, sensitive content which should not have been indexed by Google or a hacker adding spam pages in Google, and you will want these multiple URLs removed fast.
A link or URL needs to be removed from Google search results as soon as possible if there’s duplicate or outdated content. This is also necessary if the staging environment has been indexed, when your entire site is hacked and when there are spam web pages, or when sensitive content has been indexed by accident.
Ignite SEO will guide you step-by-step on how this process can be done optimally.
How to remove URLs with duplicate or outdated content
Duplicate or outdated content is the most common reason people remove URLs from Google’s search results (remove a link), and this is because most outdated content is of no value to your visitors. However, the content can still have some value in SEO. All in all, duplicate content will indeed negatively impact your SEO performance because Google and other search engines will experience the confusion of which exact URL to index and rank.
When content needs to remain accessible to visitors
There are some instances whereby URLs ought to remain accessible to visitors, but you do not wish for Google and other search engines to index them because this would prove fatal to your SEO. This can be compared to duplicate content, for instance.
Here’s a typical example: An online store is run by you, and T-shirts on your online store are offered the same, the only difference being in colours and sizes, and they appear on the same page. All those pages of the product do not have unique product descriptions but just different names and images. Google views this case so that the product page’s content is considered to be near-duplicate on the same site.
When there are near-duplicate pages, this only causes Google to be given the task of deciding which URL to be chosen as the canonical one to index and your crawl budget being spent on pages that do not add any SEO value. When this situation exists, Google has to signal which URLs must be indexed and which URLs should be removed from the index. Certain factors determine the best course of action for a URL, and these are:
The URL has value
If the URL is receiving organic traffic and incoming links from other sites, you need to canonicalise them to this preferred URL that you would like to be indexed. This is the stage that Google will now assign its value to the selected URL, but the other URLs remain in accessibility to your visitors.
The URL has no value
If the URL is not receiving organic traffic and does not have any links incoming from other sites, the no-index robots tag has to be implemented. A clear message is sent by this, not for the URL to be indexed, which results in them not showing it on search engines. This case needs to be understood as Google will not consolidate any value.
If you have low-quality lots, thin or duplicate content, this will only result in your SEO efforts being impacted negatively. When identical content issues exist, there is no need for you to remove the offending pages; these pages can be canonicalised if needed for other reasons. The duplicated pages need to be merged to create a high-quality and robust piece of content. This provides an increase in the organic traffic for the entire website.
In a nutshell?
Andreja ÄŚiÄŤak says if you wish to avoid having duplicate content issues on product variants, it is of paramount importance that you build a strategic plan on your SEO and get ready to adapt if you notice any need for change.
For instance, your catalogue consists only of simple products where each product represents a specific variation. When you are in this situation, you need to index them all, although the differences between the products are not that important. While this is the case, you still need to monitor their performance in search results closely and, if there are any duplicate content issues, the parent products have to be introduced to your online store. Immediately after the parent products have started being shown on the front end, your indexing strategy must be adjusted for any search engine.
In the case that both parent and child products are visible in search results as separate items on the front end, there is a solid suggestion to implement the same rel canonical on all products to get away from duplicate content issues. When caught up in such situations, the version you should greatly prefer must be a parent product that aims to collect all variants of the products. This will provide a significant boost to its UX performance in addition to improving your store’s CEO because your customers will be able to find their desired product variant in search results more easily.
This is to make a reference to products with the same or very similar content. If you have unique content on all product pages, every web page should have a self-referencing canonical URL.
When content shouldn’t remain accessible to visitors
In case outdated content exists on your website which you do not want anyone to see in search results, there are two ways in which you can handle this, which is determined by the context of the URLs, and these are:
When the URLs have traffic and links: 301 redirects have to be implemented to the most relevant URLs for the website owner. Make sure to run away from redirecting irrelevant URLs because Google might see these as soft 404 errors. This would only lead to Google not assigning any value when it comes to the redirect target.
When the URLs do not have any traffic and links: the HTTP 410 status code has to be returned, which tells Google search engine that there was a permanent removal of the URLs. Google will remove the URLs from search results at a fast pace when the 419 status code index is used.
Andy Chadwick says the moment you have implemented the directs, the old sitemap must still be submitted to the Google Search Console and the new one, and it has to be left there for a period of 3 to 4 months. This way, Google search engine will quickly take the redirects, and the new URLs will show in the SERPs.
Remove cached URLs with Google Search Console
Most of them are a keeper of the cached version copy of your pages, which will take a very long time for an update to take place, or to remove urls from google search. The clear cache URL feature found in the Google search console has to be used if you do not want visitors to see the cached copy of the web page.
How to clear cached URLs using Google Search Console
- Sign in to your Google Search Console account.
- The right property has to be selected.
- The removals button has to be clicked in the right-column menu.
- Menu with removals in Google Search Console
- The new request button has to be clicked.
- Switch to the CLEAR CACHED URL tab.
Removing cached URLs through the Google Search Control Removals tool
A choice of whatever you want Google to take out from the cache for just one URL or for all URLs which start with a specific prefix
Enter the URL, and hit Next.
Remember that you can tell Google to get rid of the cached copies of your pages using a no-archive meta robots tag.
How to remove staging environment URLs
Releases are tested and approved with the aid of staging and acceptance environments. These environments do not necessarily exist to be accessible and indexable for search engines. Still, most of them are mistaken for such, and you end up with the staging-environment URLs that Google has indexed.
This is something that happens now and then. This section will discuss how to remove URLs from Google search with effectiveness and great speed.
When staging URLs aren’t outranking production URLs
Usually, your production URLs will not be outranked by the staging URLs. If you are also experiencing the same case, below are the steps to rectify this situation.
- Log in to your account in the Google Search console.
- The staging property has to be selected.
- The right column menu contains the removals button, which needs to be clicked; this menu includes removals in the Google Search Console.
- Press on the new request button, which will take you to the temporarily remove URL tab; this will imply the removal of URLs through the Google search control removals tool.
- A choice of all URLs with this prefix has to be made. The URL has to be entered, and the next button has to be hit. The URLs will be kept hidden by Google for 180 days, although the URLs will remain in the index of Google. Hence extra action needs to be taken for them to be removed.
- The no-index tag has to be used, and this can be by implementing it via an HTML source code or the X-robots-ta HTTP header.
- An XML sitemap with the no indexed URLs has to be created to ensure that Google should not find difficulties in discovering them and processing the index robots directive.
The staging URLs upon being deindexed by Google, the XML sitemap can now be removed, and the HTTP authentication can be added to protect your staging environment, which will prevent this from happening. Note that a content removal tool can be used if you are removing URLs from Microsoft Bing.
When staging URLs are outranking production URLs
In the case that URLs outranking your production URLs are staged, there is a need for Google to assign the staging URLs signals to the production URLs while making sure that at the same time, visitors do not end up staging URLs. You must follow the first steps as explained in the previous section.
Then implementation of 301 redirects from the staging URLs to the production URLs is made. A new staging environment has to be set up on a different domain than the one that got indexed and sees that you apply HTTP authentication to it to prevent it from being indexed again.
What to avoid when dealing with indexed staging URLs
In the case that you want to remove staging environment URLs from Google, never do so with the use of a disallow :/ in your robots.txt.file.
This would only lead to Google being denied access to the staging URLs, which will not allow them to find out information about the no-index tag. Google will go on to surface the staging URLs by using a very poor snippet like this example:
Does a robots.txt disallow instruct search engines to deindex pages?
Tomas Ignatavicius says that when making website changes to live, talk to your developers to ensure that the process is 100%. The progress of your entire website stands a chance of being harmed by some SEO bits if there is improper management. These include:
Robots.txt file.
Web server config files like .htaccess, nginx.conf or web.config.
Meta tag deployment is supported by specific files that protect your staging environment from indexing and the live website deindexing.
When it comes to content and DOM rendering, there are JS files that are involved.
If during the deployment of websites, if the robot.txt file was overwritten by the staging version with the disallow directive or the other way around, healthy websites have been seen to drop in Google’s search results mainly because the indexing flood gates were opened as a result of necessary directives being removed.
How to remove spam URLs
If your website has been hacked and contains several spam URLs, make sure to get rid of them as fast as you can so that they do not continue hurting your SEO performance and your trustworthiness in the eyes of your visitors. Below are the steps you can quickly follow to reverse the damage.
Step 1: Use Google Search Console’s Removals Tool
Google has a removals tool that works in helping you to remove spammy web pages from Google searches at a fast pace. Remember that this tool does not deindex the web pages. Instead, it temporarily hides them.
How to remove URLs from Google using GSC’s Removals tool
- Make sure you sign in to your account onGoogle Search Console
- The right property has to be selected
- There is the removals button in the right column; click it
- The menu with removals on the Google search console appears
- Press on the new request button, and you will find yourself on the temporarily remove URL tab
Removing URLs through the Google Search Control Removals tool
When you arrive at this stage, a choice of removing this URL only has to be made; enter the URL that you desire to remove and hit the next button. This URL will now be kept hidden by Google for 180 days, but remember that the URLs will still be in Google’s index, hence the need for you to take additional action to hide them.
The process has to be repeated for as much as you need to. When you are dealing with a large number of Spam web pages, it is recommended that the ones which are surfacing a lot in Google should be the ones to be hidden.
Use the Remove all URLs with this prefix option with caution because this may hide all URLs that match the URLs entered in the URL field. In addition to that, Google’s cached copies of the Spam URLs should be removed by going through the steps described in the Remove cached URLs section.
Step 2: Remove the spam URLs and serve a 410
Restoring a backup can allow you to fix the previous state of your website. Run updates, and in addition to that, add additional security measures to make sure that you remove the vulnerability of your entire site. After that, work to check whether all the Spam URLs are gone from your website. It is recommended that you run a 410 HTTP status code in case they are requested, and to make it clear, these URLs are gone and will never come back.
Step 3: Create an additional XML sitemap
The Spam URLs have to be included in a separate XML sitemap, and it has to be submitted to the Google Search Console. In this method, Google can quickly eat through the Spam URLs, and the removal process can easily be monitored via the Google Search Console.
How to remove URLs from Google with sensitive content
If you collect sensitive data, for instance, the personal data of your customer or resumes from job applicants on your website, it is of paramount importance to keep them safe. No matter what the situation, Google should not index this data, even any other search engine.
Mistakes are part of life, and sometimes sensitive content can end up in the search results of Google. Do not worry much though, an explanation will be made on removing this content quickly from Google.
Step 1: Use Google’s URL Removal Tool
The fastest way you can use to make Google stop showing them in its SERPs is by hiding URLs with sensitive content through the Google’s URL removal tool. Although that is the case, remember that this tool merely hides the submitted pages for 180 days, it does not remove them from the Google’s index.
How to hide URLs using the GSC Removals tool
- Sign in to your Google Search Console account.
- The right property has to be selected
- Click the Removals button in the right-column menu.
- Menu with Removals in Google Search Console
- The new request button has to be clicked, which will direct you to the temporarily remove URL tab.
Removing URLs through the Google Search Control Removals tool
A choice to remove this URL only has to be made, the URL you want to be removed, and then click on the Next button. The URL will now be kept hidden by Google, but make sure to keep in mind that URLs will remain in Google’s index, hence the need for you to take extra action to hide them, as outlined in the steps below.
Repeat as much as you need to. Suppose the sensitive content is located in a specific directory. In that case, it is recommended to remove all URLs with this prefix option because that will allow you to hide all URLs within that directory in one go. If you are up against a large number of URLs, which do not have a shared URL prefix containing sensitive content, it is recommended that you focus on hiding the ones that appear most often in Google. In addition, remove the cached copies of Google of the sensitive content by going through the steps described in the Remove cached URLs section.
Step 2: Remove the content and serve a 410
If the sensitive content is no longer on your website, you can go ahead and delete the URLs and return the 410 HTTP status code, which only tells Google that the URLs have been permanently removed.
Step 3: Use an additional XML sitemap
If you wish to have control and monitor the process that involves removing URLs with sensitive content, they have to be added to a separate XML sitemap and submit it in the Google search console.
Step 4: sensitive data has to be prevented from leaking
If you do not wish for sensitive content to be indexed and leaked in Google’s search results, the appropriate security measures have to be taken to ensure this does not happen.
How to remove content that’s not on your site
If you discover that other websites are making use of your content, there are several ways to remove it from Google’s search results.
Reach out to the website owner
This is the first step you have to take, make contact with the people running the website. In most of these cases, the intern is mistakenly copying your content, and swift action will be taken. They can be offered to point a cross-domain canonical to your content along with a link and then ask them to 301 redirect it to your URL to remove it altogether.
What if the website’s owners aren’t responding or refuse to take any action?
If you find that the website owners are not cooperative, there are other methods to ask Google to remove it.
For the removal of personal information, there is a form that you can use for additional removal requests for Google to delete the data permanently.
When it comes to legal violations, you can ask Google to make an evaluative request filed under applicable law.
If you have discovered content violating your copyright, you can submit a DMCA takedown request.
Note that if the pages have been deleted on another site and Google has not caught up yet, you can fasten the process by removing the outdated content tool.
You can also use it when content has already been updated, but Google’s still showing the old snippet and cache. It’ll force them to correct it.
How to remove images from Google Search
Although it is not recommended that you use the robots.txt file to remove indexed pages from Google search quickly, it is recommended by Google that you use it to remove indexed images. Yes, this might sound confusing, but it is the right way.
Google’s documentation lacks clarity because if you look at the Removals Tool documentation, a section will be found which talks about both the HTML code and non HTML code files the line. The locking mechanism should not necessarily be the use of robots.txt.
- Screenshot from Google’s Removal Tools documentation
- At the same time, “Prevent images on your page from appearing in search results” article.
- Screenshot from Google’s Prevent images on your page from appearing in search results article
So, how do you go about removing these images?
This might concern some images in the folder which have been indexed accidentally. Here is how you remove them:
- Follow carefully the steps 1-6 in the section above to quickly hide the URLs in Google Search.
- After that is done, particular lines have to be added to your robots.txt, and these are:
- User-agent: Googlebot-Image
- Disallow: /images/secret/
- Next time Googlebot downloads your robots.txt, a disallow directive for the images will be displayed, including the remove images directive from Google’s index.
How do I cancel a URL removal request?
Your URL shows the pending removal option. This can also be quickly done by just clicking cancel. This method will only temporarily hide the URL.
FAQs
What is the impact of having outdated content in Google search results?
Outdated content can negatively impact your SEO performance as it may lead to duplicate content issues, causing confusion for search engines like Google. This confusion can prevent your most relevant pages from ranking well. To maintain a healthy search index, it’s advisable to either update the content, use a noindex tag, or remove URLs from Google’s search results if they no longer serve a purpose.
How can I remove URLs from Google search results if they contain sensitive content?
If sensitive content has been indexed, the quickest way to remove URLs from Google search is by using Google’s URL Removal Tool. Additionally, applying a noindex tag to those pages will prevent them from being indexed again. For an added layer of security, consider using the X-Robots-Tag HTTP header or the noindex meta tag to ensure the content doesn’t appear in cached results.
What is a noindex tag, and how does it affect my website’s visibility in search engines?
A noindex tag is a directive used to tell search engines not to index a particular page, meaning it won’t appear in search results. This tag is essential when you want to keep certain content accessible to visitors but prevent it from affecting your SEO, particularly in cases of duplicate content or pages that don’t contribute to your overall SEO strategy.
How do I prevent staging environment URLs from being indexed by Google?
To prevent staging environment URLs from being indexed, apply a noindex tag to those indexed pages or use HTTP authentication to block access entirely. If these URLs have already been indexed, you can use Google Search Console’s removals tool to temporarily hide them and add a noindex tag or X-Robots-Tag to ensure they’re not indexed in the future.
What should I do if my website is hacked and contains spam URLs that appear in Google search results?
If your site has been hacked and spam URLs are showing up in Google search results, you should act quickly to remove URLs. First, use Google Search Console’s Removals Tool to hide the URLs temporarily. Then, remove the spam content and return a 410 status code for the affected URLs. Submitting an XML sitemap with these URLs will help Google process the removal more efficiently.
What steps should I take if I want to remove a URL from Google’s index but keep the content accessible on my site?
If you want to remove a URL from Google’s index without deleting the content from your site, you can add a noindex meta tag to the page. This tells web crawlers not to include the page in search results while keeping it accessible via a direct link. It’s important to submit a removal request in Google Search Console for faster processing. Additionally, ensure that the page is not blocked by robots.txt since web crawlers need to access it to see the noindex directive.
How can I ensure my non-HTML files are not indexed by search engines?
To prevent non-HTML files like PDFs or images from being indexed by search engines, you can use the X-Robots-Tag in the HTTP header. This method allows you to apply a noindex directive to files that don’t contain HTML, ensuring they don’t appear in Google’s index or in search results. Additionally, you can use the robots.txt file to block access to these files by specifying the user agent and disallowing the full URL path.