Once publishing a new post on the website, webmasters aspire to get searches as soon as possible and sometimes you probably want to provent Google from indexing certain pages. In below, we will use WordPress an example to explain how Google find a web site, how to speed up Google to index your site or prevent a crawling..
How Does Google Find a Web Page?
Before starting to optimize your new blog posts, you are required to learn how does Google Search work and how does Google discover a web page. In brief, that is carried out in three steps, in terms of crawling, indexing and serving.
Step 1: Crawl the URL of new web page. Google fetches the newly updated by using Googlebot, which is a web crawling bot developed by Google. It is also called Google Crawler. The crawled pages are included in a searchable index and that will be indexed by Google search engine. However, if a page is blocked from Googlebot, it cannot be indexed by Google.
Step 2: Index crawled pages. After collecting the newly updated pages in an index, Googlebot gets started to compile all words included in the collected pages. The words would be sorted into different categories and stored in the Google Index. When users type any words in the search bar, the pages that match the users’ queries will be included in the search results page.
Step 3: Serve relevant searching results. Once entering a query in the search bar, Google tries to seek all matching pages out from the index and lists the most relevant web pages on the search engine result page. The web pages with high matching rate and high-quality content will be ranked on the front of a results page.
Tips to Get New Blog Posts Indexed by Google Quickly
Obviously, there is no way for users to search for a post that cannot be crawled and indexed by Google. Such blog post stands no chance to get more traffic. To avoid such embarrassing situation, we list the following tips for reference, which are designed for an optimization of new blog posts or sites so as to get the newly updated web content indexed by Google in the shortest time.
1. Create an XML Sitemap on Your Website
An XML sitemap collects all web pages of your website in an XML file, which lets Google know the website organization and the relevance between each page clearly. That makes the listed web pages more likely to be crawled by Googlebot or some other crawlers. This method is especially great for new website or blog.
It is necessary to create an XML sitemap on your site so as to allow Google to fetch your newly updated page effectively. To simplify matters, we suggest you to pull the job off by means of a WordPress plugin. The Google XML Sitemaps, WP Sitemap Page and PS Auto Sitemap are recommended here. Those plugins help you create an XML sitemap in an easy manner and include rich features to customize the sitemap.
2. Pay Close Attention to Keyword
Since the web page is crawled, all words included in it are sorted and stored in Google Index. The words with high search volume are more likely to be searched by users. Before writing a post, you are required to include some high-traffic search terms in it.
Look for High-Traffic Search Terms
The best tool to find out the words with high search volume is Google AdWords, which lets you into the most popular keywords easily. Log into Google AdWords and select Keyword Planner from Tools. Search for target keyword by using a phrase and select the phrase with high monthly search on average.
Add Keywords to Post Title, Content, Image Alt Tag, etc.
After getting the keywords with high monthly search, you should take a good use of keyword SEO strategies thereby increasing the probability of a post being searched by users. Include the keywords in post title, page URL and post content. If there are some images included, the image ATL text and title should also include keywords so as to allow search engine robots to understand and fetch this image.
Use WordPress SEO Plugin
To be clear about the SEO grade of a new web page, you’d better install a WordPress SEO plugin on website. The WordPress SEO by Yoast is recommended here. Upon activation, a WordPress SEO by Yoast section would included in the text editor, which requires users to enter focus keyword, SEO title and meta description.
That also indicates the keyword usage in article heading, page title, page URL, content and meta description. The more the keywords included in this web page, the better the optimization is. In addition, the Page Analytics function makes it clear on the keyword density, readability, use of outbound links, and so on.
3. Write High-Quality Content
Google hates low-quality content. A web page full of low-quality content will never be ranked on the front search results page. Google releases Google Panda, a search engine ranking algorithm, to penalize webmasters running thin sites. That decreases the ranking of a site with too much duplicated content, copy issues and grammar mistakes.
To avoid such embarrassing situation, you should learn some essential writing tips thereby improving the content quality effectively. Make sure that the post is original and readable. Note that, each post should include more than 400 words.
4. Get Quality Backlinks
Get quality backlinks and make the new blog post accessible on other websites or web pages. Since you get a backlink to this new blog post, visitors can access to it by simply clicking the anchor text. Also, that makes the new blog post more likely to be crawled by Googlebots. Some simple ways listed as below can help you get quality backlinks.
Guest Blogging – Guest blogging refers to a method that allows webmasters to publish a post on other sites and include a backlink to their own sites in the guest post. If visitors are interested with the anchor words, they would like to click it and access to the linked post.
Share Blog Post Link to Social Networking Platforms – Share the new blog post URL to some popular social networks, like Google+, Facebook, Twitter, etc. Also, display a brief introduction of this blog post to arouse followers’ interest. In this way, readers are more likely to click the given link and then get more information about that post. Below is an example of sharing new post on Google+.
Leave Comments on Forums – Search for the questions related to your new blog post through some popular forums. And then, give the question a reply and attach the blog post URL to your answer. That encourages questioners to get detailed answer from the linked post.
Use Internal Links – Internal link tells Googlebots to crawl a specific web page within the same domain thereby ranking it on the front page of search engine easily. Embed the new post URL to other relevant posts naturally and add proper anchor text to the link.
5. Take a Good Use of Ping Services
Ping is a useful push mechanism that a weblog sends signal to Ping Servers to tell them about new updates on your websites. Instead of waiting for search engines to index your updates passively, you can actively ping the site to get it fast indexed.
WordPress Ping List is a series of resources that list pings to different search engines about the new post / page. That means, no matter when you publish a new post / page or change an existing one, WordPress sends pings to all of the Ping Services listed in the WordPress Ping List. As a result, search engines and web directory can be notified about your updates or modifications and quickly index them. Then, you can drive high traffic from search engines.
WordPress Ping List
Ping Services not only indexes your new publishes faster, but it also increases your site visibility. In the following, we collect a wide set of WordPress Ping List which is helpful to improve your site performance as well as boost SEO.
How to Add and Update WordPress Ping List
With so many helpful Ping services, you should add them to your WordPress site and get it fast indexed. Follow the below guidelines, you can easily add / update your WordPress Ping List step by step.
Login into your WordPress dashboard, go to Settings -> Writing, scroll down to the bottom and you can see the Update Services like the following screenshot.
Copy and paste the above WordPress Ping List into the text area, and then click on Save Changes button.
Still now, you have done the process. When you publish a new post, WordPress automatically notifies the Ping Services listed in the Update Services Box so that your content can be quickly indexed and drive a lot of traffic from search engines.
6.Submit WordPress RSS Feed to Directories
RSS, the abbreviation of Really Simple Syndication, utilizes a series of standard web feed formats to publish frequently updated content, such as blog entries, new headlines, audio, video, and more. RSS feed allows publishers to automatically syndicate content and benefits users who want to get latest contents from favorite websites.
Running a WordPress website, submitting RSS feed to some top ranked RSS feed directories and search engines is one of the best ways to boost a site traffic and also a great way to get Google index your website faster.
Prevent Search Engines from Indexing a Specific Page or Post
All webmasters wish their sites to be crawled by web crawlers as frequent as possible, because this is a great opportunity to boost traffic and rank on the front search results page. By default, all web content is indexed by search engines, like Google, Yahoo, Bing, etc. However, not all pages or posts need to get search volume like the user friendly error page message, confirmation page, thank you page, and so on so forth.
Unfortunately, WordPress only allows you to enable or disable the whole website to be indexed by search engines via Dashboard > Settings > Reading rather than let you limit search engine indexing for certain pages or posts. To help you achieve the goal, we make a guide on how to prevent search engines from indexing a page or post by modifying instructions in robots.txt.
Stop Search Engines by Using robots.txt
Find out your robots.txt file on the server. If you have not had one, you can create a new file and name it as robots.txt. Since our website integrates with cPanel, we are going to target the robots.txt file via cPanel > Files > File Manager. Open the file in an editor.
For instance, if we plan to disallow a post called “BuddyPress Review” to be crawled, then we are required to add the following code to robots.txt and that should include the URL slug “buddypress-review” of this post. To disallow other posts or pages, you just need to copy and paste relevant URL slugs properly.
User-agent: * Disallow: /buddypress-review/
Besides, you are also allowed to prevent search engines from indexing a category. Here, we take image category as an example. And then, you are required to make use of the following code. Note that, the “*” means this instruction applies to all search engines.
User-agent: * Disallow: /images-directory/
However, if you just want to stop certain search engines from indexing the web content, like Google, then you should replace the “*” to “googlebot” as the following code. Keep in mind that bingbot refers to Bing, teoma refers to Ask, googlebot-image refers to Google Images and googlebot-news refers to Google News.
User-agent: googlebot Disallow: /images-directory/
The following robots.txt means the whole website cannot be searched by search engines.
User-agent: * Disallow: /
Use “noindex” Page Meta Tags to Realize the Goal
The use of “noindex” is an understandable method suitable for all people, even for someone lacking knowledge of coding stuffing. You are able to add the following meta tags to any page or post so as to make it unable to be crawled by search engines. Note that, the use of “robots” means all search engines cannot search this page/post.
If you want to disallow all search engines but certain ones, then you are required to follow the meta tags as below. This example indicates that only Google is allowed to crawl the page/post.
Besides, you can also specify more than one search engines out of blocking state by using the meta tags as below. This example means both Google and Bing are able to crawl the page/post.
Block Search Engines with WordPress Plugin
WordPress has developed a large number of plugins for blocking search engines, like WordPress SEO by Yoast, WordPress Meta Robots, PC Hide Pages, and so on, among which WordPress SEO by Yoast is the most popular one with a bundle of advanced features.
If you are new to this field and know a little about robots.txt and meta tags, then the use of a robots meta plugin is a great option for you.
Plugin URL: https://wordpress.org/plugins/wordpress-seo/
Fix Googlebot Cannot Access CSS and JS Files Warning
If you receive a message at your Google Search Console, previously Google Webmaster Tools, that Googlebot cannot access CSS and JS files to thousands of webmasters. You may be wondering why Google needs to access the CSS and JS files of your site. Let’s make things brief. When crawling and indexing your site, Google not only reads the content, but also takes the visual layout into account. If Google can fully understand your site layout and structure, there is more possibility for you to obtain higher rankings in search results. So all is about your SEO performance.
WordPress itself is not configured to block Googlebot from CSS and JS files, but you may have blocked them accidentally when trying to improve site speed or security. WordPress security plugins can also cause the problem and get you a warning email.
The warning email includes some instructions for fixing the blocking issue, but you may have found that the information is a little bit hard to understand. In below, we will offer a simple but detailed tutorial.
Identify the Blocked Resources
Firstly, you need to find the files that are blocked from Googlebot, so you can carry out the appropriate actions later. You can do this in two ways, both requiring access to the Google Search Console account of your site.
Check the blocked resources
In the Search Console dashboard, expand the Google Index menu and click on Blocked Resources. If you have resources blocked, you will see a list of them and how many pages they have affected.
Clicking on the URLs under the “Host” column, you can get the locations of the files that Googlebot is not allowed to access. Manually check through the results, and if you get some JS or CSS files added by themes and plugins, you have to edit the robots.txt file to make some modifications.
Use the Fetch as Google feature
You may have not seen any blocked resource in the section discussed above. At this time, you could use the fetching feature in Google Search Console to identify those resources and see how the blocking affects your website layout.
To do this, go to Fetch as Google under the Crawl menu, and then choose fetching and rendering the homepage. Remember to do this for both Desktop and Mobile.
Once the fetching is successful, the result is put in a row. Just click on it to look at a comparison of how your website displays for visitors and Googlebot respectively.
If there is any difference in the display, it means some CSS/JS files have been blocked from Googlebot. The blocked resources are listed on the bottom of the page. Clicking on the Robots.txt Tester link after each URL will show you the lines of your robots.txt that blocks Googlebot. You can note the lines down.
Modify the robots.txt File
Since you have located the blocked resources, now you can start correcting the robots.txt file to grant Googlebot the appropriate access. Generally speaking, there are three easy ways for editing the file.
- You can connect to your site with an FTP client. Once connected, scroll down and you will find the robots.txt file in the root directory.
- You can access the file by using a web-based file manager. The popular cPanel includes an easy-to-use one, which is also our choice.
- If you have installed WordPress SEO by Yoast plugin on your site, then you are able to edit the file directly in the WordPress dashboard by going to SEO > Tools > File editor.
Opening the file, you will find some lines which disallow the access to some directories of your site. Below is an example.
Remove the restrictions
If you have taken some notes when checking the blocked resources, then you can remove the lines accordingly to give Googlebot the access to your CSS and JS files.
Typically, many files are located in the themes and plugins folders, so you may need to remove these lines. The denied access to the wp-includes folder could also be the source of problem because some themes and plugins call scripts in this folder.
Another easy way out
If you are uncertain which line to displace or don’t want to remove the restrictions put before, you can simply add the following lines in the robots.txt file to allow Googlebot to access all CSS and JS files. These lines will override the “Disallow” but they do not affect the rules for any other file.
Don’t have a robots.txt file?
Another unlikely event is that you find your robots.txt is empty or does not exist, which means all files are crawled and indexed automatically, but you still receive a warning email.
If this is the case, you may have to contact your hosting provider to know whether they are applying restrictions to some folders by default. You can either ask them to unblock the folders, or create a robots.txt file by yourself and add a line like “Allow: /wp-includes/js/” to override the default configuration.
Confirm the Modification
After editing and saving the robots.txt file, you need to make sure the configurations work by performing “Fetch as Google” once more. If the two screenshots look exactly the same and there is no blocked resource listed, the problem is resolved properly.