Who’s your biggest fan? Have you sat down and really thought about who might love your blog the most? A reader base of any size always has ONE viewer who loves your content more than anyone else – someone who knows your content top to bottom, and your site back to front. You’ve probably got a few people in mind, but there’s one thing you may not have considered – what if it’s not a person at all? Your site, and every other site on the Internet has a small dedicated fanbase who want to know ALL about your site. They’re called search engine robots (or ‘spiders’), and they’re a team of robots who regularly check in on your site, searching for updates or changes.
While it sounds a bit stalkery, these robots are ESSENTIAL to your blog’s success. So what are they, and what do they actually do?
Who’s visiting my site?
Search engines solely rely on their spiders to gather data. Without them, they would have ZERO knowledge about the web and would fail to produce search results. Each search engine has its own set of data collectors, often with their own name. The most common bot is the Googlebot, not surprisingly from Google. As the world’s largest search engine, it is most frequently spoken about, and it’s movements are easy to track through Google Search Console. You’ll also be getting visits from the Bingbot, Slurp Bot, DuckDuckBot, Baiduspider from Bing, Yahoo, DuckDuckGo & Baidu respectively, along with bots from the thousands of other search engines that exist.
What do they want from me?
Fully taking on the spider persona, search robots ‘crawl’ your site to learn as much as they can. They want to know where everything is located on your site, what kind of content you have, what keywords you use, how it all links together and any changes or updates you’ve made. They don’t want to steal your stuff – they simply want to build a picture of where your content is located, and how it relates to search terms.
Spiders are crawling your site.
The crawling process is how search engines gather information about your site, and how it may be helpful to their user. Essentially, they visit and interpret pages across your site. They collate all the information they find and use it to form an index. This index is a hierarchy of websites, with the most helpful being listed first, for every search term possible.
How often is my site crawled?
Search engines visit your site on a semi-regular schedule, and you’ll be able to see how often they visit in your crawl statistics. Once they understand your site and how your audience interacts with it, you might notice your peak crawl times are VERY similar to your peak traffic times – this is to ensure your audience is receiving the most relevant content when they’re most likely to be looking for it.
How can I get crawled MORE?
If search engines notice you’re regularly adding fresh content, they won’t want to miss it. They’ll increase their frequency to make sure they’re capturing new content as soon as it’s added. If you find you’re not getting crawled as frequently as you’d like even, or only occasionally add new content, you can prompt a site crawl in Google Search Console. Other ways include speeding up your site and getting more sites referring to yours through quality backlinks.
What is a sitemap!?
A sitemap is a list of every page on your site that you’d like to be shown in search results. You may be thinking you want EVERY page to appear, but that’s not the case. There may be pages you want to exclude, such as tags & categories pages, that can result in duplicate content and other errors. A sitemap provides an easy guide for robots to find every corner of your site. Without it, they are essentially navigating by following links, and looking for new ones as they move around. Without a sitemap, there is a possibility they may miss pages on your site. WITH a sitemap, they know exactly where you want them to go, and will find everything they need to properly index and understand your site.
Crawl errors
Errors occur when robots attempt to crawl your site but the requested URL is unavailable. Sometimes this is due to an individual page or section that may be broken or removed from your site. Other times, it can be your entire site, say if a server crashes. The more errors that occur, the less frequently the offending page or site will be crawled. If errors are not resolved long term, eventually they’ll give up and stop trying to crawl at all.
Crawl budgets
While you might hear the term ‘crawl budget’ around, it’s not something to stress over. You should however at least be aware of how it may impact your site. Crawl budgets are designed to limit the frequency of visits, as to not overload your server. As always, Google prioritises the user’s ability to access and interact with the site over their own need to gather information. If your site is slow to respond during crawls, they will lessen their crawl rate. If your site speed improves, the crawl rate will too.