The .htaccess file can be used to regulate site accessibility by crawling and search engine bots. By incorporating a noindex, nofollow header, or rule within the .htaccess file, you can effectively block these bots from accessing the site.
Additionally, redirecting the bots to a designated page, like a 503 page, serves as an alternative method to prevent their entry.
On the other hand, if selective access is preferred, you have the flexibility to permit specific bots, such as Google bot, to access the site by configuring the .htaccess file accordingly.
Related Articles
Create a robots.txt file
Block User Agent with .htaccess
Add Google reCAPTCHA to weForms
Using noindex, nofollow
TIP: This method will block crawling completely.
- Find the document root for the desired domain
- Right-click on the .htaccess file and select Edit
- Add the following code to the top of the file
<IfModule mod_headers.c>
Header set X-Robots-Tag "noindex, nofollow"
</IfModule> - Click Save Changes at the top-right of the screen
Using mod_rewrite
TIP: This method will redirect the bots to a 503 error page, but will still allow them to visit the robots.txt file for the site.
- Find the document root for the desired domain
- Right-click on the .htaccess file and select Edit
- Add the following code to the top of the file
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(bot|crawl|spider).*$ [NC]
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule .* - [R=503,L] - Click Save Changes at the top-right of the screen
Allow Bot to Bypass Block
TIP: This method provides a means to allow certain bots, such as the Google bot, to crawl the site while blocking all other crawlers or bots.
- Find the document root for the desired domain
- Right-click on the .htaccess file and select Edit
- Add the following code to the top of the file
RewriteCond %{HTTP_USER_AGENT} !Bot [NC]
REPLACE: Bot with the user agent of the bot to block.
EXAMPLE: The following code allows Google bot to visit the site, but blocks any other visitors with the words bot, crawl, or spider from visiting the site
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(bot|crawl|spider).*$ [NC]
# Allow Googlebot through:
RewriteCond %{HTTP_USER_AGENT} !Google [NC]
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteRule .* - [F] - Click Save Changes at the top-right of the screen