Category: AI Search Optimization

  • Robots.txt and noindex robots meta tag might not be the best way to block AI Search Crawlers

    I recently conducted an experiment on my test website. I applied the noindex meta robots tag to every page of the site. Then, I prompted several AI search engines to extract specific information from the website and observed the responses.

    For this scenario, let’s consider my test website to be domain.in.

    The prompt I used was: “What services does https://www.domain.in/ offer as a business? Please check the website directly before answering.”

    Key Findings

    • ChatGPT: Accessed the site’s content, as indicated by the ChatGPT-User/1.0 agent in my server logs (+https://openai.com/bot), and accurately quoted the requested information.
    • Perplexity: Did not retrieve any content from the website, suggesting it honors the noindex meta tag. Its response explicitly stated that direct website access is unavailable and that no content from the test site appears in search results.
    • Claude AI: Successfully obtained the required answers. The server logs showed the user agent Claude-User/1.0; +Claude-User@anthropic.com.
    • Google AI Mode: Generated fabricated information unrelated to the actual site content. This indicates it primarily relies on Google’s search index during its query process.
    • Deepseek AI: Could not access any content from the test website. Its response specified that it doesn’t browse the web directly and depends entirely on search results.
    • Qwen AI: Managed to retrieve content from the website, but its server logs revealed it used a HeadlessChrome browser rather than a specific user agent.

    This shows to me that adding noindex meta robots tag is not a safe way to block web-pages from AI Search engines.

    So after doing the above I had another test website where i added the below in my robots.txt which basically disallowed all the bots from accessing the website.

    User-agent: *
    Disallow: /

    However, even after adding the script, I used the below prompt to ask question about this new test website.

    Prompt: what services does https://www.domain.com/ as a business provide. Please check the website now and then only let me know. as they have updated their website What makes them different and why should i choose them.

    ChatGPT (specifically ChatGPT-User Agent as appearing in the Server Logs) was able to access my website and was able to quote the information as i asked in the prompt.

    I also tested the same prompt in other AI Answer engines but all the rest were not able to fetch any content from the website. Please see the findings below for more info.

    Claude AI: It specifically said that its unable to access the website directly as it appears to be blocked by robots.txt rules. This shows that it abides by the robots.txt rules.

    Perplexity AI: It wasnt able to get any information from the website. It implies that it respect the robots.txt rule.

    Deepseek: It wasnt able to get any information too. It gave the following response with some related information.

    Based on the search results provided, I do not have specific information about the services offered by https://www.domain.in/, as this particular website was not included in the search results. However, I can provide a general overview of what a typical AI Search Optimization Agency might offer based on the industry trends and common services described in the search results.

    It just shows that Deepseek relies on Search results from certain search engines.

    Google AI Mode: Wasn’t able to provide any information from the website.

    Qwen AI: Wasn’t able to provide any information from the website.

    This implies to me that even when you use robots.txt Disallow rule, its upto the AI Search Engine bot to respect it or not. So if you’re serious about this, you should use services like Cloudflare to block these bots which actively block these bots by identifying their IP Address and other criteria.

  • How to access website server logs in Hostinger Hpanel

    Do you know that you can access your website server logs or access logs in Hostinger Hpanel even for Starter Plans?

    If not then please have a read and at the end you would be able to access the Server Logs and able to find any 4XX and 5XX errors from your logs.

    Please follow the below steps as mentioned.

    1. First Login to your Hostinger account.
    2. Select your Hosting Package and Click on Manage
    1. Now select the website from the dropdown which appears in top portion of left hand vertical menu.
    2. Click on Analytics which appears on the second vertical menu from the left as shown below.
    Hostinger Hpanel Left hand menu
    1. Now click on Access logs Tab as shown
    Fig: Access Logs Tab inside the rectangular box

    This is were you will get see all your logs. You will be able to see the time when your website was accessed, IP Address, Country, Device/User Agent and Response Time.

    You can access what kind of bot accessed your website. For AI search engine optimization, we can specifically look at user agents such as GPTBot, ChatGPT-User, Google-Extended, ClaudeBot, PerplexityBot and many others. You can get an idea which pages these bots are requesting and which they don’t. This will give you an idea about the pages served in AI search results on these platform. Additionally you can also check the response time column to get an idea if the response time is healthy or certain pages are taking long time to respond to these bot requests.

    Note: You can’t export the Server logs from here and can only access a maximum of Last 7 days log data.

    Finding Error Logs

    1. Go through the Error Code 5xx tab to find all the pages having internal server errors. This will help you to find and resolve the issue which these AI bots are facing while accessing your web-pages
    2. At last, Go through the Error Code 4xx tab to find any broken pages. This will help you to either fix these pages or redirect these broken links to closely related web-pages.

    Once you have access to server logs, you can run various analysis on the data and find various issue and resolve it. If you are interested on how to do the analysis, please go through this Log Files Analysis guide by Matt Diggity.