How Ni1 Respects robots.txt
At Ni1, respect for website owners is a core principle. Our crawler follows established web standards and honors the instructions provided through the robots.txt protocol. We believe website owners should have clear control over how their content is accessed and indexed.
What Is robots.txt?
A robots.txt file is a simple text file placed in the root directory of a website. It tells web crawlers which areas of a site they are allowed or not allowed to access.
For example:
User-agent: *
Disallow: /private/
This rule tells all crawlers not to access the /private/ directory.
A robots.txt file is typically located at:
https://example.com/robots.txt
How Ni1 Uses robots.txt
Before crawling a website, Ni1 checks the site’s robots.txt file and follows the rules specified for our crawler.
Our crawler:
- Reads robots.txt before crawling.
- Respects crawl restrictions.
- Avoids blocked directories and files.
- Operates responsibly to minimize server load.
- Focuses on publicly accessible content only.
Allowing Ni1 to Crawl Your Website
If you would like Ni1 to crawl and index your website, you can explicitly allow our crawler in your robots.txt file.
Example:
User-agent: Ni1Bot
Allow: /
This allows Ni1 to access all publicly available pages on your website.
You may also allow all search engine crawlers:
User-agent: *
Allow: /
Blocking Ni1 From Specific Areas
You can restrict access to selected directories.
Example:
User-agent: Ni1Bot
Disallow: /admin/
Disallow: /private/
Ni1 will avoid these locations.
Blocking Ni1 Completely
If you do not want Ni1 to crawl your website, use:
User-agent: Ni1Bot
Disallow: /
Ni1 will respect this directive and will not crawl your site.
Managing Your robots.txt File
Creating and updating a robots.txt file is straightforward:
Step 1: Create the File
Create a plain text file named:
robots.txt
Step 2: Add Your Rules
Specify which crawlers may access which areas of your website.
Example:
User-agent: *
Disallow: /temp/
Disallow: /backup/
User-agent: Ni1Bot
Allow: /
Step 3: Upload to Your Website Root
Place the file in your site’s root directory:
https://yourdomain.com/robots.txt
Step 4: Verify Accessibility
Ensure visitors can access the file directly in their browser.
Example robots.txt Configurations
Allow Everything
User-agent: *
Allow: /
Block a Private Folder
User-agent: *
Disallow: /private/
Block Multiple Directories
User-agent: *
Disallow: /admin/
Disallow: /internal/
Disallow: /backup/
Allow Ni1 but Block Others
User-agent: Ni11.0
Allow: /
User-agent: *
Disallow: /
Best Practices
- Only block content that should not be crawled.
- Keep robots.txt simple and easy to maintain.
- Review your rules whenever your website structure changes.
- Use robots.txt alongside other security measures where appropriate.
- Remember that robots.txt manages crawler access; it is not a security mechanism for protecting sensitive data.
Our Commitment
Ni1 is built on transparency, privacy, and respect for the open web. We honor robots.txt directives, focus exclusively on publicly accessible content, and give website owners clear control over how their sites interact with our crawler.
If you have questions about Ni1Bot or need assistance managing crawler access, our team is always happy to help.