Thread Locked This thread is locked - replies are not allowed.



Permlink Replies: 2 - Pages: 1 - Last Post: 10-Nov-2019 00:36 Last Post By: jGromit Threads: [ Previous | Next ]
jGromit

Posts: 6,792
Registered: 31-Jan-2006
The opposite of SEO
Posted: 10-Nov-2019 00:32
Some users are eager to have their albums listed by searchers like Google and Bing, and go to great lengths to figure out how to get higher listings for their sites. This is referred to as Search Engine Optimization, or SEO.

But consider the opposite case. You have albums that are intended only for family and friends, and you don't want the search engines to crawl all over them. How do you get them to ignore your site?

The only way to protect a directory from all unauthorized viewing is to put a password on it, which is probably annoying for your site visitors. But you can at least discourage search engines from indexing parts of your site that you don't want to appear in search results.

The first step is to turn off "indexing" on your hosting account. Say you have this structure on your account:
public_html
public_html/albums
public_html/albums/pets
public_html/albums/pets/fluffy (there's an album here)
public_html/albums/pets/sparky (there's an album here)
public_html/blog (some other stuff, like a Wordpress site)
If you point a browser to http://example.com/albums, it will look for an index.html file there - in other words, a web page that can be shown in the browser. What happens if there isn't one? If "indexing" is turned on, the browser will display the directory of folders, and you'll see a listing for pets. If you point the browser to http://example.com/pets, and there's no index.html file, the browser will show a listing for fluffy and sparky. So, a search engine will be able to burrow down into your account, even if there are no pages with links to those lower levels.

If you turn off indexing, however, the search engine won't be able to get anywhere. A browser that's pointing to http://example.com/albums or http://example.com/album/pets will get a 403 error, telling the visitor that there's no web page there and that it doesn't have permission to view the raw listing of directories. This is handy if you want people to view your pages only if you give them the specific URL that points to them, like http://example.com/albums/pets/sparky. There's no way a search engine can find sparky on its own, unless there are other pages out there that point to that album.

On many hosts, indexing is turned off by default. If it isn't, you can turn it off. In a typical hosting cPanel, there will be an icon labeled Indexes, often in a Security or Advanced section.
jGromit

Posts: 6,792
Registered: 31-Jan-2006
Re: The opposite of SEO
Posted: 10-Nov-2019 00:34   in response to: jGromit in response to: jGromit
The next step is to provide a robots.txt file. Well-mannered search engines will honor this file, and won't crawl things if this file tells them not to. This is effective only at the top level of your domain. So, in the example above, you can tell the search engines that it's OK to search blog, but that you don't want them to search albums. The robots.txt file should be placed here:
public_html/robots.txt
To tell the search engines not to index or search albums, it should contain this:
User-agent: *
Disallow: /albums/ 
To tell the search engines to ignore your entire site:
User-agent: *
Disallow: / 
But be careful. Say your robots.txt file looks like this:
User-agent: *
Disallow: /albums/ 
Disallow: /albums/pets/ 
Disallow: /albums/pets/fluffy/ 
Disallow: /albums/pets/sparky/ 
The well-mannered search engines will leave albums, pets, fluffy, and sparky alone, but at the same time, you've revealed to the rogue search engines that those directories exist.

Rogue search engines will simply ignore your robots.txt file, and will scoop up whatever they can find.
jGromit

Posts: 6,792
Registered: 31-Jan-2006
Re: The opposite of SEO
Posted: 10-Nov-2019 00:36   in response to: jGromit in response to: jGromit
The last step is to use a meta tag on your pages that tell a search engine what to do. For example, your fluffy album may be accessible to anyone - your home page has a link to it, or you've posted links to it on other sites. So, a search engine will find it. You can still provide some guidance to the search engines about what to do when they land there. The following lines should appear in the <head> section of the page.

This tells a search engine that it shouldn't index this page, and it shouldn't follow any of the links on the page:
<meta name='robots' content='noindex, nofollow'>
This tells the searcher that it's OK to index the page, but that it shouldn't follow any of the links on the page:
<meta name='robots' content='index, nofollow'>
This tells the searcher that it shouldn't index the page, but that it's OK to follow the links on the page:
<meta name='robots' content='noindex, follow'>
Some skins provide a way for you to insert material in this section of the page. Others have a simple pull-down list of choices (in my skins, it's always on the Misc tab in the skin settings). Some skins, alas, have a hard-coded entry, and changing it will require modifying the skin's template files.

As with a robots.txt file, rogue search engines will ignore this directive, and will crawl whatever they can find.
Legend
Forum admins
Helpful Answer
Correct Answer

Point your RSS reader here for a feed of the latest messages in all forums