This question is answered. Helpful answers available: 2. Correct answers available: 1.


Permlink Replies: 8 - Pages: 1 - Last Post: 14-Feb-2020 19:20 Last Post By: bleraillez Threads: [ Previous | Next ]
bleraillez

Posts: 33
Registered: 17-May-2009
How to block robots from indexing the site
Posted: 12-Feb-2020 19:23
 
  Click to reply to this thread Reply
I just want to have a private family site so no Google & Co indexing.

Right now I have added <meta name=”robots” content=”noindex”> in the Tiger/Site/CustomCode. Is this enough?

My other ideas :
  • Change Index page name to something like index1.html and have an empty index.html page in each folder. But I don't know how to add an invisible page to the site.
  • Create a user-pwd for all the persons "allowed" to visit the site. But that's cumbersome for some persons even if the login&pwd are simple and easily rememberable as Paris-France (my mother's age prevents her from learning new tricks).
TIA

N.B. All the images keywords contain the names of everyone present, so an indexed image has a lot of search data available. And I have my own domain name and web sites with some being public.
jGromit

Posts: 7,772
Registered: 31-Jan-2006
Re: How to block robots from indexing the site
Posted: 12-Feb-2020 19:37   in response to: bleraillez in response to: bleraillez
 
  Click to reply to this thread Reply
First, cover all the bases with the meta tag:
<meta name='robots' content='noindex, nofollow'>
That says, "don't index this page, and don't follow any of the links on it."

Then create a simple robots.txt file, and upload it to the root of your domain (do not put it in any of the folders/subdirectories - it will just be ignored there):
User-agent: *
Disallow: / 
Ill-mannered search bots will still crawl your site, but the majors (Google, Bing) will leave it alone.

Using "fake" pages won't work. If a visitor to your site can access the real pages, so can a search bot.

But the simplest way to protect things is to put your album under http://mysite.com/family/, never publish that link anywhere else (give it to your friends and family only), and then don't put up a redirect page at http://mysite.com/index.html - don't put an HTML file there at all.. And also turn off "indexing" in your web host control panel, so that a search bot going to http://mysite.com/ won't see a list of the subdirectories, and won't be able to find the family folder.

Password-protecting things is, indeed, inconvenient, and it can also cause a lot of bizarre problems in a complicated album (like having to login again when you hit a search page, or try to play a video).
bleraillez

Posts: 33
Registered: 17-May-2009
Re: How to block robots from indexing the site
Posted: 12-Feb-2020 20:18   in response to: jGromit in response to: jGromit
 
  Click to reply to this thread Reply
jGromit wrote:
First, cover all the bases with the meta tag:
<meta name='robots' content='noindex, nofollow'>
That says, "don't index this page, and don't follow any of the links on it."

I have added <meta name=”robots” content=”noindex”> in the Tiger/Site/CustomCode. Is this good? I didn't find anywhere else in the preferences.

Then create a simple robots.txt file, and upload it to the root of your domain (do not put it in any of the folders/subdirectories - it will just be ignored there):
User-agent: *
Disallow: / 
Ill-mannered search bots will still crawl your site, but the majors (Google, Bing) will leave it alone.

Done.

Using "fake" pages won't work. If a visitor to your site can access the real pages, so can a search bot.
+1

But the simplest way to protect things is to put your album under http://mysite.com/family/, never publish that link anywhere else (give it to your friends and family only), and then don't put up a redirect page at http://mysite.com/index.html - don't put an HTML file there at all.
Are you talking about the real "mysite.com" or my (real domain) site? I'm a bit lost.

And also turn off "indexing" in your web host control panel, so that a search bot going to http://mysite.com/ won't see a list of the subdirectories, and won't be able to find the family folder.
Sorry, what is mysite.com? My domain or one called mysite? What control panel are you talking of?

Password-protecting things is, indeed, inconvenient, and it can also cause a lot of bizarre problems in a complicated album (like having to login again when you hit a search page, or try to play a video).

+1
Since I mainly use search folders (jAlbum wise) to navigate in the site it's out of the question. I want people to navigate with "Last Name", First Name" and "Decade" filters and I've found no other way than to create an arborescence of search folders. I don't think it's possible to have drop-down menus that are modified by the selections in other menus. Like if you choose Family_1 you only have a list of first names that exist in the family and the list of decades where there are images. (I hope I'm clear, I don't speak much english these days ;)
Take a look at my site (link removed) apart the decades, all other folders are search links.

Anyways, thanks a lot for your time and help.

Edited by: bleraillez on 12-Feb-2020 20:18

Edited by: jGromit on 12-Feb-2020 15:37, trying to keep Google from indexing your site! Stop posting the real domain name!!!
karlmistelberger

Posts: 532
Registered: 5-Dec-2013
Re: How to block robots from indexing the site
Posted: 12-Feb-2020 21:00   in response to: bleraillez in response to: bleraillez
 
  Click to reply to this thread Reply
bleraillez wrote:
I just want to have a private family site so no Google & Co indexing.
  • Create a user-pwd for all the persons "allowed" to visit the site. But that's cumbersome for some persons even if the login&pwd are simple and easily rememberable as Paris-France (my mother's age prevents her from learning new tricks).

Actually the browser remembers "User Name" and "Password". Mom (92) does not learn new tricks, she clicks "OK": http://www.mistelberger.net/Bilder/
jGromit

Posts: 7,772
Registered: 31-Jan-2006
Re: How to block robots from indexing the site
Posted: 12-Feb-2020 21:37   in response to: bleraillez in response to: bleraillez
 
  Click to reply to this thread Reply
bleraillez wrote:
jGromit wrote:
First, cover all the bases with the meta tag:
<meta name='robots' content='noindex, nofollow'>
That says, "don't index this page, and don't follow any of the links on it."

I have added <meta name=”robots” content=”noindex”> in the Tiger/Site/CustomCode. Is this good? I didn't find anywhere else in the preferences.


Please re-read my answer. Add the code I've posted, in the Custom Code box, including the nofollow attribute. Just add one word to what you've already done.

But the simplest way to protect things is to put your album under http://mysite.com/family/, never publish that link anywhere else (give it to your friends and family only), and then don't put up a redirect page at http://mysite.com/index.html - don't put an HTML file there at all.
Are you talking about the real "mysite.com" or my (real domain) site? I'm a bit lost.

I was just giving an example, and trying not to keep posting your actual domain name. These forums get indexed by Google, very quickly!

And also turn off "indexing" in your web host control panel, so that a search bot going to http://mysite.com/ won't see a list of the subdirectories, and won't be able to find the family folder.
Sorry, what is mysite.com? My domain or one called mysite? What control panel are you talking of?

Again, that's just an example. You want to turn off indexing on your domain, of course. You don't have a mysite.com domain.

Log on to your web hosting account. Does it give you access to something called cPanel? If so, that's where you'll find an icon for doing this. If not, contact your web host, and ask them how to turn off "indexing" for your site.

Since I mainly use search folders (jAlbum wise) to navigate in the site it's out of the question.

Not relevant. Your searches will work just fine.

What I'm just trying to explain to you is that if you have a site called http://example.com, there's no way a search bot can find your album if it's located at http://example.com/thealbum. You give that link to your family and friends, and they can then visit the site, and use all of the searches. There's no problem. But a search bot won't know to look for http://example.com/thealbum if there's no link to it on http://example.com. Search bots can't examine your entire web hosting account. All they can do is follow links.
jGromit

Posts: 7,772
Registered: 31-Jan-2006
Re: How to block robots from indexing the site
Posted: 12-Feb-2020 22:16   in response to: jGromit in response to: jGromit
 
  Click to reply to this thread Reply
The more I think about this, the more I think you should use the "meta robots" tag and the "robots.txt" file, and leave it at that. Those will prevent Google, Bing, etc., from crawling your site. That's enough. No need to worry about "fake pages," or hidden subdirectories, or passwords.

It's actually very difficult to get Google to list your site in its search results, even if you want it to. Why? Because Google doesn't care about your family, and because there aren't links to your family album from a lot of other sites, Google assumes that no one else cares about your family, either.

Some of my stuff has been out in the open for years, but doesn't show up in the Google search results, or if does, it's on page 37 of the results.
bleraillez

Posts: 33
Registered: 17-May-2009
Re: How to block robots from indexing the site
Posted: 13-Feb-2020 22:27   in response to: jGromit in response to: jGromit
 
  Click to reply to this thread Reply
jGromit wrote:
The more I think about this, the more I think you should use the "meta robots" tag and the "robots.txt" file, and leave it at that. Those will prevent Google, Bing, etc., from crawling your site. That's enough. No need to worry about "fake pages," or hidden subdirectories, or passwords.
Ok
It's actually very difficult to get Google to list your site in its search results, even if you want it to. Why? Because Google doesn't care about your family, and because there aren't links to your family album from a lot of other sites, Google assumes that no one else cares about your family, either.
True, unless a few links start to go to much in the wild.
Some of my stuff has been out in the open for years, but doesn't show up in the Google search results, or if does, it's on page 37 of the results.
I think I'll follow you and maybe change the directory name once in a while like family -> family2021 -> family2022...
A search in google once in a while will serve as an alert.

I'm also going to look in the jAlbum preferences to see if it's possible to erase all the keywords and other tags from a posted jpeg. Post just the image nothing else included.

Thank you very much for your help, ideas and suggestions.
jGromit

Posts: 7,772
Registered: 31-Jan-2006
Re: How to block robots from indexing the site
Posted: 13-Feb-2020 22:37   in response to: bleraillez in response to: bleraillez
 
  Click to reply to this thread Reply
bleraillez wrote:
I'm also going to look in the jAlbum preferences to see if it's possible to erase all the keywords and other tags from a posted jpeg. Post just the image nothing else included.

If you don't check Settings > Advanced > Metadata > Include photographic data in generated images (the second checkbox, not the first), the JPG's will have no extra information in them at all - just pure images, containing only information about the image size, resolution, data compression, etc.

The other things - keywords, for example - aren't stored in the images. That information is all in the JSON files that are part of the album. Without those, of course, things like searches wouldn't be possible. When you do an album search, it's not looking at the JPG files at all - it's only looking at the JSON data.
bleraillez

Posts: 33
Registered: 17-May-2009
Re: How to block robots from indexing the site
Posted: 14-Feb-2020 19:20   in response to: jGromit in response to: jGromit
 
  Click to reply to this thread Reply
jGromit wrote:
bleraillez wrote:
I'm also going to look in the jAlbum preferences to see if it's possible to erase all the keywords and other tags from a posted jpeg. Post just the image nothing else included.

If you don't check Settings > Advanced > Metadata > Include photographic data in generated images (the second checkbox, not the first), the JPG's will have no extra information in them at all - just pure images, containing only information about the image size, resolution, data compression, etc.


Great, since the camera make, model... is of no interest for 18XX->2000 pictures

The other things - keywords, for example - aren't stored in the images. That information is all in the JSON files that are part of the album. Without those, of course, things like searches wouldn't be possible. When you do an album search, it's not looking at the JPG files at all - it's only looking at the JSON data.

That's why I was a bit lost since the keywords are in the image from a LightRoom export that includes keywords if you export everything (as far as I know).

Tyvm
Legend
Forum admins
Helpful Answer
Correct Answer

Point your RSS reader here for a feed of the latest messages in all forums