Submitting a sitemap to Google is one easy and effective way to make sure all the pages of your website are indexed by the Google Bot. Without a sitemap, you depend on the GoogleBot following links to get from one page to another in order to, eventually, find and index all your pages. This works, but it can take months.
Submitting a sitemap is one of the Google Webmaster Tools you can use to monitor and improve Google's handling of your website. And since everyone wants to score well in a Google search, why should you pass on this opportunity ?
Now, if your website is just a bunch of html pages on a webserver, in that free web space that came with your internet account, you don't have any access to the web server so you can't run the Google Sitemap Generator. You do, however, have a your website on your local hard disk. To create a sitemap, you can just list all the paths and filenames of the html files. If you then replace the top level directory names with the relevant http://hosting_domain/site_directory string, you've got yourself a text file sitemap that can be submitted to Google as a (simple) sitemap. You can also transform the text file into Google's preferred format - an xml file, by feeding it to the Site Map Generator.
#!/bin/bash # script to create sitemap.txt # Koen Noens, October 2006 LOCAL_ROOT="/home/jp/websites/mysite" # replace with your path SITE_ROOT="http://my.isp.com/my_site" # replace with your site URL EXTENSIONS=".htm .html .php .asp .aspx .jsp" pushd $LOCAL_ROOT #find all .htm, .html, .php, ... pages, remove trailing dot and concatenate with SITE_ROOT cd $LOCAL_ROOT rm sitemap.txt || echo "no previous sitemap found" FOUNDFILES=$(mktemp) for ext in $EXTENSIONS ; do find . -name "*$ext" >> $FOUNDFILES done # remove leading . and insert site_root to build urls sed -i 's/\.//' $FOUNDFILES for FILE in $(cat $FOUNDFILES); do echo $SITE_ROOT$FILE >> $FOUNDFILES.0 done # if there is an exclude list, exclude the files in it from the sitemap empty="" if [[ -e exclude.lst ]]; then cat exclude.lst | while read entry; do sed -i "s,$entry,$empty,g" $FOUNDFILES.0 done; # remove blank lines as well sed -i '/^$/d' $FOUNDFILES.0 fi # finishing touches sort -f -u $FOUNDFILES.0 >> sitemap.txt rm $FOUNDFILES.0 rm $FOUNDFILES # add sitemap to files_to_upload echo "$LOCAL_ROOT/sitemap.txt" >> $LOCAL_ROOT/upload
I'm sure there are even shorter ways to do this, using pipes and more advanced sed scripts, but so far this is the best I can do. For Windows, you can use a visual basic "sitemap generator" script that does roughly the same. (see also to these Visual Basic scripts). Or create a html sitemap to add to your website.