A site in need of a sitemap

The Sitemap is the Holy Grail of a website. It’s the sheet (or sheets) of xml that new webmasters don’t know to use and some experienced webmasters neglect to create. Consider that every website has a front, a back, a mouthpiece, a gang of security guards and a guide. Visitors see the front, the webmaster uses the backend to create the front, the RSS feed tells the world what’s happening at the website, robots.txt and other little bits help protect it, and the sitemap guides search engine spiders around the it.

Usually, if you use a content management system (CMS) you will be blessed with automatic sitemap generation either through an inbuilt process or a plugin. In which case, you only need to locate it, submit it to search engines, link to it from your index page or the footer of every page, and regularly ping it to tell search engines about updates to it. You will usually find your sitemap sitting comfortably close to your robots.txt at the root of your domain e.g. your-domain.com/sitemap.xml

If you are not blessed with automatic sitemap generation and submission then you will need to create your own sitemap. Of course, that is what this article is all about and below here are the instructions your should follow to do that.

Most often, a sitemap needs to be manually created when a website is hand crafted in (x)html or when a sitemap is to be remotely hosted (i.e. the sitemap is placed on a different domain or server to the website it maps as is frequently the case when a sponsor provides a co-brand or white label site but not enough space or facility to host a sitemap. You can learn how to split a domain across multiple hosts in this EasyGuide.

There are programs and scripts that can be used to generate sitemaps. These can be split into two categories: those that work and those that don’t work. Pedants might point out that a third category exists which includes those that only work when they feel like it or after a lot of flirtatious smooth-talking, as is often the case.

Those sitemap generators that do work can be subdivided into two subcategories:

  • those that run from a desktop PC, and
  • those that run from a web server.

And they may be subdivided into paid and free. Guess which we’re going to work with :-)

Most of the free sitemap tools that work from a desktop PC are the same ones used to check for dead links. You should read How to Check for Deadlinks to learn more about them because I am not going to discuss them here. More often than not the “sitemaps” created by those programs need to be  manually edited into an xml sitemap format, for example, the URLs

http://journalxtra.com/downloads/

http://journalxtra.com/tools/

Would become:

<?xml version="1.0" encoding="UTF-8"?>
<urlset
 xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<!-- The site URLs go below here -->

<url>
 <loc>http://journalxtra.com/downloads/</loc>
 <changefreq>weekly</changefreq>
</url>
<url>
 <loc>http://journalxtra.com/tools/</loc>
 <changefreq>weekly</changefreq>
</url>

</urlset>

The Scriptilitious ScriptBox is free and comes with a sitemap-maker utility that make it easy convert those URLs into a sitemap. You can get it by clicking the download button at the end of this article. I advise you to use klinkstatus (or similar) to index a website from your desktop then to use Scriptilitious to convert the indexed URLs into an xml sitemap. There is a a rumor that klinkstatus will soon have a specific template for xml sitemap creation which is good news for webmasters who use Linux (like me). Unfortunately, neither of these programs is yet capable of automatically uploading a sitemap to a server.

So let’s take a look at the working, free online sitemap generators:

There are many scripts that can be uploaded to a web server and configured to automatically rebuild a sitemap and submit it to various search engines. Unfortunately, and they are incredibly awkward to set up and configure; plus, for security reasons, many of them will only map a website that is on the domain where the script is being used. That restriction prohibits them from being used to create sitemaps for remote sites.

A better option is to use free online sitemap generators. They work, they are not limited to one website, they don’t care whether you own the site being mapped and they can be used frequently. There is one catch: most limit their free maps to either 500, 1000 or 5000 URLs and only map URLs that can be reached from the root (index) page of a website. The ones I use are no exception:

Those three sitemap generators are more than enough for most sites but what if you have a co-brand, white label or hand-crafted website that updates daily and has hundreds of thousands of pages that must be indexed? How might all those lovely URLs be indexed?

Think about this:

A list of the most recent URLs is created when you generate a sitemap. When a new web page is created a new URL is created which must be added to that map. If you start out with 1,000 URLs and add 10 new URLs every day then over 20 days another 200 URLs must be mapped. If a sitemap generator maps only the first 1000 URLs it encounters from a website’s index page and there are 1200 URLs to index then 200 URLs will be missed out of the map. An incomplete map is bad news. An incomplete map could result in a site being poorly indexed by search engines.

Is there a way to coax the online generators to create a bigger sitemap?

Fortunately, sitemap generators do not check the size of a current sitemap and cannot determine whether a sitemap is made up from the contents of multiple sitemaps that have been generated by free sitmap generators. This failing can be turned to our advantage: we can use the same free tools to create daily or weekly sitemaps then combine their results to build one super sitemap. We can then force the generator to map different parts of a website by putting links to those parts on the website’s index page. For best results, one of those links should point to an artificial linklist that points to the sections of the site that need to be mapped; but, we must be careful not to duplicate data lines!

The Method

The method is easy for those who use Linux. I do not know whether Windows comes with “sed” but Windows users can use VirtualBox, a Linux LiveDisk, or they can install CygWin (Cygwin or CygwinX). These instructions assume you have already placed strategic links on your site’s front (index) page that point to the deeper parts of your website or a linklist that contains deeplinks to those parts you wish to have mapped. Strategic links should be as close to the top of the index page as possible (machines read webpages top-to-bottom, left-to-right). You can make your life easier my using the automated sitemap-ripper utility that comes with Scriptilitious. Again, Scriptilitious can be downloaded at the bottom of this article. So, here’s how we create a sitemap using online generators and (or not) the free sitemap-ripper utility:

  1. Use one of the sitemap generation tools listed above. Sometimes the generators can be mistaken for DoS attacks and hack attempts so they can be blocked by server security software. My general route is to try 5,000 then 1000 then 500 URLs. The latter one is rarely ever blocked;
  2. Upload the sitemap to your server. It should usually be placed in the root directory e.g. your-domain.com/sitemap.xml;
  3. Register the sitemap with the big two search engines (Google and Bing (and Yahoo));
  4. Place a link to the sitemap in the footer of your site’s index page (I suggest the footer because, most often, the same footer is repeated on every page). This ensures that Yahoo! and other search engines can easily find the sitemap;
  5. If possible, place a link to your sitemap in robots.txt by adding this line to it:
  6. Sitemap: http://www.example.com/sitemap.xml
  7. Use My Page Rank to ping the major search engines with the details of your sitemap;
  8. To update the sitemap, use one of the sitemap generation tools but instead of overwriting the old sitemap with the newly created one, combine their contents. You can do this with sitemap-ripper or with this little bit of code:
    1. Place the content of both sitemaps (old and new) into one file called sitemap.xml.
    2. Open a terminal (Bash/Konsole/Console) and type or copy and paste this script into it
    3. sed -i 's#^[ t]*##g' sitemap.xml
      sed -i 's#http://www.#http://#g' sitemap.xml
      sed -i 's#http://#http://www.#g' sitemap.xml
      sed -i 's#<url>##g' sitemap.xml
      sed -i 's#</url>##g' sitemap.xml
      grep "<loc>" sitemap.xml > extracted.xml
      sort -u extracted.xml > sorted.txt
      rm sitemap.xml
      rm extracted.xml
      mv sorted.txt sitemap.xml
      sed -i 's#<url><url>#<url>#g' sitemap.xml
      sed -i 's#<loc>#<url>n  <loc>#g' sitemap.xml
      sed -i 's#</loc>#</loc>n  <changefreq>daily</changefreq>n  <priority>0.5</priority>n</url>#g' sitemap.xml
      sed -i.bak '1i <?xml version="1.0" encoding="UTF-8"?>n<urlsetn      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"n      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"n      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9n      http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">n      <!-- The site URLs go below here -->n      <!-- formatted with a script from http://journalxtra.com/downloads -->n' sitemap.xml
      echo '</urlset>' >> sitemap.xml
      rm sitemap.xml.bak
    4. That code removes superfluous whitespace at the beginning of all lines, changes all URLs to the http://www format, sorts the data, removes duplicate content, extracts all the mapped URLs, sets their priority to “0.5″ and specifies their frequency of change as “daily”. The final file it creates is the all important sitemap.xml. The downloadable script is more interactive, allows URL format, the change frequency, and page priority to be specified as it runs – plus it automatically combines the original sitemaps before it rips them apart, extracts the URLs, cleans them up, removes duplicates and reformats them into our Holy Grail.
  9. Repeat step 7 then 6 every time a new sitemap is generated.

Ensure that your URLs use only one of the http:// or http://www formats. If you’re URLs are mixed then the pages could be indexed twice or thrice which could be rewarded with a search engine penalty and lower page rank due to different backlinks pointing to different pages (http:// is different to http://www.)

If you cannot be bothered to do all the above, for as little as $19.99 you can get a sitemap generator that will create as many sitemaps as you need as big as you need them. It’s available from xml-sitemaps.com, provides automatic sitemap updates and will save you a lot of time.

Scripilitious

This free interactive utility box comes with two sitemap creation scripts that automate step 7 to produce a sitemap with the name sitemap.xml. Scriptilitious is known to work with Linux it might work natively with Windows but most likely will require Cygwin or some other Linux terminal emulator.

Instructions

  1. Unzip the downloaded file,
  2. Place the two sitemaps that need to be combined into the WorkBox folder (give them both a different name),
  3. Open a terminal in Scriptilitious folder and type ./scriptilitious.sh
  4. Further usage instructions are provided as the script runs.

Scriptilitious ScriptBox (79)

Reblog this post [with Zemanta]

dp seal trans 16x168 Crafty Sitemap Building  Copyright secured by Digiprove © 2010

PlayOnLinux Logo

PlayOnLinux Logo

The past couple of weeks have been interesting. Like a graffiti artist’s tag, I’ve seen PlayOnLinux cropping up in all sorts of different forum posts and blog comment spots. What is it, I thought, is it some kind of ploy to get people to consider Linux as a gaming platform (which it’s brilliant for)? Or is it a gaming forum or a new game? Well, enough is enough, I said to myself a few days ago, it’s time to take look at it. So, in the interest of gaming I googled PlayOnLinux. And this is what I found…

PlayOnLinux is a compliment to Wine (of Wine is Not an Emulator variety). It provides a graphical interface between Wine and the PlayOnLinux repository which contains loads of scripts which make it easy to install Windows programs on Linux under Wine.

It relies on Wine to make programs work so Wine must be installed before PlayOnLinux. The interface for PlayOnLinux provides a list of categorised scripts for installing programs such as Internet Explorer, Safari, Windows Media Player, WinAmp, DirectX and tons of games – the video below shows its current software list (6th Nov. 2009). You can even create your own scripts to aid the installation of other software. PlayOnLinux automatically downloads and installs any dependencies or Wine versions that are required for specific programs to successfully install. As you might have guessed, it permits multiple versions of Wine to be installed on one system – it automatically downloads and configures them so you have no messy configuration files to muddle your mind.

PlayOnLinux is a program I highly recommend to any Linux or Mac user (yes, there’s a mac version too. It’s called, surprise, surprise PlayOnMac):

It simplifies the installation of multiple Wine versions,

It simplifies the installation of Windows applications,

It simplifies the installation of Windows Games.

In short it won’t give you a headache and if you’re having problems with Wine then this is your analgesic.

You can get up to date details from its home website, here or, if you want to jump straight in, the download page for all Linux distros is here.

If you’re yet to install Wine, you can do it by typing “Wine” into your Package Manager (Synaptic, Apt, KPackageKit), or by visiting WineHQ.

Kubuntu uses a familiar boot from CD screen.
Image via Wikipedia

Update, 3rd Nov. 2009: Karmic Koala is now out of beta testing and is working as well as anticipated. I’m using Karmic Koala 64 bit and this review and guide is based on the 64bit version.

If you’re looking for extra repositories then try this JournalXtra article.

Just 14 days to go ’till version 9.10 of Kubuntu is unleashed on the world. Codenamed Karmic Koala, the beta version is  ready for download and I totally ignored the developers’ warnings not to install the beta on a machine that needs to be stable and reliable. I plunged and gambled my sanity when I downloaded it, burned it to disc, wiped my hard drive and installed it. That was a few days ago. I am now using the beta version and it is impressive. It is lightening fast – It took less than 30 seconds to boot to a working desktop immediately after installation. Takes a little longer now I’ve added a bit of software but it’s still quicker than any previous release. To say the Kubuntu team are aiming for a 25 second boot time, and my machine’s on its last legs, I reckon they’re very close to getting there.

This new release ships with:

Open Office 3.1.1 which sports better KDE integration and uses the Oxygen theme by default and the KDE file browser dialogue;

Amarok 2.2 RC which has an improved browser, movable layout elements, and a feature rich playlist editor. Aesthetically, it’s much better than previous versions;

K3b has become K3b 2 as it moves up from KDE 3 to KDE 4 integration;

Firefox 3.5, GIMP 2.6.7 and Wine 1.2 are available from the repositories along with a whole load of the latest versions of many software packages;

Firefox can be installed from the applications menu with a quick click of a link;

by default, the quicker and more efficient Ext4 file system is used instead of Ext3;

it comes with a USB disc creator for creating a USB Live Disc for trying and installing Kubuntu;

the Alternate installation disc provides options to repair a broken system and create a USB Live Disc at the boot prompt; and

The NetworkManager applet has been “improved” although I found it a little awkward to use compared to its predecessor.

Installation is easy: burn the disc, pop-it in the boot drive, restart the computer and follow the prompts.

The option to try Kubuntu without installation is, as always, available to testers and those who need or prefer to use a Live Disc instead of an installed system. The installation took longer this time than with previous distributions and the installation iso had to be burned to disc at a slow x12 speed (took about 15 minutes).

The overall feel of the product is one of smoothness and accessibility. I’ve been a fan of KDE 4 since it evolved into 4.2. Now it’s grown into the more mature 4.3.2 it’s renewed my relationship with my Linux box. As the screenshot shows, it has a desktop folder widget that opens folders in pop-up windows when the mouse is hovered over them; likewise it previews text files and images. The bottom right of the screenshot shows a Comic Strip widget that automatically downloads, displays and cycles through selected strips; and the top right of the screenshot shows an RSS feed widget. Widgets, including folder views, can be added to, moved about and subtracted from the desktop(s).

Kubuntu Karmic Koala 9.10 64 bit Screenshot

Kubuntu Karmic Koala 9.10 64 bit Screenshot (yes, they are two Canadian Geese and one domestic duck; my lake isn't yet cold enough for penguins ;-) )

The beta’s not without faults. It’s prone to package breakages when updated but they’re usually fixed with the next update. Fingers crossed, I’ve had no serious, unrecoverable problems since testing it. The default package manager is known to be unreliable but a simple apt-get install synaptic resolves that but then I prefer Synaptic to KPackageKit. Another issue I’ve noticed is that installed packages do not always appear in the programs menu; the solution is easy: right-click the programs menu, select “Menu Editor” and once it’s opened click save before closing it without actually editing anything.

If you’ve never tried KDE 4 or you were put off it by its original release then I challenge you to try Karmic Koala as a Live Disc and tell me that it hasn’t developed into one tasty baby and you don’t love it.

Making Kubuntu Multimedia Friendly

Like all official *buntu distros, Karmic needs a few extra repositories adding to it and packages installing into it to make it multimedia friendly. Here’s a list of the plastic surgery I gave mine to give the bodywork that extra lift (instructions to follow):

added the medibuntu repositories so I could grab all those multimedia playing goodies and non-free codecs;

added the ubuntu studio repositories so I could get even more multimedia extras;

installed Synaptic;

enabled the extra repositories;

installed kubuntu-restricted-extras;

installed libdvdcss2 and w64codecs (w32codecs for non 64 bit systems) to enable use of non-native media formats;

installed Firefox, Flash, VLC, Mplayer, GIMP (+ the GIMP repository), ImageMagick, Wine and LMMS (Linux Multimedia System), Real Player and Quicktime; then

activated propriety hardware drivers (ATI, Nvidia etc…).

Final Opinion:

Karmic Koala is a great product and I eagerly anticipate the release of the “finished” product so I can worry less about updates and the package breakages they might cause. Give it a try, it’s only a download away.

A few instructions for those who need them

Install the Medibuntu repository by copying and pasting the following lines into the Terminal, remember to press the return key afterwards. These lines of code are valid for all *buntu distros including 32 and 64 bit ones :

sudo wget http://www.medibuntu.org/sources.list.d/$(lsb_release -cs).list \
--output-document=/etc/apt/sources.list.d/medibuntu.list &&
sudo apt-get -q update &&
sudo apt-get --yes -q --allow-unauthenticated install medibuntu-keyring &&
sudo apt-get -q update

Install the Ubuntu Studio repository by copying and pasting the following lines into the Terminal (remember to press the return key afterwards):

sudo su -c 'echo deb http://archive.ubuntustudio.org/ubuntustudio karmic main >> /etc/apt/sources.list'
wget -q http://archive.ubuntustudio.org/ubuntustudio.gpg -O- | sudo apt-key add - && sudo apt-get update

Install the Synaptic package manager by typing or copying and pasting the following lines into the Terminal:

sudo apt-get install synaptic

Enable the extra repositories by opening Synaptic (use the program menu or type kdesudo synaptic into Terminal) then navigate through the top menu to Settings > Repositories and tick all items listed under “Downloads from the Internet”, next select the “Other Software” tab and tick all items but CD Rom. Close the Repositories dialogue then Reload the package list (click Reload in the menu bar else press Ctrl+R).

Use Synaptic’s search box to find and install each of these,

kubuntu-restricted-extras, libdvdcss2, w64codecs (w32codecs for non 64 bit systems), Firefox, VLC, Mplayer, GIMP (+ the GIMP repository), ImageMagick, Wine and LMMS (Linux Multimedia System) and Quicktime.

The installation instructions written below for Flash and RealPlayer may not be easy for you to follow if you’re new to Linux; but take solace, my friend, there’s another 64 bit Flash and RealPlayer installation guide here at JournalXtra.

Download RealPlayer from the RealPlayer website, move it to your opt folder (root/opt) you might need to run Dolphin as root to do that (type kdesudo dolphin in Terminal) , unpack it if it’s compressed, right-click the extracted binary file (.bin) and click properties. Under the Permissions tab, tick “is executible” then “O.K”. Open a terminal and type cd /opt/RealPlayer11Gold.bin and press return then type ./RealPlayer11Gold.bin to begin the installation process (change RealPlayer11Gold.bin to the name of the extracted binary file. The full stop before the forward slash (./) is important).

Install Flash via Synaptic unless you are using a 64 bit version of Kubuntu. 64 bit users should download 64 bit Flash from adobe by clicking here then remove any current flash installations by entering the following lines into Terminal

sudo apt-get purge flashplugin-nonfree gnash gnash-common mozilla-plugin-gnash nspluginwrapper swfdec-mozilla

Next, unpack the downloaded Flash plugin and move it to /usr/lib/firefox/plugins then start or restart Firefox to check  the plugin is properly installed by visiting Tools > Add-Ons > Plugins  and confirming Shockwave Flash is listed.

To enable the propriety (non-free) drivers navigate the Programs Menu (Kicker) to Applications > System > Hardware Drivers and activate the drivers relevant to your hardware.

WINE can be found in the Package Manager or installed with sudo apt-get install wine . For those looking for an alternative to WINE there’s a package called PlayOnLinux which can be installed by typing the following lines into Terminal:

sudo wget http://deb.playonlinux.com/playonlinux_karmic.list -O /etc/apt/sources.list.d/playonlinux.list
sudo apt-get update
sudo apt-get install playonlinux

GetDeb is another repository that provides titles not usually found in the official Ubuntu one (for example Acetone ISO). Instructions for adding it can be found here. The easiest method is to use the repository installer.

A Little Trivia

For anyone wondering which part of the air the Kubuntu developers pluck the titles, the numbers represent the year and month of release and the code name is an adjective and an animal name both beginning with the same letter. The initial letter of the code name follows the order of the alphabet so Intrepid came before Jaunty which comes before Karmic; Karmic Koala 9.10 is to be released during the 10th month of 2009; and Jaunty Jackalope 9.04 was released during the 4th Month of 2009.

If you’re looking for extra repositories then try this JournalXtra article.