Sunday, September 12, 2010

Vim Makes a Sitemap File

This is probably the last tech (Vim) post I will be making as Traditional Hydrotherapy is online.

To try to jump start Google into indexing the site, I made a sitemap.xml file manually. I'm not sure if it will work, but I submitted it to Google anyway.

The instructions were quite clear from Manually Creating Sitemap Files, but how to achieve it quickly with Vim and over 900 files to index?

I used Split Screens, Reading-in files and, of course a repeating Macro.

I started by manually doing my home directory and making sure the parts worked. This was the longest part of the process and involved making some files (to read in) and the macro.

I had to produce code like this for each file:
<url>
<loc>http://www.traditionalhydrotherapy.com/Problems/VisceralCongestion.html</loc>
<lastmod>2010-08-17</lastmod>
</url>

I first made two files "top.n" and "end.n" to get the beginning and end of the url block
This is top.n:
<url>
<loc>http://www.traditionalhydrotherapy.com/Problems/
This is what end.n looked like:
</loc>
<lastmod>2010-08-17</lastmod>
</url>
Then I opened "sitemap.xml" in the hydro home directory (where the file would end up) and split the screen to "Problems/PageIndex.html" (using :sp Problems/PageIndex.html). PageIndex consist of an alphabetical list of all the pages in the directory, in this case the Problems directory.

I had to get the cursor in the right place in both files so moving to the line above the first file in the PageIndex list I then used Ctrl-W-P to move back the sitemap file then "G" to take me to the end of the file and ran this macro:
:read top.n^MG$^Wpj0f"lyt"^Wpp:read end.n^MkJxG$
:read top.n^M I read-in the contents of the top.n file (^M represents the single character for Enter, don't try to copy this file as I've escaped all the special characters - better to make your own keyboard macro)

G$^Wpj0 - move the cursor to the end of the sitemap file and Ctrl-W-p (CTRL is ^W) to take me to the PageIndex file and then move down one line and to the beggining of the line.
f"lyt" - find the first quote on the line, move one character to the right and yank to the next quote (this simply copies the url of the file from the link).
^Wpp - Ctrl-W-p back to the sitemap file and put the yanked text (the url)
:read end.n^M - read in the contents of end.n
kJx - move up one line and join the next line, deleting the space.
G$ - move the cursor to the end of the file and end of the line, ready for the next itineration of the macro.

Of course the this macro was repeatable so I just typed:
300@a
and it simply itinerated down the list 300 times. (the macro was in the "a" register)
When it got to the "</ul>" at the end of the list of files, it just stopped the macro as there was no " on the line.

I had to modify top.n and end.n for each of the four sections (Diseases, Effects, Problems, and Techniques) but otherwise it was quite quick.

I indexed 955 files in all.

The last bits were easy, I just added the xml header and footer and the file was complete.

Originally it was over 130 kb so I turned it into a 7kb .gz (sitemap.xml.gz) and ftp'd it to the site. Then I used Google Webmaster tools to submit it to Google.

Done. I think.

No comments:

Post a Comment