Search This Blog

Friday 10 May 2013

Practical use of macros and spreadsheets

This follows on from Planning a book with a spreadsheet.  It will make more sense if you read that first.  There are two more to follow: the links are at the end.

A word of warning: this is written for the sort of people who think "hacker" is an honourable designation, people who consider Notebook an excellent application for writing web pages, the sort of people who grok this:


Real programmers type COPY CON PROGRAM.EXE

This item based on tricks I developed while writing Australia's Pioneers, Heroes and Fools, ISBN 9781741960488, published November 1, 2007 by Pier 9 in Australia.

Cover, Australia's Pioneers, Heroes and Fools That book owed a lot to a lot of people. Most of them gave their time selflessly to contribute to Australian scholarship. I used what they did to make a book, but I believe it is worth acknowledging and explaining what I did.

This explanation is for people comfortable with macros and jumping text from Word to Excel and back again. If this is not you, don't go past the fourth paragraph!


In the first place, I owe thanks to the people who collected and collated a large number of historical records which were printed in limited editions, a few of which I found in libraries. Then there were the volunteers who transcribed some of the mss, saving me from having to puzzle through unfamiliar handwriting.

Most of all, though, I owe a debt of gratitude to the people at Project Gutenberg Australia. These lovely people have scanned and proofed most of the journals of the explorers, and that was my starting point for a database. I added text from original material and even papers, pamphlets and transcribed documents, but my major source was Project Gutenberg Australia.


If I describe what I did with their pieces, that effectively covers what I did with the mss and other sources as well, so read this with a bit of imagination.


First, I downloaded the text files, then I trimmed the Project Gutenberg end matter and pasted the rest into Microsoft Word. I have a standard macro which takes all double carriage returns and converts them to &&##, then it converts all single carriage returns to a space, before converting &&## to a carriage return. That fixed almost all of the paragraphs, but next, the macro replaced any double carriage returns with singles (3 times) and replaced double spaces with singles (5 times).


This left me with fairly neat text, which I was ready to work on. My aim was to select paragraphs to go into a database. I faced a major problem, in that spelling often differed: a crocodile in one account might be am alligator in another, or even an alegator! So I decided to add keyword codes to the entries.


I also needed to attach a date, the name of the explorer and other details, and I knew that with some 5300 entries, mistakes would creep in, unless I let the machine do the hack work. I needed to be able to sort by day of month, month and year, so these were separate fields.


I will let you in on a secret: I have been living with repetitive strain injury since 1981, and I maintain a gruelling writing schedule. I survive because I know how to make the machine do the work. I decided that rather than enter into Excel by pasting, I would do the initial pasting in Word, so I set up a couple of macros to add tab-delimited fields to a paragraph. Later, these fields would distribute into different columns.


I created a paragraph header like this: 

21{TAB}July{TAB}1847{TAB}John{TAB}Smith{TAB} -- in this case, the first entry in Smith's journal was 21 July 1847.


I copied this, and then set up a macro that added that on the front or a paragraph, highlighted the paragraph and changed the text to blue (to make it stand out -- my colour vision is deficient, and blue is better than red) and then added a string of tabs at the end. At the right point for the keyword codes, I added a single backslash -- I will come back to this later.


Then I started to go through the journal. If I chose a paragraph, I just ran the macro: if the passage I wanted was less than a paragraph, I selected the portion I wanted, made it into a paragraph, and ran the macro.


If there was more than one selection from a given date, I had a field, just after date, where I manually added a sequential number -- in the sorting that would follow, I wanted to make sure paras could always be returned to their order.


When the date changed, I pasted the old header, changed the date, and copied it to the clipboard.


When I had been right through the journal, I sorted the paragraphs. Then I cut the ones at the top, all the ones starting with dates, and pasted them to a new file before closing the old file (I did not want to save THOSE changes!).


Now I was ready to add keywords, but I did not need massive amounts of typing and occasional typos that make a mess of later searches, so I used mnemonic codes. \ta was transport animals, \tac and \tah were transport using camels and transport using horses and so on. So now I could add labels like \dac for dangerous animals crocodiles or \das which was snakes, and so on. Now I went to the leading backslash and added the appropriate codes. There were no spaces anywhere in the codes.


Now I was ready to paste the results into an Excel scratchpad where the dates and sequential numbers were converted to a unique value that I could always sort on to get the entries back in order. Then I could add the journal name in the top row and copy that down, and add a unique record accession number before pasting back into Word as unformatted text.


There usually seemed to be a few extra clean-ups needed to remove extraneous spaces, then I was ready to add the data to the database. While I sometimes ported the results into Access, for most purposes, a flatfile database in Excel was all I needed, and I made Excel the native format.


I repeated this for each volume, and then I was ready to convert the keyword codes, which were in a separate Excel worksheet. The codes were in column A, the meaning was in column B.


Creating a macro to convert 200 codes was a bit hairy. First, I had to sort the codes in Excel, according to length: it was essential to convert \tahf before \tah before \ta -- think about it! Now macros in Word (yes, Word -- bear with me!) are large and sloppy. Trial and error revealed that macros converting more than 50 codes fell over. I also knew that macro conversion is tedious, so I wrote a macro to convert just three codes and broke into it.


Once you have some repetition, parsing the code is easy. I identified the whole sequence, then using the =CONCATENATE command in Excel, I put the entire code for a conversion in column C. There were a few problems like line breaks and quote marks, but I used stray symbols for those. Then I copied all the conversion codes into Word, replaced the stray symbols with the necessary line breaks and quote marks.


Now I was ready: I selected and copied the first 50 codes, opened my macro with the editor and pasted over the codes.


I now opened a new file, and carefully copied the key codes column and pasted it into that file, ran a search and destroy to remove ANY spaces and ran the macro. Then I went back to the codes file, moved the second fifty codes into the macro, went to the other file and ran the macro, and so on. At the end, I ran a check for any outstanding back-slashes. These were the invalid codes, and each was then converted manually, once I worked out what it should be.


Next, I took the file of converted codes, and pasted them back into Excel. Now I had the tool I needed, ready to group the events of a given day, or to find the instances of dangerous insects, or whatever. Everything was in Excel, and could be manipulated into other formats as needed.


And now for something completely different: 


Handling bibliographies.


The figure above shows how I set my bibliographies out in a spreadsheet. The important thing to notice is Column G, where all of the information stored in the other columns is pulled together by an =CONCATENATE command. There are three of these: one for books, one for journals and one for web resources, and they read as follows:

Book resources (from G2)

=CONCATENATE(B2&", "&C2&". "&D2&": "&E2&", "&F2&".")

Journals and periodicals (from G425)

=CONCATENATE(B425&", '"&C425&"'. "&E425&", "&F425&".")

Web resources (from G433)

=CONCATENATE(B433&", "&C433&". "&D433&", "&F433&", last viewed "&E433&".")


From there, you are on your own. Note that Excel is very unforgiving about the syntax in these strings. Extra spaces, missing brackets and forgotten ampersands will all make it baulk.

So why do I bother, when there is brilliant software that does this sort of thing? It wasn't around when I started, and because I always use the same format and often come back to the same references in other works, it suits me to be able to port stuff across, rather than use Endnotes or whatever. That's just me.


One of the good things about this is that I can assign priorities in another column, sort by that or by the "seen" field, and within a given library, sort by call number, making shelf searches easier. A file name for a PDF or a GIF is my short-hand way of reminding myself that the file in question is on my computer. There is also a comments field for me to note the references which proved to be absolutely useless.   You need that, sometimes.


As indicated, there is more to this series, which actually began with Effective Science Writing, Part 1.


Planning a book with a spreadsheet;
Macro tricks in Word, Part 1; and 
Macro tricks in Word, part 2.

No comments:

Post a Comment