[Just happened upon Joshua Tallent's very similar technique for creating indexes in his Kindle Formatting book. For the record, I wrote this before I ever saw that section in his book. Great minds think alike! :)]
I really like indexes. I wish fiction books had them, but I believe non-fiction books MUST have them, especially mine. I'm intrigued by the possibilities for indexes with ebooks, but wasn't able to wait for the fancy stuff to be ready for my
new book about EPUB. So, I created my own.
I use InDesign to prepare my EPUBs. Curiously, InDesign won't export either the TOC or the index to its EPUB file. They simply don't appear, with or without links. So, I had to get creative.
The first problem was how to get the contents of my already created index out of InDesign. If you copy the index out to a text document, you don't get any of the formatting. You can export to Dreamweaver, and it will create an XHTML document, including everything
but your TOC and index.
The solution is to copy the index and paste it into a new InDesign document. Then choose File > Export for... > Dreamweaver. You'll have successfully fooled InDesign into thinking the index was just normal text and it will create a nice XHTML file for you, with properly formatted index entries.
I'll leave it up to you to add the proper CSS so that your index looks snazzy.
The real trouble is how to make the links work. There's no way to know what the pagination will be like in an EPUB document. An EPUB can be viewed on all different size screens, and your readers can change the font size and thus change the pagination, as many times as they like. Page 23 in an EPUB might be half of page 16 in the print edition one day, and all of page 30 the next!
So, how do you link the index to the referenced pages without totally recreating the index entries one by one? (Not even an index lover like me is going to do that.)
My solution was to mark the actual physical print pages in the EPUB. This has two benefits. First, if ereaders ever get around to supporting the
pageList function, I'll be able to use it to relate the print edition to the EPUB edition (which could be useful in classrooms, for example, to get everyone to the same content). For the record, Apple recommends the use of pageList, though I can't see that iBooks does anything with the information, at least not
yet.
My book only has 192 pages, so I decided the fastest way to mark the beginning of each physical page in the EPUB was by hand, inserting
<span id="pxx"></span>
, where
xx is the appropriate page number. It certainly took a lot less time than creating a target for each and every index entry (which still wouldn't have solved the problem of compiling multiple references into a single entry with multiple page references).
Next, I used GREP to convert my XHTML index entries into links to the now-marked pages.
The first step was to change the very few index entries that had numbers in the text itself. For example, I changed HTML5 temporarily into "HTMLfive" so that the
5 wouldn't be affected by my global changes later.
Since my EPUB has five separate XHTML files that might contain index entries, I had to use five different GREP expressions.
Here's the first one:
Search For:
( |–)(7|8|9|1[0-8])(,|</p>|–)
(Which means, search for a string that starts with a space or an en dash, followed by either a 7, 8, 9, or a 1 followed by one of 0-8, and then followed by either a comma, closing p element or an en dash. Which means, in plainer english, find me any index entry number from 7-18 (the pages in my introduction). I limited it to numbers after a space or en-dash and before a comma, closing </p>, or en dash so as to not grab two-digit numbers within three-digit ones.
Replace with:
\1<a href="ePub-STTP-1.xhtml#p\2">\2</a>\3
This Replace phrase references the three things we found in the Search phrase. The first was either a space or an en dash. The second is the page number, and the third is either a comma, closing p element or an en dash. And so we'll replace what we've found with either the space of the en dash, followed by the code for a link that references the XHTML document that contains pages 7-18 (the introduction), followed a hash symbol and the letter p, followed by the page number we found, followed by the end of the a element code, followed by the page number we found as the clickable text, followed by </a>, followed by the comma, closing p element, or en dash.
You have to run it twice because the first time it will get all the entries before an en dash and the second time it will get all the ones after it. But in my tests, it couldn't find the en dash if the search text was contained in the previous replacement text (which makes sense).
And poof, all the index entries for the introduction are done.
Now, we'll do the index entries for the first chapter, which in this example, goes from page 19 to 44.
We'll Search for:
( |–)(19|2[0-9]|3[0-9]|4[0-4])(,|</p>|–)
The only thing different here is the page numbers we're looking for. Here we want 19, or a 2 followed by 0-9 (that is all the 20's), or a 3 followed by 0-9, (the 30's), or a 4 followed by 0-4 (from 40–44).
And we'll replace it with:
\1<a href="ePub-STTP-2.xhtml#p\2">\2</a>\3
Which is the same as the first replacement text, except that this time it's the XHTML file that contains pages 19-44.
I won't bore you with the three remaining pairs of search and replaces, although I'm happy to share them if you need them.
Of course, you may need to adjust your search patterns to match your own index's formatting. And if you have cross references (see this topic, or see also that topic), you should change those into links as well.
Then I changed those few index entries that had numbers in the text (like HTMLFive) back into numbers (HTML5).
I was done with the links themselves, and all that was left was to finish off the XHTML code (the head section, closing body and html tags, etc.), and then add it to the
content.opf and
toc.ncx files (as described
in my book).
Once compiled into the EPUB it looks like this:
Find the entry you're interested, click the page number, and away you go!
This is not a perfect system. Since any ereader might divide what used to be a print page into various ereader pages, showing the beginning of the printed page may or may not include the referenced topic. The reader may have to turn an electronic page to find the topic. I added a note to the index to explain this.
Still, it's pretty good. It's lovely to be able to follow contextual index entries and find what you're looking for without having to search, say, every instance of "content.opf".
By the way, I have posted the
complete index for my new EPUB book. (The links are not live on the website, since they'd have no place to go.)