Thursday, September 30, 2010

Why I Crack Open InDesign EPUBs


InDesign CS5 is currently my favorite tool for generating EPUB files from existing documents. It's probably no coincidence that I'm a long time InDesign user in general, quite comfortable with its features and idiosyncracies. In fact, I don't use a standalone word processor at all. I hadn't bought Word for years until I needed it for examples for my latest book. I use InDesign even for the most basic letters. And InDesign's EPUB generation is pretty good: it generates all the persnickety XML files and exports a fair bit of the design from your original document.

But it could do so much more.

Here's why I currently have to crack open InDesign files (solutions to all these problems are in my book). Let's hope the next version of InDesign does all this for me!

1. First and foremost, because the EPUB doesn't validate. To do so, I must add the dc:date element.

2. Because InDesign doesn't properly designate an image or given page as a cover. This is huge!

3. To add metadata. Without standard, accurate metadata, you might as well not publish an ebook because no-one will be able to find it.

4. In order to apply text wrap to images and sidebars. Since InDesign supports the creation of text wrap, around images by themselves, groups of images and captions, and sidebars of text, it should also be able to export it into the EPUB (especially considering that only small adjustments in resulting XHTML and CSS are required).

5. To wrap text around drop caps.

6. To format first line of text (what in InDesign is a Nested Style) so that it maintains formatting in EPUB.

7. To specify which paragraphs should be kept together, for example, images with captions, headers with first paragraph following, etc. Again, InDesign supports keep together but doesn't export to EPUB.

8. To specify page breaks. Although iBooks does not yet support page breaks (which is incredible in itself), other ereaders do. This is a major concern of ebook publishers.

9. To specify widows and headers. One more instance where InDesign already has the functionality but does not support it in export to EPUB. Hugely important for ebook publishers (even though not yet supported in iBooks, it is supported by other ereaders).

10. Because Apple says, on page 15 of their Publishing Guidelines, that "All book elements (for example, the cover, table of contents, first chapter, index, and so on
must be identified in the <guide> block." Note the use of the word "must". Note that <guide> is optional according to the current version of EPUB, and Apple has not yet rejected books without <guide> blocks, to my knowledge.

11. To add pagination guidelines, so that readers can find passages from a particular page of a print edition in their ebook edition (e.g., in a college class).

12. To eliminate erroneous default values that InDesign applies to non-specified properties (which mess up CSS inheritance).

13. To correct/use font names supported by the iPad.

14. To add Zapf Dingbats (which must be added as UTF-8).

15. To add text in foreign languages, like Hebrew, Arabic, Thai, Chinese, Japanese, etc.

16. To fix the spacing bug that occurs when exporting headers (see page 163 in the print edition of my EPUB book).

17. To force iBooks to support left-aligned text (by inserting a span tag within every p element). Granted, this is a hack, but text will continue to look really awful when fully justified until we have decent hyphenation dictionaries in ereaders.

18. To change spacing and margin units from ems to percentages in order to adapt to smaller screens. This is a complicated issue, and will probably get more complicated, but also hugely important.

19. To deal with hyphenation.

20. To fix or insert links. Currently, EPUB export breaks if you try to export a link that contains an ampersand, and most links to Amazon to buy books contain ampersands! <strong>Update (3/Oct/2011): CS5.5 exports links from multiple chapters correctly! </strong>

21. To insert video or audio. I would recommend following Apple's lead on this, using HTML5 tags, but allowing for different formats.

22. To maintain the inheritance among styles in the InDesign document in the generated styles in the CSS. I know this is complicated, but if I can do it with a little bit of GREP, couldn't ID do it automatically?

Most of these issues are described (and solved with workarounds) in my EPUB book. But wouldn't it be nice if InDesign just did this for us? What do you wish InDesign could do?

Monday, September 27, 2010

Pages exports video in EPUB incorrectly

One of my big surprises while testing Pages was the way it treats video. Apple claims on its website that Pages supports exporting video in its EPUB files, and thus, Pages would seem like a great tool for creating enhanced ebooks.

Unfortunately, it's not. And frankly, it just looks like sloppy coding to me. Honestly, I don't understand it.

I created a new Pages document and dragged a video into it, while holding down the Command key, so that it would be "inline" (required for Pages to export to EPUB).

I then exported the resulting document to EPUB. And although the EPUB file passed EpubCheck validation, the video doesn't actually play in iBooks.

Video has Broken Play in Pages EPUB

That seemed strange to me, so I opened up the EPUB file and looked at the XHTML files generated by Pages. They don't follow Apple's own recommendations!

The problem seems to be that the video is embedded in the wrong format: .mov instead of .m4v. And the file is declared incorrectly in the manifest in the opf file: video/quicktime instead of video/mpeg4.

Indeed, if I make these small adjustments, the video plays perfectly fine. (You can get full details from my book: EPUB: Straight to the Point.)

Also note that although the above referenced document doesn't claim that Pages supports audio, Pages does, sort of. It actually exports audio to EPUB also as a QuickTime file, but somehow it works anyway. Nevertheless, it doesn't follow Apple's own recommendations as outlined in the "iBookstore: Publisher User Guide 1.3.1", which state that audio should be have the .m4a extension. And Apple's sample file uses "audio/mp4" as the proper mime type in the OPF file, while Pages exports as "video/quicktime".

It's downright frustrating that Apple's own tool for creating enhanced EPUB files don't work in Apple's own ereader!

On using Pages for EPUB (part 1)

Apple recently announced that they had added EPUB export to their word-processing software, Pages. If you already own Pages, you just have to go to Software Update to get the new features. My first tests show that it creates perfectly reasonable EPUB files that pass EpubCheck 1.05 with no errors.

Several people have asked me to do a quick rundown on the usefulness of this feature, and so I've been playing with it to a bit. But lately, I've been having trouble doing anything quickly, so instead of obsessing about getting this just right, I'm going to write a series of blog posts as I pull out all the details.

Before I write anything else, I have to tell you that I'm used to laying out technical books in InDesign. With that in mind, I will try not to complain about (or even mention) Pages' weaknesses as a layout program, and instead concentrate how it converts its documents to EPUB format.

Use Word Processing Templates

First and foremost, you should always choose Word Processing templates (or blank ones) when doing an EPUB project. Pages won't let you export documents based on Page Layout templates.

Use Inline Images (not Floating)

Second, most of Pages’ templates include so-called “floating images” which do not export to EPUB. You either have to choose a template with inline images, or convert the images yourself before exporting. For the record, Pages uses the word floating in a way that is quite confusing for those already versed in web design, where floating has to do with text wrap. Indeed what Pages means by floating is much more akin to “absolute positioning” in CSS. Indeed, it means that the image is not linked to the text, and is instead fixed to a particular area of the page (where it “floats”). If you add more text around or on top of the image, its position is not affected. Since there are no physical pages in an exported EPUB, it doesn't make sense to export a “floating” image.

An “inline image” in contrast is anchored to a specific spot in the text. Add more text before or after that point and the image will move to a new position on the page. EPUB documents are rivers of text and inline images.

You can also choose whether or not text should wrap around your inline images, which I describe in more detail in the next section.

Note that to add inline images to your document, you must hold down the Command key while dragging an image from the Media box. You can also select an existing image, go to the Wrap section of the Inspector, and choose Inline (moves with text).

It's rather easy to stick images between words. That'll look OK in Pages,

images within text in Pages

but it won't look at all right in EPUB.

Image within text, in iPad

Be sure that the anchor symbol is at the beginning of the paragraph.

Wrapping Text around Images

Pages does support wrapping text around images, and even exports such wrapping to EPUB (in contrast with InDesign). Simply go to the Wrap section of the Inspector and choose the desired wrap configuration.

Unfortunately, you have to assign the same amount of extra space to three sides of a wrapped image. That is, if the image is wrapped to the left, and you accept the default value of 12pts for the Extra Space in the Wrap Inspector, Pages will export the image with a style that assigns the margin-bottom, margin-top, and margin-right to precisely 2.5641%, which for the margin-top and margin-bottom is exactly 2.5641% too much, especially if the image should be aligned with the top of the paragraph. The result isn't so pretty:

Text Wrap in Pages in EPUB

You have to decide whether you'd rather have the text jammed up next to the image on all three sides, whether you'd rather have extra space on all three sides, or whether you're willing to go into the EPUB and manually change the CSS (which you can read about in my book).

Creating a Cover

Pages has the useful but perhaps tricky option of converting the first page of your document into the cover of your EPUB file. It basically takes a screenshot of all the images and text on the first page and saves it in PNG format, and then designates the PNG file as the cover. This is signficant for a few reasons. First, this is the only place in your EPUB document where you can maintain the exact layout of the original page. If your text begins in the middle of the page, it will begin in the middle of the page in the cover as well. If the image takes up half the page, it will continue to do so. This means that you should be careful to choose a section that has a layout that will make sense as a cover. In particular, if you reduce it to a tiny icon, will it still be legible?

And unfortunately, the quality of the resulting PNG file is less than stellar. The only place you can really see the cover in a large size is opposite the table of contents when you're holding the iPad horizontally. In my rough example, it's pretty blurry:

Cover from Pages in iPad

I guess I'd recommend using really big text, and not very much of it.

If you don't use the first page as the cover, Pages makes a screenshot of your first page anyway to display opposite the table of contents, though it uses a generic icon of the book in the book list.

Aligning text (centering, justification)

Here, Apple falls into its own trap. You can set the alignment (centering, justification, or just plain left or right) in Pages, and it even exports it properly, but because Apple displays all text as justified in iBooks unless you use special tricks, the text in the exported EPUB won't be aligned as you wished.

CSS

Pages exports information for the following CSS properties for each and every paragraph (you can forget the Cascade): color, font-size, font-style, font-variant, font-weight, letter-spacing, margin-bottom, margin-top, padding-left, padding-right, text-align, text-decoration, text -indent, text-transform.

Not only that but it gives values to 4 decimal places! Four! A retina display for the iPad must be coming soon...

I write in my book about the perils of specifying too many values in the CSS: basically, you're breaking the Cascade of inheritance on which CSS gets its strength.

Breaks

You can choose to create section breaks, page breaks, column breaks, and layout breaks. The most you'll ever get is a <br style="clear: both;" /> That's not going to get you an actual page break, or any other kind, but it will keep the text that follows from wrapping around any previous elements. I've found that the Layout Break is the most useful for this function as it doesn't add extra page breaks in Pages (which, of course, does support them).

(To be continued)

Tuesday, September 21, 2010

A Proud Internaut

[Cross posted from A Year in Barcelona.]

When we lived in the States and there were considerably less Catalans around with whom to keep up the language, I used to look around on the internet for ways to practice. Several years ago, I found a radio show called L‘Internauta, whose tagline I just love: “a program about technology, but more importantly, about what people do with technology”. I would download the podcasts, and listen to them in my car, incongruously hearing Catalan while seeing America.

But it wasn’t just language practice. The program is directed by a guy named Vicent Partal, who also began and runs one of the most, if not the most, important source of Catalan news online, Vilaweb. He is a great radio host: knowledgeable and self-effacing, warm and personable. Each week he finds really interesting people to talk with on the show: a group beginning a twitter-based book club, a Catalan programmer-cum-entrepreneur who has written award-winning tools for social marketing, a rural group that used the internet as an organizational tool, and many more. And there are also regular guests, who help direct the conversation to the topics of the week, be they new products, viruses, or general technology news. It always seemed like they were talking about just what I needed to know more about, and I learned a lot. OK, and I can't deny I'm intrigued by how Vicent, who is from Valencia,  conjugates verbs!

Those of you who've read my books know that I tend to use a lot of examples from Catalunya. Usually, I just use my own photographs in order to avoid dealing with rights issues, but several years ago, I needed an example of a video on YouTube. I'd never corresponded with Vicent before, but I thought I'd ask him if he wouldn't mind if I used one of Vilaweb's broadcasts. I'm not sure what he thought of this American stranger writing him in Catalan with such an unusual request, but he kindly gave me permission and you can see the results on pages 306-7 of the Sixth Edition of my HTML book.

Over the next few years, as I listened more assiduously to L’Internauta, and read Vicent's editorials in Vilaweb, often about Catalunya's struggle for independence, I would write Vicent emails from time to time. Probably “fan mails” would be more accurate. I was just an admirer, impressed with the technology, his political insight, and his obvious caring about people. Once in a while, I would suggest some collaboration, and even tried translating Vilaweb articles into English for a time.

It's a curious relationship. There are a few people who follow me in this way too, folks who have used my books to start their careers and then write me letters every once in a while, and update me on what they're doing. I think it's part of how the internet works, and I like it. Somehow it connects us all together.

A few weeks ago, they opened a new Apple Store in Barcelona. I used the purchase of Pages (which can now export to EPUB!) as an excuse to go visit. And there among the throngs of devotees, I recognized Vicent Partal. It took me a while to get up the gumption, but I finally went over and introduced myself. He was perfectly friendly, but distracted with the business he had come to do. I still left happy that I had finally met in person someone I really admired.

When I got home, I was pleasantly surprised to find an email from Vicent. And when I told him about my latest book on EPUB, he suggested I be a guest on L’Internauta. That was today. At first I was really nervous, but it was so fun. It's going to sound stupid, but I love talking about EPUB. I love that anyone can publish a book, that you don't need a publisher, or expensive software, and I love helping people do it. I'm totally intrigued with how the whole market is shifting and how publishing is changing, and I've spent the last six months following all sorts of interesting people on Twitter to learn more about it. And it was really special to be on my own favorite radio program!

I think I've explained this whole thing because this, for me, is the beauty of the internet: making connections with people who might be far away and might speak other languages, but who have a common interest and are willing to share what they know. This is what that blog post the other day about print versus ebooks was getting at: How long are you willing to bet against people using things into beauty? I bet on beauty winning out.

Wednesday, September 15, 2010

Corollary to the Heisenberg Uncertainty Principle

Yesterday in the metro on the way to pick up the kids from school, it occurred to me that there should be a corollary to the Heisenberg Uncertainty Principle. My extremely basic memory from seventh grade of that principle is that you can't observe a particle with absolute accuracy because the very act of observing it affects the particle and makes it do things it wouldn't have otherwise.

But what I'm realizing is that observing not only affects the particle in question, but these days as I drown in tweets, it affects the observer (me). I need to keep track of what's going on, but if I keep completely up-to-date, I don't have any time to create. If I only create, I don't have any time to keep up to date on what's going on outside my office.

Thus, the Corollary: I can neither keep completely up-to-date, nor only create. The trick, as ever, is to find the balance.

Thursday, September 9, 2010

Making Pages EPUBs readable in BBEdit with GREP

I keep meaning to write a full article on using GREP to edit EPUB, but until I do, at least let me share some of the tools I'm working with.

Suppose you're testing Pages, Apple's word processing software that exports to EPUB, and you want to be able to look at the code to see just how Pages translates from word processing document to XHTML. Or why on earth it stuck an image between two words in your text. Unfortunately, Apple doesn't add returns, and frankly, it's tough to slog through:

Pages EPUB before

You can use GREP to add a return before each opening tag that doesn't already have one. Search for ([^\r])(<[^/]) (find a less than sign that is NOT after a return and NOT followed by a backslash) and replace it with \1\r\2 (whatever was found before the less than sign, followed by a return, followed by the less than sign and whatever followed it).

Then your XHTML will look like this:

Pages EPUB after

And you might actually be able to assess Pages' usefulness as an EPUB creation tool.

And I know I'm far from the first person to ask this, but how is it that Apple can get away with charging more than $100 for a product in Europe that in the US only costs $79 (and about $50 if you buy it on Amazon)?

More of my books