Thursday, September 9, 2010

Making Pages EPUBs readable in BBEdit with GREP

I keep meaning to write a full article on using GREP to edit EPUB, but until I do, at least let me share some of the tools I'm working with.

Suppose you're testing Pages, Apple's word processing software that exports to EPUB, and you want to be able to look at the code to see just how Pages translates from word processing document to XHTML. Or why on earth it stuck an image between two words in your text. Unfortunately, Apple doesn't add returns, and frankly, it's tough to slog through:

Pages EPUB before

You can use GREP to add a return before each opening tag that doesn't already have one. Search for ([^\r])(<[^/]) (find a less than sign that is NOT after a return and NOT followed by a backslash) and replace it with \1\r\2 (whatever was found before the less than sign, followed by a return, followed by the less than sign and whatever followed it).

Then your XHTML will look like this:

Pages EPUB after

And you might actually be able to assess Pages' usefulness as an EPUB creation tool.

And I know I'm far from the first person to ask this, but how is it that Apple can get away with charging more than $100 for a product in Europe that in the US only costs $79 (and about $50 if you buy it on Amazon)?

14 comments:

  1. Hi Liz,

    You can also use xmllint (http://xmlsoft.org/xmllint.html) to do this, with the added benefit that the markup is properly indented. The command syntax is simple enough:

    xmllint --format packed.xhtml > formatted.xhtml

    This tool comes preinstalled on Macs and you can download versions for PC and Linux.

    -Steve

    ReplyDelete
  2. As for the price difference ... us in the EU have this thing called VAT that is usually around the 20% mark ... so a 100 US$ product would- just by adding the VAT - climb to 120 ...

    ReplyDelete
  3. Apple's not the only one to charge more than exchange rates would seem to justify for overseas sales. Adobe has drawn fire for the same policy. Some of Adobe's claims may be justified. In Europe, they have to pay a VAT tax that, by law, can't be a separate item from the cost (lest the public see how much they're being taxed). They also claim that language tweaks for smaller countries have to be written off over a smaller sales volume. But that doesn't explain why the US-English versions also costs more.

    I know you must be busy settling down in Spain, but I imagine I'm not the only one that would love to see you review the quality of EPub created by Pages and whether it passes muster for ebooks sold through the iBookstore.

    ReplyDelete
  4. BBEdit has a very fine search and replace function and if that's not sufficient, it also supports GREP fully. It's not free but there is a free, fewer featured version called Text Wrangler.

    ReplyDelete
  5. Liz, I believe you can just use BBEdit's Format utility. It's under Markup > Utilities. You can even make it hierarchical.

    ReplyDelete
  6. @Mike... I'm working on it! Thanks for your encouragement (both via email and here). I'll send you a note when it's up.

    @Aaron: I'm quite embarrassed to admit I have never used that menu option, and happy that you told me about it!

    ReplyDelete
  7. You are telepathic, Liz: more information about "GREPPING" this file is just what I was looking for. Thanks,

    Michael Pastore
    50 Benefits of Ebooks

    ReplyDelete
  8. @Aaron. Thanks for pointing me to the Utilities options. Unfortunately, none of them reformat the code just how I like... mostly they either put in too many returns, or not enough. I'm sure every programmer has their preferred way.

    @Michael: I'm working on some more GREP articles. Glad you found this one useful. GREP totally rocks.

    ReplyDelete
  9. "I imagine I'm not the only one that would love to see you review the quality of EPub created by Pages" (Michael).

    As a matter of fact, I only discovered Pages' ability to export EPUB a few minutes ago, a discovery that sent me like a shot to Liz's blog. I'm going to look into it myself, although compared to Liz I'm an amateur.

    ReplyDelete
  10. Hi, Liz

    I'm reading your book "Epub Straight" really so good information.
    I graphic designer so long years and now I'm studying CSS to improve my EPUBs.
    I've read about produce EPUBs and about your post I have one suggestions.

    What do you think in use Dreamweaver to edit EPUB, after unzip? It's a great tool to edit CSS and XHTML, with several resoucers.

    About your I have some doubts.
    Is possible the Indd export one group (like image + caption) within just one DIV?
    Usually Indd put one DIV to each object.

    Do you have any ideia to Indd don't export one span tag to each p tag?

    Thanks a lot

    ReplyDelete
  11. As far as I can see so far, the Pages EPUB export facility may well prove to be a lot better than the InDesign export facility.

    For a start, it reliably divides a long document into separate HTML files.

    For a second, the immediate result looks a lot "cleaner" to me than the InDesign counterpart.

    Of course, it's just impossible not to have to tinker with the HTML and CSS, and possibly add embedded fonts, what have you. But so far, so good, from this semi-amateur's perspective!

    ReplyDelete
  12. Just use the Format command in the Utilities menu.

    And if you’re really concerned about readability, your fonts and colours aren’t helping. Neither smaller type nor monospaced type is better.

    ReplyDelete
  13. @Joe, I'm not crazy about the options in the Utilities menu, they leave too much space.

    ReplyDelete
  14. hallo, i have download the original text files for Walden on Project Gutenberg in examples page of EPUB Straight to the Point'blog, and then you said, After massaging the files with GREP to remove the extra returns, the text files look like this. just wanted to ask with what soft did you manage to remove the extra returns?
    -also you can open the word template and do save as quick style set-

    ReplyDelete

More of my books