h1

More about creating eBooks from Word documents

August 27, 2012

At the end of June I spent a few hours investigating what cheap/free new tools/processes were on the market for converting complex Word documents into eBooks for reading on tablets, smartphones etc. My last foray into this area was back in February 2012.

The reason I did this was to try to stay a little ahead of the game with my main client, who has massive amounts of documentation in Word and PDF and who may well ask me about eBook options in the future. I also wanted to see for myself whether the claims of some tools were as good as they seemed to be. And I wanted to test the different tools on one of my client’s documents, not on some generic (simple) Word document created purely for testing purposes.

Bottom line: As at the date of this blog post, I believe that PDF is the best option if you have complex Word documents that you want to be read on a small form factor device. Why? PDFs will resize on phones, they preserve Word’s formatting, and PDF is ubiquitous in that you can create a PDF directly from Word 2007 and later without needing to have Adobe Acrobat. They are also incredibly quick and easy to create and require no specialized knowledge to get a decent result.

So how did I reach this bottom line?

First, I’ll discuss some of the features of the Word document that I used as my test document. Then I’ll talk about some of the tools I tested (and didn’t test — and why) and the results I achieved. Along the way I’ll talk about the time taken and the knowledge required to use the tools, which are very important factors to consider when you have a large document set to convert.

My complex test document

The Word documents I work on for my main client are complex. For example, each document can have any or all of these:

  • several hundred pages long
  • up to 100 section breaks for portrait/landscape and A3 page orientations and sizes, and they have odd/even pages for each section
  • headers and footers, which are different for odd/even pages
  • many tables, some of which can be quite complex with merged cells, sideways text, landscape orientation, small fonts, etc.
  • many figures (photos, charts, diagrams, maps, including landscape and A3 size figures)
  • track changes are turned on from one revision to the next
  • multiple authors or reviewers who add their track changes and comments
  • hundreds of comments from reviewers
  • document automation (fields and bookmarks), which populates the headers and footers and some of the front matter (four pages of front matter is common)
  • one or more appendices
  • outline numbering, which is used for all headings including appendices; typically three or four heading levels are used, though five is not unheard of
  • macros that enable the automation of some regular tasks
  • potentially hundreds of automated captions (figures and tables) and cross-references (figures, tables, appendices, and sections)
  • footnotes
  • many bulleted lists, typically to two levels, though sometimes three
  • three-level table of contents (which also includes the first Appendix heading level), list of tables, and list of figures
  • terminology, abbreviations list
  • in-text citations and a references list  (bibliography) for those cited works
  • and so on…

So these are not your typical Word documents that many tools are designed to convert (i.e. novels, short documents, documents with just a few headings etc.).

My earlier testing on these sorts of Word documents showed that the areas that the tools have the most trouble with are:

  • bulleted lists
  • tables
  • fonts
  • figures
  • justification and paragraph indents

Tools tested — and not tested

I wanted to test readily available and cheap/free conversion tools only. These tools needed to be simple enough for an author or admin assistant to use and seamless enough to not require specialist technical knowledge of HTML, CSS, XML, etc. They also needed to be quick to use and not require hours of pre- or post-tweaking of the input or output documents.

I did not want to test tools that are the domain of specialist technical writers — ordinary staff need to be able to use these conversion tools and whether I like it or not, Word is still my client’s authoring tool. Putting these Word documents into another tool is NOT an option — the authors must be able to update the documents easily and the authors all know how to use Word (with varying degrees of competency…). The prime job of these authors is to be health, safety, and environmental experts, NOT writers. Report writing is just one of their functions.

Therefore, I didn’t consider testing specialist tools such as ComponentOne’s Doc-to-Help (I believe they have an EPUB output option from Word), Adobe’s RoboHelp (also has EPUB and Kindle publishing options but I suspect only from content that’s in RoboHelp, not from Word), Adobe’s InDesign (again, content has to be imported/fixed up InDesign first), and Madcap’s Flare (like RoboHelp, it has an EPUB publishing output option, but the content must first be in Flare, not Word).

I’d already tested Sigil and Calibre back in February, and used those tools (particularly Calibre) in my recent testing. The tools I tested this time were ePubMaker (Epingsoft; US$59) and epub-builder (Wondershare MePub; US$39.95; no longer available). I had intended testing another one I’d heard about earlier in the year on one of my email discussion lists, but it was no longer available on their web page.

Test results

The results of my testing these two products were a mixed bag, and NONE of the outputs was suitable for immediate use. Fonts and indentations were all over the place, headers/footers, footnotes, outline numbering, automated captions/cross-references etc. either weren’t preserved or didn’t work consistently, tables were awful, and the documents could not be resized on my smartphone.

I uninstalled both products after the tests as neither was able to deal with my test document satisfactorily.

In both cases either a lot of tweaking to the Word document was required before publishing it to EPUB (e.g. in Word or through a clean-up tool like Word Cleaner), or after (e.g. via Calibre). And even then, the results left a lot to be desired.

Neither tool offered a seamless conversion from one format to the other. For converting one or a few documents, these tools may offer a good solution for some people, but I could never consider them in their present form for a task such as converting complex documents (and possibly thousands of these documents). Like other converters I’ve tested, if you have a simple Word document, then these sorts of converters have their place and may suit your purpose.

However, for a large-scale conversion, the amount of pre- or post-tweaking to get the EPUB result to come even close to looking like the source Word document is just not feasible. In my client’s case, the Word document must be preserved and both the initial and any future conversion needs to be quick and easy to do as updates are made to these documents regularly.

The light at the end of the tunnel…

After being quite disillusioned by the conversion options I tried, I decided on one final option — to PDF the test Word document using both Acrobat and the PDF converter in Word 2010 and put the resulting PDFs on my phone. I didn’t fiddle with the PDF settings in Acrobat. Guess what? Both PDFs worked really well. Fonts, indentations, justifications, bullets, tables, figures, headers/footers, footnotes, section breaks for different sized and oriented pages etc. were ALL preserved and just looked like they did in the Word document. And once the PDF was on my phone, I could use my fingers to ‘pinch’ it to zoom in and out on the document and everything resized accordingly.

The best thing about PDF? The authors and the admin assistants can all use it as it’s just an output option in Word. It’s a no brainer to create and takes just seconds (well, perhaps a minute or two for the really long docs). It’s quick, easy, and preserves the look of the document. It doesn’t require knowledge of all the various parts of an EPUB document, XML, CSS, HTML etc. and it doesn’t require hours of tweaking to convert just one document, only to do it all over again when the doc has to be updated.

So the bottom line for my client, in the absence of totally changing the way they create documents, would be that PDF is the best option at this stage.

Of course, that may change in the future as the industry matures and as tools become more widely available and sophisticated and require less technical knowledge on the part of the person doing the conversion (just like WordPress and the like have negated the need for the average person to know HTML to create a website).

But for now, if they ask me, I’ll be recommending PDF.

6 comments

  1. […] More about creating eBooks from Word documents (cybertext.wordpress.com) […]


  2. I haven’t found it in Word specifically but you have to watch out whether autoreflow is set. In earlier versions of PDF (haven’t been active in this space for a while, so please forgive the weasel words), reflow was an accessibility option that was set and allowed the PDF to be interpreted on other devices, such as phones or reading devices for the blind. (One application for creating PDFs had it hidden under Handicapped Access.)

    That may be on by default in Word, or in the latest version of the PDF spec. I don’t know. Or maye phones are just smarter.


  3. Did you test hyperlinks in footers and headers in 2010 PDF conversion? It does not work in 2007 so I was considering purchasing 2010 for the PDF converter feature. All our eBooks use anchor text hyperlinked and 2007 only converts raw links, too long and ugly!


  4. Hi Teresa

    Surprisingly, we don’t have links in these documents (we have everything else!). However, I just created a test document in Word 2010 and put a clickable URL in the header. I then created a PDF of this document (from the Word option, NOT via Acrobat or the Acrobat plug-in), and tested the hyperlink in the header in the resulting PDF. It worked fine.

    –Rhonda


  5. Rhonda, Thanks for replying. Now, I’m really confused. I called Microsoft today asking if their Word 2010 version had a built-in converter that worked on footer and header hyperlinks. He said “NO” that it was a 3rd party add-on for $49.95. I went to that site, downloaded their trial and it did not convert footer hyperlinks for 2007. Perhaps it’s just for 2010. Does your Word 2010 include the converter in “Save as Pdf”? The tech guy didn’t seem to know very much and their Chat support was equally ignorant so zero confidence in what they told me. Also, if it works as “Save as Pdf” could you try a hyperlink (not a raw link) in the footer and see if that works? If you have the time! Thanks! Teresa


  6. @Ronda and @Teresa:

    Kudos for the great and thorough article Ronda.

    I have a very similar client with complex documents (mainly lots of sections, different odd and even pages, TOC and hyperlinks). They also have been creating all of their documents in Word and want to export them to PDF as well as use them on tablets etc. in the near future.

    I have had some success with Calibre. I was able to make it recognize the headings in the Word document (even thought the client uses their own styles instead of the normal Heading 1, 2, 3 styles) using a simple XPATH expression.

    When I exported to EPUP and sent the file to the iPad, it did include a nice TOC based on the Word styles.

    Alas, Calibre didn’t seem to convert page per page – instead, I had say content in the ebook with a very large font spread over 3 or 4 pages, where in the original Word document all this content was displayed on one single page. I think this should be fixable but still wonder how…

    Also, the client is using vertical rectangles on the outer end of each page in Word, showing the text “Section” plus the section number the page belongs to. These were not converted to EPUB either.

    Have you had any progress in the meantime?

    Relating to Teresa’s question:

    I have tried inserting hyperlinks in headers in Word 2010. The reason was that my client would like a hyperlink on every page that can take the reader back to the TOC when clicked upon.

    Result

    Using Word’s built-in SAVE AS – PDF:

    Although linked TOCs will work in the PDF (meaning they are clickable and take you to the linked heading), hyperlinks inserted in the header and footer will (mostly) not.

    If you insert a hyperlink in the header, linking to the TOC (or to any other bookmark in the document) using Insert – Hyperlink, and then convert the document using SAVE AS – PDF, the link is not clickable in the PDF.

    On the other hand, if you insert a web-hyperlink in the header, e.g. http://www.google.com, it WILL be clickable. This only works for external links, e.g. http://www. addresses.

    I have tried to find out how to trick Word into converting the in-document hyperlinks the same way as the external hyperlinks, but have not been able to find a solution.

    Using Adobe Acrobat XI’s PDF Plugin for Word (on the Acrobat ribbon):

    In this case, hyperlinks in the header WILL be clickable – which is exactly what I wanted. Alas, in this case the other thing my client wants is not working: A vertical rectangle on the outer end of each page, showing the text “Section” plus the section number the page belongs to.

    This feature did work using the Word SAVE AS – PDF, but doesn’t using the Acrobat PDF Plugin…

    It is funny how every time you seem to have found a solution, another problem appears… ;-P

    All the best
    Marcus



Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.