More about creating eBooks from Word documentsAugust 27, 2012
At the end of June I spent a few hours investigating what cheap/free new tools/processes were on the market for converting complex Word documents into eBooks for reading on tablets, smartphones etc. My last foray into this area was back in February 2012.
The reason I did this was to try to stay a little ahead of the game with my main client, who has massive amounts of documentation in Word and PDF and who may well ask me about eBook options in the future. I also wanted to see for myself whether the claims of some tools were as good as they seemed to be. And I wanted to test the different tools on one of my client’s documents, not on some generic (simple) Word document created purely for testing purposes.
Bottom line: As at the date of this blog post, I believe that PDF is the best option if you have complex Word documents that you want to be read on a small form factor device. Why? PDFs will resize on phones, they preserve Word’s formatting, and PDF is ubiquitous in that you can create a PDF directly from Word 2007 and later without needing to have Adobe Acrobat. They are also incredibly quick and easy to create and require no specialized knowledge to get a decent result.
So how did I reach this bottom line?
First, I’ll discuss some of the features of the Word document that I used as my test document. Then I’ll talk about some of the tools I tested (and didn’t test — and why) and the results I achieved. Along the way I’ll talk about the time taken and the knowledge required to use the tools, which are very important factors to consider when you have a large document set to convert.
My complex test document
The Word documents I work on for my main client are complex. For example, each document can have any or all of these:
- several hundred pages long
- up to 100 section breaks for portrait/landscape and A3 page orientations and sizes, and they have odd/even pages for each section
- headers and footers, which are different for odd/even pages
- many tables, some of which can be quite complex with merged cells, sideways text, landscape orientation, small fonts, etc.
- many figures (photos, charts, diagrams, maps, including landscape and A3 size figures)
- track changes are turned on from one revision to the next
- multiple authors or reviewers who add their track changes and comments
- hundreds of comments from reviewers
- document automation (fields and bookmarks), which populates the headers and footers and some of the front matter (four pages of front matter is common)
- one or more appendices
- outline numbering, which is used for all headings including appendices; typically three or four heading levels are used, though five is not unheard of
- macros that enable the automation of some regular tasks
- potentially hundreds of automated captions (figures and tables) and cross-references (figures, tables, appendices, and sections)
- many bulleted lists, typically to two levels, though sometimes three
- three-level table of contents (which also includes the first Appendix heading level), list of tables, and list of figures
- terminology, abbreviations list
- in-text citations and a references list (bibliography) for those cited works
- and so on…
So these are not your typical Word documents that many tools are designed to convert (i.e. novels, short documents, documents with just a few headings etc.).
My earlier testing on these sorts of Word documents showed that the areas that the tools have the most trouble with are:
- bulleted lists
- justification and paragraph indents
Tools tested — and not tested
I wanted to test readily available and cheap/free conversion tools only. These tools needed to be simple enough for an author or admin assistant to use and seamless enough to not require specialist technical knowledge of HTML, CSS, XML, etc. They also needed to be quick to use and not require hours of pre- or post-tweaking of the input or output documents.
I did not want to test tools that are the domain of specialist technical writers — ordinary staff need to be able to use these conversion tools and whether I like it or not, Word is still my client’s authoring tool. Putting these Word documents into another tool is NOT an option — the authors must be able to update the documents easily and the authors all know how to use Word (with varying degrees of competency…). The prime job of these authors is to be health, safety, and environmental experts, NOT writers. Report writing is just one of their functions.
Therefore, I didn’t consider testing specialist tools such as ComponentOne’s Doc-to-Help (I believe they have an EPUB output option from Word), Adobe’s RoboHelp (also has EPUB and Kindle publishing options but I suspect only from content that’s in RoboHelp, not from Word), Adobe’s InDesign (again, content has to be imported/fixed up InDesign first), and Madcap’s Flare (like RoboHelp, it has an EPUB publishing output option, but the content must first be in Flare, not Word).
I’d already tested Sigil and Calibre back in February, and used those tools (particularly Calibre) in my recent testing. The tools I tested this time were ePubMaker (Epingsoft; US$59) and epub-builder (Wondershare MePub; US$39.95; no longer available). I had intended testing another one I’d heard about earlier in the year on one of my email discussion lists, but it was no longer available on their web page.
The results of my testing these two products were a mixed bag, and NONE of the outputs was suitable for immediate use. Fonts and indentations were all over the place, headers/footers, footnotes, outline numbering, automated captions/cross-references etc. either weren’t preserved or didn’t work consistently, tables were awful, and the documents could not be resized on my smartphone.
I uninstalled both products after the tests as neither was able to deal with my test document satisfactorily.
In both cases either a lot of tweaking to the Word document was required before publishing it to EPUB (e.g. in Word or through a clean-up tool like Word Cleaner), or after (e.g. via Calibre). And even then, the results left a lot to be desired.
Neither tool offered a seamless conversion from one format to the other. For converting one or a few documents, these tools may offer a good solution for some people, but I could never consider them in their present form for a task such as converting complex documents (and possibly thousands of these documents). Like other converters I’ve tested, if you have a simple Word document, then these sorts of converters have their place and may suit your purpose.
However, for a large-scale conversion, the amount of pre- or post-tweaking to get the EPUB result to come even close to looking like the source Word document is just not feasible. In my client’s case, the Word document must be preserved and both the initial and any future conversion needs to be quick and easy to do as updates are made to these documents regularly.
The light at the end of the tunnel…
After being quite disillusioned by the conversion options I tried, I decided on one final option — to PDF the test Word document using both Acrobat and the PDF converter in Word 2010 and put the resulting PDFs on my phone. I didn’t fiddle with the PDF settings in Acrobat. Guess what? Both PDFs worked really well. Fonts, indentations, justifications, bullets, tables, figures, headers/footers, footnotes, section breaks for different sized and oriented pages etc. were ALL preserved and just looked like they did in the Word document. And once the PDF was on my phone, I could use my fingers to ‘pinch’ it to zoom in and out on the document and everything resized accordingly.
The best thing about PDF? The authors and the admin assistants can all use it as it’s just an output option in Word. It’s a no brainer to create and takes just seconds (well, perhaps a minute or two for the really long docs). It’s quick, easy, and preserves the look of the document. It doesn’t require knowledge of all the various parts of an EPUB document, XML, CSS, HTML etc. and it doesn’t require hours of tweaking to convert just one document, only to do it all over again when the doc has to be updated.
So the bottom line for my client, in the absence of totally changing the way they create documents, would be that PDF is the best option at this stage.
Of course, that may change in the future as the industry matures and as tools become more widely available and sophisticated and require less technical knowledge on the part of the person doing the conversion (just like WordPress and the like have negated the need for the average person to know HTML to create a website).
But for now, if they ask me, I’ll be recommending PDF.