Posts Tagged ‘eBooks’


More about creating eBooks from Word documents

August 27, 2012

At the end of June I spent a few hours investigating what cheap/free new tools/processes were on the market for converting complex Word documents into eBooks for reading on tablets, smartphones etc. My last foray into this area was back in February 2012.

The reason I did this was to try to stay a little ahead of the game with my main client, who has massive amounts of documentation in Word and PDF and who may well ask me about eBook options in the future. I also wanted to see for myself whether the claims of some tools were as good as they seemed to be. And I wanted to test the different tools on one of my client’s documents, not on some generic (simple) Word document created purely for testing purposes.

Bottom line: As at the date of this blog post, I believe that PDF is the best option if you have complex Word documents that you want to be read on a small form factor device. Why? PDFs will resize on phones, they preserve Word’s formatting, and PDF is ubiquitous in that you can create a PDF directly from Word 2007 and later without needing to have Adobe Acrobat. They are also incredibly quick and easy to create and require no specialized knowledge to get a decent result.

So how did I reach this bottom line?

First, I’ll discuss some of the features of the Word document that I used as my test document. Then I’ll talk about some of the tools I tested (and didn’t test — and why) and the results I achieved. Along the way I’ll talk about the time taken and the knowledge required to use the tools, which are very important factors to consider when you have a large document set to convert.

My complex test document

The Word documents I work on for my main client are complex. For example, each document can have any or all of these:

  • several hundred pages long
  • up to 100 section breaks for portrait/landscape and A3 page orientations and sizes, and they have odd/even pages for each section
  • headers and footers, which are different for odd/even pages
  • many tables, some of which can be quite complex with merged cells, sideways text, landscape orientation, small fonts, etc.
  • many figures (photos, charts, diagrams, maps, including landscape and A3 size figures)
  • track changes are turned on from one revision to the next
  • multiple authors or reviewers who add their track changes and comments
  • hundreds of comments from reviewers
  • document automation (fields and bookmarks), which populates the headers and footers and some of the front matter (four pages of front matter is common)
  • one or more appendices
  • outline numbering, which is used for all headings including appendices; typically three or four heading levels are used, though five is not unheard of
  • macros that enable the automation of some regular tasks
  • potentially hundreds of automated captions (figures and tables) and cross-references (figures, tables, appendices, and sections)
  • footnotes
  • many bulleted lists, typically to two levels, though sometimes three
  • three-level table of contents (which also includes the first Appendix heading level), list of tables, and list of figures
  • terminology, abbreviations list
  • in-text citations and a references list  (bibliography) for those cited works
  • and so on…

So these are not your typical Word documents that many tools are designed to convert (i.e. novels, short documents, documents with just a few headings etc.).

My earlier testing on these sorts of Word documents showed that the areas that the tools have the most trouble with are:

  • bulleted lists
  • tables
  • fonts
  • figures
  • justification and paragraph indents

Tools tested — and not tested

I wanted to test readily available and cheap/free conversion tools only. These tools needed to be simple enough for an author or admin assistant to use and seamless enough to not require specialist technical knowledge of HTML, CSS, XML, etc. They also needed to be quick to use and not require hours of pre- or post-tweaking of the input or output documents.

I did not want to test tools that are the domain of specialist technical writers — ordinary staff need to be able to use these conversion tools and whether I like it or not, Word is still my client’s authoring tool. Putting these Word documents into another tool is NOT an option — the authors must be able to update the documents easily and the authors all know how to use Word (with varying degrees of competency…). The prime job of these authors is to be health, safety, and environmental experts, NOT writers. Report writing is just one of their functions.

Therefore, I didn’t consider testing specialist tools such as ComponentOne’s Doc-to-Help (I believe they have an EPUB output option from Word), Adobe’s RoboHelp (also has EPUB and Kindle publishing options but I suspect only from content that’s in RoboHelp, not from Word), Adobe’s InDesign (again, content has to be imported/fixed up InDesign first), and Madcap’s Flare (like RoboHelp, it has an EPUB publishing output option, but the content must first be in Flare, not Word).

I’d already tested Sigil and Calibre back in February, and used those tools (particularly Calibre) in my recent testing. The tools I tested this time were ePubMaker (Epingsoft; US$59) and epub-builder (Wondershare MePub; US$39.95; no longer available). I had intended testing another one I’d heard about earlier in the year on one of my email discussion lists, but it was no longer available on their web page.

Test results

The results of my testing these two products were a mixed bag, and NONE of the outputs was suitable for immediate use. Fonts and indentations were all over the place, headers/footers, footnotes, outline numbering, automated captions/cross-references etc. either weren’t preserved or didn’t work consistently, tables were awful, and the documents could not be resized on my smartphone.

I uninstalled both products after the tests as neither was able to deal with my test document satisfactorily.

In both cases either a lot of tweaking to the Word document was required before publishing it to EPUB (e.g. in Word or through a clean-up tool like Word Cleaner), or after (e.g. via Calibre). And even then, the results left a lot to be desired.

Neither tool offered a seamless conversion from one format to the other. For converting one or a few documents, these tools may offer a good solution for some people, but I could never consider them in their present form for a task such as converting complex documents (and possibly thousands of these documents). Like other converters I’ve tested, if you have a simple Word document, then these sorts of converters have their place and may suit your purpose.

However, for a large-scale conversion, the amount of pre- or post-tweaking to get the EPUB result to come even close to looking like the source Word document is just not feasible. In my client’s case, the Word document must be preserved and both the initial and any future conversion needs to be quick and easy to do as updates are made to these documents regularly.

The light at the end of the tunnel…

After being quite disillusioned by the conversion options I tried, I decided on one final option — to PDF the test Word document using both Acrobat and the PDF converter in Word 2010 and put the resulting PDFs on my phone. I didn’t fiddle with the PDF settings in Acrobat. Guess what? Both PDFs worked really well. Fonts, indentations, justifications, bullets, tables, figures, headers/footers, footnotes, section breaks for different sized and oriented pages etc. were ALL preserved and just looked like they did in the Word document. And once the PDF was on my phone, I could use my fingers to ‘pinch’ it to zoom in and out on the document and everything resized accordingly.

The best thing about PDF? The authors and the admin assistants can all use it as it’s just an output option in Word. It’s a no brainer to create and takes just seconds (well, perhaps a minute or two for the really long docs). It’s quick, easy, and preserves the look of the document. It doesn’t require knowledge of all the various parts of an EPUB document, XML, CSS, HTML etc. and it doesn’t require hours of tweaking to convert just one document, only to do it all over again when the doc has to be updated.

So the bottom line for my client, in the absence of totally changing the way they create documents, would be that PDF is the best option at this stage.

Of course, that may change in the future as the industry matures and as tools become more widely available and sophisticated and require less technical knowledge on the part of the person doing the conversion (just like WordPress and the like have negated the need for the average person to know HTML to create a website).

But for now, if they ask me, I’ll be recommending PDF.


Creating an eBook from a Word document

February 3, 2012

These notes are to remind me what I did to create an eBook from a Word document in case I have to create more. They are not step-by-step instructions — they are just my notes from my first foray into creating eBooks.

Some background: I have an Android phone, so was only able to test on that platform.


  1. Format the Word document so that it uses Heading styles for each section. Minimal formatting elsewhere.
  2. Save the Word doc as filtered HTML.
  3. Optional: Clean up the resulting HTML using Word Cleaner (use the Clean up existing Word HTML files conversion option; however, this can strip the heading information from the document and re-applying that and other styles may not be worth the effort).
  4. Use Sigil to create the initial ePub book from the filtered HTML output from Word.
  5. Use Sigil to clean up the output, add a table of contents, etc.
  6. Use Calibre to create ePub and other eBook formats, add a cover page and picture, if necessary.
  7. Test the resulting output in the Calibre reader.
  8. Republish the ePub in Calibre and/or republish to other formats too (e.g. MOBI for Kindle).
  9. Test on various devices/readers.

If you’re using an Android phone, you can test the ePub and MOBI formats in the Aldiko and Kindle apps. However, for Kindle on Android you may have to plug the phone into your computer as a hard drive and transfer the MOBI document to the Kindle folder. I found that the Kindle app for Android wouldn’t open a MOBI file transferred via Bluetooth or DropBox, but would open it if I transferred it. The ePub version worked straight away in Aldiko.

Potential issues with Word documents when converted to MOBI and viewed on Kindle

Some of these potential issues could be corrected with adjustments to the CSS; however, for these initial tests, I didn’t do that as I wanted to see how easily it was to create eBooks ‘out of the box’.

  • Bulletted lists: May not display correctly. The bullet is preserved, but the indentation is a little out-of-whack.
  • Tables: The data displays, but cell background colors display as little colored markers, instead of the color filling the cell background. The test table I used had 5 columns — any more and it would struggle to display them all. Heading row repeat is not preserved so as you scroll pages for a long table, there are no headings at the top of the page to tell you what data you’re looking at.
  • Justification: The text is fully justified, even though the Word document was left justified.
  • Figures: Display as smallish thumbnail size images and they can’t be resized.
  • Fonts: The font displayed is a serif font, even though the Word document used Calibri (sans serif) for the base font and Cambria (serif) for the headings. Heading and caption colors and sizes preserved. Paragraph indent (none) preserved and paragraph leading preserved too.

Potential issues with Word documents when converted to ePub and viewed on Aldiko

Some of these potential issues could be corrected with adjustments to the CSS; however, for these initial tests, I didn’t do that as I wanted to see how easily it was to create eBooks ‘out of the box’.

  • Heading fonts: Not preserved. These were Cambria, blue in the original Word document, but became black (unknown serif font) in the ePub version.
  • Caption fonts for figures and tables: Became normal text. Some ended up with a hyperlink (as did the main Heading 1) and I have no idea why.
  • Figures: Display as smallish thumbnail size images and they can’t be resized.
  • Justification: The text is fully justified, even though the Word document was left justified.
  • Paragraphs: First lines were indented slightly, even though they weren’t in the original Word document. Paragraph leading not preserved.
  • Tables: Go off the ‘page’. Reducing the displayed text size helped, but you wouldn’t want to have extensive tables in your original document. No colors displayed for heading rows, banded rows, borders etc — all white background and black borders.

Tools I used

See also:

[Links last checked January 2012]


Fonts: Reading PDFs and ebooks on screen

April 13, 2009

When I flew to the US recently for the WritersUA Conference in Seattle, I spent a LOT of time on planes, and hanging around in airports and hotel rooms. As I don’t sleep on a fight — even a 14 hour one — I need something to occupy my time. I’m not a big movie fan so I usually only watch one or two of the 40+ typically on offer, and maybe a TV episode or two. That takes care of about 6 hours. Likewise, I rarely read much fiction these days and rarely buy physical books or borrow them from the library any more.

Prior to leaving, I created a folder of all those PDFs I’ve been meaning to read and podcasts I’ve been meaning to listen to. On this latest trip, I mostly listened to podcasts while driving to/from the airport (it’s a 3+ hour drive, so I can usually knock off a few!). On the plane and when I didn’t have internet access, I read some of the PDFs.

And this is what I found: serif fonts are much harder for me to read on screen than sans serif fonts.

I was zooming the PDF text up to 125% or 150% so that I could sit back comfortably to read it on my laptop. Even at that zoom factor, the thin stokes of fonts like those in the Times and similar font families got lost against the white background. One factor for this lack of readability could have been that I had the brightness on my laptop turned down a lot so that the light from the screen didn’t disturb those around me too much. But it wasn’t just the thin strokes — the serifs also affected the readability as they added ‘noise’ to the text.

This is by no way a scientific study ;-), just a personal observation after reading some 500+ PDF pages on screen over many hours.

I guess the take-away from this is to consider how your reader will be reading the PDF or ebook you create. Do you expect them to print it out or read it online? If online, have you considered how the brightness settings and zoom factors may affect readability of the font you have chosen? You may not know or be able to find out the answers to these questions, but consider them when you are creating a PDF or ebook.

[Link last checked April 2009]