Archive for August, 2012

h1

Word: Carriage returns, not paragraph returns

August 28, 2012

One of my readers emailed me with a problem they couldn’t figure out. Here’s a summary of the problem:

I’ve been working on a macro to clean up a document converted from PDF. A lot of these conversions place a pilcrow at the end of each line, I guess because Word isn’t sure what else to do with a line break in a PDF.

Part of the macro runs a find and replace that deletes most of the pilcrows but retains them where it recognizes a paragraph. What’s important here, and what I hope you can help with, is that when it makes a new pilcrow (I’m using ^13 in the find & replace), it treats it like a carriage return. The symbol on the screen is definitely a pilcrow—the same, filled-in black pilcrow I’m used to seeing at the end of a paragraph. I went into the xml of the document, and sure enough there was <cr> instead of <p>.

The paragraphs are all set to 12pts after, but none of them are actually displaying that space. The printed version is the same. It’s not a display issue. It’s just that Word is accepting the command to insert a pilcrow, inserting it, but treating it like a carriage return.

I found a solution, which is to search the document for pilcrows, select them, then type a paragraph. But I’d really like to understand what’s happening (and why) with that find and replace. Why is manually typing a paragraph working while find & replace is not?

My first step in trying to solve the problem was to create a new Word document, add a couple of soft line returns and some normal paragraph returns, then do a find/replace on the soft line returns (^l — that’s a lower case ‘L’) replacing them with ^13. Sure enough, the pilcrow looks the same, but the paragraph formatting (paragraph above/below spacing mostly) wasn’t quite right. When I saved the doc as XML, I found ‘cr’ instead of ‘p’ where I’d done this find/replace, just as my worried reader had found.

I know that ^13 is often required when you’re doing a find/replace with wildcards as ^p doesn’t work (or at least, it doesn’t work under most circumstances).

I did the test again, this time replacing ^l (soft line return) with ^p (hard line return), and the paragraph formatting changed to what it should be for each paragraph.

I did a final test. I added in some soft line returns, replaced ^l with ^13, then to see if ^p would work, I replaced ^13 with ^p and it worked! Those pesky carriage returns created from the ^l to ^13 replace changed to hard line returns when I replaced them with ^p.

Now I’m not sure how all this would work in the macro my reader had created, but at least they now have something they know will work.

All this testing was just in the normal Find/Replace dialog — no special settings at all.

Interestingly, Wikipedia says that ‘In ASCII and Unicode, the carriage return is defined as 13′ (http://en.wikipedia.org/wiki/Carriage_return) but says nothing about the code for paragraphs/hard line returns.

More on line feeds, carriage returns, end of paragraph marks etc.:

[Link last checked December 2020]

h1

More about creating eBooks from Word documents

August 27, 2012

At the end of June I spent a few hours investigating what cheap/free new tools/processes were on the market for converting complex Word documents into eBooks for reading on tablets, smartphones etc. My last foray into this area was back in February 2012.

The reason I did this was to try to stay a little ahead of the game with my main client, who has massive amounts of documentation in Word and PDF and who may well ask me about eBook options in the future. I also wanted to see for myself whether the claims of some tools were as good as they seemed to be. And I wanted to test the different tools on one of my client’s documents, not on some generic (simple) Word document created purely for testing purposes.

Bottom line: As at the date of this blog post, I believe that PDF is the best option if you have complex Word documents that you want to be read on a small form factor device. Why? PDFs will resize on phones, they preserve Word’s formatting, and PDF is ubiquitous in that you can create a PDF directly from Word 2007 and later without needing to have Adobe Acrobat. They are also incredibly quick and easy to create and require no specialized knowledge to get a decent result.

So how did I reach this bottom line?

First, I’ll discuss some of the features of the Word document that I used as my test document. Then I’ll talk about some of the tools I tested (and didn’t test — and why) and the results I achieved. Along the way I’ll talk about the time taken and the knowledge required to use the tools, which are very important factors to consider when you have a large document set to convert.

My complex test document

The Word documents I work on for my main client are complex. For example, each document can have any or all of these:

  • several hundred pages long
  • up to 100 section breaks for portrait/landscape and A3 page orientations and sizes, and they have odd/even pages for each section
  • headers and footers, which are different for odd/even pages
  • many tables, some of which can be quite complex with merged cells, sideways text, landscape orientation, small fonts, etc.
  • many figures (photos, charts, diagrams, maps, including landscape and A3 size figures)
  • track changes are turned on from one revision to the next
  • multiple authors or reviewers who add their track changes and comments
  • hundreds of comments from reviewers
  • document automation (fields and bookmarks), which populates the headers and footers and some of the front matter (four pages of front matter is common)
  • one or more appendices
  • outline numbering, which is used for all headings including appendices; typically three or four heading levels are used, though five is not unheard of
  • macros that enable the automation of some regular tasks
  • potentially hundreds of automated captions (figures and tables) and cross-references (figures, tables, appendices, and sections)
  • footnotes
  • many bulleted lists, typically to two levels, though sometimes three
  • three-level table of contents (which also includes the first Appendix heading level), list of tables, and list of figures
  • terminology, abbreviations list
  • in-text citations and a references list  (bibliography) for those cited works
  • and so on…

So these are not your typical Word documents that many tools are designed to convert (i.e. novels, short documents, documents with just a few headings etc.).

My earlier testing on these sorts of Word documents showed that the areas that the tools have the most trouble with are:

  • bulleted lists
  • tables
  • fonts
  • figures
  • justification and paragraph indents

Tools tested — and not tested

I wanted to test readily available and cheap/free conversion tools only. These tools needed to be simple enough for an author or admin assistant to use and seamless enough to not require specialist technical knowledge of HTML, CSS, XML, etc. They also needed to be quick to use and not require hours of pre- or post-tweaking of the input or output documents.

I did not want to test tools that are the domain of specialist technical writers — ordinary staff need to be able to use these conversion tools and whether I like it or not, Word is still my client’s authoring tool. Putting these Word documents into another tool is NOT an option — the authors must be able to update the documents easily and the authors all know how to use Word (with varying degrees of competency…). The prime job of these authors is to be health, safety, and environmental experts, NOT writers. Report writing is just one of their functions.

Therefore, I didn’t consider testing specialist tools such as ComponentOne’s Doc-to-Help (I believe they have an EPUB output option from Word), Adobe’s RoboHelp (also has EPUB and Kindle publishing options but I suspect only from content that’s in RoboHelp, not from Word), Adobe’s InDesign (again, content has to be imported/fixed up InDesign first), and Madcap’s Flare (like RoboHelp, it has an EPUB publishing output option, but the content must first be in Flare, not Word).

I’d already tested Sigil and Calibre back in February, and used those tools (particularly Calibre) in my recent testing. The tools I tested this time were ePubMaker (Epingsoft; US$59) and epub-builder (Wondershare MePub; US$39.95; no longer available). I had intended testing another one I’d heard about earlier in the year on one of my email discussion lists, but it was no longer available on their web page.

Test results

The results of my testing these two products were a mixed bag, and NONE of the outputs was suitable for immediate use. Fonts and indentations were all over the place, headers/footers, footnotes, outline numbering, automated captions/cross-references etc. either weren’t preserved or didn’t work consistently, tables were awful, and the documents could not be resized on my smartphone.

I uninstalled both products after the tests as neither was able to deal with my test document satisfactorily.

In both cases either a lot of tweaking to the Word document was required before publishing it to EPUB (e.g. in Word or through a clean-up tool like Word Cleaner), or after (e.g. via Calibre). And even then, the results left a lot to be desired.

Neither tool offered a seamless conversion from one format to the other. For converting one or a few documents, these tools may offer a good solution for some people, but I could never consider them in their present form for a task such as converting complex documents (and possibly thousands of these documents). Like other converters I’ve tested, if you have a simple Word document, then these sorts of converters have their place and may suit your purpose.

However, for a large-scale conversion, the amount of pre- or post-tweaking to get the EPUB result to come even close to looking like the source Word document is just not feasible. In my client’s case, the Word document must be preserved and both the initial and any future conversion needs to be quick and easy to do as updates are made to these documents regularly.

The light at the end of the tunnel…

After being quite disillusioned by the conversion options I tried, I decided on one final option — to PDF the test Word document using both Acrobat and the PDF converter in Word 2010 and put the resulting PDFs on my phone. I didn’t fiddle with the PDF settings in Acrobat. Guess what? Both PDFs worked really well. Fonts, indentations, justifications, bullets, tables, figures, headers/footers, footnotes, section breaks for different sized and oriented pages etc. were ALL preserved and just looked like they did in the Word document. And once the PDF was on my phone, I could use my fingers to ‘pinch’ it to zoom in and out on the document and everything resized accordingly.

The best thing about PDF? The authors and the admin assistants can all use it as it’s just an output option in Word. It’s a no brainer to create and takes just seconds (well, perhaps a minute or two for the really long docs). It’s quick, easy, and preserves the look of the document. It doesn’t require knowledge of all the various parts of an EPUB document, XML, CSS, HTML etc. and it doesn’t require hours of tweaking to convert just one document, only to do it all over again when the doc has to be updated.

So the bottom line for my client, in the absence of totally changing the way they create documents, would be that PDF is the best option at this stage.

Of course, that may change in the future as the industry matures and as tools become more widely available and sophisticated and require less technical knowledge on the part of the person doing the conversion (just like WordPress and the like have negated the need for the average person to know HTML to create a website).

But for now, if they ask me, I’ll be recommending PDF.

h1

Word: Styles pane completely blank

August 16, 2012

I got a document from one of my authors yesterday. It was a mess in all sorts of ways. I applied the correct template and set the styles to update automatically (then turned that off again almost immediately). But no matter what I did, I couldn’t see any styles listed in the Styles pane (Word 2007), or in the drop-down box of styles I have on the Quick Access Toolbar. The styles showed in the big icons in the Styles group on the Home tab and I could apply, modify them etc. from there. But I like using the Styles pane. I was baffled.

When I opened the document this morning, the Styles pane was still blank. I clicked the Options link at the bottom of the blank Styles pane, but that just said that All Styles were set to show, despite none showing at all.

I tried saving the doc as an RTF doc, then resaving it back to a DOCX doc. That didn’t work. So I went back to that Options link.

This time I selected Recommended as the styles to show, and suddenly the styles all displayed as they should.

I have no idea why none of the styles would show when All Styles was selected, but changing it to Recommended fixed the problem for me.

Strange.

h1

Word: Blank pages when document is printed

August 14, 2012

If you have a long document with many section breaks (especially for odd/even pages and landscape/portrait orientations) it will look fine on screen and as a PDF. But when you print that document onto paper, you may find that you end up with several blank pages (no headers/footers — just blank)  scattered throughout the printout. Why?

Well, the answer is that this is ‘by design’. For example:

  • If you set a section break to start on an odd-numbered page AND the text in the previous section finished on an odd page, then the even page between those two odd pages will be blank and the two odd pages will be numbered as odd numbers (the blank even page won’t get a page number printed on it, but it’s still counted as a page). This is fine and how you expect it to be if you’re printing double-sided but it seems strange to find blank pages scattered throughout when you print on just one side of the paper. (BTW, starting a new chapter or major section on an odd page is a print publishing convention that has likely been around for centuries. Check some printed books on your shelves to see the evidence of this convention — you’ll typically find every new chapter starting on an odd [recto/right-hand] page.)
  • If you have a landscape section in amongst your portrait pages, and there’s only enough content to fit on a single landscape page, then the back of that page will print as a blank page. Again, if you’re printing double-sided, this is what you want to happen, but it’s disconcerting when you’re printing single-sided. A portrait page will NOT print on a landscape page if they are in two sections with different page orientations.

This convention has been part of Word since forever, and is a carry-over from the book printing industry.

You won’t see these ‘blanks’ on screen when you’re viewing the Word document, or in a PDF created from that Word document.

Here’s some further reading/explanation for this blank page:

  • http://answers.microsoft.com/en-us/office/forum/office_2007-word/how-can-i-remove-the-blank-page-after-section/a705c115-7cc8-4172-89cc-41cef30fad35: “If you are using Odd/Even pages, Word will always start a new section on the right hand (odd) page which is convention.” (Sept 2, 2010 comment from Terry Farrell, Microsoft Word MVP)
  • http://windowssecrets.com/forums/showthread.php/140608-Word-2007-extra-blank-pages-New-page-Section-break-with-Odd-Even-pages: “That’s normal and always have been so. I wouldn’t want it any other way either. Odd pages always start on the right. Once you select Different Odd/Even H&Fs in Page Layout, Odd will always be on the right and Word will always slip in a blank even page when necessary.” (comment by TerFar (Terry Farrell?), 29 August 2011)
  • http://support.microsoft.com/kb/85392: “In Microsoft Word, if the insertion point is positioned on an odd page and you insert an odd-page section break, a blank page will be generated after the section break…. This behavior occurs because Word for Windows cannot position two odd or two even pages in a row; therefore, it inserts an even or odd page between the pages. NOTE: These blank pages do not contain headers or footers. … If you insert an odd-page or even-page section break when the insertion point is positioned on an odd/even page, respectively, the resulting blank page will not contain a header or footer. Page number information about the blank page will be displayed in the status bar, and the page will appear as a blank page in print preview; however, you cannot position the insertion point on the blank page in the document.”
  • http://wordfaqs.mvps.org/blankpage.htm: “A Continuous section break does not cause a page break. A Next Page break causes the following text to start a new page. An Odd Page break causes the following text to start a new odd page. If the text before the break ends on an odd page, Word will insert a blank even page between the two odd pages. This page is completely invisible to the user (except in Print Preview with facing pages displayed) but will be “printed” by the printer. Similarly, an Even Page break may cause Word to insert a blank odd page.  … Sometimes a Next Page break will be converted to an Odd Page break. This frequently happens when a landscape page appears on the back of a portrait one, or vice versa. The reason for this is that most printers really don’t like to print landscape; rotating text and graphics is apparently a more complex operation for them. It seems to be especially difficult for them to duplex (print both sides of) pages with different orientations. Word accommodates this reluctance by changing Next Page breaks to Odd Page so the printer can print the pages on separate sheets. “
  • http://word.tips.net/T001870_Automatic_Blank_Pages_at_the_End_of_a_Section.html: “Two of the section break types result in the addition of blank pages to the document, if necessary. For instance, if you use an Odd Page section break, and the previous section ends on an odd page, then Word automatically inserts a blank even page so that the next section can start on the next odd page. … The problem with this is that Word inserts an absolutely blank page—it doesn’t even print headers or footers on the page.” (Susan Barnhill, Microsoft Word MVP)

[Links last checked August 2012]

h1

Word: Updating fields takes forever

August 10, 2012

Here’s a curly one. I had a client’s Word 2007 document to work on and the fields took FOREVER to update. In fact, I wasn’t even sure if they *did* update as I killed Word after an hour and a half of watching the ‘spinning wheel of death’… The author had had the same issue — it took more than an hour for the fields to update for him too. Neither of us had any error messages.

I tried a few things (listed below), and eventually I found the culprit — it was a linked Excel chart that couldn’t be found. Perhaps it had been deleted from the server, or moved, or renamed — any of these meant that it wasn’t where the link said it should be. Therefore, when Word tried to update the fields, it went looking for something that didn’t exist and got itself into an infinite loop of looking and looking.

Once I’d taken a screen capture of that chart (with the author’s permission) and inserted that into the document instead, the fields updated in just a few minutes, as they normally do for these client documents.

Here are some of the things I tried to eventually track down the offending object:

  • Turned on all field codes then skipped to each one to see if there was one that looked out of place. Nope — they all looked fine.
  • Right-clicked on each figure to see if it was a linked Visio diagram or Excel chart instead of an image. Again, all looked fine (including the nasty Excel chart — there was nothing to alert me that it was linked!).

Here’s how I found the culprit:

I copied the document to my PC with Word 2010 on it to see if that made any difference. Well, that was what alerted me to the offending object. When I tried to update the fields in Word 2010, I immediately got a message that a link couldn’t be found. I did NOT get that message in Word 2007.

This was an *excellent* error message — it told me the type of object that it couldn’t find (‘Chart’) and it told me where to go to see the links. Thanks Microsoft! (although it would be an even better message if there was a button to take you straight to that location…)

I went to that location (File > Info > Edit Links to Files at the bottom of the far right panel in Word 2010; it’s harder to find in Word 2007 — it’s under the Microsoft office button > Prepare > Edit Links to Files).

Scrolling through the list of links in the Links window showed one that wasn’t like all the rest. The info in the Source File column was different, there was nothing in the Item column, and the Type was ‘Chart’. It came after all the _Ref links (internal cross-references in the document) and before the bookmarks/fields that we use for the document properties, which meant it was very likely at the very end of the document (it was the only chart in the final appendix). And when I clicked on the suspect item in the Links window, the full source file information showed that it was an Excel file located on one of my client’s servers. I have access to that server, so the only explanation I can come up with is that the file had been moved, renamed, or deleted since it was originally inserted in the Word document.

I removed the chart from the appendix and ran the update fields again in Word 2010 — this time it was done in seconds and with no errors.

Back in Word 2007, I replaced the chart with an image of it and updated the fields again. This time it took a few minutes (which is usual for these documents as they have hundreds of fields), not well over an hour.

Problem solved! And the author was very happy too ;-)

h1

Exchange Server sync with phone not working

August 6, 2012

A week or so ago I upgraded the operating system for my Android phone. Everything seemed to be working fine. A few days later I drove to the city to my client’s offices. I turned off WiFi as I left home and turned on Bluetooth. When I got to the offices, I couldn’t get my mail via Exchange Server on the 3G connection. I didn’t worry too much about it as I had meetings to attend and a lot of running around to do, but when I got home I turned off Bluetooth, and turned the WiFi back on. I still couldn’t get my mail via Exchange Server. I could get internet access without any issue, but I continued to get ‘Connection socket timeout’ errors whenever I tried to sync my mail, contacts or calendar from Exchange Server.

I started to point the blame at the operating system upgrade and even called HTC. They said to restart the phone (I’d already tried that), then to power off fully and power back on (I’d tried that too), then remove the battery for about 5 minutes (yep, I had already tried that too). Their next suggestion was to delete my Exchange Server account on the phone (after writing down all the settings!) and re-enter the account. If that didn’t work, they suggested that I turn off the router/WiFi and then turn it on again, and if that failed, then I was to call them back to do a factory reset (in which case I’d lose ALL my settings, apps etc.).

I was loathe to delete the Exchange Server settings off my phone — let alone do a factory reset — as it had worked fine a few days before (I was still blaming the OS upgrade), so I called my saviours at PC Guru to see if there was anything I had to set/reset on the server before I deleted the account and tried reconnecting. After a short discussion with me, my Guru asked if I could remote in from any computer to my server (e.g. via Outlook Web Access). I was working within my network, so I couldn’t test that.  So he tried to access my Outlook Web Access from his PC but couldn’t get a connection. He couldn’t get any of my other remote working options to work either, which would also explain why my phone couldn’t connect to my Exchange Server.  He thought a service on the server might need to be restarted, so he remoted in to my server and restarted the IIS web service.

Once he’d done that, he could see the login screen for my remote connection options, so he called me back and asked me to try syncing with Exchange Server again. It worked!

I had been unfairly blaming the Android upgrade, when it was purely coincidental that it happened about the same time the IIS service went down. Actually, the IIS service might have stopped some time before that as I usually use the internal WiFi to connect via my phone and rarely check from outside (I work from home).

Lessons learned:

  • Sometimes events that you think are related are purely coincidental.
  • I found out that there’s a web service (IIS) that needs to be restarted before I consider ever deleting my Exchange Server account on my phone and setting it up again.
  • PC Guru are awesome! Again!

[Links last checked August 2012]

h1

Milestone! Two million views

August 3, 2012

Sometime between 5pm 2 August 2012 (my time) and 7am 3 August 2012, this blog hit two million views!

In September 2011, this blog cracked the one million views mark since I started blogging in 2008. As I said when the total views passed half a million sometime in 2010, I can’t even comprehend that number.

So to have a further one million views in less than 12 months is mind-boggling! I must be doing something right ;-)

My weekday average number of views is between 4000 and 5000, and my monthly average is more than 100,000 views. I’ll do a full analysis at the end of this year, as I have done in previous years (see below for links).

However, despite the number of views and the overwhelmingly positive comments (‘OMG! You saved my LIFE!’), I make no more than ‘pin money’ from the donate button at the end of many posts (https://cybertext.wordpress.com/2011/05/11/if-only/) and I pay WordPress to NOT have random ads on this blog (my choice).

I mainly write this blog for me — I can’t possibly remember all the things I learn, so when I write instructions they are primarily to backup my brain ;-) Think of this blog as a dumping ground for all those random bits and pieces I learn each day. And yes, I definitely use my own blog to ‘remember’ how to do something I may have only done once or twice before.

The fact that others (all two million others!) find it useful too, is humbling.

See also:

[Links last checked August 2012]