Occasional Notes

Posted : admin On 7/14/2022

We’re using Apache POI to manipulate the content of some Word documents. There are other ways to do it, but, on the whole, Apache POI works reasonably well for a nominally free solution. We’ve hit a use case that can be summarised by a simple question: does this Word document contain a (Word-generated) table of contents (TOC)? You would think that that is a reasonably uncontroversial question, perhaps even one commonly asked. Apparently it is not.

Occasional Notes is a team of musicians - strings, woodwids, harp, guitar, piano and vocalists who will provide custom crafted and professionally performed music for your entire wedding day - ceremony, cocktail, dinner, and dance. Whatever style of.

Listen to Occasional Notes SoundCloud is an audio platform that lets you listen to what you love and share the sounds you create. Stream Tracks and Playlists from Occasional Notes on your desktop or mobile device. Occasional notes: Doctor did Atkins Article (PDF Available) in BMJ Clinical Research 329(7476) November 2004 with 20 Reads How we measure 'reads'. Occasional Notes Productions Calgary's award winning music and entertainment production company. Our video-on-demand production of Candy Cane Kids is now LIVE!

Background

Occasional Notes Meaning

The background here is that I know nothing about TOC generation in Word beyond what I’ve been able to deduce from examining Word’s behaviour and trawling the content of word/document.xml. I gather that Word inserts a processing instruction of some kind, but also renders static content into the file—that is, there’s a marker saying “there is a TOC in this document”, but the TOC content itself is also rendered. It seems that instead of dynamically generating the TOC content (say, every time the document is changed), Word instead generates it once, and then it is only updated on a manual re-generation. So the problem we’re facing is:

Occasional Notes Definition

Occasional Notes
  • A document has a TOC.
  • We make changes to the body content: say, removing an entire section.
  • The TOC is now stale, and instead of automatically refreshing it, Word inserts error messages at print time.

Occasional Notes Calgary

A basically satisfactory workaround in our case is to call enforceUpdateFields() on the document prior to save, which signals to Word to show a dialog on next load:

Again, this isn’t ideal, but it is satisfactory.

Solution

Apache POI doesn’t expose anything useful in its high-level API for detecting an existing TOC. After an exhaustive Google search, and quite a bit of digging around in the lower-level class hierarchies, it wasn’t obvious that we could solve this at any level using Java alone.

Inspecting word/document.xml suggested that a processing instruction that looked something like this was present in all documents containing TOCs:

<w:instrText xml:space='preserve'> TOC o '1-3' h z u </w:instrText>

How about if we get the XML for the document and search for such an element? If we call getDocument() on the XWPFDocument, we get a CTDocument1 which implements XmlObject and provides a selectPath() method to select nodes via an XPath expression. (If you’re curious, it took a couple of hours of trial and error to be able to come up with the facts in the preceding sentence!) Firstly, add XMLBeans and Saxon to your POM:

(Again, that excerpt represents an hour of fun trying to assemble mutually compatible versions of POI, XMLBeans and Saxon, as well as answering the question “Do we also need xmlbeans-xpath?” Spoiler: we don’t.) Then, with an XWPFDocument called document, find any w:instrText elements, where w is a namespace which we’ll also define, and see if any of them contain a magic string:

So, it’s brute force and depends on a magic string, but it seems to work. Better solutions gladly accepted!