HTML5 And The Document Outlining Algorithm

About The Author

Derek builds websites at WebsiteNI and helps curate the HTML5 Gallery, where he gets particularly hung up about document outlines. His favourite place in the … More about Derek ↬

Email Newsletter

Weekly tips on front-end & UX.
Trusted by 200,000+ folks.

One important part of HTML5 that is still not widely understood is sectioning content, and to understand that, we need to grasp the document outlining algorithm. And that can be a challenge… but the rewards are well worth it.

By now, we all know that we should be using HTML5 to build websites. The discussion now is moving on to how to use HTML5 correctly. One important part of HTML5 that is still not widely understood is sectioning content: section, article, aside and nav. To understand sectioning content, we need to grasp the document outlining algorithm.

Screenshot

Understanding the document outlining algorithm can be a challenge, but the rewards are well worth it. No longer will you agonize over whether to use a section or div element — you will know straight away. Moreover, you will know why these elements are used, and this knowledge of semantics is the biggest benefit of learning how the algorithm works.

What Is The Document Outlining Algorithm?

The document outlining algorithm is a mechanism for producing outline summaries of Web pages based on how they are marked up. Every Web page has an outline, and checking it is easy using a really simple free online tool, which we’ll cover shortly.

So, let’s start with a sample outline. Imagine you have built a website for a horse breeder, and he wants a page to advertise horses that he is selling. The structure of the page might look something like this:


1. Horses for sale 1. Mares 1. Pink Diva 2. Ring a Rosies 3. Chelsea’s Fancy 2. Stallions 1. Korah’s Fury 2. Sea Pioneer 3. Brown Biscuit
Figure 1: How a page about horses for sale might be structured.

That’s all it is: a nice, clean, easy-to-follow list of headings, displayed in a hierarchy — much like a table of contents.

To make things even simpler, only two things in your mark-up affect the outline of a Web page:

Obviously, the sectioning of content is the new HTML5 way to create outlines. But before we get into that, let’s go back to HTML 101 and review how we should all be using headings.

Creating Outlines With Heading Content

To create a structure for the horses page outlined in figure 1, we could use mark-up like the following:

<div>             
   <h1>Horses for sale</h1>

   <h2>Mares</h2>

   <h3>Pink Diva</h3>
   <p>Pink Diva has given birth to three Grand National winners.</p>

   <h3>Ring a Rosies</h3>
   <p>Ring a Rosies has won the Derby three times.</p>

   <h3>Chelsea’s Fancy</h3>
   <p>Chelsea’s Fancy has given birth to three Gold Cup winners.</p>

   <h2>Stallions</h2>

   <h3>Korah’s Fury</h3>
   <p>Korah’s Fury has fathered three champion race horses.</p>

   <h3>Sea Pioneer</h3>
   <p>Sea Pioneer has won The Oaks three times.</p>

   <h3>Brown Biscuit</h3>
   <p>Brown Biscuit has fathered nothing of any note.</p>

   <p>All our horses come with full paperwork and a family tree.</p>
</div>
Figure 2: Our “Horses for sale” page, marked up using headings.

It’s as simple as that. The outline in figure 1 is created by the levels of the headings.

Just so you know that I’m not making this up, you should copy and paste the code above into Geoffrey Sneddon’s excellent outlining tool. Click the big “Outline this” button, et voila!

An outline created with heading content this way is said to consist of implicit, or implied, sections. Each heading creates its own implicit section, and any subsequent heading of a lower level starts another layer, of implicit sub-section, within it.

An implicit section is ended by a heading of the same level or higher. In our example, the “Mares” section is ended by the beginning of the “Stallions” section, and each section that contains details of an individual horse is ended by the beginning of the next one.

Figure 3 below is an example of an implicit section that ends with a heading of the same level. And figure 4 is an implicit section that ends with a heading of a higher level.

<h3>Sea Pioneer</h3><!-- start of implicit section -->
<p>Sea Pioneer has won The Oaks three times.</p>

<h3>Brown Biscuit</h3><!-- This heading starts a new implicit section,
so the previous Sea Pioneer section is closed -->
Figure 3: An implicit section being closed by a heading of the same level
<h3>Chelsea’s Fancy</h3><!-- start of implicit section -->
<p>Chelsea’s Fancy has given birth to 3 Gold Cup winners.</p>

<h2>Stallions</h2><!-- this heading starts a new implicit section
using a higher level heading, so Chelsea’s Fancy is now closed -->
Figure 4: An implicit section being closed by a heading of a higher level.

Creating Outlines With Sectioning Content

Now that we know how heading content works in creating an outline, let’s mark up our horses page using some new HTML5 structural elements:

<div>
   <h6>Horses for sale</h6>

   <section>
      <h1>Mares</h1>

      <article>
         <h1>Pink Diva</h1>
         <p>Pink Diva has given birth to three Grand National winners.</p>
      </article>

      <article>
         <h5>Ring a Rosies</h5>
         <p>Ring a Rosies has won the Derby three times.</p>
      </article>

      <article>
         <h2>Chelsea’s Fancy</h2>
         <p>Chelsea’s Fancy has given birth to three Gold Cup winners.</p>
      </article>
   </section>

   <section>
      <h6>Stallions</h6>

      <article>
         <h3>Korah’s Fury</h3>
         <p>Korah’s Fury has fathered three champion race horses.</p>
      </article>

      <article>
         <h3>Sea Pioneer</h3>
         <p>Sea Pioneer has won The Oaks three times.</p>
      </article>

      <article>
         <h1>Brown Biscuit</h1>
         <p>Brown Biscuit has fathered nothing of any note.</p>
      </article>          
   </section>

   <p>All our horses come with full paperwork and a family tree.</p>
</div>
Figure 5: The horses page, marked up with some new HTML5 structural elements.

Now, I know what you’re thinking, but I haven’t taken leave of my senses with these crazy headings. I am making a very important point, which is that the outline is created by the sectioning content, not the headings.

Go ahead and copy and paste that code into the outliner, and you will see that the heading levels have absolutely no effect on the outline where sectioning content is used.

The section, article, aside and nav elements are what create the outline, and this time the sections are called explicit sections.

One of the most talked about features of HTML5 is that multiple h1 elements are allowed, and this is why. It’s not an open invitation to mark up every heading on the page as h1; rather, it’s an acknowledgement that where sectioning content is used, it creates the outline, and that each explicit section has its own heading structure.

The part of the HTML5 spec that deals with headings and sections makes this clear:

“Sections may contain headings of any rank, but authors are strongly encouraged to either use only h1 elements, or to use elements of the appropriate rank for the section’s nesting level.”

I would strongly advise that until browsers — and, more critically, screen readers — understand that sectioning content introduces a sub-section, using multiple h1 elements is less safe than using a heading structure that reflects the level of each heading in the document, as shown in figure 6 below.

This means that user agents that haven’t implemented the outlining algorithm can use implicit sectioning, and those that have implemented it can effectively ignore the heading levels and use sectioning content to create the outline.

At the time of this writing, no browsers or screen readers have implemented the outlining algorithm, which is why we need third-party testing tools such as the outliner. The latest versions of Chrome and Firefox style h1 elements in nested sections differently, but that is very different from actually implementing the algorithm.

When most user agents finally do support it, using an h1 in every explicit section will be the preferred option. It will allow syndication tools to handle articles without needing to reformat any heading levels in the original content.

<div>
      <h1>Horses for sale</h1>

      <section>
         <h2>Mares</h2>

         <article>
            <h3>Pink Diva</h3>
            <p>Pink Diva has given birth to three Grand National winners.</p>
         </article>

         <article>
            <h3>Ring a Rosies</h3>
            <p>Ring a Rosies has won the Derby three times.</p>
         </article>

         <article>
            <h3>Chelsea’s Fancy</h3>
            <p>Chelsea’s Fancy has given birth to three Gold Cup winners.</p>
         </article>
      </section>

      <section>
         <h2>Stallions</h2>

         <article>
            <h3>Korah’s Fury</h3>
            <p>Korah’s Fury has fathered three champion race horses.</p>
         </article>

         <article>
            <h3>Sea Pioneer</h3>
            <p>Sea Pioneer has won The Oaks three times.</p>
         </article>

         <article>
            <h3>Brown Biscuit</h3>
            <p>Brown Biscuit has fathered nothing of any note.</p>
         </article>           
      </section>

      <p>All our horses come with full paperwork and a family tree.</p>
   </div>
Figure 6: Our horses page, marked up sensibly.

One other point worth noting here is the position of the paragraph “All our horses come with full paperwork and a family tree.” In the example that used headings to create the outline (figure 2), this paragraph is part of the implicit section created by the “Brown Biscuit” heading. Human readers will clearly see that this text applies to the whole document, not just Brown Biscuit.

Sectioning content solves this problem quite easily, moving it back up to the top level, headed by “Horses for sale.”

Mixing It Up

So, what happens when implicit sections and explicit sections are combined? As long as you remember that implicit sections can go inside explicit sections, but not the other way round, you will be fine. For example, the following works well and is perfectly valid:

<h1>Horses for sale</h1>

   <section>
      <h2>Mares</h2>

      <h3>Pink Diva</h3>
      <p>Pink Diva has given birth to three Grand National winners.</p>

      <h3>Ring a Rosies</h3>
      <p>Ring a Rosies has won the Derby three times.</p>

      <h3>Chelsea’s Fancy</h3>
      <p>Chelsea’s Fancy has given birth to three Gold Cup winners.</p>
   </section>

And it creates a sensible hierarchical outline:

1. Horses for sale 1. Mares 1. Pink Diva 2. Ring a Rosies 3. Chelsea’s Fancy
Figure 7: Implicit sections created by headings inside an explicit section.

However, if you hope to achieve the same outline by nesting an explicit section inside an implicit section, it won’t work. The sectioning element will simply close the implicit section created by the heading and create a very different outline, as shown below:

<h1>Horses for sale</h1>

   <h2>Mares</h2>

   <article>
      <h3>Pink Diva</h3>
      <p>Pink Diva has given birth to three Grand National winners.</p>
   </article>

   <article>
      <h3>Ring a Rosies</h3>
      <p>Ring a Rosies has won the Derby three times.</p>
   </article>

   <article>
      <h3>Chelsea’s Fancy</h3>
      <p>Chelsea’s Fancy has given birth to three Gold Cup winners.</p>
   </article>

This would produce the following outline:

1. Horses for sale 1. Mares 2. Pink Diva 3. Ring a Rosies 4. Chelsea’s Fancy
Figure 8: Explicit sections can’t go inside implicit sections.

There is no way to make the explicit sections created by the article elements become sub-sections of the Mare’s implicit section.

You can use headings to split up the content of sectioning elements, but not the other way round.

Things To Watch Out For

Untitled Sections

Until now we haven’t really looked at nav and aside, but they work exactly the same as section and article. If you have secondary content that is generally related to your website — say, horse-training tips and industry news — you would mark it up as an aside, which creates an explicit section in the document outline. Similarly, major navigation would be marked up as nav, again creating an explicit section.

There is no requirement to use headings for aside and nav, so they can appear in the outline as untitled sections. Go ahead and try the following code in the outliner:

<nav>
      <ul>
         <li><a href="/">home</a></li>
         <li><a href="/about.html">about us</a></li>
         <li><a href="/horses.html">horses for sale</a></li>
       </ul>
   </nav>

   <h1>Horses for sale</h1>

   <section>
      <h2>Mares</h2>
   </section>

   <section>
      <h2>Stallions</h2>
   </section>
Figure 9: An untitled <nav>.

The nav appears as an untitled section. Now, this generally wouldn’t be a problem and is not considered bad HTML5 code, although in his recent HTML5 Doctor article on outlining, Mike Robinson recommends using headings for all sectioning content in order to increase accessibility.

Untitled section and article elements, on the other hand, are generally to be avoided. In fact, if you’re unsure whether to use a section or article, a good rule of thumb is to see whether the content has a natural, logical heading. If it doesn’t, then you will more than likely be wiser to use a good old div.

Now, the spec doesn’t actually require section elements to have a title. It says:

“The section element represents a generic section of a document or application. A section, in this context, is a thematic grouping of content, typically with a heading.”

Your interpretation of this probably hinges on your understanding of the word “typically.” I take it to mean that you need a damn good reason not to use headings with section elements. I do not take it to mean that you can ignore it whenever you feel the urge to use a new HTML5 element.

Where the article element is specified, the spec goes even further by showing an example of blog comments marked up as untitled articles, so there are exceptions. However, if you see an untitled section or article in the outline, make sure you have a good reason for not giving it a title.

If you are unsure whether your untitled section is a nav, aside, section or article, a very handy Opera extension will let you know which type of sectioning content you have left untitled. The tool will also let you view the outline without leaving the page, which can be hugely beneficial when you’re debugging sections.

Sectioning Root

The eagle-eyed among you will have noticed that when I said that sectioning content cannot create a sub-section of an implicit section, there was an h1 (“Horses for sale”) not in sectioning content immediately followed by a section (“Mares”), and that the sectioning content did actually create a sub-section of the h1.

The reason for this is sectioning root. As the spec says, sectioning elements create sub-sections of their nearest ancestor sectioning root or sectioning content.

Sectioning content elements are always considered subsections of their nearest ancestor sectioning root or their nearest ancestor element of sectioning content, whichever is nearest, regardless of what implied sections other headings may have created.”

The body element is sectioning root. So, if you paste the code from figure 7 into the outliner, the h1 would be the sectioning root heading, and the section element would be a sub-section of the body sectioning root.

The body element is not the only one that acts as sectioning root. There are five others:

\1. blockquote \2. details \3. fieldset \4. figure \5. td

The status of these elements as sectioning root has two implications. First, each can have its own outline. Secondly, the outline of nested sectioning root does not appear in, nor does it have an effect on, the outline of its parent sectioning root.

In practice, this means that headings inside any of the five sectioning root elements listed above do not affect the outline of the document that they are a part of.

The final thing (you’ll be glad to hear) that I’ll say about sectioning root is that the first heading in the document that is not inside sectioning content is considered to be the document title.

Try the following code in the outliner to see what happens:

<section>
   <h1>this is an h1</h1>
</section>

<h6>this h6 comes first in the source</h6>

<h1>this h1 comes last in the source</h1>
Figure 10: How heading levels at the root level affect the outline.

I won’t try to explain this to you because it will probably only confuse both of us, so I’ll let you play with it in the outliner. Hint: try using different heading levels for the implicit sections to see how the outline is affected; for example, h3 and h4, or two h5s.

Untitled Documents

If no heading is at the root level of the document (i.e. not inside sectioning content), then the document itself will be untitled. This is a pretty serious problem, and it can occur either through carelessness or, paradoxically, by thinking carefully about how sectioning content should be used.

Roger Johansson addresses this issue in his excellent article on document outlines and HTML5 and the follow-up article.

Johansson asks how a proper document outline is supposed to be created for a blog post or other news-type item using HTML5. If you subscribe to the belief that your logo or website name should not be in an h1 element, you could mark up your blog post along the lines of the following:

<body>
   <article>
      <h1>Blog post title</h1>

      <p>Blog post content</p>
   </article>
</body>

The document is untitled. Somewhat reluctantly, Johansson settles on marking up the website’s title in h1 and using another h1 to mark up the article’s title. This is a sensible solution and is backed up by the results of the WebAIM screenreader user survey, in which the majority of respondents stated a preference for two top-level headings in exactly this format.

This same approach is also widely used on static pages that are built with HTML5 structural elements, and it could be very useful indeed for screen reader users. Imagine that you are using a screen reader to find a decent recipe for chicken pie, and you have a handful of recipe websites open for comparison. Being able to quickly find out which website you are on using the shortcut key for headings would be much more useful than seeing only “chicken pie” on each one.

Not too far behind two top-level headings in the screen reader user survey was one top-level heading for the document. This is probably my preferred option in most cases; but as we have already seen, it creates an untitled body, which is undesirable.

In my opinion, there is an easy way around this problem: don’t use article as a wrapper for single-blog posts, news items or static page main content. Remember that article is sectioning content: it creates a sub-section of the document. But in these cases, the document is the content, and the content is the document. Setting aside the name of the element, why would we want to create a sub-section of a document before it has even begun?

Remember, you can still use div!

hgroup

This is the final item in the list of things to watch out for, and it’s very easy to understand. The hgroup element can contain only headings (h1 to h6), and its purpose is to remove all but the highest-level heading it contains from the outline.

It has been and continues to be the subject of controversy, and its inclusion in the specification is by no means a given. However, for now, it does exactly what it says on the tin: it groups headings into one, as far as the outlining algorithm is concerned.

In Conclusion

The logic behind the document outlining algorithm can be hard to grasp, and the spec can sometimes feel like physics: understandable as you’re reading it, but when you try to confirm your understanding, it dissolves and you find yourself re-reading it again and again.

But if you remember the basics — that section, article, aside and nav create sub-sections on Web pages — then you are 90% of the way there. Get used to marking up content with sectioning elements and to checking your pages in the outliner, because the more you practice creating well-outlined documents, the sooner you will grasp the algorithm.

I promise, you will have it cracked after only a handful of times, and you will never look back. And from that moment on, every Web page you create will be structured, semantic, robust, well-outlined content.

Other Resources

Further Reading

Smashing Editorial (al, mrn)