Friday, April 8, 2016

HTML5 Microdata

Microdata

INTRODUCTION

There are 3 ways to provide machine-readable content embedded in a classical Web document:HTML+RDFamicroformats and microdata. In this section, we will focus on microdata.
Adding microdata to Web pages helps search engines to better understand the pages' content, the topic they talk about, etc. The main interest for microdata is Search Engine Optimization.
This information is not visible by humans, it is pure semantic information. Popular kinds of microdata are events, a person's profile, the description of an organization, the details of a recipe, a product description, a geographical location, etc. 

QUICK EXAMPLE OF MICRODATA THAT DESCRIBES A PERSON:

  1. <section itemscope itemtype="http://schema.org/Person">
  2.     <h1>Contact Information</h1>
  3.     <dl>
  4.       <dt>Name</dt>
  5.       <dd itemprop="name">Michel Buffa</dd>
  6.       <dt>Position</dt>
  7.       <dd><span itemprop="jobTitle">
  8.            Professor/Researcher/Scientist</span> for
  9.           <span itemprop="affiliation">
  10.               University of Côte d'Azur, France
  11.           </span>
  12.       </dd>
  13.     </dl>
  14.     <!-- SURFACE ADDRESS GOES HERE -->
  15.     <h1>My different online public accounts</h1>
  16.     <ul>
  17.        <li><a href="http://www.twitter.com/micbuffa"
  18.               itemprop="url">Twitter profile</a></li>
  19.        <li><a href="http://www.blogger.com/micbuffa"
  20.               itemprop="url">Michel Buffa's blog</a></li>
  21.     </ul>
  22. </section>
We can also add  another embedded data item in the middle, such as the person's address:
  1. ...
  2. </dl>
  3. <!-- SURFACE ADDRESS GOES HERE -->
  4. <dd itemprop="address" itemscope
  5.     itemtype="http://schema.org/PostalAddress">
  6.     <span itemprop="streetAddress">10 promenade des anglais</span><br>
  7.     <span itemprop="addressLocality">Nice</span>,
  8.     <span itemprop="addressRegion">Alpes maritimes, France</span>
  9.     <span itemprop="postalCode">06410</span><br>
  10.     <span itemprop="addressCountry" itemscope
  11.           itemtype="http://schema.org/Country">
  12.          <span itemprop="name">France</span>
  13.     </span>
  14. </dd>
  15. <h1>My different online public accounts</h1>
  16. ...
We will look deeper into the details of the itempropitemscope and itemtype attributes in the next few sections.

DATA THAT CAN BE PROCESSED, ORGANIZED, STRUCTURED, OR PRESENTED IN A GIVEN CONTEXT

Different use cases:
    • The browser, or a browser extension, may interpret the last example as an address and may propose to send it to a map application,
    • A Web crawler may interpret this as an address and display it in its responses using a dedicated presentation layout,
    • Some JavaScript code in the page can access this data,
    • With other types of microdata, for events, for example, the browser may pop up a calendar application, etc.
Note: For advanced users, Microdata is very similar to microformats, which use HTML classes, or to RDFa, which doesn’t validate in HTML4 or HTML5. Because RDFa was considered to be too hard for authors to write (Google has conducted research that finds that authors make 30% more mistakes with RDFa than with other formats), microdata is HTML5's answer to the need to embed semantics into html documents.

Testing tools

GOOGLE RICH SNIPPETS AND STRUCTURED DATA TEST TOOL

One of the most popular resource for testing microdata (as well as microformats and RDFa) is the Google page about rich snippets and structured data. This page contains a link to a structured data testing tool that you can use to see how Google recognizes the semantic data you embed in your HTML code.

Testing a real interactive example with an "about page" for Michel Buffa

Let's have a look now at a (small) example of an about page. It renders as a very simple paragraph that explains who Michel Buffa is... But we embedded Microdata, so it's interesting to see how a search engine sees it, and how it may produce "augmented search results".
Source code:
  1. <!DOCTYPE html>
  2. <html>
  3. <head>
  4. <meta charset=utf-8 />
  5. <title>Michel Buffa</title>
  6. </head>
  7. <body>
  8. <div itemscope itemtype="http://schema.org/Person">
  9.     My name is <span itemprop="name">Michel Buffa</span>,
  10.     And I'm a <span itemprop="jobTitle">professor/researcher</span> at
  11.      <a href="http://www.i3s.unice.fr/I3S/" itemprop="affiliation">I3S
  12.     Laboratory</a> in the south of France, near the city of Nice. My
  13.     email
  14.     is : <span itemprop="email">micbuffa@gmail.com</span>.
  15.     I live in the city of
  16.     <span itemprop="address" itemscope
  17.         itemtype="http://schema.org/PostalAddress">
  18.          <span itemprop="addressLocality">Biot</span>, in a region named
  19.          <span itemprop="addressRegion">Alpes Maritimes</span>
  20.     </span>
  21. </div>
  22. </body>
  23. </html>
Rendering of the page in a browser:
Rendering of Michel Buffa home page
Here is what Google sees of the page, we just entered its URL (http://jsbin.com/uquboc/144) in the  the Google page about rich snippets and structured data:
Microdata of the example, as seen y Google
Notice that the address is a fully featured embedded object in the Person's description.

LIVE MICRODATA

The Live Microdata Web site  is a bit similar to the previous one except that it shows the extracted metadata as JSON objects: 
example of live microdata from the previous example. Microdata are displayed as json objects
And the JSON view of the microdata:
JSON view of the microdata

Implementing microdata in your page / theitemscopeitemtype and itemprop attributes

BASIC STEPS

Adding microdata to an HTML page is a really simple task and requires only three attributes: itemscope, itemtype  and itemprop.

1 - Define a container element by adding an itemscope attribute

First, you need to add an itemscope attribute to an HTML element. This will define the "global object" for which we will define properties. This element can be of different types that we will describe later, but for now let us keep looking at the same example we used in previous sections:
  1. <section itemscope itemtype="http://schema.org/Person">
  2. ...
  3. </section>
We will look at the itemtype attribute later. Now that we have defined a global wrapper object/element (a Person in that case), we can  add properties inside this element to define the first name, last name, etc.

2 - Specify the vocabulary used for your microdata with the itemtypeattribute of the container element

HTML5 proposes semantic elements for representing sections, articles, headers, etc, but it does not propose any specific elements or attributes to describe an address, a product, a person, etc.
We need a special vocabulary to represent a person or a physical address. With microdata you can define your own vocabulary or better, reuse one of the existing popular vocabularies available, such as the ones fromhttp://www.schema.org
Microdata works with properties defined as name/value pairs. The names are defined in the corresponding vocabulary. For example, the vocabulary for representing a Person, available athttp://schema.org/Person defines a set of property names, as illustrated by the following screenshot:
Properties examples for the Person schema
As you can see in this small extract from the vocabulary (also called a "schema"), a Person can have a name (some text), an Address (the type is defined by another vocabulary named PostalAddress), an affiliation (defined by another vocabulary named Organization) and so on.
We notice that one property, such as the address of a Person, may use another vocabulary. Yes, a vocabulary may link to another vocabulary! There is also inheritance between vocabularies! The above screenshot shows that the Person vocabulary inherits from a Thing vocabulary, and the five first properties of the table come from this vocabulary that describes things.
If you are a developer and if you are familiar with object oriented programming, think of properties as class attributes and think of vocabularies as classes.
Vocabularies are meant to be shared
If one of the existing vocabularies available at the schema.org Web site fits your needs, you should reuse it, as the most popular vocabularies are becoming de facto standards and will be taken into account by Web crawlers, browsers, and browser extensions.
However, if you do not find a vocabulary corresponding to your needs, keep in mind that anyone can define a microdata vocabulary and start embedding custom properties in their own Web pages. You need to define a namespace and put a description of your vocabulary in a Web page that has the name of your vocabulary. For example, if you own japaneserobots.com, you may define a vocabulary for describing Mech Warrior robots at http://japaneserobots/MechWarrior in the same way as http://schema.org/Person describes the properties of a person.

3 - Add properties using the itemprop attribute in HTML elements inside the container

Basics:
Now that you defined a container element, you may add properties to the HTML inside:
  1. <section itemscope itemtype="http://schema.org/Person">
  2.      <h1>Contact Information</h1>
  3.      <dl>
  4.          <dt>Name</dt>
  5.          <dd itemprop="name">Michel Buffa</dd>
  6.          <dt>Position</dt>
  7.          <dd><span
  8.                itemprop="jobTitle">Professor/Researcher/Scientist
  9.              </span> for
  10.              <span itemprop="affiliation">University of Nice,
  11.                     France
  12.              </span>
  13.           </dd>
  14.      </dl>
  15.      <h1>My different online public accounts</h1>
  16.      <ul>
  17.          <li><a href="http://www.twitter.com/micbuffa"
  18.              itemprop="url">Twitter profile</a></li>
  19.          <li><a href="http://www.blogger.com/micbuffa"
  20.              itemprop="url">Michel Buffa's blog</a></li>
  21.      </ul>
  22. </section>
In this example the container is a <section> that corresponds to a Person (we have one clue here: the name of the vocabulary given by the itemtype attribute), and each property defined inside this section is identified by the value of the itemprop attribute of sub-elements.
The line: 
  1. <dd itemprop="name">Michel Buffa</dd>
...defines a property called "name" that has a value of "Michel Buffa" (the text value between the opening and closing tags of the <dd> element)
Nesting microdata items:
As we saw with the Person/Address example at the beginning of this chapter, it is possible to nest microdata items inside one another.
Give an element inside a microdata container its own itemscope attribute with the recommended itemtypeattribute for indicating the name of the vocabulary used by the nested microdata.
Again, look at the Person/Address example:
  1. ...
  2. </dl>
  3. <!-- SURFACE ADDRESS GOES HERE -->
  4. <dd itemprop="address" itemscope
  5.     itemtype="http://schema.org/PostalAddress">
  6.      <span itemprop="streetAddress">10 promenade des anglais</span><br>
  7.      <span itemprop="addressLocality">Nice</span>,
  8.      <span itemprop="addressRegion">Alpes maritimes, France</span>
  9.      <span itemprop="postalCode">06410</span><br>
  10.      <span itemprop="addressCountry" itemscope
  11.            itemtype="http://schema.org/Country">
  12.           <span itemprop="name">France</span>
  13.      </span>
  14. </dd>
  15. <h1>My different online public accounts</h1>
  16. ...
The properties at lines 8-12 refer to the address nested microdata (they are defined in the Address vocabulary, not the Person vocabulary), and "France" (line 14) is a property that refers to the Country vocabulary.
Several properties with the same name but different values
It is possible to use the same property name several times in one microdata object, but with different values:
  1. ...
  2. <h1>My different online public accounts</h1>
  3. <ul>
  4. <li><a href="http://www.twitter.com/micbuffa" itemprop="url">Twitter
  5.       profile</a></li>
  6. <li><a href="http://www.blogger.com/micbuffa" itemprop="url">Michel
  7.       Buffa's blog</a></li>
  8. </ul>
This will define the fact that Michel Buffa has two online accounts, and the two properties have the name url, each with its own value.
It is possible to set more than one property at once, with the same value
Here are some microdata that represent a song. In this example, at line 5 we set  two different properties:genre and keywords with the same value (see the MusicRecording schema definition athttp://schema.org/MusicRecording):
  1. <div itemscope itemtype="http://schema.org/MusicRecording">
  2. <h2>The song I just published</h2>
  3. <ul>
  4. <li>Name: <span itemprop="name">Please buy me on itunes, I need money!</span></li>
  5. <li>Band: <span itemprop="genre keywords">Punk, Ska</span></li>
  6. </ul>
  7. </div>
And so on...
Now let's see what elements are compatible with the itemprop attribute and where the values of the properties are located, depending on each element type.

THE HTML ELEMENTS COMPATIBLE WITH THE ITEMPROP ATTRIBUTE

If the itemprop attribute appears on a:
Elements that can be associated with microdata
HTML5 elementsmicrodata value associated
<a><area><audio><embed><iframe>
<img><link><object><source>, or <video> element
The data is the url in the element's hrefsrc, or data attribute, as appropriate. For example, an image element inside a container of personal contact information can be recognized as that person's photo and downloaded accordingly.
<time> elementThe data is the time in the element's datetime attribute. This lets you, for example, just say "last week" in your text content but still indicate exact date and time.
<meta> elementThe data is whatever appears in the content attribute of the <meta> element. This is used when you need to include some data that isn't actually in the text of your page.
anything elseThe data is whatever is in the text of the element.
For example, the value of a property defined in an <img> element will be the value of the src attribute:
  1. <img itemprop="image" src="MichelBuffa.png" alt="A great professor">
Or for a <time>, it will be the value of the datetime attribute:
  1. <time itemprop="birthday" datetime="1965-04-16">April 16, 1965</time>
Or for an <a> element, the value will be the value of the href attribute:
  1. <a href="http://www.twitter.com/micbuffa" itemprop="url">profile</a>

Microdata Tools

There are many tools available (most are free) that you can use for generating, visualizing and debugging microdata. We propose some of them in this page, but feel free to share the tools you find / like in the forums.

MICRODATA GENERATORS

There are many free tools you can use to automatically generate microdata for describing persons, restaurants, movies, products, organizations, etc. such as:
Example with the first tool of the list:
microdata generator form
Result:
generated microdata from previous screenshot

MICRODATA VISUALIZATION

microdata visualizer

BROWSER EXTENSIONS

There are many available, for all modern browsers. We have compiled an exhaustive list of extensions (and online tools too) for the students from a previous version of this course.

Examples of well structured pages with Microdata

Here, we propose a few links to Web pages that were created by students of previous editions of this course (from W3C's W3DevCampus classrooms).
The students had to create a Web page that presented themselves, with some information: name, job, organization they work for, location, etc. and of course enrich the page with microdata. They also had to follow the best practices concerning the new structural elements, headings, etc.
Click on these pages and look at the source code...

FIRST EXAMPLE

Visit the online page: http://output.jsbin.com/wozoye
Structure:
picture of the first about me page example. Shows the table of content
Microdata:
microdata from the example page

SECOND EXAMPLE

Visit the online page: http://output.jsbin.com/buriqi/35
Example page, shows table of content

THIRD EXAMPLE

Visit the online page: http://www.webbem.nl/
Third page example

EXTERNAL RESOURCES

1 comment: