Skip to content

Markup languages

Concept of markup language

Markup languages are ways of annotating an electronic document. Usually markup will either specify how something should be displayed or what something means. The origin of the term is in typesetting, where proofs were marked up with instructions about their visual appearance, but the term then broadened to include the semantic perspective that we’re interested in here.

The names of the most popular languages usually end with Markup Language and so are abbreviated as something-ML: for example,

  • HTML – Hypertext Markup Language
  • KML – Keyhole Markup Language
  • MathML – Mathematical Markup Language
  • SGML – Standard Generalized Markup Language
  • XHTML – eXtensible Hypertext Markup Language
  • XML – eXtensible Markup Language

The most widely used markup language is HTML (HyperText Markup Language), the foundation of the World Wide Web.

Some examples are:

HTML 4.0

 <h1>Anatidae</h1>
 <p>
   The family <i>Anatidae</i> includes ducks, geese, and swans,
   but <em>not</em> the closely related screamers.
 </p>

XML

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note> 

Advantages

Initially, markup languages ​​focused on document generation but thanks to its advantages its use has been extended to definition of data structures and sharing of information.

The main advantages are:

  • Ease of creation and reading.
  • Compliance with defined and public storage standards.
  • Incorporation of metadata.
  • Definition of the structure of the data.

Common features

Markup languages ​​have stood out for a number of features which lead them to become the most widely used types of languages ​​in modern computing for storing and representing data. Among the most interesting features that markup languages ​​offer are:

  • They intermix the text of a document with markup instructions in the same data stream or file.
  • They are based on plain text.
  • They allow the use of metadata.
  • They are easy to interpret and process.
  • They are easy to create and flexible enough to represent very diverse data.

Internet applications and many of the computer programs use them in one way or another.

Fields of applicacion

While the idea of markup language originated with text documents, there is an increasing use of markup languages in the presentation of other types of information, including playlists, vector graphics, web services, content syndication, and user interfaces. Most of these are XML applications, because XML is a well-defined and extensible language.

Regarding the fields of application we can define the following classification:

  • Documents in general:
    • Descriptive languages ​​such as XML, HTML 5, YAML.
    • Presentation languages ​​such as RTF, Tex, HTML 4.
    • Lightweight languages ​​like Markdown
  • Internet technologies:
    • HTML, XHTML, GladeXML, Atom, RSS, WSDL
  • Specialized languages:
    • SVG, XMPP, COLLADA

In Markup language you will find more information.

Types of markup language

There are three main general categoris of electronic markup:

  • Presentation languages​​, aimed at specifying how the information must be represented. This kind of markup is used by traditional word-processing systems.
  • Procedural markup. Markup is embedded in text which provides instructions for programs to process the text. Well-known examples include troff, TeX, and PostScript.
  • Descriptive or semantic languages​​: aimed at describing the structure of the data it contains.

This is the most accepted classification, but as is often the case in the field of Computer Science, we can find languages ​​that have aspects of these categories and allow them to define the way the information is presented and to define it the structure.

In the recent years, a number of small and largely unstandardized markup languages have been developed to allow authors to create formatted text via web browsers, such as the ones used in wikis and in web forums. These are sometimes called lightweight markup languages. Markdown, BBCode, and the markup language used by Wikipedia are examples of such languages.

Procedural and presentation

In these languages, what is done is to indicate how to do it the presentation of the data. Either through design information (mark bold, titles, etc.) or procedures to be performed by the software representation. The most popular example of these languages ​​is HTML, but there are some there are many more: TeX, Wikitext ...

In these cases the documents can help us to determine in what way the document will be shown to whoever reads it.

Latex: a procedural markup language

For example:

\documentclass{article}
\usepackage{graphicx}

\begin{document}

\title{Introduction to LaTeX {}}
\author{Author's Name}

\maketitle

\begin{abstract}
The abstract text goes here.
\end{abstract}

\section{Introduction}
Here is the text of your introduction.

\begin{equation}
    \label{simple_equation}
    \alpha=\sqrt{\beta}
\end{equation}

\subsection{Subsection Heading Here}
Write your subsection text here.

\begin{figure}
    \centering
    \includegraphics[width = 3.0in]{myfigure}
    \caption{Simulation Results}
    \label{simulationfigure}
\end{figure}

\section{Conclusion}
Write your conclusion here.

\end{document}

Descriptive or semantic

These languages ​​describe the logical structure of the document ignoring how it will be represented in the programs. Only the marks are put with the aim of defining the parts that give structure to the document. The example more important is XML but there are others that are having a lot of support, such as for example JSON.

In the following document we have an example of a file of marks that represents information about people:

<students>
    <person>
        <name>Pere</name>
        <lastname>Puig</lastname>
    </person>
    <person>
        <name>Manel</name>
        <lastname>Garcia</lastname>
    </person>
</students>
We can clearly notice what this data is about: a list of students. At a glance, it is easy to determine that Pere and Manel are names and that Puig and Garcia are surnames. But through the hierarchy of data it can be inferred that Pere Puig and Manel Garcia are students as both first and last name are included inside of the students tag.

This document shows the structure of the data it contains and also this can also be discovered by interpreting the labels their content semantic. From the knowledge we have it follows that Peter is the name of a person who is a student.

Bibliography, webgraphy and credits