This document tries to define the technical terms it uses or to provide links to definitions. If you find terms which are unknown to you and not defined here, please consult eg the Terms section of HTML 2.0 specification or some of the general Internet glossaries. (The most authoritative Internet glossary is probably RFC 1983.)
People who have heard about HTML 3.0 should notice that HTML 3.2 is not an extension or a variant of HTML 3.0, which has now been withdrawn. (The version numbers 3.0 and 3.2 are misleading!) More exactly, HTML 3.2 contains
For a good summary of the new features in HTML 3.2 as compared with HTML 2.0, consult the article What's New in HTML 3.2 in the World Wide Web Journal, but please notice that it contains a few mistakes.
HTML 3.2 has been defined by the World Wide Web Consortium. It is supported by several browsers to a large extent, and it will probably become the common basis understood by almost all relevant Web software. The next version, an extension to HTML 3.2, is being developed under the code name Cougar.
An older standard, HTML 2.0, is supported to an even larger extent, since HTML 3.2 is an extension of HTML 2.0.
However, to be exact, the following HTML 2.0 features have been removed in HTML 3.2:
This document does not discuss general issues of Web authoring, such as overall design of documents and document collections. As regards to them, see my list of suggested reading.
In addition to such issues, you need to know where to put your HTML document to make it accessible to the world; this may involve things like setting up directory and file protections suitably. Please consult your local Web support for information relevant at your site.
This document concentrates on basic HTML usage. In particular, this document does not give realistic examples about applets or image maps. (The main reason for this is that the author felt that a basic document was urgently needed, and providing good examples about such complicated and somewhat controversial issues would have taken too much time.)
For printing on paper, you may wish to use the
PostScript version
(generated from the HTML version with Netscape),
which also exists in
a much smaller form, as
compressed
(with the Unix compress utility).
In general, you should be able to read this document on any decent WWW browser. However, tables (TABLE elements) have been used in this document, mainly in the description of attributes, since they are essentially tabular information best presented so. Unfortunately this means that parts of this document are almost illegible when viewed with browsers which cannot present tables (eg most versions of Lynx).
The author hereby gives general permission to copy and distribute this document or parts thereof in any medium, provided that all copies contain, in a manner appropriate for the medium, an acknowledgement of authorship and the URL of the original document, ie http://www.hut.fi/%7ejkorpela/HTML3.2/
The permission granted above does not imply permission to distribute this document in a modified form or as a translation. Please contact the author to discuss the conditions for such actions.
Explanation: The author wishes to preserve the integrity of the document. This includes specifying the context when distributing or using excerpts and informing the reader about the availability of the entire document in its most up-to-date form.
Please notice that most introductory texts on HTML do not present the language exactly as defined by HTML 3.2; some of them might differ a lot from it. This is understandable, since the language HTML evolves rapidly (and even divergently).
The specification is relatively short and technical, and consulting the older HTML 2.0 specification (also known as RFC 1866) can be useful, since the current HTML 3.2 specifications can sometimes be understood only be assuming HTML 2.0 as a background document.
In order to understand the HTML specifications exactly, some fluency in reading SGML (the metalanguage used to describe the syntax of HTML formally) is required. SGML as a whole is rather complicated, and the SGML standard is only available in printed form. However, for the purpose of understanding the SGML descriptions of the syntax of HTML (that is, HTML DTDs), the following material usually gives you enough information:
There are some minor internal inconsistencies in the HTML 3.2 specification.
Notice that documents on HTML (even some of the above-mentioned) very often contain information about features which do not belong to HTML 3.2.
Even if you know HTML 3.2 well, you will by mistake violate the specification; for instance, just forgetting an ending quote can cause a lot of such violations. You may not notice the error in your environment but your readers may get confused.
It is not sufficient to check that "it works" on your browser. Other people will use that browser in a different environment or with different settings, different versions of the browser, or even quite different browsers. Browsers very often pass invalid HTML without giving error messages, perhaps even handling in such a way that things seem to work fine. For other people, it might be a mess. Looking at your document on a few different browsers may help to detect problems, but it would be too tedious to do that for all important browsing environments.
Therefore, validate your code. You can use eg HTML Validation Service of WebTechs which is easy to use.
Passing validation means that there are no violations of HTML syntax (providing that the validator does its job right). Checking the quality of the document is a different thing. There are some checkers such as WebLint which can be used to test the document for various common problems - for things which, although technically legal, are likely to provoke known browser bugs, etc. Checkers may of course perform an HTML syntax check too, but typically they are rougher than validators. They might declare a document legal syntax when it isn't, or declare it illegal when it is. Nevertheless, they are useful tools, both for alerting newcomers to potential problems, and for picking up errors made by even the most experienced.
For more information, Heikki Kantola's nice compact list of validators and checkers and WDG's (annotated) rather extensive list of validators and checkers.
In addition to character repertoire and encoding (of characters by bit combinations), there is a special feature which is fixed in HTML: the interpretation of numerical character escapes of the form &#n; where n is a number. Such an escape is to be interpreted as the character corresponding to n in ISO 10646 and Unicode. In practice, browsers cannot represent all ISO 10646 characters, but the specifications imply that if a browser &#n; presents as a character, it must use the ISO 10646 character. (Unfortunately, browsers may violate this.)
In practise, you should use ISO Latin 1 characters only. Currently or in the near future you can hardly expect general support for extensions to it, although support to some national alphabets may exist nationally. Support for ISO Latin 1 should exist in all browsers, but there are problems even with this. You may of course decide to stick to the ASCII character set, which is a subset of ISO Latin 1, especially if you do not need letters with diacritic marks (or, in general, letters other than English a - z).
The printable characters of ASCII (with code values from 32 to 126 in decimal) are the following:
! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~
The other printable characters
of ISO Latin 1 (with code values
from 160 to 255 in decimal)
are the following:
¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿNote: The presentation of some characters in the copy of this document may be defective eg due to lack of font support. Naturally, the appearance of characters varies from one font to another.
If your keyboard or text editor does not allow you to enter (ie to type directly) some ISO Latin 1 characters such as ä or ñ, you can use the character escape conventions.
Some practical warnings to those who create HTML documents on microcomputers:
<H1> <H1 ALIGN=LEFT>
<H1>Foreword</H1>In such cases the two tags and the part of the document enclosed by them forms a unit which is called HTML element. Some tags, eg <HR>, are HTML elements by themselves, and for them the corresponding end tag would be illegal. - In the sequel we will usually refer to tags by their name only, omitting the obligatory angle brackets.
For some elements which logically consist of a start tag, some content and an end tag, it is legal to omit the end tag, possibly even the start tag. For example, you can omit the end tag </P> and let browsers and other software imply it when necessary. The exact rules for allowable tag omission are given in the HTML specification, often only in the formal (SGML) syntac, so they can be hard to read. Moreover, some browsers are known to misbehave if you omit some end tags even when the specs allow it, and this can have drastic effects eg when nested tables are involved. Thus it is wisest to use explicit end tags always for all elements which logically have an end tag.
You can also omit the quotes from an attribute value if the value consists of the following characters only (cf to the technical concept of name):
Within attribute values, no HTML tags are recognized. On the other hand, escape sequences are recognized and interpreted.
There is a minimized syntax for attributes when the attribute value is the same as the attribute name. For instance, <UL COMPACT="COMPACT"> can be abbreviated as <UL COMPACT> (and it is common practise to do so). Some user agents even require minization for some attributes (COMPACT, ISMAP, CHECKED, NOWRAP, NOSHADE, NOHREF), so perhaps it is best to use the minimized syntax when applicable.
Successive attribute specifications must be separated with blanks (or newlines).
The general syntax of URLs is the following:
scheme://host:port/path/filename
where
http | a Web document (to be accessed using Hypertext Transfer Protocol, HTTP) |
ftp | a file in a so-called FTP server, to be retrieved using File Transfer Protocol |
gopher | a file in a Gopher server |
mailto | electronic mail address |
news | a newsgroup or an article in Usenet news |
telnet | for starting an interactive session via the Telnet protocol (which is part of TCP/IP) |
www.hut.fi (or sometimes a numerical TCP/IP
address); notice that typically, but not necessarily, Web
servers have domain names starting with www
:port
http
URLs. For other URLs, simplifications and special interpretations are
applied. For example, a mailto URL is just of the form
mailto:address where address is
a normal Internet E-mail address like
Jukka.Korpela@hut.fi.
Please notice that appending anything to the E-mail address in
a mailto URL
is nonstandard and
may result in lost mail without
anyone noticing!
As explained above, it is safest to enclose URLs in quotes when writing them as attribute values in HTML.
For an overview of URLs, see W3C material on addressing.
As regards to the technical specifications of the syntax of URLs, see RFC 1738 (absolute URLs) and RFC 1808 (relative URLs).
In particular, the specifications say that within a URL only a limited set of characters can be used as such:
A to
Z, a to z,
0 to 9)
$-_.+!*'(),
;/?:@=&# provided that they
are used in the special meaning reserved for them
in the
RFCs mentioned above.
;/?:@=&# must also be encoded, if they
are not used in the special meaning.)
This encoding (which is defined by URL specifications, not HTML
specifications)
consists of
using the percent sign followed by two
hexadecimal digits, presenting the code position.
For example, tilde (~) should be presented as
%7E and space as %20.
(Violating the rules causes problems
much more likely
in the latter
case than in the former.)
In this document, upper case letters are used for the above-mentioned constructs. This may help the reader distinguish HTML code from normal text.
However, the following constructs are (in general) case sensitive:
The term newline is used to denote an end of line designation. Theoretically SGML specifies that a line (record) should begin with a record start character (line feed, LF, ASCII code 10) and end with a record end character (carriage return, CR, ASCII code 13). In practise, HTML documents are presented and transmitted using a newline presentation convention of the computer system used. Therefore, HTML browsers are encouraged to accept any of the three common representations, namely CR LF sequence, CR only, and LF only, as line separators and to infer the missing record end and start characters.
Thus, it does not matter how you divide the text into a lines, since
a newline is equivalent to a blank. Notice, however, that you
must not divide a word into two lines in HTML.
If you eg divide the word
international into two lines as follows:
inter- nationalit will be interpreted as equivalent to
inter- nationaland the result is not what you want.
Thus, you must use HTML tags such as P or BR to force line breaks, if they are necessary for the logical representation of your document.
Browsers usually do not divide words into two lines, except possibly when a word contains a hyphen. The HTML 3.2 Reference Specification is not very explicit in this matter; it just says, in the discussion of tables, the following:
For some user agents it may be necessary or desirable to break text lines within words. In such cases a visual indication that this has occurred is advised.
Beware that the line length is outside your control. It depends on the browser, device, and settings used by the people who look at your document. You can force line breaks but not prevent line breaks between words, in general. (You can try to prevent line breaks by using non-breaking spaces.)
As regards to newlines in conjunction with HTML tags, there are special rules:
<P> Text
is equivalent to
<P>Text
Text </P>
is equivalent to
Text</P>
The horizontal tab character (HT) can appear in the HTML source. Within PRE elements, tabs have a special interpretation. Otherwise a tab is equivalent to a space. Thus, it does not imply tabulation of any kind. (In order to present tabular data, use the TABLE element.) It is best to avoid tabs in HTML code and to use a suitable number of spaces instead, if one wants to format the HTML source code into tabular form.
Apart from the elements at the topmost levels, namely HTML, HEAD and BODY, the HTML elements are classified into three major categories:
Any text element (including plain text) can appear wherever a block element is allowed, by virtue of implicitly forming a paragraph (P element) when necessary.
A rule of thumb which may help in remembering which elements are block elements and which are text elements: block elements cause paragraph breaks, text elements do not.
Note: Often block elements can contain both text elements and
other block elements, ie blocks can be nested.
Text elements can be nested, too.
On the other hand,
text elements may not contain block elements.
For example,
<CITE><H3>Origin of Species</H3></CITE>
is invalid (since CITE
is text element and H3 is block element)
and also illogical (you don't really mean that the heading
as a structure
is a citation, do you?)
whereas
<H3><CITE>Origin of Species</CITE></H3>
would be legal, although different browsers might treat it differently
(letting either H3 or CITE determine the rendering, or possibly
using a mixture of the two).
Similarly, don't embed
headings into A NAME
tags but vice versa.
It is also illegal to have a paragraph break (P tag)
within eg a STRONG element; although several
browsers can handle it, the semantics is ambiguous and you should use
separate start and end STRONG tags within each paragraph (if you really
want to emphasize such large portions of text!).
The same information is presented in the individual tag descriptions, in their Allowed context and Contents parts. Here it is presented in a compact form. This form does not cover all details but might be more illustrative.
Legend:
A, ADDRESS, APPLET, B, BIG, BLOCKQUOTE, BODY, CAPTION, CENTER, CITE, CODE, DD, DFN, DIV, DT, EM, FONT, FORM, H1, H2, H3, H4, H5, H6, HTML, I, KBD, LI, P, PRE (with restrictions), SAMP, SMALL, STRIKE, STRONG, SUB, SUP, TD, TH, TT, U, VAR.
The following are not text containers but may contain text elements indirectly, ie contain elements which are text containers:
DIR, DL, MENU, OL, TABLE, TR, UL.
The following may not contain text elements at all:
AREA, BASE, BASEFONT, BR, HEAD, HR, IMG, INPUT, ISINDEX, LINK, MAP, META, OPTION, PARAM, SCRIPT, SELECT, STYLE, TEXTAREA, TITLE,
Similarly I will use the term block container to denote any element which may contain a block element directly (as opposite to containing an element which contains a block element). Block containers are: BLOCKQUOTE, BODY, CENTER, DD, DIV FORM HTML, LI (when within UL or OL), TD, TH.
Obviously, since some characters such as < are used with a very special meaning in HTML, there must be some way of expressing them as data characters, ie when they should appear eg as part of the document itself or in a URL. The convention is that the following notations are used:
| character | notation | usual name(s) of the character |
|---|---|---|
| < | < | less than character, left angle bracket |
| > | > | greater than character, right angle bracket |
| & | & | ampersand |
There was notation " for the double quote (") in HTML 2.0, but it does not belong to HTML 3.2 (for certain technical reasons). The double quote can be typed as such within normal text, and within quoted strings as well if the single quotes are used as the outermost quotes. (In the rare cases where this does not work, you can use " to represent the double quote.)
Notice that the semicolon is part of the escape sequence. In principle, it is necessary only if the following character would otherwise be recognized as part of the name. In practice, it is best to adopt the habit of always terminating an escape sequence with a semicolon.
In escape sequences, the case of letters is significant. For example, the ampersand & may not be represented as & (this escape sequence is undefined), and the escape sequences ä and Ä denote two distinct characters, a umlaut (a dieresis, the letter a with two dots above it) in lower case and in upper case (ä and Ä); notice the principle of uppercasing only the first letter in the escape notation (&AUML; is undefined).
The need for the above-mentioned escape sequences arises from the syntax of HTML. In fact there are escape sequences for all characters in the ISO Latin 1 character set. There are
| © | copyright sign, © |
| ® | registered trademark sign, ® |
| | non-breaking space |
However, there is usually little reason to use other escape sequences than < and > and &. Using ä instead of ä might seem to give some character code independency, but it does not; if a browser can display ä correctly, it can also display correctly a document in which the character ä is specified directly. But notice that sometimes you cannot input some special characters directly due to keyboard restrictions, and in such cases you can have use for notations like ä.
And please notice that "character ä" means the ISO Latin 1 character with name "small letter a with diaeresis" (diaeresis = umlaut), with code 344 in octal, 228 in decimal. It can be entered into an HTML document in various ways. It is possible that pressing a key labeled with ä or Ä is not among those ways. For instance, on a Macintosh with Scandinavian keyboard the ä key normally produces a character quite different from ä in ISO Latin 1. Various programs may or may not handle this by performing character code conversions.
Some browsers support other escape sequences than those mentioned above, for example ™ and &cbsp;. The use of such notations is strongly discouraged. (Notation ™ refers to a symbol which does not belong to ISO Latin 1 at all; you may wish to use the HTML 3.2 conformant notation <SUP><SMALL>TM</SMALL></SUP> instead. Notation &cbsp; stands for "conditional breaking space", not in ISO Latin 1 and possibly not intended to be a character at all.)
This name concept occurs in the description of HTTP-EQUIV and NAME attributes of the META element and in the description of NAME attribute of the PARAM element.
In other contexts, a string which is used to name something may contain other characters as well but then it must be quoted.
It is of course possible that due to software or hardware limitations all colors cannot be presented. On some devices, the actual rendering might be just black and white or different shades of grey.
When a color is specified as the value of an attribute, there are two possibilities:
It is not necessary to know the numerical equivalents of the predefined color names in order to use them. However, the following table specifies them as well, since they might help authors who wish to define colors by slightly modifying the predefined ones.
| Black = "#000000" | Green = "#008000" |
| Silver = "#C0C0C0" | Lime = "#00FF00" |
| Gray = "#808080" | Olive = "#808000" |
| White = "#FFFFFF" | Yellow = "#FFFF00" |
| Maroon = "#800000" | Navy = "#000080" |
| Red = "#FF0000" | Blue = "#0000FF" |
| Purple = "#800080" | Teal = "#008080" |
| Fuchsia = "#FF00FF" | Aqua = "#00FFFF" |
These colors were originally picked as being the standard 16 colors supported with the Windows VGA palette. The HTML 3.2 Reference Specification contains a section on colors with sample images in each of the 16 colors. Notice that these colors are rather striking in their brightness. Normally you should use paler colors.
See also:
A browser should multiply the pixel values by an appropriate factor when rendering to very high resolution devices such as laser printers. For instance if a user agent has a display with 75 pixels per inch and is rendering to a laser printer with 600 dots per inch, then it should multiply the pixel values given in HTML attributes by a factor of 8.
The question whether should prevent line breaks when rendering HTML documents is ambiguous. The HTML 2.0 specification says:
Use of the non-breaking space and soft hyphen indicator characters is discouraged because support for them is not widely deployed.The soft hyphen should really be avoided; it serves no useful purpose in HTML. But as regards to non-breaking space, you can well use it to try to prevent line breaks where you don't want them. And although the HTML 3.2 Reference Specification is not explicit about the matter in general, it suggests, in the discussion of the NOWRAP attribute of TH and TD elements, that should act as non-breaking space within table cells at least.
If you use non-breaking spaces, use them instead of normal
spaces, not in addition to them. For instance, if you wish to prevent a line
break between
version and 3, type
version 3
(not version 3).
On the other hand, within a table in HTML 3.2, can have quite different meaning, which can be described as non-empty space: when a table is presented with borders, cells with empty contents are drawn without them, and spaces only do not constitute contents - but does! This peculiar semantics does not prevent from acting as a non-break space as well.
For further confusion, some people use to force spaces into the visible presentation of a document, eg by putting an or a few of them into the beginning of a paragraph to get its first line intended. This may actually work on some browsers, but it is unwise to rely on that, and it is normally useless to try to enforce such presentation features anyway.
You can begin a comment with the four-character sequence <!-- (less than sign, exclamation sign, two hyphens) and terminate it with the three-character sequence --> (two hyphens, greater than sign). Don't use the character pair -- or the character > within a comment. For example:
<!-- Written by Jukka Korpela -->(For a more thorough discussion of comment syntax, see document HTML comments by WDG.)
It is generally preferable to include metainformation about the document into HTML elements, such as META. Consider making information about purpose, author, creation and last update time etc a visible part of the document itself, too.
Thus, comments should be inserted in rare cases only, eg to comment the HTML code itself to explain things that may look odd. Remember that a comment is part of an HTML file, to be transmitted whenever the document is delivered. Therefore, to avoid wasting bandwidth, if you have a long story to tell, put it into a separate document and insert just its URL into a comment.
HTML editors and converters often insert a few comment lines into the beginning of an HTML file. Such indications can be helpful and should not be removed.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <TITLE>Hello</TITLE> Hello worldIn fact, this document implicitly has the following structure, ie it is equivalent to the following:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE>Hello</TITLE> </HEAD> <BODY> Hello world </BODY> </HTML>This means that apart from the first line, the entire file is an HTML element which contains a HEAD element, with the TITLE element as contents, and a BODY element, with the plain text as contents.
Thus, in the absence of HTML, HEAD, and TITLE tags a browser implicitly assumes them in suitable places. Therefore, your document always contains a head and a body.
Here we will simply emphasize that every HTML document should contain certain basic information about its origin. The local recommendations may specify in detail the form in which that information should be provided.
The importance of providing origin information becomes evident if we think how people find documents using search engines or link lists in an increasing amount. In such contexts the document pops up as such, in isolation, even if you may have intended that people find out following links which you have carefully designed so that they give background information. When a user has eg found your document using AltaVista, he most probably wants to know what kind of document it is. Therefore, each HTML file should provide the very basic information (or link to information) about its origin and nature. For example, in a book-like document collection divided into small files, every file should contain at least a link to the "front page" of the "book".
At least the following origin information should be provided:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE>A sample HTML document</TITLE> <LINK REV="made" HREF="mailto:jukka.korpela@hut.fi"> </HEAD> <BODY> <H1>A sample HTML document</H1> This is a sample HTML document exemplifying a suggested way of presenting basic origin information. <HR> <P> <A HREF="http://www.hut.fi/~jkorpela/">Jukka Korpela</A>, <a href="mailto:Jukka.Korpela@hut.fi">Jukka.Korpela@hut.fi</a> <BR> This document belongs to the context of <a href="index.html">Learning HTML 3.2 by Examples</a> <BR> The URL for this document is <KBD> http://www.hut.fi/~jkorpela/HTML3.2/skel.html </KBD> <BR> Created: December 5, 1996 </BODY> </HTML>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">(where you theoretically should have
HTML 3.2 Final
instead of
HTML 3.2)
<TITLE>Introduction to General Absurdity</TITLE>
Most browsers don't complain if you omit these, but they are required by the HTML 3.2 definition. More importantly, there are good practical reasons to include them:
Optionally, the HEAD element may contain the following elements in addition to a TITLE element:
The tags for expressing major structural features, so-called block level tags, are the following:
A recommendable approach, which may need adjustments to fit your local recommendations, is the following:
List can be nested in the sense that an item in a list, i.e. an LI (or DD) element, may in turn contain a list element.
Notice that the basic paragraph element P is not nestable, ie you cannot have P elements within a P element to create subparagraphs. However, the various list elements effectively provide an itemization structure which essentially corresponds to subparagraph division. Moreover, the list elements are nestable.
Logical markup shall be preferred. Use physical markup only if it is really relevant that part of a text displayed in a particular physical way (if possible). The need for physical markup may arise when referring to information in fixed presentation form, such as text in a book or in an image. Such situations occur rarely.
For instance, use the STRONG element for strong emphasis, letting the various Web browsers express the emphasis in the way which is the best in the environment where they are used. Do not use the B element (indicating bolding), except in the rare occasions where you are writing about some text appearing in boldface somewhere.
When style sheets will be generally useable, both authors and readers will be able to affect the rendering (eg font, color, and background) of elements. For instance, someone might wish to have all program code extracts presented with yellow background and larger than normal font whereas someone might prefer some quite different methods of distinguishing them from normal text. Such operations will be much easier if logical markup has been used consistently.
In addition to being more flexible with respect to various browsers and rendering environments, logical markup has the following advantage over physical markup: In an increasing amount, computer programs are used for extracting information from HTML documents for various purposes like indexing. For this to work, it is much better to have logical markup indicating eg that some text is more important than the rest or a quotation of computer printout, rather than having designations of physical fonts.
Both logical and physical markup is done using HTML elements with start and end tags. It follows from the nature of HTML language that markups must not overlap. For instance, the following is in error:
This has some <B>bold and <I></B>italic text</I>.On the other hand, markup elements can be nested. User agents should do their best when rendering structures like the following:
This is <I>italic text which contains <U>underlined text</U> within in </I> whereas <U>this is normal underlined text</U>.
Obviously, browsers with limited font repertoire can have difficulties in presenting text markup.
Avoid emphasizing too much, since emphasizing everything is tantamount to saying everything with the same emphasis, ie not emphasizing anything! (The proverbial student who underlines everything in his textbook has not grasped the idea of emphasizing.)
Unfortunately there is no phrase element for "de-emphasis", ie for indicating segments of text as less important. If you really need that, you may consider using the SMALL element. But especially if the less important text is relatively long, it might often be a better idea to put it "behind hyperlinks", into separate documents to which there are links in the main document. A person who follows such a link is probably interested in the text, so he probably prefers seeing it as normal text, and there is no need for any de-emphasis.
The DFN element can be regarded as a special kind of emphasis, too, but logically it indicates that a term is used in a context where it is defined. This is a very useful element in principle but unfortunately many browsers, including Netscape, do not effectively support it.
The VAR element indicates that a piece of text (typically, a word) is a variable, ie a generic notation to be replaced by different actual expressions.
The other phrase elements involve different kinds of citations or quotations:
| CITE | citation (title of a book or article or equivalent) |
|---|---|
| CODE | program code or equivalent (eg HTML code) |
| SAMP | sample output from programs, scripts, commands etc |
| KBD | text to be typed from a keyboard by a user; typically used when giving instructions |
Please do not identify eg the concept of emphasis with its physical representation on your browser (or even its typical representation on several browsers). See below for notes and examples on rendering markup.
| TT | "teletype" text, ie monospaced text |
|---|---|
| I | italics |
| B | bold |
| U | underlined |
| STRIKE | strike-through text |
| BIG | large font |
| SMALL | small font |
| SUB | subscript |
| SUP | superscript |
Note: SUB and SUP might reasonable be regarded as phrase-level markup, and as mentioned above, SMALL might be used as a substitute for the missing phrase markup for de-emphasis.
The FONT (and BASEFONT) element offers more possibilities to control font sizes than BIG and SMALL. However, all use of font size control in HTML should be avoided.
For example, some browsers (eg Internet Explorer) render TT (and CODE) so that the font is significantly smaller than normal text font, and this disproportion is preserved when the setting for font size is changed; moreover, Internet Explorer renders VAR with monospaced font whereas most graphical browsers use (much more naturally) italics. On the other hand, in Netscape these font sizes are separately settable and by default the same font size is used for both, but "the same" is the technical size in points - in practise monospaced font looks bigger than normal proportional font!
Thus, avoid messing up with font sizes; use phrase markup and other structural elements and let the users, if they dislike the font sizes, define fonts in their browser settings the best they can.
The following table is intended for giving an idea of the variation. It (verbally) presents the rendering of markup elements in Netscape Navigator, Microsoft Internet Explorer, and Lynx. Notice that there is variation even within each of these programs - depending on version, platform, and system-wide or user's own configuration, so this is just a typical situation. Thus, consider this as what different things might happen rather than as a description of what actually happens in some particular program.
| element | Netscape | Internet Explorer | Lynx |
|---|---|---|---|
| EM | italics | italics | underlined |
| DFN | normal text | italics | normal (monospaced) |
| CODE | monospaced | monospaced small | normal (monospaced) |
| SAMP | monospaced | monospaced small | normal (monospaced) |
| KBD | monospaced | monospaced small | normal (monospaced) |
| VAR | italics | monospaced small | normal (monospaced) |
| CITE | italics | italics | underlined |
| TT | monospaced | monospaced small | normal (monospaced) |
| I | italics | italics | underlined |
| B | bold | bold | underlined |
| U | normal text | underlined | underlined |
| STRIKE | strike-through | strike-through | text between [DEL: and
:DEL]
|
| BIG | larger than normal | larger than normal | normal text |
| SMALL | smaller than normal | slightly smaller than normal | normal text |
| SUB | lowered, slightly smaller | lowered | normal text |
| SUP | raised, slightly larger | raised | normal text |
These relate to unnested elements. Nesting of text elements may affect the rendering.
The following example illustrates the approach in the context of an introduction to the Perl programming language.
<P>The following Perl script prints out its input so that each line begins with
a running line number:</P>
<PRE><CODE>
#!/usr/bin/perl
$line = 1;
while (<>) {
print $line++, " ", $_; }
</CODE></PRE>
<P>The scalar variable <CODE>$line</CODE> is of course the line counter.<P>
<P>The loop construct is of the form<BR>
<CODE>while (<>) {</CODE><BR>
<VAR>process one line of input</VAR> <CODE>}</CODE><BR>
</P>
<P>Assuming that you have written this script (the simpler version of it) into a
file named <KBD>lines</KBD>, you could test it using a command of the form<BR>
<KBD>./lines</KBD> <VAR>datafile</VAR><BR>
In particular, using the script as input to itself, you would do as follows
(the details of system output vary from one system to another):
</P>
<PRE>
<SAMP>lk-hp-23 perl 251 % </SAMP><KBD>./lines lines</KBD>
<SAMP>1 #!/usr/bin/perl
2 $line = 1;
3 while (<>) {
4 print $line++, " ", $_; }
lk-hp-23 perl 252 % </SAMP>
</PRE>
Notes on the example:
Thus, on the Web there is no such thing as the layout of a document. As an author you cannot dictate layout, just make some efforts to affect it. The following notes, and all information related to layout-oriented features of HTML, should be read with this in mind.
Several HTML elements have optional attributes which can be used to affect the way in which the element is rendered. Consult the detailed descriptions of individual HTML tags to see the possibilities and to read notes about them.
In particular, you may wish to center parts of the text to make them more distinguishable from normal text. You can use the ALIGN=CENTER attribute in several elements like P or DIV (or the separate CENTER element).
If you wish to separate major portions of your document visually from each other, you can use the HR element. Typically it is rendered as a full width horizontal line. But please use this in addition to structuring tools like headings, not as a substitute for them.
As regards to detailed layout issues such as forcing or preventing line breaks, see section Division into lines and the use of blanks and tabs. Font issues were discussed above.
Technically links are specified using A (anchor) elements, and the technical issues are discussed in the description of the A tag. Here we just present the basic idea, a very simple example, and a few pragmatic or stylistic notes.
A link is a directed connection between a particular point in a document and another particular point in the same or another document. The points are often called anchors in HTML terminology.
The two ends of a link (the anchors) are in different logical positions: the link is from one point to another. The latter, called the target of the link, is very often the beginning of a document or, perhaps more logically speaking, an entire document.
In the simplest case, you create a link from one point of your document to another document (which could be your own or written by someone else, perhaps physically located at the other side of the globe). You have to decide which words act as a visual representation of the link, ie as the phrase which refers to the other document, and you need to know the Web address (the URL) of that document. Then you just put the pieces together into a suitable A element. For instance:
I work at <A HREF="http://www.hut.fi/english.html">HUT</a>.This might, in one environment, be rendered as follows:
I work at HUT.
The link text, here the abbreviation HUT, acts as a link to a Web document which explains what the abbreviation means and also provides a lot of information about it. The renderings vary a lot - the link text might be underlined, colored, or otherwise distinguishable from normal text. The user (reader) is assumed to know how links are rendered in the particular environment.
Although it is technically easy to set up links, it is pragmatically often very difficult to use them the right way. Here are some practical guidelines:
Assuming that we have some graphics in some format in a file, there are two essentially different ways to use it in a Web document. You can either link to it or to embed it into your document. In the first case, you use an anchor (A) element; in the latter case, an IMG element. In the first case, when a user accesses your document he sees eg a verbal phrase which acts as a link, and activating that link causes an image to be displayed, either in the same window or in another, depending on the browser and its settings. On the other hand, an embedded image is part of your document; when a user accesses your document, the image is loaded along with it and displayed as part of it.
In both cases, the user will see the image only if the browser supports the particular graphics format. The most commonly supported formats are GIF and JPEG. They are often the only formats supported for embedded images. For linked images, the support is typically wider (it might include eg PostScript, PDF, and PNG) and extensible by the user (by installing new viewers and making suitable additions to the settings of the browser). The reason is that linked images are typically implemented so that the browser knows nothing of the graphics format itself but only knows how to launch a separate program to present it.
As a special case, it is possible to combine linking and embedding in a sense: you can create a document which contains an image which acts (instead of verbal link text) as a link to another image. Typically, the embedded image is rather small, stamp-like, often a small coarse version of the image to which it points as a link.
Linking to an image is usually permitted without specific permission. On the other hand, embedding an image means using it in a way which requires the author's permission, and the author must be mentioned. (See Web Law FAQ.) Obviously, some images are so simple that copyright is not applicable. Moreover, there is a large number of collections of images, some of which are in the public domain.
To illustrate linking to images and embedding images, let us consider a GIF image which has been put onto a suitable place so that it is accessible using the URL http://www.hut.fi/%7elsarakon/sae.gif. Now I could refer to it in the following way:
<A HREF="http://www.hut.fi/~lsarakon/">Liisa Sarakontu</A> has drawn <A HREF="http://www.hut.fi/~lsarakon/sae.gif">a picture of Siamese algae eater</A>.On the other hand, since Liisa has given me the permission to do so, I could embed the image into a document of mine as follows:
The Siamese algae eater (<I>Crossocheilus siamensis</I>) is often mixed up with another algae eating fish, the "false Siamensis" (<I>Garra taeniata</I> or <I>Epalzeorhynchus sp.</I>). Below you can see drawings of them by <A HREF="http://www.hut.fi/~lsarakon/">Liisa Sarakontu</A>. <P> <IMG SRC="http://www.hut.fi/~lsarakon/sae.gif" ALT="[Picture of Siamese algae eater]"> <P> <IMG SRC="http://www.hut.fi/~lsarakon/false.gif" ALT='[Picture of "false Siamensis"]'>The issue of good use of images is very difficult any many-faceted. No attempt to cover it will be made here. The author has written a separate treatise How to use images in communication in general and on the Web in particular.
There is no general support in HTML 3.2 to presenting mathematical formulas. Consult the W3C document on Math Markup to see what work is in progress in this respect. However, you can use some software (eg TeX) to produce the representation of a formula as an image, eg in PostScript form, and use the IMG tag to embed it into your document or the A tag to create link to it. The latter method is often worth considering, especially for large formulas. The reader may prefer reading the text without distractions and looking at the formula (image) at the very moment he is prepared to do so. Moreover, he may prefer looking at it in a separate window (which is separately adjustable in size and positionable on the screen).
In some cases, when just a few separate symbols are needed within the text and they have reasonable textual alternatives, the following kind of approach can be suitable:
The Greek letter <IMG SRC="http://www.ece.cmu.edu/icons/Sigma.xbm" ALT="sigma"> is often used to denote summation.There is a problem, however: since an image has fixed dimensions whereas the size of letters is browser-dependent, there might be an unesthetic disproportion.
Sometimes it is best to present mathematical expressions in linearized notation. For example, instead of trying to find a way of presenting the square root of 2 in the normal mathematical way, you might write just sqrt(2). It depends on intended audience whether you need to explain such notations.
Table cells are often called table elements, but it is best to avoid that in the HTML context, since it might cause confusion eg with the TABLE element, which is the HTML description of an entire table.
Tables are the most important improvement in HTML 3.2 in comparison with HTML 2.0. On the other hand, the table constructs of HTML 3.2 are only a subset of The HTML3 Table Model (RFC 1942).
Unfortunately tables are not yet supported by all browsers, and even if support exists it may be of poor quality. (Text-only browsers and speech-based user agents will always have difficulties with complicated tables, of course.) See Alan Flavell's review Tables on non-table browser for information about making tables look somewhat reasonable, if possible, also on browsers which do not support tables.
Another unfortunate situation is that people have started using table elements just to get a desired layout of pages, not to represent data which is logically matrix-like in structure.
<TABLE> <TR> <TD> 1 </TD> <TD> 0 </TD> </TR> <TR> <TD> 0 </TD> <TD> 1 </TD> </TR> </TABLE>and it looks like the following on a typical browser:
| 1 | 0 |
| 0 | 1 |
Thus, the TABLE tags enclose the table rows, each of which is enclosed by TR tags and enclose table cells enclosed by TD tags. This corresponds to the logical structure of a table as a set of rows consisting of cells. You can abbreviate the table structure by omitting the TD and TR end tags (since a browser implicitly assumes them), but at the expense of losing the logical clarity to some extent:
<TABLE> <TR> <TD> 1 <TD> 0 <TR> <TD> 0 <TD> 1 </TABLE>
Moreover, although omitting those end tags is legal HTML 3.2, it may in practise confuse some browsers (including Netscape) in some cases.
The use of blanks and newlines in the HTML code for a table is irrelevant to the visual appearance of a table when viewed with a browser, since that appearance is controlled by HTML tags. However, it is often useful to position table elements suitably in the HTML code so that items in the same column are adjusted to make the structure clear for you (or whoever has to maintain the HTML document).
<P>An illustration of the use of the TABLE element in HTML.</P> <TABLE BORDER=1> <CAPTION>Finnish, English, and scientific names for some animals</CAPTION> <TR><TH>Finnish name</TH><TH>English name</TH><TH>Scientific name</TH></TR> <TR><TD>hirvi</TD><TD>elk</TD><TD><I>Alces alces</I></TD></TR> <TR><TD>orava</TD><TD>squirrel</TD><TD><I>Sciurus vulgaris</I></TD></TR> <TR><TD>susi</TD><TD>wolf</TD><TD><I>Canis lupus</I></TD></TR> </TABLE>Notice that some table elements in the example contain text markup; in this case, there is a specific reason for using the I element.
In the simplest case you can just write a TABLE element (with attributes defaulted) which contains a single row which contains two data cells, each of which contains a paragraph.
In a more general case, you should divide the parallel texts into logical parts, such as paragraphs, and make each part a cell of the table. This may require a lot of work (unless you have a suitable program to do the job), since you must take care of "merging" the text: after the first part of the first text, you must have the first part of the second text, etc.
The following example presents a passage from the Bible in three versions and translations:
<TABLE> <CAPTION><STRONG>The beginning of Genesis in three languages</STRONG></CAPTION> <TR ALIGN=LEFT VALIGN=TOP> <TH><TH>Latin (Vulgate)</TH><TH>English (King James version)</TH> <TH>Finnish (1992 version)</TH> </TR><TR ALIGN=LEFT VALIGN=TOP> <TH>1</TH> <TD>In principio creavit Deus caelum et terram.</TD> <TD>In the beginning God created the heaven and the earth.</TD> <TD>Alussa Jumala loi taivaan ja maan.</TD> </TR><TR ALIGN=LEFT VALIGN=TOP> <TH>2</TH> <TD>Terra autem erat inanis et vacua et tenebrae super faciem abyssi et spiritus Dei ferebatur super aquas.</TD> <TD>And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.</TD> <TD>Maa oli autio ja tyhjä, pimeys peitti syvyydet, ja Jumalan henki liikkui vetten yllä. </TD> </TR><TR ALIGN=LEFT VALIGN=TOP> <TH>3</TH> <TD>Dixitque Deus "Fiat lux" et facta est lux.</TD> <TD>And God said, Let there be light: and there was light.</TD> <TD>Jumala sanoi: "Tulkoon valo!" Ja valo tuli.</TD> </TR></TABLE>Notice that the ALIGN and VALIGN attributes can be essential for achieving good rendering. Browsers cannot know the nature of tables from their contents, so there are situations where the document author may need to control formatting issues like alignment.
Using a TABLE element for a definition list is perhaps not an intended use of that element but it is often useful, especially since the author can control things like alignment and use of borders. Consult the document Examples of various list elements in HTML for a very simple example of presenting a definition list as a table with default attribute settings. Usually you probably want the "definition terms" to be left-aligned, as in the following example:
<TABLE> <CAPTION>The first three letters of the Greek alphabet</CAPTION> <TR><TH ALIGN=LEFT>alpha</TH> <TD> the first letter of the Greek alphabet </TD> </TR> <TR><TH ALIGN=LEFT>beta</TH> <TD> the second letter of the Greek alphabet </TD> </TR> <TR><TH ALIGN=LEFT>gamma</TH> <TD> the third letter of the Greek alphabet. </TD> </TR> </TABLE>
For numerical tables, proper alignment is usually crucial for easily readable rendering. (It is in a sense a structural feature, since it relates to the comparability of items of a column.)
Integer values in a column should be right aligned. This is easy to achieve in principle. There are two alternatives:
Values containing a decimal point (or, in many languages, a decimal comma) should be aligned according to that separator, but unfortunately this is not possible in HTML 3.2. (There are suggested ways of expressing such requests, but currently there is little if any support for them.) One solution is to present such values so that there is the same number of digits to the right of the decimal point in every value in a column, and use ALIGN=RIGHT.
However, the rendering might be unsatisfactory if numbers are presented using a proportional font so that digits are of essentially different sizes. It is possible but tedious to overcome this by putting the data in each numerical cell within a TT element. (Notice that it is not legal for a TT element to contain a TABLE element!)
The following example contains first a hand-formatted table presented using the PRE element, then the same data using a TABLE element. In general, it takes more work and care to use a TABLE element but the result is often much better.
Measurement results: <PRE> time temperature pressure 12:00 26 12.8 12:15 22.5 9.8 12:30 11 1.65 12:45 3.3 0.03 13:00 0.05 0.002 </PRE> <TABLE> <CAPTION>Measurement results</CAPTION> <TR><TH>time</TH><TH>temperature</TH><TH>pressure</TH></TR> <TR ALIGN=RIGHT><TD>12:00 </TD><TD>26.00 </TD><TD>12.800 </TD></TR> <TR ALIGN=RIGHT><TD>12:15 </TD><TD>22.50 </TD><TD> 9.810 </TD></TR> <TR ALIGN=RIGHT><TD>12:30 </TD><TD>11.00 </TD><TD> 1.650 </TD></TR> <TR ALIGN=RIGHT><TD>12:45 </TD><TD> 3.30 </TD><TD> 0.030 </TD></TR> <TR ALIGN=RIGHT><TD>13:00 </TD><TD> 0.05 </TD><TD> 0.002 </TD></TR> </TABLE>
The index is implemented in HTML using normal
links, eg
<A HREF="af.html">Afghanistan</A>
What we will discuss here is how to present the link names, or some
other pieces of text, as a list, table, or some other structure.
If you only read HTML specifications, the obvious answer is to use the DIR or MENU construct. However, as mentioned and exemplified in the general discussion of lists, this is not practically feasible. Thus, if we prefer having the menu in multicolumn format, as we usually do, we must use other constructs.
One possibility is to format the menu by hand and enclose it into a PRE element. If the menu items are link texts, you should first format it as text only, then add the anchor (A) tags, since adding them obscures the layout. For clarity, therefore, the following example is presented without links (unlike the other alternatives):
<PRE> Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla Antarctica Antigua and Barbuda Arctic Ocean Argentina Armenia </PRE>Another possibility, which should be the normal one, is to present the items simply as a text paragraph, using eg a blank or a blank and a comma as separator. This means that the browser takes care of dividing the text into lines and the presentation is very compact:
<BASE HREF="http://www.odci.gov/cia/publications/nsolo/factbook/"> <P> <A HREF="af.htm">Afghanistan</A>, <A HREF="al.htm">Albania</A>, <A HREF="ag.htm">Algeria</A>, <A HREF="aq.htm">American Samoa</A>, <A HREF="an.htm">Andorra</A>, <A HREF="ao.htm">Angola</A>, <A HREF="av.htm">Anguilla</A>, <A HREF="ay.htm">Antarctica</A>, <A HREF="ac.htm">Antigua and Barbuda</A>, <A HREF="ocat.htm">Arctic Ocean</A>, <A HREF="ar.htm">Argentina</A>, <A HREF="am.htm">Armenia</A> </P>Of course, it is possible to force line breaks by using a BR element (eg to make a change in the initial letter cause a new line in an example like above). If you think the items are not distinguishable enough in the rendering, consider prefixing each item with a special character like * (and using just spaces as separator).
However, if for some reason the presentation must be such that all items occupy the same amount of space, then one can either use the PRE method described above or take the effort of designing a suitable TABLE element. Example:
<BASE HREF="http://www.odci.gov/cia/publications/nsolo/factbook/"> <TABLE><TR> <TD WIDTH=160><A HREF="af.htm">Afghanistan</A></TD> <TD WIDTH=160><A HREF="al.htm">Albania</A></TD> <TD WIDTH=160><A HREF="ag.htm">Algeria</A></TD> <TD WIDTH=160><A HREF="aq.htm">American Samoa</A></TD> </TR><TR> <TD WIDTH=160><A HREF="an.htm">Andorra</A></TD> <TD WIDTH=160><A HREF="ao.htm">Angola</A></TD> <TD WIDTH=160><A HREF="av.htm">Anguilla</A></TD> <TD WIDTH=160><A HREF="ay.htm">Antarctica</A></TD> </TR><TR> <TD WIDTH=160><A HREF="ac.htm">Antigua and Barbuda</A></TD> <TD WIDTH=160><A HREF="ocat.htm">Arctic Ocean</A></TD> <TD WIDTH=160><A HREF="ar.htm">Argentina</A></TD> <TD WIDTH=160><A HREF="am.htm">Armenia</A></TD> </TR></TABLE>Alternatively, you might wish to consider the effect of using a table with borders.
Notice that this solution is rather unclean. It involves a TABLE structure where the division into lines is (normally) made for layout purposes only, and adding new items usually requires complete restructuring of the table. You typically need to insert WIDTH attributes to ensure that table columns are of the same width, and the specification is inherently device-dependent since it must be given in pixels. In particular, the presentation might not be the desired one of the physical font size in pixels differs too much from what you think it should be.
Thus, this approach should be avoided in general. Hopefully future browsers will support the UL element in a more advanced way, automatically selecting a compact multicolumn presentation when applicable, or at least support the DIR element in the intended way.
neut. masc. fem.
nom. id is ea
acc. id eum eam
gen. eius eius eius
dat. ei ei ei
abl. eo eo ea
Obviously this calls for using a table in HTML, and
using the above-explained constructs you can write a
simple table presentation for the data.
However, if you would like to make it more explicit that there
are identical entries in adjacent cells, you can use the
ROWSPAN and COLSPAN attributes as follows:
<TABLE BORDER=1 ALIGN=CENTER CELLPADDING=3> <CAPTION>Declination of <I>is</I> in singular</CAPTION> <TR><TH></TH><TH>neuter</TH><TH>masc.</TH><TH>fem.</TH></TR> <TR><TH>nom.</TH><TD ROWSPAN=2 VALIGN=MIDDLE><I>id</I></TD> <TD><I>is</I></TD><TD><I>ea</I></TD></TR> <TR><TH>acc.</TH><TD><I>eum</I></TD><TD><I>eam</I></TD></TR> <TR><TH>gen.</TH><TD COLSPAN=3 ALIGN=CENTER><I>eius</I></TD></TR> <TR><TH>dat.</TH><TD COLSPAN=3 ALIGN=CENTER><I>ei</I></TD></TR> <TR><TH>abl.</TH><TD COLSPAN=2 ALIGN=CENTER><I>eo</I></TD> <TD><I>ea</I></TD></TR> </TABLE>For example, the first cell is specified to have ROWSPAN=2, which effectively means that two adjacent cells in the same column are combined into one cell. Notice that when writing the HTML code for the next row (the second TR element) we simply leave out a cell element corresponding to the location which has already been taken into use.
Nested tables easily become confusing. Moreover, there are browsers which cannot handle nested tables in general or which get confused with complicated nested tables. Of course, nested tables can be the natural way of expressing information, when it is logically an array of something which may in turn be an array.
Basically you just need to be very careful in writing HTML code for nested tables. No new elements or other features are needed, just a combination of those which have already been described. But due to deep nesting one easily makes mistakes, and the results can be really messy, and locating the error may take time.
The simplest case is probably a table with a single row consisting of two elements, each of which is a table. This might be used for presenting two similar tables in parallel for comparison. To proceed with our grammatical example, here is a table containing two tables, one for declination in singular and one for declination in plural:
<TITLE>tbl</TITLE> <TABLE ALIGN=CENTER> <CAPTION>Declination of <I>is</I></CAPTION> <TR><TD> <TABLE BORDER=1 ALIGN=CENTER CELLPADDING=3> <CAPTION>Singular</CAPTION> <TR><TH></TH><TH>neuter</TH><TH>masc.</TH><TH>fem.</TH></TR> <TR><TH>nom.</TH><TD ROWSPAN=2 VALIGN=MIDDLE><I>id</I></TD> <TD><I>is</I></TD><TD><I>ea</I></TD></TR> <TR><TH>acc.</TH><TD><I>eum</I></TD><TD><I>eam</I></TD></TR> <TR><TH>gen.</TH><TD COLSPAN=3 ALIGN=CENTER><I>eius</I></TD></TR> <TR><TH>dat.</TH><TD COLSPAN=3 ALIGN=CENTER><I>ei</I></TD></TR> <TR><TH>abl.</TH><TD COLSPAN=2 ALIGN=CENTER><I>eo</I></TD> <TD><I>ea</I></TD></TR> </TABLE> </TD> <TD> <TABLE BORDER=1 ALIGN=CENTER CELLPADDING=3> <CAPTION>Plural</CAPTION> <TR><TH></TH><TH>neuter</TH><TH>masc.</TH><TH>fem.</TH></TR> <TR><TH>nom.</TD></TH><TD ROWSPAN=2 VALIGN=MIDDLE><I>ea</I></TD> <TD><I>ii (ei)</I></TD><TD><I>eae</I></TD></TR> <TR><TH>acc.</TH><TD><I>eos</I></TD><TD><I>eas</I></TD></TR> <TR><TH>gen.</TH><TD COLSPAN=2 ALIGN=CENTER><I>eorum</I></TD> <TD><I>earum</I></TD></TR> <TR><TH>dat.</TH><TD COLSPAN=3 ROWSPAN=3 ALIGN=CENTER VALIGN=MIDDLE> <I>iis (eis)</I></TD></TR> <TR><TH>abl.</TH></TR> </TABLE> </TD> </TABLE>Notice the explicit use of end tags like </TD>. The same code with omissible tags omitted is equivalent according to HTML 3.2 specification, but Netscape has a bug which can make it present a nested table incorrectly in the absence of end tags.
The default alignment is the following:
There is no way to set different defaults for an entire table. (Although the TABLE element accepts an ALIGN attribute, it affects the positioning of the entire table!)
However, you can use the ALIGN and VALIGN attributes in TH and TD elements to set the alignments for an individual cell, and you can use the same attribute in a TR element to set the alignment defaults for the cells within that element (ie within one row); naturally, such defaults can be overridden in individual elements.
The possible values of ALIGN (in TH, TD and TR elements) are LEFT, RIGHT, and CENTER, for aligning the contents of a cell vertically with respect to the left, center or right within the space for the cell. Notice that when aligning to the left or right, there can still be some space between the upper or lower border of the cell, depending on the setting of the CELLPADDING attribute of the enclosing TABLE element.
The possible values of VALIGN (in TH, TD and TR elements) are TOP, MIDDLE, and BOTTOM, for aligning the contents of a cell vertically with respect to the top, center or bottom of the space for the cell. As stated above, the default is VALIGN=MIDDLE. Notice that when VALIGN=TOP or VALIGN=BOTTOM is used, there can still be some space between the upper or lower border of the cell, depending on the setting of the CELLPADDING attribute of the enclosing TABLE element.
The short answer is: Don't. When necessary, use logical markup for text elements within tables as well as elsewhere. (Previous discussion contained a simple example of this.)
Assuming that you really need to designate font face, size and color (or just insist on doing so), the laborious way of doing it elementwise is the only portable way. Here portable means that you can, with some confidence, expect the HTML code to work on most browsers (assuming that they have table support at all, of course). This is not just a standards issue. In particular, in Netscape the BASEFONT element does not affect text in tables (it is disputable whether it should, according to the standard).
To summarize the situation, as regards to portable solutions in the above-mentioned sense:
Style sheets provide tools for affecting the rendering in a rather detailed manner, but support for them in browsers is still under development.
The basic idea of style sheets is to provide tools for specifying features of the visible (or audible) representation of HTML documents without introducing new HTML tags and attributes for the purpose. The presentation style is specified in a manner which allows several style specifications (by the author and by users, as well as browser defaults) to be taken into account when rendering a document. This will allow control over indentation, colors, fonts, etc in a sophisticated manner. For more information about style sheets in general, consult the W3C pages on style sheets and WDG pages on style sheets.
Almost at the same time as the HTML 3.2 Reference Specification was accepted as a W3C Recommendation, a recommendation with similar status was accepted concerning style sheets: Cascading Style Sheets, level 1, abbreviated CSS1. The two recommendations are, however, separate in the sense that the combination of style sheet specifications with HTML documents has not been defined exactly. In particular, CSS1 mentions the ID and CLASS attributes for selecting specific pieces of text, but these attributes are not in HTML 3.2. The same applies to attributes of STYLE element and the proposed SPAN element.
The HTML 3.2 language provides two ways of referring to style sheets in HTML documents:
Additional methods of referring to style sheets in HTML will probably be possible, and some of them are already supported. For a short general discussion, see Linking Style Sheets to HTML by WDG. There is also a W3C Working Draft HTML3 and Style Sheets which discusses these issues.
An HTML 3.2 conforming browser need not support style sheets in any way (except by recognizing the STYLE element and hiding its contents). However, there is increasing support for some features of CSS1 in browsers.
The structure of the tag descriptions is as follows:
This presentation does not discuss the XMP, LISTING, and PLAINTEXT elements. They are now deprecated (obsolete), and PRE should be used instead.
In principle, the A element can also be used for some other purposes which are currently of little practical value.
The user may select the anchor text (in a browser-dependent manner, using eg arrow keys for moving the cursor and enter key for selecting, or the mouse for moving the cursor and a mouse button click for selecting). In that case the document or location in a document as specified by the target, if existent and accessible, will be fetched and presented to the user. A browser may allow the user to select whether the document is to be displayed in the same or in another window on the screen.
The visual look of anchor texts is settable by user options in many browsers. It can depend on whether the target has been visited by the user or not. It is also affected by eventual LINK and VLINK attributes in a BODY element. When a document is printed, anchor texts might be, depending on the browser and its settings, eg normal text or underlined text or footnotes (indicating the target URLs) might be attached to them.
If anchor text is (or contains) an IMG element, a browser generally indicates the image as a link by drawing a colored (typically blue) border around the image. The width (and existence) of such a border can be controlled by the BORDER attribute of the IMG element.
Other A elements than those containing an HREF attribute have no effect on the rendering of a document.
or
<A NAME="name"></A>
| attribute name | possible values | meaning | notes |
|---|---|---|---|
| NAME | string | a name for a link end | must be unique within the document; case sensitive |
| HREF | URL | network address for the linked resource | could be another HTML document, a PDF file, an image etc |
| REL | string | the forward relationship also known as the "link type"; cf. LINK with REL | in principle, could be used by browsers in several ways, eg to determine to how to deal with the linked resource when printing out a collection of linked resources |
| REV | string | the reverse relationship: | a link from document A to document B with REV=relation expresses the same relationship as a link from B to A with REL=relation. |
| TITLE | string | a title for the linked resource | advisory |
The value of a TITLE attribute might be used eg
mailto: URL
<P>A hyperlink referring to a document in the same directory as the current one: <A HREF="ADDRESS.html">Examples of using ADDRESS tag</A>. <P>A hyperlink referring to a document elsewhere: <A HREF="http://www.hut.fi/english.html">HUT</A>. <P>A hyperlink in which the link text contains markup: <a href="http://www.iki.fi/oa/HTML/"><cite>The HTML test set</cite></a> <p>A hyperlink referring to a label in the same document: <A HREF="#final">final example</A>. <P>A hyperlink referring to a label in another document: <A HREF="http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimerP2.html#UR"> URL info in HTML Primer</A> <P>A link to an image: <A HREF="http://www.hut.fi/~jkorpela/perhe.jpg" TITLE="Yucca's family picture, by Minna">a family picture</A>. <P><A NAME="final">Finally, this is just text to which you can refer with a hyperlink.</A>
As regards to ISMAP, see the IMG examples.
It depends on the browser how references to resources like audio and video files are handled. If a browser supports them, it typically supports some particular repertoire of file formats by initiating ("launching") a separate program for "playing" the file. (It might use a distinct program for each file format or a general-purpose media player program for a large set of formats.) Thus, for example, in order to listen to .au files the user needs, in addition to suitable hardware installed, a program which can produce sounds according to specifications in .au format, and user's browser must have settings which instruct it to launch that player program for .au files.
Don't use anchor texts like Click here. They look extremely stupid eg in a paper copy of a document. Warren Steel says in Hints for Web Authors:
You don't need to say "Click here for information on our graduate programs;" just insert the link into what you were saying: "Our excellent graduate programs ..." Links to large files or unusual formats should be so marked, perhaps in a parenthetical note. "Our stirring fight song (400k .au) ..."
You can make plain text and binary files of various formats available to other people alongside with your HTML files, and you can tell about them and provide links to them in your HTML documents. However, your server may not support the file format involved, so try to use some widely known format and corresponding file name suffix; see also WDG Web Authoring FAQ, questions 5 and 6.
Of course, such links will be useful only to such people who can use a program which processes the particular file format in a meaningful way. Processing might consist of displaying an image or animation, playing music, or doing some spreadsheet calculations, for example. This might take place within a browser or in a separate program launched automatically by a browser (when programmed to do so), or "offline" so that the Web browser is used just to retrieve the file and to save it into a local file, to be opened later by an application.
Example:
The budget proposal is available as a <A HREF="budget.zip">zipped Excel file</A>People using computers on which Excel is available will then be able to view your document on it. It depends on browser and its settings how smoothly this can take place. Of course they also need some program (eg WinZip) for unfolding a .zip file, but such software exists for almost all environments and should be installed anyway. The reason for my suggesting the use of zipped format is twofold:
It is a rather common error to omit quotes or the closing quote in an HREF attribute. Some browsers are permissive, others may get very confused, so that the link may not work at all.
You cannot nest A elements, but you can write a
dual-purpose A element
which has both an HREF and a NAME attribute, eg.
<A NAME="foo" HREF="#bar">zap</A>
It is not obvious what exactly is the entity named in A NAME element. The most natural interpretation seems to be that it is a part of the document, namely the part between the start and end tags. However, notice that only text elements are allowed within the contents and that most browsers seem to interpret things so that an A NAME element just names a location (a point) in the document, namely the location of the start tag, leaving the position of the end tag meaningless. (However, an end tag </A> is obligatory!)
You can use
a mailto: URL in the HREF attribute.
Example:
My E-mail address is <A HREF="mailto:Jukka.Korpela@hut.fi"> Jukka.Korpela@hut.fi</A>.(Please avoid constructs like
<A HREF="mailto:address">Mail me!</A>
which are useless eg when reading a paper copy of the document.)
Selecting such a link typically means that the browser invokes an
E-mail composer, with the recipient field prefilled.
It is not possible to prefill other fields in any reliable way.
Use forms
instead of simple mailto: links
if you want to prefill something.
<ADDRESS> <P> Jukka.Korpela@hut.fi </P> </ADDRESS>One idea is to provide just the author's name but so that it is a link to a home page containing more information. This is typically suitable for short documents to be viewed on the screen only.
<ADDRESS> <P> <A HREF="http://www.hut.fi/~jkorpela/">Jukka Korpela</A> </P> </ADDRESS>A longer, more typical example:
<ADDRESS> <P> Jukka Korpela, M.S. (Math.)<BR> Helsinki University of Technology Computing Centre<BR> FIN-02150 Espoo<BR> Finland </P><P> Telephone International +358 9 451 4319 </P><P> Electronic mail (Internet): <A HREF="mailto:Jukka.Korpela@hut.fi">Jukka.Korpela@hut.fi</A><BR> WWW home page: <A HREF="http://www.hut.fi/%7Ejkorpela/">http://www.hut.fi/%7Ejkorpela/</A> </P> </ADDRESS>