Tutorial
NITF is XML
NITF is an XML-conforming vocabulary. This means that NITF uses the constructs standardized by XML to describe elements of content within a document, and the descriptive attributes of that content.
For example, if a publisher wants to use NITF to distinguish a company name from the surrounding text, the <org> element would be used:
Today, <org>Microsoft</org> announced the release of....
If the publisher wants to embed Microsoft's NASD stock symbol, the value and idsrc attributes of the <org> element would be used:
Today, <org value="MSFT" idsrc="NASD">Microsoft</org>
announced the release of....
For more information on how XML works, visit our page listing XML resources.
This tutorial covers the most widely used sections of NITF. For details about each element, consult the NITF documentation page. Within this tutorial, elements and attributes are displayed in purple. Comments are included within the XML .
Basic Structure of NITF
NITF is divided into two sections, the <head> and the <body>.
<nitf>
<head>
</head>
<body>
</body>
</nitf>
In this respect, NITF is just like HTML -- But that's where the resemblance ends. Web authors use HTML to describe the display of their pages. NITF, on the other hand, is designed to describe the substance of news articles.
NITF <body>
The NITF <body> element is itself divided into three sections:
<body>
<body.head>
</body.head>
<body.content>
</body.content>
<body.end>
</body.end>
</body>
Here is an example of a simple NITF article:
<nitf>
<head>
</head>
<body>
<body.head>
<hedline>
<hl1>This is the main headline</hl1>
</hedline>
<byline>
By Joseph Q. Reporter
</byline>
</body.head>
<body.content>
<p>This is the content of the first
paragraph of the article.</p>
<p>This is the content of the second
paragraph of the article.</p>
</body.content>
</body>
</nitf>
Containers within <body.content>
Within body.content, NITF allows for several other types of text container at the same level as the paragraph. These include:
- <table>
- <list>
- <block> (groups together related content, for example: into a sidebar)
- <hl2> (a sub-headline)
- <media> (a photo or video clip, an associated caption, etc.)
Enriched Text
NITF contains many elements that can distinguish content appearing within headlines, bylines, tables, lists, and paragraphs. These "enriched-text" elements allow the publisher to index and
highlight documents better, and add hyperlinks to richer sources of related and archival information.
Attributes of these enriched-text elements allow publishers to store -- inline with the text -- descriptive codes. These attributes can provide consistency and dependability among authors, writing styles, and languages.
Enriched-text elements provide facilities for marking up:
- people
- places
- companies and organizations
- values (such as numbers, money, and dates)
- titles and functions
- emphasized words
- hyperlinks
Here is a more enriched and expanded example of an NITF article:
<nitf>
<head>
</head>
<body>
<body.head>
<hedline>
<hl1>This is the main headline</hl1>
<hl2>This is a sub-headline</hl2>
</hedline>
<byline>
By <person value="JQR-412"
idsrc="newspub-corp">
Joseph Q. Reporter</person>
</byline>
</body.head>
<body.content>
<p>Today, <org value="MSFT"
idsrc="NASD">Microsoft</org>
announced the release of....</p>
<p>That company is <em>so</em> big
that they....</p>
<media media-type="image">
<media-reference
mime-type="image/jpeg"
source="gates.jpg"
alternate-text="Bill
Gates at the podium."
>
</media-reference>
<media-caption>
Gates makes speech.
</media-caption>
</media>
</body.content>
</body>
</nitf>
More details on what is allowed in the NITF <body> element can be learned from our documentation and examples.
NITF <head>
The <head> element contains metadata that describes the article as a whole. It is divided into five main sections:
<head>
<title>
</title>
<tobject>
</tobject>
<docdata>
</docdata>
<pubdata/>
<revision-history>
</revision-history>
</head>
NITF <title>
The <title> element holds a plain-text string of characters that sums up, usually in one line, what the article covers. Most often, the <title> element is just the headline of the article, with any line breaks or enriched-text elements stripped.
Many NITF processing systems display a list of <title> elements when users search an NITF archive. Hence, it is important that this element be included.
Here is an example:
<title>
President's State of the Union Address Makes Big Splash
</title>
NITF <tobject>
The <tobject> section (which stands for "topic object") contains elements that distinguish feature news stories from, say, news analyses or obituaries. It also holds elements that describe what subject the article is about, via a three-level Subject Codes taxonomy.
The IPTC has issued controlled Subject Code vocabularies that it strongly recommends the publisher use when including <tobject> elements. These vocabularies are available in several languages, and the IPTC is open to requests for additions to the list.
Here is an example:
<tobject>
<tobject.property
tobject.property.type="Wrapup"
/>
<tobject.subject
tobject.subject.refnum="4008006"
// required
tobject.subject.code="FIN"
// optional three-letter code for Level 1
tobject.subject.type="Economy, Business & Finance"
// optional display-name for Level 1
tobject.subject.matter="Macro Economics"
// optional display-name for Level 2
tobject.subject.detail="Foreign Exchange Markets"
// optional display-name for Level 3
/>
</tobject>
NITF <docdata>
The <docdata> element (which stands for "document data") holds elements that identify, date, and register the article. With NITF 3.1, three news-management attributes were added to the docdata element. It breaks out as follows:
<docdata
management-status="embargoed"
management-doc-idref="4000.21.a"
management-idref-status="canceled"
>
<correction
info="fixes typo in paragraph 3"
id-string="4000.21.a"
/>
<doc-id
id-string="4000.21.b"
/>
<del-list>
<from-src src-name="ScreamingMedia"/>
</del-list>
<urgency
ed-urg="1"
/>
<fixture
fix-id="Investigative Business Feature"
/>
<date.issue
norm="20000522T090500-0500"
/>
<date.release
norm="20000523T090500-0500"
/>
<du-key
generation="3"
part="2"
version="12"
key="microsoft-trial"
/>
<doc.copyright
year="2000"
holder="The Toronto Globe & Mail"
/>
<key-list>
<keyword key="software"/>
<keyword key="antitrust"/>
</key-list>
</docdata>
NITF <pubdata>
The <pubdata> element (which stands for "publication data") has attributes that identify and date and register the article. It breaks out as follows:
<pubdata
type="print"
name="The Toronto Globe and Mail"
date.publication="20000523T0000-0500"
/>
NITF <revision-history>
The <revision-history> elements provide a creative history of the document. It breaks out as follows:
<revision-history
name="Pat Q. Editor"
function="editor"
norm="20000520T0843-0005"
comment="fixed rampant typos"
/>
Summary
You are invited to review our documentation and sample articles.
Send comments to the
|