Tutorial

NITF is XML

NITF is an XML-conforming vocabulary. This means that NITF uses the constructs standardized by XML to describe elements of content within a document, and the descriptive attributes of that content.

For example, if a publisher wants to use NITF to distinguish a company name from the surrounding text, the <org> element would be used:

Today, <org>Microsoft</org> announced the release of....

If the publisher wants to embed Microsoft's NASD stock symbol, the value and idsrc attributes of the <org> element would be used:

Today, <org value="MSFT" idsrc="NASD">Microsoft</org>
announced the release of....

For more information on how XML works, visit our page listing XML resources.

This tutorial covers the most widely used sections of NITF. For details about each element, consult the NITF documentation page. Within this tutorial, elements and attributes are displayed in purple. Comments are included within the XML <!-- comment markers -->.


Basic Structure of NITF

NITF is divided into two sections, the <head> and the <body>.

<nitf>
	<head>
		<!--
		Metadata about the document as a whole
		goes here.
		-->
	</head>

	<body>
		<!--
		Contents for direct display to the user
		go here.
		-->
	</body>
</nitf>

In this respect, NITF is just like HTML -- But that's where the resemblance ends. Web authors use HTML to describe the display of their pages. NITF, on the other hand, is designed to describe the substance of news articles.


NITF <body>

The NITF <body> element is itself divided into three sections:

<body>
	<body.head>
		<!--
		This section holds core news components,
		such as headline and byline, that are commonly
		displayed before the text of an article.
		-->
	</body.head>

	<body.content>
		<!--
		This section holds the article,
		generally consisting of paragraphs of text,
		but perhaps with embedded tables, lists,
		photos, and other items. These can also be
		referenced by specifying a location on the
		Internet or another computer.
		-->
	</body.content>

	<body.end>
		<!--
		This section holds core news components
		that are commonly displayed at the end of
		an article.
		-->
	</body.end>
</body>

Here is an example of a simple NITF article:

<nitf>
	<head>
	</head>
	<body>
		<body.head>
			<hedline>
				<hl1>This is the main headline</hl1>
			</hedline>

			<byline>
				By Joseph Q. Reporter
			</byline>
		</body.head>

		<body.content>
			<p>This is the content of the first
			paragraph of the article.</p>

			<p>This is the content of the second
			paragraph of the article.</p>
		</body.content>
	</body>
</nitf>


Containers within <body.content>

Within body.content, NITF allows for several other types of text container at the same level as the paragraph. These include:

  • <table>
  • <list>
  • <block> (groups together related content, for example: into a sidebar)
  • <hl2> (a sub-headline)
  • <media> (a photo or video clip, an associated caption, etc.)


Enriched Text

NITF contains many elements that can distinguish content appearing within headlines, bylines, tables, lists, and paragraphs. These "enriched-text" elements allow the publisher to index and highlight documents better, and add hyperlinks to richer sources of related and archival information.

Attributes of these enriched-text elements allow publishers to store -- inline with the text -- descriptive codes. These attributes can provide consistency and dependability among authors, writing styles, and languages.

Enriched-text elements provide facilities for marking up:

  • people
  • places
  • companies and organizations
  • values (such as numbers, money, and dates)
  • titles and functions
  • emphasized words
  • hyperlinks

Here is a more enriched and expanded example of an NITF article:

<nitf>
	<head>
	</head>
	<body>
		<body.head>
			<hedline>
				<hl1>This is the main headline</hl1>
				<hl2>This is a sub-headline</hl2>
			</hedline>
			<byline>
				By <person value="JQR-412"
				idsrc="newspub-corp">
				Joseph Q. Reporter</person>
			</byline>
		</body.head>

		<body.content>
			<p>Today, <org value="MSFT"
			idsrc="NASD">Microsoft</org>
			announced the release of....</p>
			<p>That company is <em>so</em> big
			that they....</p>
			<media media-type="image">
				<media-reference
					mime-type="image/jpeg"
					source="gates.jpg"
					alternate-text="Bill
					Gates at the podium."
					>
				</media-reference>
				<media-caption>
					Gates makes speech.
				</media-caption>
			</media>
		</body.content>
	</body>
</nitf>

More details on what is allowed in the NITF <body> element can be learned from our documentation and examples.


NITF <head>

The <head> element contains metadata that describes the article as a whole. It is divided into five main sections:

<head>
	<title>
		<!--
		A short, plain-text title of the document.
		Often used for display in a listing of
		search-results.
		-->
	</title>

	<tobject>
		<!--
		Codes for the type of article, and the
		subjects covered by the article.
		-->
	</tobject>

	<docdata>
		<!--
		Contains metadata about this document in
		particular. Includes publication date, an
		urgency rating, the column name (if it's
		a regular feature), series name
		(if it's part of a series of articles),
		and information on copyright ownership,
		distribution rights, and basic
		news management features for
		updating and cancelling previously-
		published documents.
		-->
	</docdata>

	<pubdata/>
		<!--
		Information on where and how this article
		was published.
		-->

	<revision-history>
		<!--
		Specifies who revised the document, and why.
		-->
	</revision-history>
</head>


NITF <title>

The <title> element holds a plain-text string of characters that sums up, usually in one line, what the article covers. Most often, the <title> element is just the headline of the article, with any line breaks or enriched-text elements stripped.

Many NITF processing systems display a list of <title> elements when users search an NITF archive. Hence, it is important that this element be included.

Here is an example:

<title>
	President's State of the Union Address Makes Big Splash
</title>


NITF <tobject>

The <tobject> section (which stands for "topic object") contains elements that distinguish feature news stories from, say, news analyses or obituaries. It also holds elements that describe what subject the article is about, via a three-level Subject Codes taxonomy.

The IPTC has issued controlled Subject Code vocabularies that it strongly recommends the publisher use when including <tobject> elements. These vocabularies are available in several languages, and the IPTC is open to requests for additions to the list.

Here is an example:

<tobject>
	<tobject.property
		tobject.property.type="Wrapup"
		/>

	<tobject.subject
		tobject.subject.refnum="4008006"
			// required

		tobject.subject.code="FIN"
			// optional three-letter code for Level 1

		tobject.subject.type="Economy, Business & Finance"
			// optional display-name for Level 1

		tobject.subject.matter="Macro Economics"
			// optional display-name for Level 2

		tobject.subject.detail="Foreign Exchange Markets"
			// optional display-name for Level 3

		/>
</tobject>


NITF <docdata>

The <docdata> element (which stands for "document data") holds elements that identify, date, and register the article. With NITF 3.1, three news-management attributes were added to the docdata element. It breaks out as follows:

<docdata
	management-status="embargoed"
	management-doc-idref="4000.21.a"
	management-idref-status="canceled"
	>
	<!--
	management-status can have one of the following values:
	
		usable		

		Content may be published without restriction.
		If a management-doc-idref attribute is also
		included, then this message indicates that
		document is now usable. 
				
		embargoed	

		May not be published until either docdata's
		date.release timestamp is passed (see below),
		or until a message with management-status=
		"usable" is published, pointing to this 
		document.
				
		withheld	

		Indicates that the current document may not 
		be published until further notice. 
		If a management-doc-idref attribute is also
		included, then this message indicates that
		document is now not usable.
				
		canceled	

		Publisher must take immediate action to withdraw 
		or retract the document referenced in 
		management-doc-idref.

	management-doc-idref refers to the docdata's doc-id of a 
		previously published document.
		
	management-idref-status indicates what should be the new 
		status of the aforementioned document referenced 
		in management-doc-idref
	-->
	<correction
		info="fixes typo in paragraph 3"
		id-string="4000.21.a"
		/>
		<!--
		Indicates this article was issued as a
		correction, and which document it corrected.
		-->

	<doc-id
		id-string="4000.21.b"
		/>
		<!--
		ID of this document.
		-->

	<del-list>
		<!--
		Indicates who has delivered / distributed
		this article.
		-->
		<from-src src-name="ScreamingMedia"/>
	</del-list>

	<urgency
		ed-urg="1"
		/>
		<!--
		Highest.
		-->

	<fixture
		fix-id="Investigative Business Feature"
		/>
		<!--
		The name given to a regular feature.
		-->

	<date.issue
		norm="20000522T090500-0500"
		/>
		<!--
		Date the item was published.
		
		The IPTC  recommends using these ISO date format:
			YYYYMMDDTHHMMSS±HHMM (preferred) 
			YYYYMMDDTHHMMSSZ (alternative)
		When no time is available, this ISO date format
			is recommended:
			YYYYMMDD000000±HHMM
		-->

	<date.release
		norm="20000523T090500-0500"
		/>
		<!--
		Works in correspondence with docdata's 
		management-status="embargoed" 
		attribute. Earliest date at
		which the item may be published.
		-->

	<du-key
		generation="3"
		part="2"
		version="12"
		key="microsoft-trial"
		/>
		<!--
		Provides a mechanism for grouping related
		stories.
		-->

	<doc.copyright
		year="2000"
		holder="The Toronto Globe & Mail"
		/>
		<!--
		Who owns the copyright to this document.
		-->

	<key-list>
		<!--
		Holds a set of keywords relevant to the article.
		-->
		<keyword key="software"/>
		<keyword key="antitrust"/>
	</key-list>
</docdata>


NITF <pubdata>

The <pubdata> element (which stands for "publication data") has attributes that identify and date and register the article. It breaks out as follows:

<pubdata
	type="print"
	name="The Toronto Globe and Mail"
	date.publication="20000523T0000-0500"
	/>


NITF <revision-history>

The <revision-history> elements provide a creative history of the document. It breaks out as follows:

<revision-history
	name="Pat Q. Editor"
	function="editor"
	norm="20000520T0843-0005"
	comment="fixed rampant typos"
	/>


Summary

You are invited to review our documentation and sample articles. Send comments to the

© 2007 IPTC, International Press Telecommunications Council