Help > Setting up the system > Solutions > Document specifications and related settings > Fields > Syntax for format strings in field specifications

Syntax for format strings in field specifications

This topic describes how to type format strings when:

Type the format string as accurately as possible. See the Important notes below regarding predefined sets, special occurrences, and examples of complex expressions.

Symbol

Represents

Description

{ }

Ranges

(The default setting is for a character/number to occur exactly once. Using { } creates a custom range.)

Examples: 0{5}, 0{1-5}

Meaning: A range with an exact number of occurrences {n} or a varied but limited number {n-m}

Possible matches: 00000 and 0, 00, 000, 0000, 00000 respectively

A

Uppercase alphabetic characters

(Predefined set)

Example: A{2-5}

Meaning: An alphabetic field containing 2 to 5 uppercase characters

Possible matches: AB, XYZ, ABDDE, etc.

a

Lowercase alphabetic characters

(Predefined set)

Example: a{1-3}

Meaning: An alphabetic field containing 1 to 3 lowercase characters

Possible matches: a, ab, abc, etc.

N or n

Numeric characters

(Predefined set)

Example: N{7}

Meaning: A numeric field containing any 7 digits

Possible matches: 1234567, 7123456, 3334447, etc.

O or o

Special characters corresponding to the language specification. Such characters could be ?, :, &, etc

(Predefined set)

Example: O N{2}

Meaning: A field containing 1 special character followed by any 2 numeric characters

Possible matches: $10, #99, %75, etc.

X or x

Alphanumeric characters and special characters such as #, >, etc

(Predefined set)

Example: X{5}

Meaning: A field containing five characters, each of which can be a lower or uppercase letter, a number, or a special character

Possible matches: ABc3D, #bD61, $1500, etc.

.

Any character except <space>

Example: .

Meaning: A decimal/period represents any character, except <space>

Possible matches: a, B, 3, $, !, <space>, etc.

\

Explicit characters

Examples: \a, \8, \?

Meaning: A backslash followed by any single character matches that exact character

Matches: a, 8 and ? respectively

' '

Explicit strings

Examples: 'The Total', 'date', ' '

Meaning: A string enclosed within single quotation marks matches that exact string (case sensitive)

Matches: The Total, date and <space> respectively

-

Specific characters to remove from the beginning or end of a string

Example: N{4} -'#'?

Meaning: The # character, if any, is removed when this four-digit field proceeds to output.

Field value example: 1234#

Output: 1234

CaseIns

Case insensitive strings

Example: CaseIns'the needles'

Meaning: The "CaseIns" keyword, followed by a string enclosed with single quotation marks matches that exact string but is not case sensitive

Possible matches: The needles, THE Needles, tHe NEEdles, etc.

[ ]

A specified set or range

Examples: [139], [1-9]

Meaning: A field containing any single character from the set or range enclosed in brackets.

Possible matches: 1, 3, 9, and 1, 2, 3, 4, 5, 6, 7, 8, 9 respectively

^

Complements to a set or range

Example: [^123]

Meaning: A field containing any single character not found in the set or range enclosed in brackets.

Possible matches: 4, 5, 6, 7, ?, #, a, A, b, B etc.

*

A range of "zero to many" for the character or string specified

Examples: 'blah'*, 'Ch' \e* 'se'

Meaning: A field containing, in these examples, 0 to many of the string "blah" and 0 to many of the single character "e".

Possible matches: <empty>, blah, blahblah, blahblahblah and Chse, Chese, Cheese, Cheeese, etc.

?

Zero or one occurrences of the character or string specified

Example: 'cat' \s?

Meaning: A field containing one or no "s" after the specified string.

Possible matches: cat, cats

|

The expression "or", separating two elements or groups where one or the other is allowed

Example: (N{5} | N{10})

Meaning: The "or" operator is used to allow a field containing either five or ten digits

Possible matches: 00001, 12345, 0123456789, 1010101010, etc.

( )

A group of expressions

Example: ('one' | 'two') 'three'

Meaning: A field containing the character strings "one" or "two", and then "three"

Possible matches: one three, two three

Additional examples

Represents

Matches/Does not match

[1-5]N{3-11}

(4 to 12 numbers, where the first is in the range from 1-5)

Possible matches: 1111, 12345

Does not match: 65432122

[A-Z0-9]{3}N{7}

(10 characters where the first 3 are uppercase letters and numbers and the last 7 are numbers)

Possible matches: ABC123456, 1234567890

Does not match: ABCD123456

(X|’ ’){0-20}

Free text including spaces, 0-20 characters

Possible matches: t+2&G, 12aA 345

Does not match: 123456789012345678901

(N{4}\-\0[1-3]\-N{2}) | (N{2} ' ' CaseIns'january'| CaseIns'february'|CaseIns'March' ' ' N{4})

(Dates during the first three months of the year)

Possible matches: 1974-03-31, 31 March 1974, 2003-03-24, 23 March 2003

Does not match: 2001-09-11, 21-february-2002

Important notes

Characters belonging to a predefined set are determined by the language being used. This is particularly important to be aware of when you want to specify "special characters." For example, the backslash (\) character is not included in the Swedish language specification. Language specifications included in Capture Components Administration cannot be changed. However, you can create your own language specifications. Therefore, if you want to use a character that is not included in the default language specification for the language being used (for example the backslash in Swedish), you must create a new language specification which includes that character.

Failing to use a <space> between consecutive characters belonging to a predefined set results in the user receiving an error message. Examples.

Specific characters used in the syntax, which might actually occur in a field, can be included by a preceding "\". Thus, "\\" and "\?" include a backslash and a question mark respectively. Examples.

The format string (X|’ ’)* (any character including <space> occurring from 0 to many times) is the most universal format, placing no restrictions at all on the field value. All other format strings narrow down the possible set of values.