This topic describes how to type format strings when:
Specifying the extraction format for barcode classifiers and fields.
Specifying the extraction format for character fields and classifiers.
Type the format string as accurately as possible. See the Important notes below regarding predefined sets, special occurrences, and examples of complex expressions.
Symbol | Represents | Description |
{ } | Ranges (The default setting is for a character/number to occur exactly once. Using creates a custom range.) | 0{5}, 0{1-5} A range with an exact number of occurrences {n} or a varied but limited number {n-m} 00000 and 0, 00, 000, 0000, 00000 respectively |
A | Uppercase alphabetic characters (Predefined set) | A{2-5} An alphabetic field containing 2 to 5 uppercase characters AB, XYZ, ABDDE, etc. |
a | Lowercase alphabetic characters (Predefined set) | a{1-3} An alphabetic field containing 1 to 3 lowercase characters a, ab, abc, etc. |
N or n | Numeric characters (Predefined set) | N{7} A numeric field containing any 7 digits 1234567, 7123456, 3334447, etc. |
or | Special characters corresponding to the language specification. Such characters could be ?, :, &, etc (Predefined set) | Example: O N{2} A field containing 1 special character followed by any 2 numeric characters $10, #99, %75, etc. |
or | Alphanumeric characters and special characters such as #, >, etc (Predefined set) | X{5} A field containing five characters, each of which can be a lower or uppercase letter, a number, or a special character ABc3D, #bD61, $1500, etc. |
. | Any character except <space> | Example: . A decimal/period represents any character, except <space> a, B, 3, $, !, <space>, etc. |
\ | Explicit characters | \a, \8, \? A backslash followed by any single character matches that exact character Matches: a, 8 and ? respectively |
' ' | Explicit strings | 'The Total', 'date', ' ' A string enclosed within single quotation marks matches that exact string (case sensitive) The Total, date and <space> respectively |
- | Specific characters to remove from the beginning or end of a string | Example: N{4} -'#'? Meaning: The # character, if any, is removed when this four-digit field proceeds to output. 1234# 1234 |
CaseIns | Case insensitive strings | CaseIns'the needles' Meaning: The "CaseIns" keyword, followed by a string enclosed with single quotation marks matches that exact string but is not case sensitive The needles, THE Needles, tHe NEEdles, etc. |
[ ] | A specified set or range | [139], [1-9] A field containing any single character from the set or range enclosed in brackets. 1, 3, 9, and 1, 2, 3, 4, 5, 6, 7, 8, 9 respectively |
^ | Complements to a set or range | Example: [^123] not found in the set or range enclosed in brackets. A field containing any single character4, 5, 6, 7, ?, #, a, A, b, B etc. |
* | A range of "zero to many" for the character or string specified | Examples: 'blah'*, 'Ch' \e* 'se' A field containing, in these examples, 0 to many of the string "blah" and 0 to many of the single character "e". <empty> blah, blahblah, blahblahblah and Chse, Chese, Cheese, Cheeese, etc. |
? | Zero or one occurrences of the character or string specified | Example: 'cat' \s? A field containing one or no "s" after the specified string. cat, cats |
| | The expression "or", separating two elements or groups where one or the other is allowed | Example: (N{5} | N{10}) The "or" operator is used to allow a field containing either five or ten digits 00001, 12345, 0123456789, 1010101010, etc. |
( ) | A group of expressions | Example: ('one' | 'two') 'three' A field containing the character strings "one" or "two", and then "three" one three, two three |
Additional examples | Represents | Matches/Does not match |
[1-5]N{3-11} | (4 to 12 numbers, where the first is in the range from 1-5) | Possible matches: 1111, 12345 Does not match: 65432122 |
[A-Z0-9]{3}N{7} | (10 characters where the first 3 are uppercase letters and numbers and the last 7 are numbers) | Possible matches: ABC123456, 1234567890 Does not match: ABCD123456 |
(X|’ ’){0-20} | Free text including spaces, 0-20 characters | Possible matches: t+2&G, 12aA 345 Does not match: 123456789012345678901 |
(N{4}\-\0[1-3]\-N{2}) | (N{2} ' ' CaseIns'january'| CaseIns'february'|CaseIns'March' ' ' N{4}) | (Dates during the first three months of the year) | 1974-03-31, 31 March 1974, 2003-03-24, 23 March 2003 Does not match: 2001-09-11, 21-february-2002 |
Characters belonging to a predefined set are determined by the language being used. This is particularly important to be aware of when you want to specify "special characters." For example, the backslash (\) character is not included in the Swedish language specification. Language specifications included in Capture Components Administration cannot be changed. However, you can create your own language specifications. Therefore, if you want to use a character that is not included in the default language specification for the language being used (for example the backslash in Swedish), you must create a new language specification which includes that character.
Failing to use a <space> between consecutive characters belonging to a predefined set results in the user receiving an error message. Examples.
Specific characters used in the syntax, which might actually occur in a field, can be included by a preceding "\". Thus, "Examples.
and " include a backslash and a question mark respectively.The format string (X|’ ’)* (any character including <space> occurring from 0 to many times) is the most universal format, placing no restrictions at all on the field value. All other format strings narrow down the possible set of values.