Regular expressions

Regular expressions are used to recognize patterns with textual data. They evaluate text data and match an expression with the text in the document. In Tungsten Transformation, regular expressions are used in format locators, validation methods, and formatters, to identify and normalize items on a document.

Regular expressions describe data in an abstract way, and some common examples are listed in the following table:

Regular Expression Syntax

Format	Description	Example	Matches	Does Not Match
C	One character	a	a	b,A
. (period)	Any character	b.g	bug, bag, big, bbg	bg, baag
\d	Any single digit	a\d	a5, a8, a0	aA, ab, a
c₁c₂c₃	One character out of a set	[abc]	a, b, c	1, 2, d, D, A, ab, bc
[c₁-c_n]	One character out of a range	[a-z]	b, g, x	1, 2, D, A
? (question mark)	The previous term is optional	x\d?	x, x7, x1	xx, xq
+ (plus sign)	The previous term can be repeated one or more times	\d+	4, 2323, 100	A112, 2b, X
* (asterisk)	The previous term can be repeated zero or more times	x\d*	x6, x, x100	100x, xx
{n}	The previous term can be repeated exactly n times	y{3}	yyy	yy, yyyy
{m, n}	The previous term can be repeated between m and n times	\d{5,9}	12345, 999999999	1234, 999999999999
\	Escape special characters	\$ \\ \- \? \.	$ \ - ? .	!%
()	Group characters	a(\$\$)?b	a$$b, ab	a$b, a$$
(e₁\|e₂)	Choice	(abc\|ABC)	abc, ABC	aBC, AbC
\n	Back reference (nth item matched in round brackets needs to be matched again)	(\d)x\1	1x1,2x2,3x3,4x4...	1x2,6x7...

You can find many third-party resources on the internet about regular expressions. In many cases however, extensive knowledge of regular expressions is not needed because Tungsten Transformation provides a set of commonly used and predefined templates.

You can also use dictionaries in regular expressions for format locators. If you know the name of the dictionary, you can edit the input box of the format locator directly by typing the dictionary name as "§ + dictionary name + §." As there is no such symbol on the keyboard, you can generate this by typing Alt + 0167.

If any of your documents have special ASCII characters that you want to locate and extract, you can do so using regular expression codes for ASCII characters. The following table shows the conversion requirements.

ASCII Hex	Special Character	Regular Expression Code
21	!	\x21
22	"	\x22
23	#	\x23
24	$	\x24
25	%	\x25
26	&	\x26
27	'	\x27
28	(	\x28
29	)	\x29
2A	*	\x2A
2B	+	\x2B
5E	^	\x5E
A7	§	\xA7

For example, a single entry of \x2A can be used to match a single character. In this case, an asterisk (*). In addition, you can use these characters as a range. For example, [\x21-\x29] can locate any of the following characters; !"#$%&'().

Regular expressions

Search resultsSearch tips

Search tips