Perform Common Tasks
Extracting Only Part of a Text
To extract only a part of the text in a tag, you can use patterns on the text in the tag. For example, you might want to extract the name "Bob Smith" from the following text: "The article is written by Bob Smith." To do this, use the Extract data converter (do not confuse this with the Extract step) and configure it as described in this topic.
In this example, the pattern used is ".*by\s(.*)\.", which means that the text between "by" and the period will be matched by the subpattern. For more information, see Patterns.
- Open Extract Configuration, and select the Basic tab.
-
In the Pattern field, enter the text pattern to extract.
Configure the Pattern property to match the entire text, with the text to extract matched by a subpattern, enclosed by parentheses.
Converting Content
To normalize content, use Conversion, such as replacing text with another text. For example, to normalize country codes to their natural language description, such as normalizing "US" to "United States."
-
For plain text conversions, use the Convert Using List data converter.
-
For conversions based on patterns or expressions, use the If Then data converter.
Extracting and Formatting Numbers
- To extract a number from content, add an Extract Number data converter.
- To perform additional number formatting, use the Format Number data converter.
Extracting the Date from Text
Extracting dates should be done in the same fashion as extracting numbers.
-
To extract a date from text, add an
Extract Date data
converter to your robot.
Extract Date uses patterns to extract the date. The pattern does not have to match the entire text, only the date. The extracted date is converted to standard date format.
- To perform additional date formatting, use the Format Date data converter.
Extracting Only a Subset of the Tags in a Found Tag
Sometimes, you want to extract from a range of tags rather than a single tag.
For example, consider the case of extracting the body text of an article, where the body text is made up of individual sections, each in their own tag, and where information about the article title and author is contained in some other tags. To extract only the body text without the article title and author, use the Extract action to extract the text, and configure the action so that only the range of tags spanning the body is extracted.