I have these tables: http://www.nemzetisport.hu/cikk.php?cikk=153057&
In the third column there is nationality and the group together, and I would like to extract the nationality and the group.
REGEXP(Field3, "^\w+")
REGEXP(Field3, "[^ ]+$")
It's working well, but I think it's just not a great solution. I would use these formulas:
REGEXP(Field3, "^\s*(.*?)\s*,")
REGEXP(Field3, ",\s*(.*?)\s*$")
In English: if there's a grouping operator "()" in the regexp, then the regexp's result should be the content of the operator, not the whole match.
With this data source, the first solution is OK, but the second solution is not just grabbing the data, but trimming it, too, what may make an import process easier.
Andras
Kirix Support Forums
Calculated fields
3 posts
• Page 1 of 1
Re: Calculated fields
Your regular expressions are correct. The only thing you're missing is more complete documentation on the REGEXP function.
In fact, we can extract the part of the regular expression in parenthesis by supplying an optional third parameter. Two syntaxes are supported:
REGEXP(<string>, <exp>, <integer>)
REGEXP(<string>, <exp>, <string>)
With an integer parameter, the integer denotes which sub-match included in parenthesis we want to return. For example,
REGEXP("ABC", "(A)B(C)", 1)
returns A, which is the first submatch, and
REGEXP("ABC", "(A)B(C)", 2)
returns C, which is the second submatch.
In your example, we would use the following to extract the parts in the parenthesis:
REGEXP(Field3, "^\s*(.*?)\s*,", 1)
REGEXP(Field3, ",\s*(.*?)\s*$", 1)
However, because we can denote submatches, we can even combine the two regexes into one and use the integer parameter to extract the first and second parts:
REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", 1)
REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", 2)
The string parameter works similar to the integer parameter, except that we use a $ in front of the number. For example, following the patterns above, we would use:
REGEXP(Field3, "^\s*(.*?)\s*,", "$1")
REGEXP(Field3, ",\s*(.*?)\s*$", "$1")
The beauty of this syntax is that we can easily rearrange or replace the matching portions of the string to construct new strings. For example, if we want to reverse the parts of the string separate by the comma, we can combine the expressions and use the third parameter to reorganize the matching parts:
REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", "$2 $1")
Finally, because we can nest functions, we can apply additional functions outside the regular expression function. For example, after reversing the parameters, we can capitalize the string using the UPPER function around the REGEXP function:
UPPER(REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", "$2 $1"))
Overall, regular expressions are extremely useful for extracting portions of data, and using them in calculated fields like you're using them opens up huge possibilities not only for cleaning up lists of names, addresses, phone numbers, or other contact information, but also quickly prototyping them for use in other code.
In fact, we can extract the part of the regular expression in parenthesis by supplying an optional third parameter. Two syntaxes are supported:
REGEXP(<string>, <exp>, <integer>)
REGEXP(<string>, <exp>, <string>)
With an integer parameter, the integer denotes which sub-match included in parenthesis we want to return. For example,
REGEXP("ABC", "(A)B(C)", 1)
returns A, which is the first submatch, and
REGEXP("ABC", "(A)B(C)", 2)
returns C, which is the second submatch.
In your example, we would use the following to extract the parts in the parenthesis:
REGEXP(Field3, "^\s*(.*?)\s*,", 1)
REGEXP(Field3, ",\s*(.*?)\s*$", 1)
However, because we can denote submatches, we can even combine the two regexes into one and use the integer parameter to extract the first and second parts:
REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", 1)
REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", 2)
The string parameter works similar to the integer parameter, except that we use a $ in front of the number. For example, following the patterns above, we would use:
REGEXP(Field3, "^\s*(.*?)\s*,", "$1")
REGEXP(Field3, ",\s*(.*?)\s*$", "$1")
The beauty of this syntax is that we can easily rearrange or replace the matching portions of the string to construct new strings. For example, if we want to reverse the parts of the string separate by the comma, we can combine the expressions and use the third parameter to reorganize the matching parts:
REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", "$2 $1")
Finally, because we can nest functions, we can apply additional functions outside the regular expression function. For example, after reversing the parameters, we can capitalize the string using the UPPER function around the REGEXP function:
UPPER(REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", "$2 $1"))
Overall, regular expressions are extremely useful for extracting portions of data, and using them in calculated fields like you're using them opens up huge possibilities not only for cleaning up lists of names, addresses, phone numbers, or other contact information, but also quickly prototyping them for use in other code.
Aaron Williams
Kirix Support Team
Kirix Support Team
-
Aaron - Kirix Support Team
- Posts: 120
- Joined: Fri Dec 16, 2005 3:01 pm
Re: Calculated fields
Great, thanks for the detailed information.
- BarthaziAndras
- Registered User
- Posts: 9
- Joined: Thu Sep 13, 2007 12:08 pm
3 posts
· Page 1 of 1
Return to Bug Reports & Suggestions