Kirix Support Forums

by **BarthaziAndras** on Thu Sep 13, 2007 12:58 pm

I have these tables: http://www.nemzetisport.hu/cikk.php?cikk=153057&

In the third column there is nationality and the group together, and I would like to extract the nationality and the group.

REGEXP(Field3, "^\w+")
REGEXP(Field3, "[^ ]+$")

It's working well, but I think it's just not a great solution. I would use these formulas:

REGEXP(Field3, "^\s*(.*?)\s*,")
REGEXP(Field3, ",\s*(.*?)\s*$")

In English: if there's a grouping operator "()" in the regexp, then the regexp's result should be the content of the operator, not the whole match.

With this data source, the first solution is OK, but the second solution is not just grabbing the data, but trimming it, too, what may make an import process easier.

Andras

by **Aaron** on Thu Sep 13, 2007 2:32 pm

Your regular expressions are correct. The only thing you're missing is more complete documentation on the REGEXP function.

In fact, we can extract the part of the regular expression in parenthesis by supplying an optional third parameter. Two syntaxes are supported:

REGEXP(<string>, <exp>, <integer>)
REGEXP(<string>, <exp>, <string>)

With an integer parameter, the integer denotes which sub-match included in parenthesis we want to return. For example,

REGEXP("ABC", "(A)B(C)", 1)

returns A, which is the first submatch, and

REGEXP("ABC", "(A)B(C)", 2)

returns C, which is the second submatch.

In your example, we would use the following to extract the parts in the parenthesis:

REGEXP(Field3, "^\s*(.*?)\s*,", 1)
REGEXP(Field3, ",\s*(.*?)\s*$", 1)

However, because we can denote submatches, we can even combine the two regexes into one and use the integer parameter to extract the first and second parts:

REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", 1)
REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", 2)

The string parameter works similar to the integer parameter, except that we use a $ in front of the number. For example, following the patterns above, we would use:

REGEXP(Field3, "^\s*(.*?)\s*,", "$1")
REGEXP(Field3, ",\s*(.*?)\s*$", "$1")

The beauty of this syntax is that we can easily rearrange or replace the matching portions of the string to construct new strings. For example, if we want to reverse the parts of the string separate by the comma, we can combine the expressions and use the third parameter to reorganize the matching parts:

REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", "$2 $1")

Finally, because we can nest functions, we can apply additional functions outside the regular expression function. For example, after reversing the parameters, we can capitalize the string using the UPPER function around the REGEXP function:

UPPER(REGEXP(Field3, "^\s*(.*?)\s*,\s*(.*?)\s*$", "$2 $1"))

Overall, regular expressions are extremely useful for extracting portions of data, and using them in calculated fields like you're using them opens up huge possibilities not only for cleaning up lists of names, addresses, phone numbers, or other contact information, but also quickly prototyping them for use in other code.

by **BarthaziAndras** on Thu Sep 13, 2007 3:03 pm

Great, thanks for the detailed information.

Kirix Support Forums

Calculated fields

Calculated fields

Re: Calculated fields

Re: Calculated fields