FOR_EACH_CSVROW

You are here:

FOR_EACH_CSVROW

This is an iterator activity. It will read each row in a CSV file and on each iteration output the first up to 50 column values found in the row. The processing logic nested beneath FOR_EACH_CSVROW activity is repeated for each row read.

This activity is not intended for routine processing of large volumes of data. While the MAXROWS parameter permits you to specify that the activity will read and process more than 999 rows of data, it is not recommended in most instances. The activity can be useful however for transferring limited amounts of information between activities, transformation maps and the processing sequence variable pool.

This activity provides a PARSEOPTION parameter to permit you to control the way that a CSV file is parsed to best suit your case. More detailed information on the parsing options is given below under the heading CSV Parsing Options.

INPUT Parameters:

CSVFILEPATH: Required

This parameter must contain the full path and name of the file to be read.

eg C:\order.csv

or /orders/order_jan.csv

SEPARATOR: Optional

If the CSV file uses a separator other than a comma (for example, a semi-colon is commonly used in some locales), then this value should specify the separator character. The special value *TAB indicates a horizontal tab character. Otherwise, the separator should be 1 character in length and can consist of any character. If not specified, a default value of comma (,) is assumed.

NOTE: if a separator other than comma or *TAB is specified, then the value of the PARSEOPTION parameter will be disregarded. The *EXTENDED parsing option is always used in this case.

MAXROWS: Optional

This activity is not intended for routine processing of large volumes of data. For this reason, (and to avoid unintended "runaway" processes) it is limited, by default, to processing a maximum of 999 rows of data. This parameter allows you to choose to override that maximum, permitting you to process larger amounts of data, if appropriate for your solution. The default value for this parameter is 999 – increasing it is not recommended in most instances.

PARSEOPTION: Optional

Because there are no standards governing the format of data in CSV files, the PARSEOPTION parameter is provided to offer a choice of parsing techniques to suit commonly-used formats. You may choose from the following:

*SIMPLE

*STANDARD (this is the default value, except as noted below)

*EXTENDED

More detailed information on the parsing options is given below under the heading CSV Parsing Options.

Note that if the SEPARATOR parameter specifies a separator other than comma or *TAB, then the value of the PARSEOPTION parameter will be disregarded. The *EXTENDED parsing option is always used in this case.

SKIPFIRSTROW: Optional

Frequently the first row of CSV files contain identifiers or column headings, with the actual data not starting until row 2. In this case, *YES for this parameter to have the activity automatically skip the first row. If not specified, a default of *NO is assumed.

OUTPUT Parameters:

CSVROW

Upon each iteration, this output parameter will contain the row number for the current CSV row read.

Note that if you use the SKIPFIRSTROW parameter to skip the first row in the file (for example, if it contains column headings), the skipped row is not counted in determing the row number. The first row returned still has row number 1, even though it may have been the second row in the file.

CSVCOLUMN1
CSVCOLUMN2
…
CSVCOLUMN50

Upon each iteration, these output parameters will contain the value for the corresponding column for the current CSV row read, up to the number of columns present in the data or a maximum of 50 columns.

CSV Parsing Options

Unlike some other file types, there are not standards or even universally accepted practices for the formatting of data in CSV files. This activity provides a PARSEOPTION parameter to permit you to strike the best balance between performance and the flexibility to handle a variety of commonly-used formats.

The PARSEOPTION parameter permits you to specify one of the values *SIMPLE, *STANDARD or *EXTENDED. These are further described below.

NOTE: if the SEPARATOR parameter specifies a separator other than comma or *TAB, then the value of the PARSEOPTION parameter will be disregarded. The *EXTENDED parsing option is always used in this case.

Parsing option: *SIMPLE

This parsing option offers the best performance. However:

It does NOT support column separators other than the default comma (,) or *TAB. (The activity will automatically switch to parsing option *EXTENDED if another separator is specified.)
It is NOT sensitive to quotes (double or single) surrounding alphanumeric values (they will be returned as part of the column value).
Instances of the column separator character (eg: comma) embedded inside quoted strings WILL be treated as a column separator, which is usually NOT the desired behavior.

(As further information for users accustomed with LANSA development, the activity implements this option using the TRANSFORM_FILE built-in function with either 'O' (comma separator) or 'T' (tab separator) specified for the Input file format argument.)

Parsing option: *STANDARD

This parsing option offers a good balance between good performance and flexibility.

It IS sensitive to quotes (double or single) surrounding alphanumeric values and will remove them from the values returned.
Additionally it is sensitive to "doubled-up" instances of the quote character (whether double or single) embedded within a quoted string and will treat them as an "escaped" quote character and adjust the returned value accordingly.

However:

It does NOT support column separators other than the default comma (,) or *TAB. (The activity will automatically switch to parsing option *EXTENDED if another separator is specified.)
Instances of the column separator character (for example, comma) embedded inside quoted strings WILL be treated as a column separator, which is usually NOT the desired behavior.

Parsing option: *EXTENDED

This parsing option offers the most flexibility to handle a wide variety of CSV formatting cases.

It IS sensitive to quotes (double or single) surrounding alphanumeric values and will remove them from the values returned.
Additionally it is sensitive to "doubled-up" instances of the quote character (whether double or single) embedded within a quoted string and will treat them as an "escaped" quote character and adjust the returned value accordingly.
It IS sensitive to instances of the column separator character embedded inside quoted strings and will NOT treat them as column separators, but will instead return them intact as part of the column value.
It supports any character as the column separator. Commonly in Europe, for example, a semi-colon (;) is used as the field separator, and this can be handled by this activity by specifying the semi-colon in the SEPARATOR parameter. This will cause the activity to use parsing option *EXTENDED (irrespective of the value specified for the PARSEOPTION parameter).

The *EXTENDED option is the most functional, but it is also the slowest. For best performance, especially if you are expecting to process large amounts of data, you should make sure that you use the *SIMPLE or *STANDARD option unless your CSV case truly requires the additional functionality offered by the *EXTENDED option.

(As further information for users accustomed with LANSA development, the TRANSFORM_FILE built-in function is NOT used in the implementation for this option. Instead, the parsing is entirely implemented in LANSA Composer code.)