TEXTFILE_SPLIT

You are here:

TEXTFILE_SPLIT

This activity can split a text document into multiple documents at each line that meets a specified condition.

For certain types of text documents it can be useful for breaking an aggregated transaction document file into one or more files each containing a single transaction. If it is convenient or necessary for your solution to do this, then you can do so with this activity. (Note, it will usually be possible to do the same thing using a Transformation Map, but using this activity might be simpler in many cases.)

The output split document paths and names are generated according to the values specified or assumed for the TXTSPLITPATH and TXTSPLITROOT parameters. If documents of the generated name(s) exist in the target location, they will be replaced by this activity. On IBM i servers, the output split documents will be created with the same CCSID value as the input text document.

There are, of course, an infinite number of possible text document formats. This activity may not be capable of splitting every source document in the way that you intended or expected. Rather it is intended for certain common scenarios, especially fixed-length field (FLF) files and CSV files.

For example, consider an ORDERS.CSV file like this:

HEADER,12345,543,"ABC Industries","123 North St.",Bankstown,NSW,2087,2017-03-14

DETAIL,1,123,"Gasket Paper",9.95,10

DETAIL,2,456,Glue,13.27,5

HEADER,67890,876,"Banana Distributors Co-op","88 Eighth St.",Sydenham,NSW,2092,2017-04-23

DETAIL,1,789,"Custard powder",8.88,8

DETAIL,2,ABC,"Apple juice",4.32,7

By specifying the value 'HEADER' as the delimiter value in the TXTSPLITONVALUE parameter, this activity can break the file into two files each containing a single order HEADER and its associated details.

INPUT Parameters:

TXTFILE: Required

This parameter must specify the path and name of the text document file to be split.

TXTSPLITONVALUE: Required

This parameter must specify the delimiter value that triggers a new split document when found at the position and/or CSV column specified by the TXTSPLITONVALUEPOS and TXTSPLITONVALUECOL parameters.

The activity will start a new split document for each line in the input text document file that contains the specified value at the specified position and/or CSV column.

For example, if you specify the value 'HEADER', and one (1) is specified or assumed for the TXTSPLITONVALUEPOS parameter, then each line that begins exactly with the value 'HEADER' will begin a new split document.

TXTSPLITONVALUEPOS: Optional

This parameter specifies the position to search for the delimiter value specified by the TXTSPLITONVALUE parameter. The position is relative to the whole line or to the CSV column number, if any, specified by the TXTSPLITONVALUECOL parameter.

If not specified, a default of one (1) is assumed.

TXTSPLITONVALUECOL: Optional

This parameter specifies the CSV column number to search for the delimiter value specified by the TXTSPLITONVALUE parameter.

The default value, *LINE, means that CSV parsing is not performed and the whole line is searched for the delimiter value in the specified position. For best performance (avoiding the additional CSV parsing), use this value if the delimiter value is always in the same absolute position in the line.

Otherwise you may specify a CSV column number. In this case, each line's contents is parsed (up to the specified column number) to extract the CSV column value. Note that the CSV parsing will remove surrounding quote marks, if present. The resulting CSV column value is then searched for the specified delimiter value in the position specified by the TXTSPLITONVALUEPOS parameter.

TXTSPLITONVALUESEP: Optional

If you specify a CSV column number in the TXTSPLITONVALUECOL parameter, then This parameter specifies the CSV separator character. The separator should be 1 character in length and can consist of any character. If not specified, a default of comma (,) is assumed.

TXTSPLITWHERE: Optional

The standard processing is that the line containing the specified split condition starts a new split document file. For example, you would wish to start a new split document for each 'header' line and the 'header' line should be the first line in the split document.

That is the behavior when the default value '*BEFORE' is used for this parameter.

In some cases, however, it may be the last transaction line (rather than the first) that signifies that a new split document should be created. For example, imagine that your split condition identifies a 'trailer' line rather than a 'header'. In that case you wish the 'trailer' line to be written to the current split document and a subsequent line, if present, should begin a new split document.

If your case requires this behavior, then you may specify '*AFTER' for this parameter.

TXTSPLITPATH: Optional

This parameter specifies the path in which the split document files are to be created. If not specified, a default of *SAME is assumed. *SAME means the split document files will be created in the same location as the input text document file.

TXTSPLITROOT: Optional

This parameter specifies the root file name and the file extension for the split document files. If not specified, a default of *SAME is assumed. *SAME means the activity will use the file name and extension of the input text document file as the root file name and the file extension for the split document files. The activity will append a sequential number to the root file name to make each split document file name. For example, if you specify 'ORDER.txt'as the value for this parameter and the input file is split into three document files, then they would have the names 'ORDER1.txt', 'ORDER2.txt' and 'ORDER3.txt'.

TXTLINETERMINATOR: Optional

This parameter specifies (on IBM i platforms only) the line terminator that is written at the end of each line for each split document. You may specify one of the following values:

*CRLF: a carriage return/line feed pair (this is the default value)

*LFCR: a line feed/carriage return pair

*CR: a carriage return

*LF: a line feed

*NL: a new line character

This value affects the line terminators in the output split documents only. When reading the input text document, the activity will be sensitive to any standard line terminators.

TXTSUPPRESSLEADZEROS:

Option '*NO' will not suppress the zeros in the incrementation number of the file e.g. FILE000001.txt and FILE000010.txt and FILE000100.txt. This allows for better sort when in a list.

Option '*YES' is the default and will result in the suppression of zeros e.g. FILE1.txt FILE10.txt FILE100

NOTE: On Windows server platforms, all the line terminator options (*CRLF, *LFCR, *CR, *LF, *NL) result in a carriage return/line feed pair. The distinct line terminator options are not supported.

OUTPUT Parameters:

TXTSPLITCOUNT:

Upon successful completion this parameter will contain the count of split documents created.

TXTSPLITLIST:

Upon successful completion this parameter will contain a list of the full file paths of the split documents created.