2012-08-08

SQL Operation Filtering

Performing export-transform-load type workflows starting with XLS, delimited, of fixed-record length files using OpenGroupware Coil's Integration Engine is simple; just define the format using a Format Definition and perform a read action.  After that the resulting StandardXML can be channeled into your RDBMS using sqlInsertAction, sqlUpsertAction, etc...  Such straight forward ETLs are almost trivial.

One message to multiple SQL actions.
But what if you need to update some records, insert some records, and maybe even delete some records?  Depending on the criteria it might be sufficient to just perform a series of XPath operations to divide the translated data into discrete methods.  Or if you need the power of OIE's XSLT extensions such as look-up tables a transformAction could serve to divide the data into messages of rows to be updated or inserted.  But both these methods would involve making multiple passes over the translated data to produce discrete messages.  In many cases these multiple passes may be good enough - but in cases where the work-flow is processes hundreds of megabytes of data or hundreds of thousands rows the cost of multiple passes becomes prohibitive.

So are you back to having to write custom code?  No.  With OpenGroupware Coils 0.1.48 OIE's SQL related actions will support operation filtering.  With operation filtering each row element in a result set can be tagged with an operation attribute indicating what SQL related actions it is a candidate for; so sqlInsertAction, for example, will only process tagged rows whose tag is "insert".  Utilizing this method of tagging a single transformation with an XSLT stylesheet can produce a single message that can subsequently be processed by multiple actions - each of which will only processes the intended rows.

This change will not break existing work-flows as untagged rows are treated as if they are tagged with "any", all untagged rows will be processed by every action they pass through.
But if tagged the sqlUpdateAction will only process row elements with an operation attribute of "update", sqlInsertAction will only process row elements tagged as "insert", and sqlDeleteAction will only process row elements tagged as "delete". Row elements may also be explicitly tagged for operation "any" as well; doing so will yield the same behavior as if the row was not tagged with an operation.
Stanza 000060
start:sqlInsertAction
debug:logic command is "action::sql-insert"
info:Table is "workorder"
info:3 records inserted to SQL connection
complete:No description.
duration:0.228353s

Stanza 000065
start:sqlUpdateAction
debug:logic command is "action::sql-update"
error:Update target is table "workorder"
1876 records processed.
complete:No description.
duration:2.281057s
Text 1: What a process log might look like when using operations filters; in this process both the sqlInsertAction in stanza 000060 and the sqlUpdateAction in stanza 000065 processed the same message as input.
The sqlUpsertAction is a special case.  Since upsert is actually a hybrid operation the tagging doesn't make sense in that context and the operation attributes are ignored by sqlUpsertAction.

Using the format classes in conjunction with the power of OIE's XSLT extentions such as lookup tables as well as operation filtering it is possibly to efficiently perform almost any to-RDBMS type ETL work-flow.

No comments:

Post a Comment