Quantcast
Channel: TIBCO Scribe Blog | Scribe Software
Viewing all articles
Browse latest Browse all 253

Understanding Scribe Online’s Batch processing

$
0
0
A great way to improve your integration or migration performance is to use batch processing. Relative to Scribe Online, batch processing allows Agents to process a set of source data into a batch (or group) before sending that data to the Target Connector. Once that Target Connector received the batch, the operation is committed using the target endpoint’s API’s method for bulk, batch, or asynchronous processing. For Connectors that support batch processing, such as Microsoft Dynamics™ CRM and Salesforce®, performance is improved because their APIs accept operations sent asynchronously or as an array, thus processing the data set much faster. Batch processing with Scribe Online does… Amortize the cost of the operations when there are a large number of source rows to process. Reduce the number of round-trips between Scribe and the target application. More efficiently use the API quota on target applications (such as Salesforce).   Batch processing with Scribe Online does not… Work atomically against your target application (that is, individual records can succeed and fail using batch). Perform faster lookups. Reduce the number of records processed.   Configuring your Batch Not all Connectors support batch processing, but for those that do, you’ll be able to create batches for Insert, Update, Delete, and Upsert operations. For example: You can determine the batch size when you configure a Scribe Online Connection. The default size of the batch is 2,000 source records, however, different applications have different limits, as follows: Salesforce SOAP API – limit of 200 records per batch (more details here). Salesforce Bulk API – limit of 10,000 records per batch (more details here). Dynamics CRM API – no hard limit, recommended 2,000 – 3,000 max batch size. Marketo API – limit of 100 records per batch. ExactTarget API – no hard limit, recommended 100 max batch size. If you configure a batch size limit to be larger than your target application permits, most Connectors will correct the batch to the largest possible size accepted by the API during processing. For example: if you have 2,000 source rows with an Upsert operation set to a batch size of 2,000 to Salesforce SOAP API, Scribe will send 10 batches of 200 rows. How it works Each Scribe Online map starts the same way. The Agent will start by creating the connections needed in the map, and then run the query specified by the user. The results of the query are returned and cycled through each operation top-down, one source row at a time. When batch is enabled, the Agent does not send the data directly to the Connector to be committed to the endpoint. Instead, the Agent holds (batches) a set of records until the batch size is reached. The batch size is evaluated per operation, on each source row. This high level diagram describes how the Agent processes a map definition that includes a batch operation. Lookups Performance testing reveals that including a Lookup operation in a map designed for batch processing will greatly decrease the performance boost expected. This decrease in performance will depend on where the Lookup is taking place (that is, a web service will likely be much slower than a local SQL API lookup). While the Agent can process field mappings, If/Else statements, functions, and so on, very quickly, a Lookup introduces a bottleneck that is reliant on the Lookup endpoint’s API returning the value of your Lookup in a timely fashion. The diagram below shows how, when introducing a Lookup against a target endpoint, Scribe needs to perform a round-trip call before fully preparing the batch. Alternatives to Lookups   If you need to use Lookup(s), consider cross referencing the source and target key values to lower the number of Lookups required in everyday integrations. A small modification to the map above will check if a foreign key (from the Target end-point) on a source-side field. IF this source field is NULL, the Lookup will be performed, a non-batch operation will occur, and the target key will be updated on the source. ELSE, the batch update is prepared/sent to the target to update that record. By adding target keys to your source data, your Update steps will not require a Lookup, thus making use of true batch processing. In addition, fewer Lookups will provide fewer performance hits.     Performance I’ve posted my results below for processing 20,000 records (and 300,000 where storage permitted) using the same maps, processing the same data, with batch enabled and disabled for contrast. Rows processed per second is calculated by: Rows Processed / Execution Time in Seconds = Records per Second When running non-batch operations, both Solutions ran under 10 records/second against Dynamics CRM Online and Salesforce.com. When the same maps made use of batch processing at their optimal sizes, performance was dramatically increased. These statistics will give you an idea of the boost batch processing will provide your integrations. However, please note these results are specific to my laptop, database configuration, and end-points. You may see better or worse performance depending on the wide variety of circumstances such as Indexes, Triggers/Workflows, network and Internet connection speed, etc. Both Salesforce and Dynamics CRM web services performed at about the same pace when batch was turned on with about 130 records processed per second. Typically when working with web services, there is lag time waiting for network traffic over HTTP(S) and waiting responses from the APIs. Unlike web services, connections to database APIs, such as Microsoft® SQL Server, are much faster – even if they’re over a network. The above chart shows batch processing with Scribe Online’s Dynamics CRM and Salesforce Connectors. The results from high data volumes yields performance that exceeds that of the same maps against a SQL Server target! Other Notes You cannot use the result fields from a batch operation because that operation may not have happened [yet] during the evaluation – therefore, result fields of batch operations are not exposed. Doing a huge initial sync and still want use batch processing? Since result fields are disabled [...]

Viewing all articles
Browse latest Browse all 253

Trending Articles