Performance with importing large text files

nilsm · February 9, 2006, 5:58pm

I am currently trying to import large text files with contact information and general journal lines into Navision, using native db - these files are plain txt files, between 200Mb and 400Mb, with up to 3.5 million lines. First idea was to import the data with dataports, but after the first 10% the process is getting really slow, and the process took more than 10 hours to process a 250Mb text file. Second attempt was with a codeunit, reading in the file line by line, with the same effect. The performance decreases as in the first case with dataports. I was assuming this decrease in speed due to database handling and memory, therefore I wrote some functions to split these large files into smaller pieces and automatically importing this smaller files with a “batch process”, commiting to the db after every file is completely imported. Again, the first couple of files (only 20Mb in size) get imported with a reasonable speed, but afterwards again the speed decreases dramatically, and it takes more than 7 hours ([xx(]) to import 20 files, each 15Mb in size. Any ideas how this could be resolved, getting a reasonable performance? Or some hint to “reset” the memory or cache usage? Saludos Nils

ara3n · February 9, 2006, 11:21pm

Try to increase the client cache

thomasbrodkorb · February 10, 2006, 2:25am

Two different suggestions: Check the key optimization after the stuff starts to become slow and optimize before importing the next chunks. Import the files without validations into a 50000 table with the exact field structure as you have in the files instead of the final tables. Then you will have more control over performance later. A really think this is just a problem of validating fields in the tables and as well the keys which do get in with high fragmentation of the keys.

nilsm · February 10, 2006, 8:01am

quote:

Import the files without validations into a 50000 table with the exact field structure as you have in the files instead of the final tables.

That’s in fact what I am doing, import the data into a table, with minimum data validation (just 1), and only 2 keys, consisiting in one field each. There shouldn’t be too much load fromthat side…

quote:

Try to increase the client cache

That was really a good hint, increased for the momento dbms cache to 100mb and speed was fine until the txt files fill up these 100Mb, looks like reserving as much cache for Navision as possible during these imports. Any idea, if it’s possible to clear or reset this dbms cache during runtime, e.g. after each file has been processed and commited??? Saludos Nils

thomasbrodkorb · February 10, 2006, 8:09am

Isn’t that what COMMITCACHE parameter is doing ?

nilsm · February 10, 2006, 9:21am

quote:

Isn’t that what COMMITCACHE parameter is doing ?

That’s what I thought aswell, but for the moment no luck… looks like increasing dbms cache to a maximum possible… Saludos Nils

ara3n · February 10, 2006, 9:59am

try to put onpostdataitem of the dataport COMMIT;

thaug · February 10, 2006, 10:39am

I would think the COMMIT would work too, but probably better to commit every x number of records. Running a COMMIT for all 3.5 million records may be a performance detriment. I’d say to run a counter and commit every 100th record or something like that.

thomasbrodkorb · February 10, 2006, 10:49am

Nils has already split the big files and imports them one by one. And I most probably guess that he has considered to call a commit after each chunk.

nilsm · February 10, 2006, 11:23am

Got it… [;)] The commit cache on the client was the default “”, setting it explicitly to “Yes” did solve the speed issue, all files get imported with the same speed! I am down to 20min to import the before mentioned 20 x 15Mb chunks, that took beforehand around 7hrs [Wow!] By the way, i was commiting each “small” file import, but that didn’t help at first as expected… Thanks for your input. Saludos Nils

sfi · February 16, 2006, 4:52am

Funny. Why haven’t any of you mentioned the possibility of using a temporary table? In your import codeunit create a variable of type record and subtype as the final table. Set the Property Temporary on the Variable to “Yes”. Import your files into the temporary table and when the import has finished transfer the data from the temporary table to the table in the database. A temporary table only exists on the client and only as long as the codeunit is active.

thomasbrodkorb · February 16, 2006, 7:34am

Because the use of a temporary General Journal Line needs much more management (because of the dimensions which are in this case NOT temporary) ? Nils already took the easiest way to import into a “interrim” table.

nilsm · February 16, 2006, 9:03am

quote:

Why haven’t any of you mentioned the possibility of using a temporary table?

Could be possible to improve the performance using temporary tables, but these temporary tables must also use dbms cache and the point is that I have to commit from time to time free the cache again. I’ll give it a try, guess it’s not too much work and will let you know… The point here is the huge amount of data, the data for the general journal is simple, with just gobal dimensiones, but the split and import per small file approach with a commit after each file did proove to work fine in the end. Thanks for all your ideas! Saludos Nils