C++ Performance tips for dealing with large XML files


Character encoding takes time, if you know your data is just ASCII then you can turn off the Unicode encoding routines.
If you only use ASCII characters in your XML data you can call the LtXmlLib16::ASCIIOnly() global function at the start of your application to significantly improve performance as it removes the need to perform UTF data conversion.
Warning: If your data does contain extended chars then this data will be lost or misinterpreted if you turn this flag on.

For Win32 C++ to reduce the footprint you can call these global functions at the start of your application:
LtXmlLib16::AsyncToXmlFile()
LtXmlLib16::AsyncFromXmlFile()
This causes XML data to be read/written asynchronously in chunks. This can improve performance, and reduce the memory overhead when reading and writing XML into the library.

When dealing with very large XML files (30MB+) you may be able to deal with them in blocks.
You can also provide a callback function that is called when each element is process in FromXML.
This approach allows you to deal with very large files, or start processing the file before the whole file is loaded (this can significantly improve performance if you are writing a multithreaded application).

If your XML data is in a form similar to this:

<root elm>
    <Transactions>
        <Transaction>
            .
        </Transaction>
        <Transaction>
            .
        </Transaction>
            Etc.
     </Transactions>
</root elm>

This this approach allows you to load, process, and discard each Transaction as they are read.

This is useful when dealing with large files with repeating elements (e.g. bulk imports, batch processing). It can significantly reduce the memory footprint and speed up processing at load time, allowing you to deal with files of any size.

#include "StdAfx.h"
#include "../EventExampleLib.h"
#include "../EventExampleLib/EventExample.h"

using namespace LtXmlLib16;

// forward declarations
void SimpleTestEventExampleLibCEventExample(LPCTSTR);

/// <summary>
/// The main entry point for the application.
/// </summary>
int main(int argc, char* argv[])
{
    /* ---------------------------------------------------------------------------------
     * This will load the sample file
     * "D:\Temp\EventExample\EventExample.xml"
     * and demonstrate breifly how to use it.
     * --------------------------------------------------------------------------------- */
    SimpleTestEventExampleLibCEventExample(_T("D:\\Temp\\EventExample\\EventExample.xml"));

    getchar();
   
    return 0;
}

// final document to write back out to file
EventExampleLib::CEventExamplePtr g_doc = EventExampleLib::CEventExample::CreateInstance();

bool CallBackProcessTransaction(LtXmlLib16::CXmlElement* pElement)
{
    // process each <Transaction> element, one at a time
    if (pElement->GetLocalName() == _T("Transaction"))
    {
        EventExampleLib::CTransactionPtr spTrans = EventExampleLib::CTransaction::CreateInstance();
        spTrans->FromXmlElement(pElement);

        // Do Stuff with Transaction object....


        // Add Transaction into object tree if required.
        // Note: only do this if you want a complete object model when load is complete.
        // You would want to skip this if you were reading very large files.
        g_doc->GetTransactions()->GetTransactionCol()->Add(spTrans);

        return true;
    }

    // ignore all other elements
    return false;
}


void SimpleTestEventExampleLibCEventExample(LPCTSTR lpctFilename)
{
    try
    {
        // This will fire the CallBackProcessTransaction event when elements are loaded.
        // The event handler will deal with each event one at time.
        LtXmlLib16::CXmlDocument xmlDocProcessTrans(*CallBackProcessTransaction);
        xmlDocProcessTrans.Load(lpctFilename);

        _tprintf(_T("Transaction In Object Tree: %d\n\n\n"), g_doc->GetTransactions()->GetTransactionCol()->GetCount());


        _tprintf(_T("Output Objects:\n\n"));
        _tprintf(g_doc->ToXml().c_str());
    }
    catch (CLtException& e)
    {
        // Note: exceptions are likely to contain inner exceptions
        // that provide further detail about the error, GetFullMessage
        // concatantes the messages from them all.
        _tprintf(_T("Error - %s\n"), e.GetFullMessage().c_str());
    }
}