What can Differencia do for me?
We announced the forthcoming release of Differencia last month and we wanted to provide some more information on what the first release of Differencia will be able to do.
File Formats
The key problem that Differencia solves, is allowing you to compare data files that have different formats. This might be simply due to fields being in a different order, or there might be more complex differences, such as text encodings, field separators and line endings.
We also handle differences in field formats such as might arise due to International Date and Number formats. So, for example, we can correctly match all of “6 September 2007″, “9/6/07″ and “2007-09-06″.
Even if you are comparing two files of the same format, simply having your records sorted in a different order can make automated file comparison difficult. In Differencia you can specify key fields that uniquely identify records. Differencia will use these to match records in your files.
Of course specifying all the fields of a large file can be a chore, so Differencia will identify which of your fields are dates, times or numbers and even what specific format they take. To remove even this step, we will be hosting an online template database where users can share common file formats.
What to Compare
Really the only piece of configuration you need to do is to let Differencia know which fields from each file you want to compare. Again, Differencia will make some sensible choices for you. Additionally, our online template database will include predefined comparison definitions for comparing common files formats.
We hope, that with strong community support, you will be able to compare your data with no configuration.
Intelligent Comparisons
Of course things are not always that simple. What if you are trying to compare separate date and time fields in one file, with a single date-time field in another. For example you want to match “6 September 2007 11:30PM”, with separate date “6/9/07″ and time “23:30:33″ fields. Differencia will simply do the right thing, correctly matching these fields, right down to ignoring the seconds, where appropriate.
Imagine the case where you are comparing bank account data. In your Internet Banking Statement, you have an “Amount” field that includes both withdrawals and inward payments. In your Personal Accounting software file you have separate Debit and Credit fields. Differencia will correctly handle these. And if you are not using an existing template, all you have to do is let Differencia know which is the Debit field and which is the Credit. If you are using a template, of course, there is nothing to do other than select the files to compare.
With bank records, there is not always a unique identifier for each transaction that is common to your online banking statement and your personal accounts system. Which is why Differencia allows you to use any combination of fields to identify each record. So you can use the date and value of each transaction to match between the two files, even when you have separate Debit and Credit fields.
Summary
That’s just a taster of what Differencia can do for you. We’ve done our best to make the process of comparing data as painless as possible. In particular we’ve tried to minimise the configuration so that you can simply select two files and run the comparison.
We’ll have screen shots to illustrate all of this next week.




October 1st, 2007 at 3:25 pm
[...] data comparison tool. For information on what Differencia can do, please see my earlier blog post, here. Feel free to let me know what you think about the UI, or if you have any questions, in the [...]