A parser for MT940 bank statements

by Sander Marechal

I am working on a new project and I needed to parse some MT940 files. MT940 is a pretty common exchange format for bank statements. Most banks will allow you to export your bank statements in this format. I had a look around but I wasn't quite happy with the existing parsers so I decided to implement one myself. This also gave me an opportunity to try out travis-ci and composer/packagist, both of which I haven't used before.

So, Here is jejik/mt940 on github. You can install it using composer and check the build status on travis-ci. Using the library is very easy:

  1. <?php
  2.  
  3. use Jejik\MT940\Reader;
  4.  
  5. $reader = new Reader();
  6. $statements = $reader->getStatements(file_get_contents('mt940.txt'));
  7.  
  8. foreach ($statements as $statement) {
  9.     echo $statement->getOpeningBalance()->getAmount() . "\n";
  10.  
  11.     foreach ($statement->getTransactions() as $transaction) {
  12.         echo $transaction->getAmount() . "\n";
  13.     }
  14.  
  15.     echo $statement->getClosingBalance()->getAmount() . "\n";
  16. }

At the moment four banks are supported: ABN-AMRO, ING, Rabobank and Triodos bank. I'd be happy to add support for your bank as well. Just send me a Pull Request on github with your parsers. Make sure that you also add a unit test for it that parses a test document. You can redact personal information from the test document (e.g. use '123456789' for the account number, etcetera.

I am also happy to implement a parser for you, if you prefer that. Just open an issue on github and I will contact you privately, or use the contact form on this website. I will need an unredacted MT940 file from your bank. It needs to be unredacted because the MT940 isn't well defined and can be fickle. If you redact it, it is possible that the parser I write will work on the file you supplied but not on the real thing. Of course, I will redact the file for you when I add it to my unit tests.

Jejik/MT940 is licensed under the MIT license. Have fun with it!

Creative Commons Attribution-ShareAlike

Comments

#1 Merijn Schering (http://www.group-office.com)

Thanks a lot!

I found and fixed a bug with rtrim in the AbstractParser. If the transaction line contain empty spaces or tabs at the end the description will never be found. I basically removed rtrim in getLine and added it in splitStatements



protected function getLine($id, $text, $offset = 0, &$position = null)
{
$pcre = '/(?:^|\r\n)\:(' . $id . ')\:' // ":<id>:" at the start of a line
. '(.+)' // Contents of the line
. '(:?$|\r\n\:[[:alnum:]]{2,3}\:)' // End of the text or next ":<id>:"
. '/Us'; // Ungreedy matching

// Offset manually, so the start of the offset can match ^
if (preg_match($pcre, substr($text, $offset), $match, PREG_OFFSET_CAPTURE)) {
$position = $offset + $match[1][1] - 1;
// return rtrim($match[2][0]);
//MS: Don't rtrim here or offsets won't match in splitTransactions
return $match[2][0];
}

return '';
}

/**
* Split the text into separate statement chunks
*
* @param string $text Full document text
* @return array Array of statement texts
* @throws \RuntimeException if the statementDelimiter is not set
*/
protected function splitStatements($text)
{
if ($this->statementDelimiter !== null) {
$chunks = preg_split('/^' . $this->statementDelimiter . '\r$/m', $text, -1);
return array_filter(array_map('trim', $chunks));
}

throw new \RuntimeException('No statementDelimiter set');
}

/**
* Split transactions and their descriptions from the statement text
*
* Returns a nexted array of transaction lines. The transaction line text
* is at offset 0 and the description line text (if any) at offset 1.
*
* @param string $text Statement text
* @return array Nested array of transaction and description lines
*/
protected function splitTransactions($text)
{
$offset = 0;
$position = 0;
$transactions = array();

while ($line = $this->getLine('61', $text, $offset, $offset)) {
$offset += 4 + strlen($line) + 2;
$transaction = array(rtrim($line));


// echo $line." ".$offset."\n";

// See if the next description line belongs to this transaction line.
// The description line should immediately follow the transaction line.
$description = array();
while ($line = $this->getLine('86', $text, $offset, $position)) {
// echo $line." ".$offset." ".$position."\n";
if ($position == $offset) {
$offset += 4 + strlen($line) + 2;


$description[] = rtrim($line);
} else {
break;
}
}

if ($description) {
$transaction[] = implode("\r\n", $description);
}

$transactions[] = $transaction;
}

return $transactions;
}

#2 Sander Marechal (http://www.jejik.com)

Hi Merijn. You should give the class-injection branch on Github a try. That already fixed it (and adds a bunch of nice new features too). It's to become version 0.3 soon.

#3 Sander Marechal (http://www.jejik.com)

I have just released version 0.3: See this article.
Post a new comment

Registration is not required to post comments, but cookies must be enabled. One of the advantages of registration is that you can edit your comments later on (editing not yet implemented). You can register or login here.




Your e-mail address will not be published, but your website URL will. All links that you post will tagged rel="nofollow" to throw off spammers. You are allowed to use the following XHTML tags in your comment: <em> <strong> <u> <b> <i> <strike> <blockquote> <big> <small> <ul> <ol> <li> <a href=""> <pre> <code> <tt> <br>. Please allow up to 60 second processing time after you post a comment. Our spam filters may take some time.