Dirty Data: What is it, how does it cause problems, and what is the solution?

Dirty data is a huge problem during implementation. It slows performance, causes crashes, and is an overall nightmare for our developers. What is it, how does it cause problems, and what is the solution?

Dirty data is data that is riddled with inconsistencies:

  • Misspellings
  • Numbers and letters in the same fields
  • Data in the wrong fields (for example, phone numbers in postal code fields)
  • Invalid data (for example marking a donor as a lapsed donor when they gave a gift last month)
  • Duplicate data
  • Format errors (such as a comma in an email field)
  • Incorrect, inconsistent, or misspelled titles

The main problems associated with dirty data are:

  • Slowed performance
  • Invalid reports
  • Inconsistent donor communication (for example, sending a lapsed donor letter to a current donor or failing to recognize a major donor as such)
  • Software crashes and freezes

What’s the business impact of dirty data to a non-profit?

Dirty data can cost a non-profit 10’s of thousands of dollars!  How?  It increases implementation costs.  It loses donors (failing to properly recognize a donor can result in donor’s not giving to your organization again). System crashes or freezes cause unnecessary down time and generate unwanted maintenance costs for an organization.

What is the solution?

Our developers have created a program to automate the clean up and conversion of data in order to facilitate a cleaner implementation. The process locates and fixes all of the mistakes in the data before conversion.

At Orange Leap, our software prevents users from creating dirty data by limiting the amount of access they have. We use a variety of strategies to prevent the mistakes that lead to dirty data:

  • Pick lists eliminate input errors by forcing the user to make a choice
  • Automatic format verification (using regular expressions)
  • Business rules ensure that you don’t accidentally create a duplicate constituent or manually mark a donor as “lapsed” that has recently given a gift.

Our short term goal is to facilitate cleaning up of the data during implementation and our long term goal is to prevent the data from getting contaminated to begin with. Once our software is installed and tested with clean data, hopefully our strategy to keep data clean will prevent the data from becoming corrupted in the future. Find out more about Orange Leap here.

Explore posts in the same categories: Fundraising and Donor Relationship Management, Open Source, Orange Leap Developer Network, Orange Leap News and Happenings, Programming

Tags: , , , ,

You can comment below, or link to this permanent URL from your own site.

One Comment on “Dirty Data: What is it, how does it cause problems, and what is the solution?”


  1. This is a really awesome article. I found your blog from bing while looking for a similar subject material. I really liked what you had to say. Keep up the good work!


Leave a reply to real time analytics Cancel reply