10 Techniques to Clean Data in Microsoft Excel

by | Published on Oct 12, 2018 | Data Entry Services

Data cleaning, otherwise known as data cleansing or data scrubbing involves detecting and correcting corrupt, inaccurate, incomplete, and irrelevant records in a record set, table or database. The coarse data is then replaced, modified, or deleted. Partnering with experienced data cleansing companies help businesses maintain complete, accurate and up-to-data database. Microsoft Excel is the most popular software used to store valuable data. The stored data must be clean and reliable. By cleansing data, business enterprises can boost their customer acquisition efforts, improved decision making process, streamline business practices, save a lot of money, and ensure that employees are making the best use of their work hours. Clean and actionable data prevent the staff from contacting customers with out-of-date information or creating invalid vendor files in the system. Since the records are clean, it helps maximize the staff’s efficiency and productivity, and improves their response rates.

Clean Data

According to a survey conducted by Pharma Intelligence and sponsored by Oracle Health Sciences, lack of confidence in clinical trial data cause delay in drug development. Besides trial delays, 81 percent of respondents pointed out data governance issues as the biggest challenge in meeting regulatory compliance. The survey respondents said that the top three data issues were duplicate data/inconsistent data, data quality and data integrity/traceability. 51 percent cited data completeness, 45 percent said data quality and 43 percent, data cleaning are the top three operational challenges that were with the clinical trial data. Therefore, companies should focus more on improving quality of their data.

Some Techniques to Clean Data in Excel

The following are some of the techniques that can be used to clean data in Excel

    • Remove Extra Spaces– Extra spaces between words can be removed with the help of trim function.
      Syntax: =TRIM(Text)

      Microsoft Excel

      This function takes one single argument which could either be the text which you type manually or it could be the cell reference. Trim function removes all the leading spaces and trailing spaces and extra spaces between words except one single space that is allowed.

      Select and treat all blank cells – If you want to fill blank cells with ‘0’ or ‘Not Available’ or simply want to highlight it, select the entire data set; press F5, this opens the “Go To” dialogue box and click on special button at the bottom left. This again opens the “Go To Special” dialogue box. Then click blank and click ok. This selects all the blank cells in your data set at the same time. Now you have these cells in grey and the first cell is in white as this is the active cell. Then start typing “0” or “Not Available” in all these cells and hit “Control + Enter”.

Microsoft Excel
    • Convert Numbers Stored as Text into Numbers – When data is imported from text files or external databases, numbers get stored as text. Some people use an apostrophe before a number to make it text. This can lead to errors if you are using these cells in calculations. Here is how to convert these numbers stored as text back into numbers. Type 1 in any blank cell and then select the cell where you typed 1 and press Control + C. After that, select the cell or range which you want to convert to numbers. Select Paste – Paste Special or press Alt + E + S. At that time “Paste Special” dialogue box appears and select “Multiply” from the box. Click OK and this converts all the numbers in text format back to numbers.
Microsoft Excel
    • Remove Duplicates – This involves highlighting duplicate data and deleting it. To highlight duplicate data, select the data and go to Home, select Conditional Formatting, click Highlight Cells Rules and select Duplicate Values. Specify the formatting and all the duplicate values get highlighted. To remove duplicates, select the data and Go to Data and click Remove Duplicates. If the data has headers, ensure that the checkbox at the top right is checked. Select the Columns from which duplicates have to be removed and click OK. This removes duplicate values from the list.
    • Highlight Errors – You can highlight Errors in Data in Excel by using Conditional Formatting and Go To Special.Using Conditional Formatting – Select the entire data set. Then go to Home and click Conditional Formatting and select New Rule. In New Formatting Rule Dialogue Box select ‘Format Only Cells that Contain’ and in the Rule Description, select Errors from the drop down.Set the format and click OK. This highlights any error value in the selected dataset.
    • Using Go To Special – Select the entire data set and press F5, this opens the Go To dialogue box and click on Special Button at the bottom left. Select Formulas and uncheck all options except Errors. This selects all the cells that have an error in it.
    • Microsoft Excel
    • Change Text to Lower/Upper/Proper Case – When a workbook or data is imported from text files, sometimes, all the text could be in lower/upper case or a mix of both. However, it can be made consistent by using these three functions:LOWER() – Converts all text into Lower Case.
      UPPER() – Converts all text into Upper Case.
      PROPER() – Converts all Text into Proper Case.
Microsoft Excel
    • Parse Data Using Text to Column – When data is obtained from a database or imported from a text file, it may happen that all the text is cramped in one cell. It is possible to parse this text into multiple cells by using Text to Column functionality in Excel. For that, select the data/text you want to parse, Go to Data, click Text to Column, this opens the Text to Columns Wizard.Select the data type and click “Delimited” if your data in not equally spaced, and is separated by characters such as comma, hyphen, dot and click Next. Select Delimiter (the character that separates your data) and select pre-defined delimiter or anything else using the Other option and click Next. Then select the data format and the destination cell. If destination cell is not selected, the current cell is overwritten.
Microsoft Excel
    • Spell Check – In order to run a spell check for your data set, use the keyboard shortcut F7.
Microsoft Excel
    • Delete all Formatting – Select the data set and go to Home and click Clear and select Clear Formats.
Microsoft Excel
    • Use Find and Replace to Clean Data in Excel – Find and replace is crucial when it comes to data cleansing. It allows you to select and remove all zeros, change references in formulas, find and change formatting etc.
Microsoft Excel

The above techniques will ensure clean or accurate data in your Excel sheets. Data cleansing tools such as OpenRefine, WinPure, Trifacta Wrangler, Data Ladder IBM Infosphere Quality Stage also can help keep your data clean. A practical way to enhance your data integrity and work with accurate data is to utilize data cleansing services provided by a reliable data cleansing company.

Recent Posts

Market Research Data Entry: Why Precision Matters

Market Research Data Entry: Why Precision Matters

A wide range of industries relies on market research to gather crucial information about target audiences, competitors, and market opportunities. The data collected includes demographics, psychographics, behavioral insights, market trends, and customer feedback. The...

What are the Benefits of OCR-based Data Entry?

What are the Benefits of OCR-based Data Entry?

Business workflows in industries such as banking, healthcare, legal, and logistics involve large volumes of paperwork that take a lot of time to process. This has fuelled the demand for data entry outsourcing in these sectors. However, manual data entry can be slow,...

How to Overcome the Challenges of Manual Data Entry

How to Overcome the Challenges of Manual Data Entry

In healthcare, finance, or any other business field, data entry is essential to maintain accurate and up to date records. However, there are several challenges involved that might result in mistakes, inefficiencies, and higher expenses. Leveraging data entry services...

Share This