Our data cleansing services include directly modifying the data by detecting and correcting errors and inconsistencies.
A leading outsourcing company, based in Tulsa, Oklahoma, Managed Outsource Solutions (MOS) is dedicated to providing cost-effective solutions for businesses in diverse industries – Medical, Legal, Media, Information Technology and more.
In today’s podcast, Julie Clements, the Chief Solutions Manager at MOS discusses some important techniques to clean data in Microsoft Excel.
Read Transcript
Simply according to this, is just about how, you know, cleaning your data bases could really play a vital role in better decision making, because the cleanliness of an Excel spreadsheet can really help clear things out for everybody.
So, data cleaning, otherwise known as data cleansing or data scrubbing involves detecting and correcting corrupt, inaccurate and incomplete, and irrelevant records in a record set, table or database. The coarse data is then replaced, modified, or deleted. Partnering with experienced data cleansing companies help businesses maintain complete, accurate and up-to-data databases. Microsoft Excel is the most popular software used to store valuable data. The stored data must be clean and reliable and by cleansing data, business enterprises can boost their customer acquisition efforts, improve decision making process, streamline business practices, also save a lot of money, and ensure that employees are making the best use of their work hours. So, clean and actionable data prevent the staff from contacting customers with out-of-date information or creating invalid vendor files in the system. So the records are clean, it helps maximize the staff’s efficiency and productivity, and also improves their response rates.
According to a survey conducted by Pharma Intelligence and sponsored by Oracle Health Sciences, lack of confidence in clinical trial data cause delay in drug development. So this is big. So, besides trial delays, 81 percent of respondents pointed out data governance issues as the biggest challenge in meeting regulatory compliance. The survey respondents said that the top three data issues were duplicate data/inconsistent data, data quality and data integrity and traceability. So, 51 percent cited data completeness, 45 percent said data quality and 43 percent, data cleaning are the top three operational challenges that were with the clinical trial data. Therefore, companies should really focus more on improving the quality of your data.
So, some techniques to clean data in Excel.
One of the easiest ones to do is just remove some extra spaces between words can help trim the function is a
Syntax: =TRIM(Text)
This function takes one single argument which could either be the text which you type manually or it could be the cell reference. Trim function removes all the leading spaces and trailing spaces and extra spaces between words except one single space that is allowed.
So, select and treat all the blank cells – If you want to fill blank cells with ‘0’ or ‘Not Available’ or simply want to highlight it, select the entire data set; press F5, this opens the “Go To” dialogue box and click on the special button at the bottom left. This again opens the “Go To Special” dialogue box. Then click blank and click ok. This selects all the blank cells in your data set at the same time. Now you have these cells in grey and the first cell is in white as this is the active cell. Then start typing “0” or “Not Available” in all these cells and hit “Control + Enter”.
Another thing is to convert numbers stored as text into numbers – So, when data is imported from text files or external databases, numbers get stored as text. Some people use an apostrophe before a number to make it text. This can lead to errors if you are using these cells in calculations. Here is how to convert these numbers stored as text back into numbers. Type 1 in any blank cell and then select the cell where you typed 1 and press Control + C. After that, select the cell or range which you want to convert to numbers. Select Paste – Paste Special or press Alt + E + S. At the same time “Paste Special” dialogue box appears and select “Multiply” from the box. Click OK and this converts all the numbers in text format back to numbers.
Another thing is to remove duplicates – Duplicates is pretty easy to remove. This involves highlighting duplicate data and deleting it. To highlight duplicate data, select the data and go to Home, select Conditional Formatting, click Highlight Cells Rules and select Duplicate Values. Specify the formatting and all the duplicate values get highlighted. To remove duplicates, select the data and Go to Data and click Remove Duplicates. If the data has headers, ensure that the checkbox at the top right is checked. Select the Columns from which duplicates have to be removed and click OK. This removes duplicate values from the list.
Another thing is highlight errors – You can highlight Errors in Data in Excel by using Conditional Formatting and Go To Special.
Using Conditional Formatting – Select the entire data set. Then go to Home and click Conditional Formatting and select New Rule. In New Formatting Rule Dialogue Box select ‘Format Only Cells that Contain’ and in the Rule Description, select Errors from the drop down.
Select the format and click OK. This highlights any error value in the selected dataset.
Using Go To Special – Select the entire data set and press F5, this opens the Go To dialogue box and click on Special Button at the bottom left. Select the Formulas and uncheck all options except Errors. This selects all the cells that have an error in it.
To change text – Another thing here is to change the text to a lower, upper or proper case. When a workbook or data is imported from text files, sometimes, all the text could be in lower or upper case or a mix of both. However, it can be made consistent by using these three functions:
LOWER with the parentheses () converts all text into Lower Case or UPPER or PROPER
Also, to parse data using text to column – When data is obtained from a database or imported from a text file, it may happen that all the text is cramped into one cell. It is possible to parse this text into multiple cells by using Text to Column functionality in Excel. For that, select the data/text you want to parse, Go to Data, click Text to Column, this opens the Text to Columns Wizard.
Select the data type and click “Delimited” if your data is not equally spaced, and is separated by characters such as comma, hyphen, dot and click “Next”. Select Delimiter (the character that separates your data) and select pre-defined delimiter or anything else using the Other option and click Next. Then select the data format and the destination cell. If destination cell is not selected, the current cell is overwritten.
Also, run a Spell Check – In order to run a spell check for your data set, use the keyboard shortcut F7.
To delete all formatting, select the data set and go to Home and click Clear and select Clear Formats.
Another thing is to use Find and Replace to clean the data in Excel – To Find and replace is crucial when it comes to data cleansing. It allows you to select and remove all zeros, change references in formulas and find and change formatting etc.
So, the above techniques will ensure clean or accurate data in your Excel sheets. Data cleansing tools such as OpenRefine, WinPure, Trifacta Wrangler, Data Ladder IBM Infosphere Quality Stage can also help keep your data clean. A practical way to enhance your data integrity and work with accurate data is to utilize data cleansing services provided by a reliable data cleansing company.
If you wanna learn more about our services, you can visit our website at www.managedoutsource.com.
Thank You so much! Bye Bye!
Read our Blog Post “10 Techniques to Clean Data in Microsoft Excel“