Data Cleaning

 

OpenRefine Logo   OpenRefine
OpenRefine is a free tool that enables users to import, clean, and transform data. Its functions include formatting and transforming imported data, filtering and grouping related data through
automated clustering, identifying and correcting inconsistencies in data cells, and augmenting data with other publicly available information online. 
Trifacta Wrangler Logo   Trifacta Wrangler
Trifacta Wrangler, formerly Data Wrangler, helps users
standardize, structure, and clean up data sets, including
formatting cells, eliminating errors or missing fields, and
validating data. The software also makes suggestions for
predictive transformations that allow users to edit and evaluate data, and it generates automated interactive visualizations to present findings or do more in-depth data exploration.   
R Project Logo   R
R is a free statistical analysis and computation software program that performs linear and generalized linear models, nonlinear
regression models, time series analysis, classical parametric and nonparametric tests, clustering, and smoothing. It also aids users in creating data visualizations and graphics. 
Python Logo   Python
Python is a programming language for basic and advanced
coding. The website features tutorials and guides for coding, along with a library of the core language and semantics used in the system.
ProPublica Logo   ProPublica Dollars for Docs app
ProPublica created this series of online guides for scraping and cleaning data from websites using various software programs,
including Google Refine, Firebug, Ruby, Nokogiri, Terreract, and Adobe Acrobat. The guides detail steps ProPublica journalists took in producing its Dollars for Docs app, which examines
payments from pharmaceutical companies to physicians and hospitals.