Data Cleaning
OpenRefine OpenRefine is a free tool that enables users to import, clean, and transform data. Its functions include formatting and transforming imported data, filtering and grouping related data through automated clustering, identifying and correcting inconsistencies in data cells, and augmenting data with other publicly available information online. |
||
Trifacta Wrangler Trifacta Wrangler, formerly Data Wrangler, helps users standardize, structure, and clean up data sets, including formatting cells, eliminating errors or missing fields, and validating data. The software also makes suggestions for predictive transformations that allow users to edit and evaluate data, and it generates automated interactive visualizations to present findings or do more in-depth data exploration. |
||
R R is a free statistical analysis and computation software program that performs linear and generalized linear models, nonlinear regression models, time series analysis, classical parametric and nonparametric tests, clustering, and smoothing. It also aids users in creating data visualizations and graphics. |
||
Python Python is a programming language for basic and advanced coding. The website features tutorials and guides for coding, along with a library of the core language and semantics used in the system. |
||
ProPublica Dollars for Docs app ProPublica created this series of online guides for scraping and cleaning data from websites using various software programs, including Google Refine, Firebug, Ruby, Nokogiri, Terreract, and Adobe Acrobat. The guides detail steps ProPublica journalists took in producing its Dollars for Docs app, which examines payments from pharmaceutical companies to physicians and hospitals. |