The data we are interested in is typically heterogeneous, massive, rapidly evolving, multi-modal, and often dirty with errors such as missing values, duplicates, uncertainties, and outliers. It is extracted from the Web, social and wireless networks, connected objects, sensors and scientific applications.