Data analysis

Information management can be defined as the discipline that is responsible for everything that is related to obtaining information, its processing and the dissemination of knowledge that can be extracted from data.

 Thanks to technological advances, we can gain valuable knowledge from the available information, and the management of this knowledge is essential to support the decision-making process in organizations. The difficulty now lies in knowing how to store and process the large volume of data in a reasonable time. Technologies such as Big Data, Data Mining and non-relational databases make the management of this information easier.

<p>Apart from the large amount of data and their fast growth, companies and governments are making its public information available to everyone to be reused and to promote, in this way, the generation new applications and services. Therefore, the challenge of private and public organizations is to achieve the ideal data management to transform data into intelligent and useful information, providing support to  decision-making processes.

Big Data

This concept includes the utilization, manipulation and exploitation of large volumes of structured and unstructured data so that a company can process and transform it into a competitive advantage.
Big data is composed by a term for data sets that are so large or complex that traditional data processing applications and software are inadequate to deal with them.
The main features of Big Data are known as the five Vs:

  •   Volume: the amount of data generated and stored let us determine whether it can be considered Big Data or not.
  • Variety: the diversity of the type of data is very wide. Data types can be structured, unstructured or semi-structured; and they can come from text, images, sensors, audio or video files, log files, etc.
  •   Velocity: data obtained are processed and analyzed in real time, so that we need an immediate response by the system.
  •   Veracity: it is necessary to check the authenticity and reliability of data, considering their origin.
  •   Value: despite having a large amount of data, it is important to know how to extract those that are correct. Having lots of data is not enough if we do not know how to read and use them.

Data Mining

Data Mining represents a technology to create predictive and descriptive models based on available data. All these data are transformed into useful knowledge to support decision-making processes in organizations through the determination of patterns or models.

The non-trivial process of identifying previously unknown, valid, new or potentially useful and comprehensible patterns within data is known as Knowledge discovery in Databases (KDD).

Non-relational databases (NoSQL)

Using relational databases, the access to large volumes of date can be quite costly due to a significant drop in speed and performance. In this context, non-relational databases provide faster access to data.

Facing with large storage needs, vertical scaling has limits of hardware. However, NoSQL is based on a horizontal scalability that allows, for example, to add another server without losing the availability of the rest of the system.

Although companies continue to require the integrity and efficient structure offered by relational databases, NoSQL model solves another type of storage needs. Therefore, both models can perfectly coexist and be valid in software developments.