Before talking about the tracking app for Coronavirus, let's remember the lesson of a computer scientist in our country. Today four years have passed since the death of Gianni degli Antoni, one of the founders of Italian information technology and above all of the faculty at the University of Milan. A brilliant character with a complex temperament, a degree in Physics, the soul of the third outbreak of Italian computer science (the other two were the Normale di Pisa and the Politecnico always in Milan), he was known by all as gda. And who, like the reporter, has had the opportunity to meet him and work with him can only confirm. And remember.
In fact, one of the things that gda often said was: Too many data produce the effect too much grace Saint Anthony. That is to say: when entering data into a system to process it, increase the quantity not necessarily an asset. This counterintuitive both from a theoretical and practical point of view. In theory, it is thought that the more data there is and the better for processing purposes: it is an algorithm that manages the sampling of a sound and a system for monitoring the movement of people for the purpose of preventing the spread of the coronavirus.
In practice, you can see the big names like Amazon, Facebook, Google and Microsoft collecting data and it is known that even if they still don't know how to use them all, then they will find a purpose. Both reasons are wrong, especially if we think of the phantom coronavirus app, and here we see why.
Data, information, knowledge
Meanwhile, a small useful distinction: we talk about data, information and knowledge in an almost interchangeable way, but they are three profoundly different concepts even if connected. A given a unique, decontextualized value: a dimensionless or dimensioned number. 33 (number and that's it, that is dimensionless), or 1.5 Kg (sized by weight). An information a contextualized datum: The jar weighs 1,5 Kg. Knowledge is the connection between one or more information: temporal links, of cause and effect, geographic.
While data and information are recorded and processed regardless of their user, knowledge requires interpretation, that is, the operator's ability to make sense of it. This operator can be human or can, within certain limits, be an artificial intelligence system. End of clarification on the meaning of words. Now let's see the problem.
Too much data is bad
It sounds like a paradox, but too much data is not good, quite the contrary. Too much grace, SantAntonio: too much data is bad. But not an intuitive explanation: it requires competence on how the processing takes place. In the amount of data, information is lost. As a result, knowledge cannot be extracted, therefore the desired result is not achieved. Because the most expensive moment is that of processing, and data analysis requires time and computing power, suitable algorithms and efficiency. Submerging data involves diluting information, which can even practically disappear. a bit like the famous needle in the haystack, only we decide how much straw to put together before starting the search. Little straw and there are no needles to find, but too much straw and the needles are lost.
An objection to all this may be that in so reasoning, in reality, the existence of big data and the mechanisms for collecting and analyzing data contained for example in On-lake or similar systems. The point is that they are systems built to work on other types of analysis and with other objectives. Without considering that there is a problem known by professionals for fifty years. the problem described by Richard Bellman in 1961: creating a statistical model from a mass of data requires an analysis of the data which, if a certain threshold is exceeded, creates an enormous effort to obtain the model without adding absolutely nothing to it. For this reason, the data are reduced with the statistical technique of dimensional reduction. Doing more would mean exploding in terms of calculation times, the amount of storage and exposure of privacy without adding anything to the final result.
But social media?
Social media and in general the big names in tech, for, like Amazon, Facebook, Google and Microsoft, store everything for their profiling. Why? On the one hand because they are broad spectrum profiling, and then because the big ones tend to have more storage capacity: they analyze and treat data at different levels and keep the relevant part, throwing away everything else. Absolutely not oriented towards scientific completeness, but aimed at commercial pragmatism. They have other purposes and for them the data can also be oversized, but archived for potential future uses not always pleasant at least from the point of view of privacy.
We come to the tracking app for Coronavirus
The app that tracks people's movements to be able to understand in case of coronavirus who may have been at risk, an app that serves to accumulate data to be treated as information to then reach, through the work of the operators, a form of knowledge useful for contain coronavirus outbreaks. All this while respecting privacy.
The idea that privacy is in opposition to the need to track people to prevent contagion, that freedom on the one hand and security on the other is a false problem. Indeed, a dangerous one, because it simplifies incorrectly. The contrast not between freedom and security, but requires a modality also technically contained and carefully designed to avoid the effect of Too much grace, SantAntonio. And the criterion of proportionality and limitation of the data collected is not only a need for privacy (which would potentially create a conflict with the effectiveness of the app) but also a technical requirement. Too much data would only be useless for computation purposes and would create problems without adding quality to the knowledge gathered.
We live in the era of potential surveillance, but we looked for it. In the sense that it is not necessary. And that the drive to have more and more pervasive tools and that trace all possible behaviors does not derive from technical requirements on the declared purposes opposed to a misunderstood sense of the private, as from further use for undeclared reasons. A good example in this sense Apple which, without making so much confusion, uses the data strictly necessary and in a modular way to obtain results comparable to those of the information buccaneers, the four mentioned above.
The Coronavirus tracking app that is designed by multiple entities pending a synthesis certainly has significant technical challenges (putting together a criterion for collecting data in a contextualized way, that is information). But it also poses a design policy problem. In an Italy (and Europe) in lockdown and then potentially released in stages, it is a crucial tool. Using the heavy hand to create an extremely sensitive and capable of collecting mountains of data would go in the opposite direction to that desired by those who asked for it and by the citizens who will use it.
It would be a question of creating a coronavirus tracking app, but more generally a social control tool that would not meet na regulatory requirements or technical needs but would actually give freedom to other purposes with the excuse of the moment of emergency and the need for safety for all as a cause for limiting everyone's freedom. An unjustifiable choice from both an ethical and technical point of view. And with an aggravating circumstance: the crass and devastating ignorance of computer culture by politics and very large slices of citizenship creates the cone of shadow within which a layer of technocrats and unscrupulous entrepreneurs can find space and feed themselves in abundance on those who for them they have basically become free lunches.
Instead, you need data but relatively few and above all the right ones. The rest is just an abuse and a violation of the law as well as of principles.
All the macitynet articles that talk about the coronavirus pandemic and the impacts on the world of technology, work and distance learning, as well as on the solutions to communicate remotely, start from this page.