Donnerstag, 22. Dezember 2016

Machine Learning with Personal Data

Von Ralf Keuper

In ihrem Paper Machine Learning with Personal Data gehen die Autoren Dimitra Kamarinou, Christopher Millard und Jatinder Singh der Frage nach, inwieweit die Verfahren des Machine Learning bei der Verarbeitung personenbezogener Daten mit den GDPR kompatibel sind. 

Zur leitendenden Frage des Papers: 
In this paper we look at the concepts of ‘profiling’ and ‘automated decision-making’ as defined in the EU General Data Protection Regulation (GDPR) and consider the impact of using machine learning techniques to conduct profiling of individuals. More specifically, we look at the right that individual data subjects have not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning them or significantly affects them. In addition, we also look at data subjects’ right to be informed about the existence of automated decision-making, including profiling, and their right to receive meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing.  
Further, the purpose of this paper is to explore how the first data protection principle (requiring that processing be lawful, fair, and transparent) may or may not be complied with when machine learning is used to carry out profiling. We argue that using machine learning for profiling may complicate data controllers’ compliance with their obligations under the GDPR but at the same time it may lead to fairer decisions for data subjects.
Auf den folgenden Seiten werden die verschiedenen Risiken benannt, denen Data Processors ausgesetzt sein können. Wie in anderen Rechtsgebieten auch, ist es häufig Auslegungssache, ob ein Data Processor gegen die Bestimmungen der GDPR verstösst. Hinzu kommt, dass die Bestimmungen sich nicht selten widersprechen bzw. von den Data Processors die Quadratur des Kreises abverlangen:
Complying with the principle of data minimization, even at the time of the processing itself, may be particularly problematic given that the effectiveness of many machine learning algorithms is dependent on the availability of large amounts of data.
Was die Aussagekraft und Objektivität von Algorithmen automatisch durchgeführter Entscheidungen bzw. Bewertungen betrifft, vertreten die Autoren eine etwas gewöhnungsbedürftige Sichtweise:
Considering all the uncertainty involved in appeals by data subjects to a human to contest a decision that has significantly adversely affected them, might it perhaps be fairer for individuals to have a right to appeal to a machine instead? This may sound strange at first, as machines are designed by humans and may carry within them the values and subjectivity of their designers in a way that may make them as unsuitable as humans to review such decisions. However, machine learning algorithms have the potential to achieve a high level of objectivity and neutrality, whereby learning techniques can be made to disregard factors such as age, race, ethnicity, religion, nationality, sexual orientation, etc., if instructed to do so, more effectively than humans, as shown in part one of this paper. Moreover, it might be appropriate for the machine-learned models through which decisions are formulated to be reviewed subsequently by other algorithms designed to facilitate auditing.
Die Algorithmen sollen sich demnach in letzter Konsequenz selbst regulieren. 

Dazu passt einige Seiten später:
Beyond the fact that some machine learning algorithms are non-transparent in the way they are designed, opacity might also be the consequence of online learning in the sense that the algorithms can ‘update their model for predictions after each decision, incorporating each new observation as part of their training data. Even knowing the source code and the data (...) is not enough to replicate and predict their behavior”. It is also important to know the precise inputs and outputs to any machine learning system. Needless to say, analysing how a learned model works becomes even more difficult when either the code, its build process, the training data and/or the ‘live’ input data are hidden. Such opacity may result from the fact that certain algorithms are protected as trade secrets or that their design is based on a company’s proprietary code.
Das macht die Sache nicht einfacher bzw. führt das eigentliche Dilemma vor Augen. Insofern ist man geneigt, Yvonne Hofstetter zuzustimmen, die eine Treuhandstelle für Algorithmen fordert. Einen gewissen Charme versprüht auch die Idee sog. Algorithmic Angels. Erwähnenswert in dem Zusammenhang sind die Gedanken von Eric Siegel in 9 Bizarre and Surprising Insights from Data Science.

Schlussfolgerung der Autoren:
To be compliant, data controllers must assess how using machine learning to carry out automated processing affects the different stages of profiling and the level of risk to data subjects’ rights and freedoms. In some cases where automated processing, including profiling, is permitted by law, data controllers still have to implement suitable measures to safeguard the data subjects’ rights, freedoms and legitimate interests. Such measures will include preventing machines making decisions before data subjects can express their point of view, allowing for substantive human review when a decision is made by a machine, and ensuring that data subjects can contest the decision. The underlying objective in the Data Protection Directive (and apparently in the GDPR) is that a decision significantly affecting a person cannot just be based on a fully automated assessment of his or her personal characteristics. In the context of machine learning, however, we contend that, in some cases, it might be more beneficial for data subjects if a final decision is, indeed, based on an automated assessment.
Insgesamt ein wichtiger Beitrag, der Licht in die Diskussion bringt.  

Keine Kommentare:

Kommentar veröffentlichen