EthicalML: Injecting Ethical and Legal Constraints into Machine Learning Models

Our choice as to which movies to watch or novels to read can be influenced by suggestions made by machine learning (ML)-based recommender systems. However, there are some important scenarios where ML systems are deficient. Each of the following scenarios involves a situation where we wish to train an ML system so that it delivers a service. In each case, however, there is an important constraint that must be imposed on the operation of the ML system. Scenario 1: We want a system that will match submitted job applications to our list of academic vacancies. The system has to be non-discriminatory to minority groups. Scenario 2: We need an automated cancer diagnosis system based on biopsy images. We also have HIV test results, which can be used at training time but should not be collected from our new patients. Scenario 3: We wish to have a system that can aid us in deciding whether or not to approve a mortgage application. We need to understand the decision process and relate it to our checklist such as whether or not the applicant has an overdraft in the last three months and is on electoral roll.

Scenario 1 asks an ML system to be fair in its decisions by being non-discriminatory with regards to, e.g., race, gender, and disability; scenario 2 requires an ML system to protect confidentiality of personal sensitive data; and scenario 3 demands transparency from an ML system by providing human-understandable decisions.

Equipping ML models with ethical and legal constraints, scenarios 1-3, is a serious issue; without this, the future of ML is at risk. In the UK, this is recognized by the House of Commons Science and Technology Committee, which recommended an urgent formation of a Council of Data Ethics (“The Big Data Dilemma” report, 2016). Furthermore, since 2015, the Royal Society has started a policy project that looks at the social, legal, and ethical challenges associated with advancement in ML models and their use cases.

Building ML models with fairness, confidentiality, and transparency constraints is an active research area, and disjoint frameworks are available for addressing each constraint. However, how to put them all together is not obvious. My long-term goal is to develop an ML framework with plug-and-play constraints that is able to handle any of the mentioned constraints, their combinations, and also new constraints that might be stipulated in the future.

The proposed ML framework relies on instantiating ethical and legal constraints as privileged information. This privileged information is available at training time to better train a decision model and to make a decision model non-discriminatory, but it will not be accessible for future data at deployment time. For confidentiality constraints, personal confidential data such as HIV test results are the privileged information. For fairness constraints, protected characteristics such as race and gender are the privileged information. For transparency constraints, complex un-interpretable but highly discriminative features such as deep learning features are the privileged information.

This project aims to develop an ML framework that produces accurate predictions and uncertainty estimates about its predictions while also complying with ethical and legal constraints. The key contributions of this proposal are: 1) a new privileged learning algorithm that overcomes limitations of existing methods by allowing to plug-and-play various constraints at deployment time, by being kernelized, by optimizing its hyperparameters, and by producing estimates of prediction uncertainty, 2) a scalable and automated inference that makes the new privileged learning algorithm easily applicable for any large scale learning problem such as binary classification, multi-class classification, and regression, and 3) an instantiation of the new algorithm for incorporating fairness, confidentiality, and transparency restrictions into ML models.

Planned Impact

Advancement in ethically and legally aware machine learning (ML) models has broad implications. In this project, I will focus on engaging with users in the following domains:

  1. Predictive Policing Predictive policing refers to “computer systems that use data to forecast where crime will happen or who will be involved” (Upturn’s report, 2016). In the UK, the effectiveness of predictive policing is widely reported, for example, Strathclyde Police cited a reduction in domestic violence reoffenders (Joe Newbold’s report in 2015). The fact that predictive policing technologies rely on historical and inherently biased crime data to build ML models raises several ethical and legal concerns such as fairness and transparency. Upturn’s 2016 report on “Early Evidence on Predictive Policing and Civil Rights” concluded that the predictive policing tools, which are currently designed and implemented, reinforce discriminatory policing practices. This is a serious issue that puts the future of predictive policing at risk despite its success in reducing crime. This project will develop an ML model that corrects past biases via non-discriminatory fairness constraints.

  2. Healthcare Analytics For health workers, hospitals and governmental decision makers, it is extremely useful to have tools to predict healthcare problems such as maternal mortality rates and cancer. The AI and Life in 2030 Report stressed that ML-driven applications need to “gain the trust of doctors, nurses, and patients”. The proposed ML framework acknowledges the need to use patient confidential data only in a strict need-to-know basis and to not use them in the deployed system. Furthermore, the transparency constraints will aid health professionals in their decisions and will steer clear of the statement “because the computer said so”.

  3. Improving the skills base “The Big Data Dilemma” report has urged immediate action to tackle the crisis of data analytics skills. The appointed PDRA will develop skills and experience in data analytics, collaboration, scientific and public presentations, and organization of workshop and stakeholder meetings.

  4. General public This project will have societal impact by informing both ML enthusiasts and sceptics about the reality that ML technologies have permeated our everyday life, and that there is an active push within the ML community to develop models that respect ethical and legal constraints.

Although not a direct focus, I recognize the long-term implications of the study in the following areas, and will be alert to any opportunity for establishing links for future actions:

  1. Human Resource Analytics Companies and universities use ML models on candidates' background information (e.g. application form data including disability) and employee data to predict whether this candidate should be hired.

  2. Mortgage Approval Similarly, lenders use ML models on borrowers' background information, including licensed data such as credit score information, to predict whether it is risky to extend a mortgage offer.

  3. Insurance Premium Setting Also, insurance companies use ML models on applicants' driving history and biographical data to predict the driver type of an applicant, and subsequently to set an insurance premium accordingly.

An algorithmic assessment method, which is used for predicting human outcomes such as recruitment (5), loan approval (6), and insurance premium (7), contributes to a world with decreasing human biases. To achieve this, however, we need advanced ML models that are free of algorithmic biases (fairness), despite the fact that they are trained based on historical and biased data. Additionally, the deployed models should not collect personal sensitive data (confidentiality). Furthermore, in an interactive mode, where humans can check the computer’s judgment, understanding the reasons behind predictions made by ML models (transparency) is a prescription for improved collaborative decisions.

Dr. Zexun Chen
Dr. Zexun Chen
Lecturer/Assistant Professor

Mathematics + Data + Me = Magic

comments powered by Disqus