Trusting AI, then eating humble pie

Using Machine learning (ML) and Artificial Intelligence (AI) in order to process data and as a result helping solve various issues of governmental importance are not new. Their use found itself in the development area as well, but there is a problem with it as it is not the panacea everyone would hope it would be.

Identifying the subject

In my previous blog post I have mentioned how AI can become discriminatory towards certain groups of individuals. It can become categorizing in the wrong way, expose or label people (or subjects) whose data is being analyzed and much more. In all that, a subject can be anyone of interest who is sharing data about pretty much anything: location,. name, spending, emotional breakdowns, new life updates, political views, critical weather forecasts, etc. Most of the AI uses inputs such as numeric value, textual or audio-visual media to be able to read and learn on the basis of those and their past in order to be able to predict the future behaviour, needs, etc. The problem with such identification is quite complex. According to the USAID report, (2018), Algorithmic mistakes often fall disproportionately on minorities and marginalized groups. Consider facial recognition, which is used in applications such as smart phone unlocking and mobile payments. Researchers have documented how commercial face detection systems often fail to notice dark-skinned faces (p.36) The reason for such inputs is behind the lack of diversity in the teams that actually create those AI systems and software. If you look Caucasian and everyone else on your IT team does, you will most likely take into account only those faces as a base for facial recognition. Given the fact, that facial recognition systems are becoming more wide-spread, especially in the juridical system of U.S. who use the crime predictions algorithms, the responsibility and quality of this technology should grow with it as well.

The Whiteness of AI

The idea of how white this smart and complicated AI technology is has been fed to us from media: the stock images that you will search for by the tag #AI or #ML will show you a very white-looking model, most likely the representative of the users of this technology, potentially with a robot covered with pure plastic-molded faces resembling the same users. This issue, according to Cave, and Dihal is opening door to another issue: the racialization of AI: (2020) not only does it prevent the discriminated group from active participation but also amplifies the dominance of those who have access to it. (p. 686) On the other hand, I might speculate that given that the AI has a personification of an image of a white-skinned humanoid might also be a cover for the decisions that are taken by those who stand behind the algorithm, who actually decide how the data is collected, processed and what is done with it.

Predicting Crime using AI

The cases of crime prediction have a long history with one intention: prevent crime before it even occurs. But how do we do it? We take the data from previously criminal-active places and make sure there are enough police officers to be on the watch in that are. What is the problem with this and what can go wrong? In compliance with the words of the professor from the University of Chicago and lead researcher of the algorithm, Ishanu Chattopadhyay, “The question is: To what degree does the past actually influence the future? And to what degree are the events spontaneous or truly random? … Our ability to predict is limited by that.”

Those findings and predictions are based on the previously given data from criminal reports, arrest records, etc in order to base the results on the previously committed crimes and incidents, predict with certainty of “90 %” the next crime. But this idea is in its core false, because the crime is then detected in the area where the police officers are happen to be and they arrest the Black or Latino residents for minor crimes, creating a feedback loop of socioeconomic and discriminatory bias that is already in the system. And in their turn, more serious crimes just go unnoticed in the neighborhoods that were not supposed to be suspicious because of lack of biased input. Besides, if the crime predicting rate is 90%, what about the other 10%? Moreover, as an important factor to this and other identifying systems: Yu claims, that that 1.1 billion people all over the world are estimated to not have the means to verify who they are, lacking the ‘legal identity’ necessary to participate in many of the functions of society (2006). This makes them not participatory in the decisions that only regard those who do have access, who do give consent to data collection, who willingly share it in order to participate in the statistics records. Hence, despite the amount of data given as an input, if it has gaps and with the lack of data creates a false assumption, those who were never under the radar will never even be detected if we are considering crime, and those whose voices are asking for help will not be heard if we regard development.

Tolerance for Error

Usually, ML is generalizing from the past experience and forms expectations about the future. In order to be able to learn fast, AI and ML in particular require large assets of data in order to be able to come up with an assumption. If given inputs that the AI has never dealt with before it can simply give fault or inappropriate outputs. The statistical data can also be wrong if not all the results are taken into account, here an example of the COVID-19 pandemic comes in handy, as the amount of not registered and notified positive patients is impossible to predict if they never let the sanitary authorities know about their status. A similar approach of collecting and processing data is found if we analyze the examples from the USAID report (2018), when women have traditionally faced discrimination in hiring, then an algorithm that scores resumes based on past hiring records will discriminate against women. (p.39) In this way, the next time women apply for jobs, the assumptions might not be made in their favor.

All decision systems make mistakes, and decisions made by machines can be just as fallible as those made by people. Relying on machines to make decisions requires honestly assessing the expected rates at which machine outputs will be incorrect — and whether those rates are acceptable. Automation may sometimes require tolerating more errors in order to reduce costs or achieve greater scale (p. 35) Those tolerances might not seem like a lot unless given a context: of crime, development of privacy.

ML and AI in development

Within the Development field, the AI has been used as a means of strengthening early warning systems for nutrition, conflict, food security, and others. (p. 18) And in this case making the calculation perhaps makes more sense as the computer can analyze the data over a large period of time and come up with an observation that a human might not even notice. The weak signals can be spotted and the conclusion can already be made by a human, given all the information and context. However, even if the initial task did not achieve the results of the project or in the process extra data was collected there are ways in order to “repurpose” data and not waste it, but recycle in another context, with another organization, which is not easy given the new GDPR law updates.

Cinnamon (2019) mentions the examples which use the AI for their findings, such as for civil registration, digital identity, and national data infrastructures, the use of data products produced from user-generated data, and personal behavioral data produced through interaction with corporate digital platforms, to draw attention to the diverse ways that data is associated with inequality of opportunity and harm, and finally, it covers efforts underway to address the data access, representation, and control divides that impede development. (p. 4) Moreover, there is also a risk of growing inequality. If people are not exposed to this technology and benefit from it they will be on the opposite side of those who are aware of it, know how to use it and have access to it. Many people are excluded from the new world of data and information by language, poverty, lack of education, lack of technology infrastructure, remoteness or prejudice and discrimination. (UN Data Revolution Group, 2014, p. 7) This notion regards the inequality of technological opportunities, which are impossible to equally control in the areas of interest. The non-inclusivity becomes a big issue when the “subject” do not even fully participate in the discourse.

How can AI be inclusive?

Since there are many users of this technology, one of the best approaches that we can take is to join forces and have responsible shared learning, partnering among governmental organizations, responsible data practices that are in line with the new GDPR laws and others. An increase of the quality of data will ensure that the regulatory framework is in balance with the collected data, and the learning more about the successes as much as about failures, which happen nonetheless will only benefit all the players of this data endeavor. Using Floridi’s (2007) term, uneven or biased representations as digital data are reontologizing our world: they are not only creating or recreating inequalities between groups such as men and women and places such as Global North and Global South, they are fundamentally changing the material world itself. (p. 223) Eliberating from the obvious discriminatory though perhaps unintentional practices can contribute to the diversity of the teams of creators of the AI technology that still has a long path until it proves to be bringing real useful and inclusive results.

Conclusion

AI and ML technology are a developing industry, the success of which does not solely depend on the amount of data and computers. The people who shape this tool have to think of it as of the tool to reach the goal rather than the medicine, or snake oil that would fit any injury or malady.  Since the discrimination is a pain point in this discourse, and considering that such projects have long term consequences it is important to bridge the existing gaps assuring those who are left out now to be included in the future. AI is a product of human creation, all the biases, priorities, discrimination and other features are transmitted directly into code. Many rely on AI and ML nowadays, but is not yet ready to fulfill the expectations of everyone, especially of those who do not use it or have limited access to. Having access to such technology can be the first high level barrier on the way to benefit from it and in even more important measure to contribute to a more realistic data pool from all kinds of potential users.

The hype of AI in the context of development can become what it has been expected to do, but it requires even more effort than it was dedicated to until now. Due to the lack of study in this topic there is a need to reinforce the collaboration of various experts in order to bridge the creators for such technological advancements, but also those who would benefit from it. Once again, the information sources do not solely depend on the personal ability to access the sources of data or their advancements. The information resources, or, according to Yu (2006) the information poverty is shaped due to structural inequalities from personal, but also cultural-societal factors and as well as political and economic factors, the latter being the defining ones.  (p. 232) The data may remain not complete or not even produced by those who was target in the development field, which might lead to biased, incorrect and not trustworthy results. The step that the decision makers have take is to assess the limitations of the technology they integrate in any part of the world, with its context, access limitations, data collection processes and repurposeness of the collateral data in collaboration with other actors. It is time to admit that AI will not feed everyone even if we ask them their cookie preferences.

 

 

Image credit: 50 of 87,517 face images collected by Google from Flickr for use in the Google FEC face expression classification dataset. Faces are blurred to protect privacy. Visualization by Adam Harvey /

References:

Cave, S.; Dihal, K. 2020: The Whiteness of AILinks to an external site. , Philosophy & Technology 33, 685-703.

Harris C., R. W. (2016). How ICT4D research fails the poor. Information Technology for Development, 22(1), 177–192

USAID 2018: Reflecting the Past, Shaping the Future: Making AI Work for International DevelopmentLinks to an external site. . Washington, DC: USAID.

Yu, L. (2006). Understanding information inequality: Making sense of the literature of the information and digital divides. Journal of Librarianship and Information Science, 38(4), 229–252

, The never-ending quest to predict crime using AI

 

Back to Top