What are some good articles on algorithms

Big data and political education

Katharina Zweig

To person

Dr. Katharina Zweig is professor for socio-informatics at the Technical University of Kaiserslautern. Due to her interdisciplinary training in biochemistry, bioinformatics and statistical physics, she says she is interested in statistically significant patterns in complex networks. Her particular research interest is the development of a principle-based network analysis and the question of how computers can support people in solving complex problems.

Contact: [email protected]

Our living environments are increasingly designed on the basis of algorithms. Algorithms determine which online messages we receive, which product advertising reaches us or whether we get a loan - all of this without the user having any insight into the underlying decision-making processes.

The increasing datafication poses new challenges for computer science, but also for the social sciences and media education. Claudia Mikat from the magazine "tv diskurs" spoke with Dr. Katharina Zweig, Professor of Socioinformatics at the Technical University of Kaiserslautern.



Claudia Mikat: You founded the "Socioinformatics" course and are primarily concerned with the effects of informatics on society. What is your particular research interest?

Katharina Zweig (& copy Katharina Zweig)
Dr. Katharina Zweig: It seems that only in the last few months have I understood what my real research interest is. In the beginning I used mathematical methods to interpret my data. As biochemists, we had little mathematical and algorithmic training and we had to rely on someone to explain to us when we can use which method to interpret our data. Then I became a computer scientist myself and started developing algorithms that put a certain property of people or organizations into a number. In social networks, for example, there is the so-called centrality index, a number that is assigned to each person and interpreted as the power, the central position that this person holds. In the USA there is what is known as the segregation index, which determines the extent to which residential areas are only occupied by people of a certain race. Such measures are used in many subjects, e.g. Sometimes misinterpreted. Understanding when to use which centrality indices and what conclusions can be drawn from them has cost me the last ten years.

Let's talk about big data and big data analytics, the increasing collection and analysis of data. How would you explain what is happening to non-computer scientists?

For the first time, digital systems give us the opportunity to observe activities and explicitly perceive information. Even in the old corner shop, people knew what we like to buy and when we do it. From this you could draw conclusions and maybe even order a product that you thought would fit the customer well. Nowadays we no longer need to rely on intuition, we can collect a lot more information while you are shopping - of course only if you do it on the Internet or if you are kind enough to use a Payback card. One aspect of big data is that you initially collect data without a purpose and then try to determine afterwards whether the information helps predict which products you will buy or which items you are interested in. In the 1990s and early 2000, it was believed that if we protected our data, everything would be fine - but that's only part of the story. Even data that is completely harmless in and of itself has a lot of potential to infer something about us when connected together.

What information can be obtained from data that are insignificant in their own right?

A study in the US has shown that some people's sexual orientation can be read from Facebook data, even if that person has not disclosed it. This goes through making friends with those who are more generous with this information. One can find out through our shopping behavior, in which time zone we are and whether we are impulse buyers or not. You can tell pretty quickly how much money we are likely to have, how old we are, how many children we have, whether we are pregnant or just divorced. All of this information is relatively easy to get out with shopping and media usage data.

What conclusions can be drawn from media use?

For example, if a Netflix customer borrows Bob the Builder and Sex and the City, the customer ID is most likely a mother who is borrowing movies for her child and herself. There are films in the United States that are very Christian and that a certain group of evangelical Christians watch a lot. Then there are films that suggest a political inclination, that are more liberal in nature, or those that deal with the outing of young people. We often underestimate the fact that people behave very uniformly in their individual interests. When we have an interest in Star Wars, we act like almost everyone who has an interest in Star Wars. Then you can link the individual information with other data.

The upcoming GMK forum - "Software takes command" - will critically deal with the fact that our living environments are increasingly being designed on the basis of algorithms. How do you assess the current development? Can algorithms gain power over us?

Algorithms become interesting when they are used to make decisions or to prepare decisions. In the future, algorithms will definitely be used to make decisions about the lives of others. This is not entirely new. The Schufa also has an algorithm to calculate my creditworthiness. Whether this calculation is done by humans or by a computer is not really that important. There is a great chance that algorithms will be more objective than people, but there is also the risk that they are poorly made and many people are treated unfairly. In the USA there are e.g. B. Algorithms designed to help judges set the sentence. These supporting decision-making systems calculate a key figure for the relapse rate of an individual. For a criminal who stole a handbag for the third time, all possible parameters are fed into this algorithm, which then signals “green”, “yellow” or “red” - depending on how prone to relapse he thinks the person is. I took a look at the mathematics behind it and the algorithms that lead to these key figures - in my opinion this is done dangerously bad, but it looks very objective on the outside.

Can you explain the risks in more detail? At the medien impuls event on artificial intelligence in May 2016 you said: "The fact that algorithms never miscalculate does not mean that they are always right." What do you mean by that?

What matters is which variables are included in the decision. We have decided in Germany that unchangeable properties are not included and that we cannot be judged according to our gender or age. The example from the USA shows discrimination against citizens with African ancestry. In the case of defendants from this population group, the algorithm suggests significantly more often that they are likely to relapse. This is interesting, because, of course, in the USA, too, one cannot feed the race written down in the documents as a variable into an algorithm - with us such a classification of people would not be permitted and would not be found in any document. In any case, there are other variables that correlate so much with it, such as school education, social status or single parenting, that this unchangeable characteristic is implicitly included. The example shows that when socially important decisions are made, there must be a certain degree of transparency about which type of data is used.

So should we as a society be particularly careful in this area of ​​application?

Yes I think so. In my experience, people either completely distrust computers or trust them completely. There are many systems that support us in our work decisions, such as the autopilot or the parking assistant, here we are happy to hand over responsibility to the machine. In the case of the recidivism rate and the US judges, it is much more complex. The machine calculates that a person has a 70% risk of relapse, and people rely on it. One problem is that a judge on his own makes the lesser mistake if he follows the suggestion of the machine and puts the person in question in jail on "red" instead of listening to his perhaps opposite gut feeling. Nevertheless, "red class" means that only 70% of the people will relapse again, 30% of them would not have become. When making decisions about human life, the use of algorithmic decision-makers must in any case be democratically legitimized.

Making algorithmic decision-making processes comprehensible is the central concern of the Algorithm Watch initiative, which you helped to establish. What exactly is meant by this?

At the moment there is too much hysteria on the one hand and on the other hand systems are already being used that are highly questionable. As an Algorithm Watch, we want to educate, network the interested people, examine algorithms and support the process towards democratic control. Algorithm Watch does not require that codes be disclosed or that each and every one of us understand why a lending algorithm made this or that decision. I don't have to understand how a car engine works, but have confidence in the TÜV that it tests products properly. Not everyone should have to decide about algorithms. But I would like us as a society to come to an agreement on which algorithms we consider to be worth checking and necessary and to set up the appropriate quality processes that ensure that algorithms make sensible decisions. China, for example, wants a measure that decides whether you are a good citizen. Algorithm Watch is the answer to the fact that these things have to be negotiated democratically and also placed under a democratically legitimized supervisory authority.

What specific demands would you have on politics from the perspective of Algorithm Watch, so that such an institution could work efficiently?

We need to know where algorithms are used to make decisions, especially in the state sector. How exactly is the decision made as to whether or not to get a visa and what kind of data is used for this? We would have to discuss which algorithms we consider socially relevant that they can have an impact on things that we have long established as social goals. And we have to talk about what it is worth to us to have these systems. Because, as I said, there is also the chance that everything will become more objective. So far, the judge has some kind of decision-making process in his head, but we can't really look into it. With the machines, we would have the chance to understand the decision better.

Let's stick with the opportunities offered by systems based on algorithms: What applications do you have in mind here?

The school algorithms, which could ensure that we all have individual learning successes, of course also offer a huge opportunity to truly live inclusion. A child could stay in the same class structure throughout the entire school year and have the same people around, because we could just make sure that each student got their own individual program on the screens in front of them. If this is done well and properly accompanied by teachers, it could be very helpful. But it would also be possible that the machines help us to form more homogeneous learning groups from the outset so that we can teach more efficiently. Integration could of course also be improved with the help of digital algorithms. When I think of the many children who have fled to us and do not speak our language - if these systems already existed to help them translate something into their mother tongue during class, that would offer many opportunities for inclusion and integration.

What special opportunities are there in current datafication when you think about collecting large amounts of data or about the Internet of Things?

I would not speak of datafication, but of the age of logging. Logging is the process by which digital devices store measurements. In addition, things that were previously not considered measurable can actually be recorded in numbers. We understand that you can measure your heartbeat or your blood pressure, but for the first time it is possible to do this all the time and to keep this data as well. That is actually what is new, the measurement itself is not new. Basically, artificial intelligence is very good at discovering patterns when we have enough data points. A doctor cannot learn much about rare diseases that fewer than 1,000 people have in a lifetime with around 30,000 different patients. We need artificial intelligence algorithms to discover patterns and suggest the appropriate treatment. In all these situations in which a single person rarely has the opportunity to gain insight at all, these algorithms are very good and very helpful.

Algorithms are abstract and can be transferred to various applications. Does this mean that algorithm developers are in no way responsible for the results it produces?

In the end, a subset of the algorithms is as general as a medical test that determines whether one treatment or the other is better. Such a test can also be applied to two automobile products or to the question of whether the Japanese live longer than Germans because they eat differently. With these general pattern finding algorithms, I, as a developer, cannot predict what they will be used for - there is less responsibility. It is different with the development of key figures. If we map complex situations in a single number, then we make mistakes, we all know that. When schoolchildren come back with a grade at the end of the school year, we also think that it doesn't really characterize our child, and that's similar with these metrics. As developers of algorithms and measures, we need to communicate better which secondary assumptions our number includes.

What should one convey to computer scientists who break down complex issues into a number? Should ethical questions and assessment criteria for algorithms play a greater role in the training of computer scientists?

This is exactly what I miss in the training of IT specialists. When I ask my students to develop a program that assesses who is really central in a social network, they start the task immediately and we end up with 18 different ideas. None of the students is surprised that we all have such different ideas, nobody asks what the intention is or what I mean by "centrality" at all. This awareness that a modeling decision is being made here on how to compress human behavior into a number is a quality that is missing in many curricula. Too little emphasis is placed on the fact that algorithms that develop numbers from which decisions about human life are later made require a special modeling obligation and ethics.

A socially and democratically responsible handling of big data is also a central challenge for media education. In which areas should media education and computer science cooperate more closely in the future? And what contribution can and must computer science make from your point of view?

As computer scientists, we can contribute a lot to the subject of "data protection" and teach students why you have to take care of your data and what happens to them. The data pilots in Rhineland-Palatinate are already doing that. However, this new aspect of the algorithms is more than just data protection. In my opinion, algorithm literacy would include starting with psychology in elementary schools so that children understand what makes people tick and what makes people different from computers. Man is not a purely rational being, we sometimes make decisions that are not good for us.There is a lot of knowledge about how addictions or mass phenomena work; and it would be important to make this more understandable to children. Why z. B. can computer games be addictive or what are the consequences of excessive, passive television viewing? For me, media literacy starts with "Gnothi seauton": "Know yourself!" We as computer scientists are then called upon to explain to what extent a computer thinks differently than we do. And of course we also have to make it clear that behind a computer there is always a person who has thought about how the computer should do something. People think in social contexts. When my best friend confides in me a secret, I know I can't spread the word. The computer can cross these social boundaries without any problems. We are often not so aware of this.

In your opinion, what knowledge and skills will be needed in the future so that people can continue to act and make decisions independently?

I believe that we do not yet adequately convey what complex systems are in all of the apprenticeships. Namely systems in which things interact in such a complicated way that even a small disturbance can cause something to change in the entire system. We saw that in the global financial crisis and we also see that in the area of ​​“big data”. The most important thing is that we think about our understanding of democracy, about what it means to be a free society. It must be clear to every citizen that, thanks to our democratic system, we can live together safely and largely peacefully today. We are at a crossroads where we could lose these things due to the different developments.

Children and young people are less interested in data protection, but want to participate and use the various applications. What do you advise them to do when using the Internet?

My most urgent advice is: don't just use it passively. Writing your own blog or running a YouTube channel is a great thing. Also: Think about whether you really want to follow a link. If you move the mouse over the link, the browser shows where it leads to. Some website addresses sound so strange that there is probably advertising behind them. My third piece of advice is: take a look at the legal notice before you take opinions from websites. If a report that cell phones cause brain cancer is about a natural medicine company that sells drugs to cancer patients, then you should be skeptical. It is always advisable to ask yourself: Who is actually paying for it? Who has what interest in offering me this service free of charge? Wikipedia is free because there is a foundation behind it and people are willing to donate for it, so that seems okay. But this mobile game, why is it free? Aha, there are in-app purchases.

Finally, one piece of advice is: don't get carried away! Don't write anything online that you wouldn't say to someone's face. Don't upload photos of you and your friends, just leave it!

In the professional area, we also upload pictures of ourselves. Are you not worried that a drinks machine, for example, will soon be addressing you by name?

Yes, that is again a question of the social context. When the drinks machine says: "Oh, Ms. Zweig, it's nice that you are here for the third time today", I think that's inappropriate. When I share information about myself on a dating website for a certain social context, namely because I am looking for a suitable partner for life, I want to rely on the fact that this data will not be used in a different social context. That is why there should be rules for data scientists to observe the social context. We must at least firmly establish these rules as etiquette, as professional ethics - and teach them to computers. A criminal will always disregard them, but as customers and as citizens we can insist that, in principle, publicly available data does not migrate to other social spheres. To say: "It is your own fault, you uploaded your picture to XING" cannot be the solution. It doesn't have to either.

What developments do we have to prepare for in the next few years? And do the opportunities or the risks outweigh the risks for you personally?

I believe we have to set the course now. I'm looking forward to a lot of things, such as driverless driving. B. We will be able to prepare ourselves to have personalized access at any time to information that may only be displayed to us, keyword “Google Glass”. We will be able to learn a great deal about ourselves by constantly measuring ourselves. As a diabetic, knowing that blood sugar levels drop dramatically when a particular teacher enters the class, there are many ways to respond. But there will also be side effects. Since the upheaval is taking place so dramatically and quickly, the question arises to me: How can one accelerate the social processes that ensure that the use and the possibilities of the new technology are integrated into the overall social goals? It will take us a while to negotiate what we want, what needs to be banned or controlled, and what we just don't find right. These constant negotiations that we are conducting as a democratic and free society are nothing new - we are actually back in waters that we know very well.



It is reprinted with the kind permission of tv diskurs

Source: Zweig, Katharina (2016): "The fact that an algorithm does not miscalculate does not mean that it is always right!" Claudia Mikat in conversation with Katharina Zweig, in: tv diskurs, 20 (4), pp. 12-17.

Download the interview as a PDF

To the complete issue tv discourse 4/2016