Many countries in the Majority World lack clear governance practices on data collection and use, especially when compared to their counterparts in developed countries. Although some developing countries have established offices for National Statistics or similar entities, there have been few large-scale projects to collect and analyze data beyond traditional methods such as census and sample surveys. This has resulted in insufficient governance around data, especially with regard to recent practices of big data collection, analytics, and insights. As big data analytics continue to rise, concerns regarding privacy and discrimination have emerged due to the loss of control over personal data. This article will classify the different forms of data that exist and examine how various frameworks for personal data protection and open data policy govern them. It will also address the potential risks and benefits of these frameworks.
Personal Data Protection Regulations
Personally Identifiable Information (PII): It encompasses any information that relates to an identifiable, living individual. This could be communication information, information about the body, or any other information that is linked to an identifier. There are 3 categories of PII:
- Unique Identifiers (UI): An example of this category is the Huduma Namba number in Kenya under which a unique identification number is created for every citizen. Sometimes biometric records are referred to as UI.
- Direct Identifiers (DI): These include any information of an individual like their name, home address, etc. Such information enables direct identification, however, it does not guarantee unique identification.
- Indirect Identifiers: These identifiers may be capable of identifying an individual in combination with other information but do not identify them directly. With the rise of big data and artificial intelligence, the capacity of algorithmic systems to extract and infer indirect identifiers about individuals and groups has increased many time over, and perhaps provides the greatest form of privacy risks.
Sensitive Information is a further subset of PII and includes such information which may bring substantial harm, embarrassment or inconvenience in case it gets disclosed or lost. For example, any information pertaining to the HIV AIDS status of any individual, sexual preferences, passwords, etc. In European Law, General Data Protection Regulation (GDPR) deals with “special categories of data” also known as sensitive data. It defines sensitive data as personal data revealing racial origin, political opinions or religious or philosophical beliefs, trade-union membership and data concerning health or sex life.
The state of personal data protection in Africa remains poor with only about 33 countries had some form of legislation that provides data and privacy protection. However, not all of these legislations are comprehensive in their nature. If we look at the evolution of data protection legislations globally, at different periods of time different regions have emerged as the most significant growth areas, as coined by Graham Greenleaf. It would be safe to say that at the moment, with 13 new countries adopting new privacy laws in the last decade, the African continent is the site of most significant changes in adoption of data protection legislation.
Beyond Personal information is the information that is freely available in the public domain, which is not subject to data protection regulations. This is data which is voluntarily introduced into the public domain, implying that an individual willingly forgoes his/her right to data protection and privacy. The issue is that such information available in the public domain is PI and can be combined with other kinds of data.
- Pseudonymised Data: Further, we have data which is related to a specific individual and is replaced with artificial identifiers or pseudonyms. This is a way in which data controllers try to leverage PI without the associated harm to the data subject.
- Anonymised Data: In the case of Anonymised Data, the identifiers in the information related to specific individuals have been removed to prevent identification of that individual. This kind of data is useful in tracking and ascertaining data involving a large dataset, for example, any information relating to how Indian students are faring in the education system, health performance at the national level, datasets of the industry sector, etc., the best way to ascertain is to anonymise and this would cause no harm to the data subjects. Once a dataset is truly anonymised and individuals are no longer identifiable.
- Aggregate Data: The next kind of datasets are that of the Aggregate data. These are usually governed by Open Data Policies. The source of data is not identifiable when data is aggregated. It means any information available in a statistical form which does not contain data that would permit the identification of a specific individual without extraordinary effort.
Usually, there are national open data policies which govern disclosing environmental data or disclosing data about other objects and species. It is important to understand and realise that Open Data policies and data protection regulations have different competing imperatives. While Open Data protects the right to transparency, data protection protects the right to privacy. Though they are different in nature, by and large, they regulate the same thing. There is a need for optimisation between these two imperatives to achieve the objectives for which the above classification of information and data is essential.
Challenges of regulating anonymised data
In a 2008 paper, Arvind Narayanan and Vitaly Shmatikov demonstrated issues that have emerged with data anonymisation with the advancement in math and algorithm techniques. Using the Netflix Prize dataset, they sought to show how an adversary even with some imprecise background information about a particular subscriber and cross-correlation with other databases, could mathematically identify his/her records if it was present in the dataset. Once the datasets have been anonymised, they fall squarely outside the scope of the data protection regime. As demonstrated above, the data protection laws apply only to datasets which contain some personal data about the individuals. Narayanan and Shmatikov state that, more and more, the datasets that we deal with are high-dimensional in nature, which allows greater scope for algorithms to correlate them with other databases. Paul Ohm echoes these fears in his paper, dramatically titled, ‘Broken promises of privacy’. Ohm’s basic hypothesis is that advancement in computer science has demonstrated the flaws of what he calls, the robust anonymization assumption — the idea that anonymization techniques could adequately change data so as to convert personal information into anonymised or aggregated information. The thrust of the robust anonymization assumption was that these techniques could protect the privacy of the data subjects. The balance between personal data and open data policies has been upset by the emergence of re-identification techniques which threaten to neutralise the effects of anonymization. The import of the above argument is that re-identification techniques allow one to take data from anonymous and pseudonymous information (which effectively exists in a regulatory vacuum between personal data policies and open data policies) and re-identify the personal information of the individual.
Therefore, information which is not subject to personal data protection laws can be used to bypass the data protection regime. We could respond to this threat in two ways — a) increasing the sophistication of the anonymization techniques, or b) reconsidering the scope of data protection law. Imam has insisted that strong anonymization techniques already exist, and if deployed effectively, they will minimise the risks. However, the flaws in the k-anonymization technique and penetrate-and-patch approach have already been highlighted. In Europe, the GDPR has also grappled with the legal definition of anonymous data. The definitions adopted by the Article 29 Working Party and national supervisory authorities differ significantly. Recital 26 GDPR adopts a risk-based approach to determine whether data is personal or not — an approach that has been endorsed by the British Information Commissioner’s Office (ICO.) When identification is ‘reasonably likely’ to occur, personal data that receives the full spectrum of GDPR protection is in play. The Article 29 Working Party however adopts a higher threshold and argues that anonymised personal data can only qualify as non-personal data when “irreversible identification” is present. On the other hand, the Gopalakrishnan Committee in India appears to be adopting a middle ground between the two approaches.
Freedom of Information and Open Data
Countries that have adopted freedom of information laws include Sierra Leone, Niger, Tunisia, Angola, Côte d’Ivoire, Ethiopia, Guinea, Liberia, Nigeria, Rwanda, South Africa, Uganda, and Zimbabwe. Legal texts differ from country to country, but many African nations face similar challenges when implementing Freedom of Information (FOI) laws. Technical capacity is often a major obstacle. For instance, in Uganda, the lack of implementing regulations for access to information laws led to a delay of nearly five years before the law could be effectively implemented. Eventually, the government established the regulations with the help of civil society actors both within and outside the country. This includes, the government partnership with civil society actors to promote the use of ICT to enhance the operationalisation of the Access to Information Act by promoting online information requests through a specialised web portal such as www.askyourgov.ug.
Nigeria has made efforts to improve its courts’ ability to apply FOI laws and handle disclosure and exemption disputes. With the help of the Open Society Foundations and the UK’s Department for International Development, the National Judicial Institute (NJI) organized a judicial studies program in May 2014 for 350 judges from 77 Superior Courts of Record. Experts in access to information law and practice, as well as senior judges from other common law countries, taught the program. The program was successful and there are plans to hold similar courses in Sierra Leone and Uganda.
Innovation and adaptation have been necessary in monitoring and overseeing compliance. In Nigeria, the Justice Minister, who is responsible for overseeing compliance with the FOI Act, has established a website (www.foia.justice.gov.ng) through which regular reporting on compliance with disclosure requests is provided. To further facilitate oversight of compliance, Nigeria’s House of Representatives created a new Committee on Reform of Government Institutions which is responsible for overseeing compliance with the law. This committee holds regular hearings and undertakes regular visits to government departments to encourage the establishment of systems for implementing the law effectively.
In the past, numerous African countries had regulations known as the “Secrets Acts,” which controlled access to information. These laws originated during the colonial period, especially in British colonies, when the government deemed most official information as related to national security and not meant for public knowledge. However, due to increasing pressure from donor organizations and an informed population connected through digital platforms, there has been a push to initiate the open data movement. This movement aims to promote transparency and make government information accessible to the public.
African countries like other developed nations are beginning to open up data towards attaining transparency and accountability. In some governments, the process of innovation, adoption, resistance, and realignment can be slow and iterative before ultimately resulting in the institutionalization and eventual maturity of Open Government Data. For instance, open data policies in Rwanda, Ethiopia, Nigeria, Uganda and Tanzania went through years of iterations and changes before getting close to finalisation. There is significant diversity among African governments in their approach to open data, with some showing genuine political will to make government datasets available not only for increased transparency but also to achieve economic impacts, and social equity, and stimulate innovation. For example, Morocco was one of the early movers into open data, launching the very first African open data platform in 2011, but has had slow progress with another seven years before the freedom of information legislation as passed, and struggle to get more government buy-in into the its central data portal. A longer-term and more politically sensitive debate is necessary to explore the philosophical and ethical aspects of access to private data that can be used for the public good. Should data generated by or about users who lack basic resources in the world’s poorest countries be locked up in the data centres of Silicon Valley corporations that claim full and exclusive ownership and do not allow access to national governments or NGOs trying to exploit the data for socio-economic development and upliftment? Interestingly, this debate had already played out in the public sphere when the media questioned why big data from CDRs had not been used to track the origin and spread of Ebola. This demonstrated how widely they had become seen as a potential magic bullet for emergencies and epidemics, which is a rather unfair characterization.
Property, Ownership and Data
The Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS Agreement) requires that databases and compilations are protected by copyright if they are considered an intellectual creation due to their selection or arrangement, regardless of whether some or all of the contents are protected by copyright. Therefore, many countries will protect a database if it meets this condition, and there is no separate intellectual property right that protects databases or any of their elements that do not meet the condition for copyright protection. Nonetheless, the database right provides protection for databases and is recognized only in certain jurisdictions, most notably in the European Union.
A database can be described as “a collection of independent works, data, or other materials, which are arranged in a systematic or methodical way and are individually accessible by electronic or other means.” This broad definition includes traditional mailing lists, customer lists, telephone directories, encyclopedias, and card indexes, whether they are held in electronic or paper form. However, it is important to distinguish between a database and its constituent elements. A database right protects the collection of data, not its individual elements. These elements may or may not be protected on their own, separate from any protection afforded to the database as a whole.
The concept of ‘data ownership’ seems to have quite a lot of intuitive power. The economic theory of the endowment effect describes that the owner of an object, (in this case our personal data), assigns it greater value than the possessor. The basis for this theory is an evolutionary response to surviving in competitive environments “in order to provide a strategic advantage in confrontation with others seeking to appropriate [the object in our possession]”. In the case of data, the value attached to one’s data is the benefit of privacy and financial gains. Unlike other commodities where ownership is often synonymous with control over the commodity due to its ability to be physically possessed or legal documentation enumerating its rightful owner, data cannot be possessed by just one person. The shared nature of the creation of data by the data subject’s interaction with an interface created by a data holder makes the answer to the question ‘who is rightfully entitled to control over personal data’ nuanced. The exploitative nature of mining personal data creates an imbalance in the benefits accrued by those whose data is utilized for financial gain and those monetizing on having access to personal data. There has been an upswell of discontent-particularly in the Global South with several commentators claiming that excessive focus on consent has skewed the discussion in favour of US-based technology corporations who reap monetary dividends from data gathered from Global South citizens, thereby leading to accusations of ‘data colonialism.’ The idea that individuals should receive fair compensation for the use of their personal data has received significant support in the last decade from a range of commentators. Given that data about individuals have become a commercial asset for data processing organisations, it has been argued that data subjects must be given an instrument that would enable them to negotiate and bargain over the use of their data. It is also worth noting that despite academic discourse suggesting that legal frameworks do not favour propertisation of data, business practices, particularly dealing with digitally available personal data suggest otherwise. Data is very much a commodity, to be traded and valuations of early-stage companies are often linked to the scale and nature of the data they control. The existing laws also permit corporations to contractually claim ownership over data, by virtue to their participation in creating them.
In reality, data, including personal data, has been treated like a commodity for some time now. Once personal data becomes a commodity, questions arise regarding the necessity, if any, of legal limits on data trade. There is also some widespread recognition of the market failure inherent in the commodification of data. This market failure is marked by the systemic incentives towards trade in data at great negative externalities in the forms of privacy harms to the data principal. Next, let us consider the nature of the right to privacy, which is the most obvious legal complication that we must necessarily contend with before embarking on any discussion about interests in data. Information privacy entails that the use, transfer, and processing of personal data must only occur with the informed consent of the individual.
Conceptually, one key thing to remember about any kind of property interest in data is that it necessarily means that privacy as a ‘value’ is owned, and like any other piece of property can be bartered. The clear implication of vesting property rights in personal data would be that privacy is an alienable right. There have been several definitions of the right to privacy, but perhaps a useful one for the purposes of our discussion would be based on the idea of privacy as individual control. It is our right to control access to and uses of physical places or locations, as well as, personal data about us. It is however, important to remember that when this right is exercised to relinquish 14 control, for example, by way of sharing some information, that does not make this not lead to waiver, relinquishment or forfeiture of the right itself or future claims to control the same data.
In a scenario, where corporations could pay individuals for their data, it would mean that those in lower-income groups would be more willing to trade away their rights than the well-off. This is fundamentally incompatible with the notion of an inherently inalienable right. Further, when trading one’s data, how is the value or the cost to the individual to be determined? A cost to the user and a definite benefit to the private platform is through the aggregation of data. Data is far more valuable when aggregated. It is thus impossible to accurately compute the precise value of an individual’s data. To the contrary, it is possible that data provided by an individual can be aggregated and used to conduct predatory practices against the group the individual belongs to. Opponents of dictum that privacy is an inherent inalienable right point to cultural relativism of privacy, and that its nature, facets and scope vary with cultural contexts. They acknowledge that privacy, by its very nature, allows for deviations in order to sustain social establishments and group values. While the exact manner in which privacy as a right may manifest itself may be culturally influenced, the very need for privacy is not. As mentioned above, the right to control access to and uses of physical places or locations, as well as, personal data about us is essential to human dignity.
Thus, we see data as a highly contested resource, with multiple legal theories and regulatory domains competing to govern it. This contestation is a reflection of the business interest in the monetary benefits of data coupled with narratives that data can lead to social good on the one hand, and the impacts that data-driven decision-making can have on people’s privacy, as well as its real and imagined exclusionary and discriminatory outcomes, on the other. In Africa, there has been considerable interest in the datafication of different aspects of life including initiatives on e-government services, AI, digital identity, open data, Fintech, Edutech, AgriTech and HealthTech initiatives. While the positivistic literature on data revolution finds it essential for improving development delivery, critical data studies stress the threats of datafication. This article only provides a snapshot of the state of data governance practices and different points of view which inform how we look at, use, study and govern data. Through the course of this project, we will look at data governance practices in Sub-Saharan Africa with a focus on Côte D’Ivoire, Ghana, Kenya and Zambia and attempt to arrive at a normative understanding of what Afro-feminist data governance could look like.