IGF 2024 - Day 4 - Workshop Room 1 - WS77 The construction of collective memory on the Internet

The following are the outputs of the captioning taken during an IGF intervention. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid, but should not be treated as an authoritative record.

***

 

>> BIANCA CORREA: Hello.  Good morning, everyone.  Welcome to the workshop, "The Construction of Collective Memory on the Internet."  As the IGF draws to a close, I believe this has been an intense and productive week for debate on Internet Governance.  You must be tired.  But we have a very interesting discussion ahead that's sure to energize and inspire you.

I would like to introduce myself.  My name is Bianca Correa.  I'm a board member of the Brazil Internet Steering Committee, and I hold a Ph.D. on law and technology, and I would like to thank the audience, both online and in person here in Riyadh.

A special thanks to the expert panelists who have kindly agreed to share their ideas and thoughts on this topic today.

The workshop titled "The Construction of Collective Memory on the Internet" will last for 90 minutes.  To make the most of our time, we will follow this discussion format:  Each speaker will have 10 minutes to present their ideas.  After that, we will move to a question-and-answer session, prioritizing interaction with both the in-person and online audiences.  Finally, the panelists will deliver their closing remarks.

So, let's get started.  Memory is a vast and complex topic.  It becomes even more complex when we think about the relationship between memory and the Internet.  In preserving memory, promoting social memory and constructing memory itself.

This workshop aims to foster a debate on the challenges of preserving memory in the digital environment.  It seeks to explore how the Internet and digital technologies can serve as tools for preserving, promoting and constructing online memory, especially in a context where much of our culture, social and political processes are mediated by and even originate on the Internet.

Memory preservation on the Internet, tackling issues such as preserving this information, countering this intersection, protecting the right of information, promoting cultural heritage, preserving multilingualism and more.  Everything is on the Internet.  But is everything on the Internet?  Feeling frustrated and not being able to find information online seems to be becoming more and more common, whether it is a news page, a blog, Tweet and et cetera.

Content on the Internet can disappear for different reasons.  Online materials can be deleted.  Vanishing information is a reality.  And a study conducted by the U.S.-based think tank Peer Research Centre, research that suggested a quarter of all web pages that existed at one point between 2013 and 2023 are no longer accessible as of October 2023.  In most cases, this is because an individual page was deleted or removed on an otherwise functional website.

For older content, this trend is even starker.  Some 38% of web pages that existed in 2013 are not able today -- available today, compared to 8% of pages that existed in 2023.

So, 23% of news web pages contain at least one broken link, as do 21% of web pages from government side.

New sites with the high-level site traffic and those with less are about equally likely to contain broken links.  Local-level government web pages, those who belong to city governments, are especially likely to have broken links.

So, given this context, this workshop aims to address some questions.  What are the challenges brought by the Internet and the digital platforms to the preservation of collective memory?  How do these new challenges relate to the promotion of information integrity, the protection of right to information, the promotion of underrepresented cultural heritage and other issues traditionally debated in the Internet Governance field.

We will start the discussion with Marielza Oliveira.  She is the Chair of the Advisory Board of the E-government Institute at the United Nations University, a former director of the UNESCO Communications and Information Sectors Division for Digital Inclusion, Policies and Transformation, where she led to support member states to strengthen capacities for access to information, digital inclusion, digital transformation and protection of documentary heritage.

Before I call Marielza, away like to introduce our moderator.  Unfortunately, I won't be able to be here the whole panel due to crossing agendas with other panels of IGF.  But I would like to introduce a very important person for us that will be on my behalf moderating this panel.  He is really on the copy.  He holds a Master's and a Ph.D. in communications from the (No English translation) University of São Paulo.  He's the manager of the steering committee advisory team.  He coordinated the centre for studies of information and communication technologies the (?).br, the UNESCO Regional Center for Studies on the Development of the Information Society, and the Brazilian School on Internet Governance, the BGI.

And I would like to introduce and to raise their names, last but not least, Jean Carlos Ferreira dos Santos and (?) without whom this panel would not be able to exist.  Thank you so much for your hard work on this topic.

So, Marielza, thank you so much for being with us, our dear friend.  So, the floor is yours.

>> MARIELZA OLIVERA: Thank you very much, Bianca.  Can you all hear me well?  I hope so.

>> BIANCA CORREA: Yeah, we do.

>> MARIELZA OLIVERA: Okay.  Great.  Thank you.  And it's so nice to see you again.  And it's nice to be with the CGI colleagues and all the colleagues around the room and on the Internet that are participating and watching this panel.

I think that this is one of the most absolutely relevant topics that we could be discussing because the Internet is really changing the way we think about, you know, and record and recall our own memories.  It's changing it completely.  And it has done that from the very beginning.  When we started and we started accumulating some information online and when the browsers, first browsers came around, we stopped really thinking about memorizing things because we could always find it on the Internet.  It was just about, you know, oh you can Google it.  It's literally, you know, the browsers became our collective memory of what was happening, except the browsers and the Internet itself doesn't have everything that we have in our own minds.

We digitize very selectively, and but we are less selectively, we have been less selectively over time.  And the Internet actually changed the way that we actually record things.  And artificial intelligence made a huge change in the process as well.

The first step that we had was essentially we put things online.  We created content, digital content online and we digitized material but now we went beyond digitizing to datafying so we could start searching and using content in a different way than we did before.

In the beginning, because there's a huge gap, a lot of disparity on the Internet, in terms of who has compute capacities and who has the skill sets and who actually can access the Internet.  In the beginning, it was even worse.  You know, nowadays we have 70% of humanity online already.  It's 5.6 billion, if I am not mistaken out of the 8 billion that exist.

In the beginning, we had quite a few less people online, and then, therefore, the content that was online was essentially the content that came from northern countries, from, you know, the U.S., from Europe, essentially.  And with a lot less content having been recorded by all the countries that had less computer capacity, less access to the Internet and so on.

So, we end up with, for example, nowadays, 46% of the content that we have on the Internet is actually in English, you know, and very little content is in other languages.

We have 7061 languages and, you know, in existence in the world, in use in the world, and, actually, less than 300 of those are in use online.  And, of course, you know, we are seeing quite a lot of effort to increase that number of languages that are active, that we can actually translate from one language to another.  But still, you know, we see quite -- the vast majority of content that the Internet has memorized in its 15.3 million websites, it's essentially, you know, from a subset of the countries that are available and that exist.

But we digitize and we digitize with a lot of disparity as well, like I was saying, because, you know, we simply don't -- you know, not all countries have the capacity.  But also, you know, the digitization process itself is a costly process.  You know, and in the beginning, where we ended up with -- we are using technologies that are nowadays quite obsolete already.

So, for example, I don't know about you guys, but raise your hand if you have CDs, if you have CDs.  Do you have CDs?  I have 400CDs.  And no CD player anymore.

It used to be that computers came with a CD player.  And nowadays, if you ask for one, people go, "Why do you need that?"  You know, it's essentially we moved from a technology that existed before, that no longer exists.  And a lot of the storage that this technology had, the capacity to store, was left behind in a lot of the archives that were digitized already, were lost, were lost, you know, because this is no longer an accessible format for most computers.

And just like that kind of format became obsolete, there are quite a lot of different formats obsolete as well from the very beginning.  You know, computers started -- my first computer, I actually -- my first personal computer that I used, you know, it had recorded things on tape.  So those are gone.  And so we lost quite a lot of what was memorized, you know, and recorded in this kind of archive.  And that's not the only gap that exists, you know, in terms of storage.

In terms of storage, you know, collectively we store less than 10%, you know, in data centres than what we actually produce in terms of information or content.  Now, I'm not going to even call it information, because a lot of it is not necessarily information.  It's content that we put on the Internet.

In 2010 already, you know, about 15 years after the first browser, you know, was made available, in 2010 we had two zettabytes of being online, being 1 trillion gigabits, essentially.  In 2010, two zettabytes.  In 2020 we had 16.4 zettabytes online.  And in 2025, five years later, we are expected to have 181 zettabytes of content. 

In 10 years we went from two zettabytes to 64.  And now in less than -- in five years, we are going to multiply that by three, you know.  So, the amount of content that we produce with the number of people online is growing at a pace that is incredible.  But storing this content is highly expensive and very selective.

So, what we have online is not necessarily what we have in storage, you know, in terms of data centres, for example.  And those are very expensive, very expensive technologies.  And not only expensive in terms of the creation of, you know, the -- you know, the tech itself, the infrastructure itself that is very costly, but also environmentally costly in terms of water that it drinks, you know, to cool the data centres, the energy that it consumes to power these data centres for them to continue working.

So digitization is a process that is incredibly expensive, so selection of what ends up stored is a process that is, you know, on a continuous basis, making a lot of what we produce being discarded and that discarding is not necessarily done by us.  It's not a process that we select to do.  It's by the organizations and the platforms that we use that end up making that kind of selection, what is worth keeping and what is getting thrown away on a daily -- you know, on a continuous basis, you know.

So, for every byte that we have to store nowadays, another byte has to be thrown away.  How do we select that?  You know, organizations make that choice and we end up not having access to a lot of information.  We have the broken links that were mentioned, you know, in the beginning, by Bianca.  You know, we have a lot of this loss of content that we use to store in the cloud or in different types of systems that end up obsolete and discarded and so on, so forth.

But it's beyond that.  Digitization is this costly process, but datafication is a process as well.  We need to be able to actually search this content and the vast amounts of content that exist, you know, to be able to be searchable, to be accessible.  You know, they have to become beyond, you know, just a record.  They have to be a searchable record.  And being searchable is a complex process as well.  You know, you actually have to datafi, create index this kind of information, this kind of content so this content can then be accessed in different ways.

And, you know, the process of indexing is very complex as well.  It used to be, you know, for example, that when we scanned text, you know, for example, we scanned a book, that we took a picture of that book.  Essentially, it was a digital Xerox copy of that book.  It's not a searchable mechanism.  You just have this kind of a picture.

Now you actually -- then you started using OCR technology, you know, the optical character readers technology that actually converted a page, you know, instead of being just a picture, to being -- to reading the text and absorbing it.  But even that, you know, it became at some point hard.  So, you actually had to index it in different ways, finding, you know, key words, for example, for text.  So who decides those key words?  Who decides on what basis you access information on the Internet?  It is the kind of thing that when we start looking, we find all kinds of issues with that.

For example, I don't know whether, you know, you are familiar with the image net dataset, which was a dataset created, I think it was Harvard, you know, that created this dataset, you know, and started putting a big set of pictures together, and somebody had to figure out a way of making sure that this data was searchable.  And so they started labeling the pictures.  And the labeling pictures became an issue that brought in all kinds of biases and discrimination.

For example, you know, it would look at the faces of people that are black or brown or, you know, the faces that are not the typical blond, blue eyed north and labeled them in different -- you know, in many derogatory ways, hyper sexualizing women, for example, women of color or calling, you know, men of color, you know, with, you know, criminality linked, you know, associations and so on, so forth.

So, that's the kind of thing that ended up happening.  And then when we search for these images, when we try to recall the memories that these images, you know, encode, you know, you end up bringing these biases in as well.

So, you have all kinds of issues with digitization, then you have the process of datafication, and then you actually try to generate, you know, nowadays we use this vast amount of data to generate applications, for example, using -- to generate generative AI, artificial intelligence, large language models, diffusion models and so on.  And those encode this datafication mechanisms that are quite, quite biased, quite disrespectful, actually, of different cultures and are keeping content from cultures that are not necessarily, you know, representative of all the cultures of the world.

So, we end up with generative AI, a set of collective memories on the Internet, and particularly in data centres that are not, you know, the memories that we put in, the content that we put in, and then, you know, we end up with this content that is coming out that is not necessarily -- it's a kind of pasteurized amalgamated average content that, you know, is not the memory of the world, you know, but it's the memory of everyone, and it's not respectful of cultural heritage and cultural precedence.  But this is what we have online.

And of course generative AI, it actually generates content as well.  And the generation of content by generative AI actually creates tremendous issues on memory that we collectively have on the Internet.  First it hallucinates.  You know, it creates information, quote, or content of things that never happened.  It doesn't -- and don't exist.  You know, it doesn't have any links to reality or to facts.  It simply predicts the next image or the next pixel or the next word.  So it predicts those and end up creating, you know, a citations of books that don't exist, pictures of events that never happened and so on.

And, actually, historians, many of those are actually using, you know, the images to illustrate, you know, generating images to illustrate episodes in history that had no photograph, you know, of them happening.  You know, before photography was invented.  So we actually now have pictures that never existed of an event and those pictures are incredibly biased as well.

For example, generative AI, one of the things that's interesting, it's just generates on the basis of what exists, there are quite a few tests, for example, about trying to generate images of black doctors treating white children in hospitals.  It happens every day.  But, you know, generative AI has enormous difficulties, you know, creating this kind of image.  But it has, you know, incredible -- you know, it makes it easy for you to create, for example, images of Indians, you know, in the U.S., you know, wearing traditional clothing and sitting around negotiating treaties with cowboy-dressed white men in 16th century.

So, it's not accurate, you know, and we end up with these images polluting our environment as well.  So, we have hallucination, which is the unintended creation of fact-free content, what I call the fact-free.

Then you have actually intentional creation of content that is also fact-free.  It's not linked to reality.

And then you have actually malignant kind of distribution of this, which is misinformation, disinformation, which actually created with intention to deceive.  So, we actually spit it all out on the Internet again and we keep polluting our information environment to the point that now, you know, we have on the process of digitization, datafication and usage of this information online.  The biggest skill that we need to have is actually the skill to verify, you know, to say, is this real?  You know, is this true?  And how do you do that is becoming more and more difficult, exactly because of the broken links and the disappearing behind paywalls of content that is trustworthy, so such as content from media organizations that have to charge for the presentation of this content in order for them to survive, you know, instead of what platforms do in presenting information to us by -- that they monetize through ads and other means such as that.

So, yeah.  Up, we live in a completely different world from when you could just Google it, when actually search engines are using generative AI to hallucinate results and offer them to us, including as a first option so we don't have the memories of humans anymore.  We have content generated by computers being presented to us as, you know, the, quote, collective memory of the world.  We need to be very, very cognizant, you know, very, very -- we need to really understand the impact that this has on everything we do.  You know, the devaluing of science, for example.  You know, if facts can be mixed up with known facts, you know, with fact-free content such as that, what is the value of the trustworthy organizations that use to generate content for us, you know, science, media, you know, authorities and so on.  They are becoming less trustworthy as well, you know, simply because we cannot differentiate between content that is generated, that is part of our collective memory, that is fact-based, evident-based content to, you know, something that is being put on the Internet by, you know, some artificial entity.

So, just some provocation to start, because I think that this is one of the most important topics that we have.  How do we preserve the validity, the reliability of our information environment?  This is the question that we have, you know, for the next few years, is the most important question that we could, you know, be discussing.  Thank you.

>> ONSITE MODERATOR:  Thank you so much, Marielza. 

I'm assuming we should now pass to the next speaker, which is Ricardo Pimenta.  So, the floor is yours.

>> RICARDO PIMENTA: Good morning.  I like to begin by thanking CGI for the invitation and also the Minister of Science, Technology and Information of Brazil for this honour of representing it. 

So, to begin with, let me share a popular Yoruba saying from Brazilian's Afro Brazilian culture that says, Exu killed a bird yesterday with a stone he threw today.  So Exu, we know it is a figure of movement in transition in urban mythology, bridges the human and divine, enabling the communication and connecting them. 

This notion of interconnectedness reminds us that maintaining and developing connections in our digitized world is both our responsibility and a challenge, even when the connection is between past and present.

In fact, in our current digital reality, these connections generate immense data information raising pressing questions.  What should be preserved and how do we distinguish the essential from the superfluous.  The challenge of maintaining collective memory has grown exponentially. 

We know -- we now face a flood of disorganized and even lost data stored across countless devices, complicating retrieval and comprehension.  For public policy, this issue is particularly urgent, given the unprecedented speed and volume of data production in the past three decades. 

So, memory as highlighted in the (?), it isn't just about the past.  It is actively constructed in the present.  Remembering today shapes our understanding of yesterday.  And memory itself is updated and written in realtime. 

In Brazil, the time has come to think about yesterday's bird.  So memory, it's a political agenda, not just a cultural one, which should primarily unite public and third sector institutions so that doesn't end up being driving mostly by the market, leading to what Andra Houston says, a dissemination of memory which overexploitation would invite us to collective and even medial forgetfulness, has a publication of public collective information in digital age. 

We must approach it ethically, creating what is preserved while recognizing that not overing can be saved.  Social platforms like Instagram or Facebook, for example, add complexity as the content they host belongs to their owners, including disinformation and toxic narratives.  This threatens the representations of our past and present and meanwhile, Brazil's more than 5.3 million Internet domains contribute every day to the vastness of this challenge.

So tackle this initiatives by institutions like IBICT, the Brazilian Institute of Information on Science and Technology where I am a researcher and currently the teaching and researching coordinator, provides some valuable examples. 

First of all, I can speak something about the Tin Can software.  The Tin Can is a software that digitalized and -- from IFA, the design institute for our historical and artistic heritage, national institute.  And the IBRAM, that is our institute from Brazilian museums.  So, centre, the Dica can ensure access to museum and memory institutions.  This is one example that are developed in IBICT inside the Ministry of Science and Technology Information.

The second is the (?) network.  That is a network who preserves over 700 open-access electronic journals, automating processes like storage and validation.

The third is the Akivo.gov.  It's a kind of pilot project that archive nearly all Brazilian government websites in 2021, with plans for user driving websites, collection and preservation, inspired by models like the (No English translation) and the Internet (No English translation) but more the experience of (No English translation) is the reference for us.

And last, the Tempora.  Tempora is a digital tool developed in digital humans laboratory in IBICT.  It is a platform for archiving and visualizing digital information in the form of a timeline, which during the 2022 presidential elections, we stored publications from fact-checking agencies with the intention of creating a timeline of disinformation events and contributing to the memory of that event in the midst of this disinformation fever we are experiencing globally. 

So, these efforts showcase potential leadership in preserving Brazilian Internet memory.  But broader challenge will remain.  Partly regarding who preserves the entirety of Brazil's online presence and now -- and how storage limitations are addressed. 

The issue recalls the Argentine writer Jose Louis (?) who wrote (No English translation) where the desire to remember everything leads always to a paralyzes.  Memory thrives in balance between remembering and forgetting, recovery and erasure.  The technological promise to store everything is illusory.  We must curate what defines the memory of the Internet, shaping what is remembered and what is not.

To do this, two challenges stand out.  The first is about management.  In my perspective.  The challenge of memory today is its management, its control in a scenario where space and time are automized and the volume of information expands entropically, invites us to feel this kind of Freudian death drive.  That pushes us to confront, to innovate and generally to the vibrant creation of means, techniques, strategies, policies and practices capable of making us overcommit one day at a time.

The second could be about governance.  A good one.  A good one that is capable of circumscribing different actors able to decide what to preserve and who makes those decisions.  This isn't just a technical issue, but a political and institutional one requiring ethical collaborative solutions.

Furthermore, if the object we are looking at -- if the object we are looking at is the Internet, how we win proposal to preserve its memory either be able to progress without thinking about the mechanisms that need to be aligned with the devices, actors and institutions that regulate it.

In my perspective, governance could play one single role to keep proper access to information and freedom of expression, but major ethical complications.  And transgenic racial public and collective memory (?) that are now in different parts of our private and public daily lives.

In closing I return to the you're a saying the actions we take today to preserve the Internet's memory will determine if the bird was indeed killed or not yesterday.  So, thank you.

>> Thank you so much, Ricardo.  I gave the floor without protected.  Ricardo, I'm doing just now.  Ricardo is currently the coordinator of teaching and research in information and science and technology at the Brazilian Institute of science and technology and ice a permanent professor at the post-graduate programme of information science at the federal university of region narrow, a full research at of the institution of technology beach since February 2013.  Sorry, Ricardo.  And thank you so much for your insightful thoughts.

Then I would give -- I give the floor to Samik Kharel.  A journalist and researcher from Kathmandu at Nepal.  With over a decade experience in reporting on contrary issues for national and international media.  She has contributed to leading research institutions focused on technology, ethics and human rights.  Kharel has received a fellowship and grants and he teaches critical thinking at university while exploring electronic music.  Kharel, thank you for your participation.  The floor is yours.

>> SAMIK KHAREL: Hi.  Can you hear me?  Yeah.  So, yeah, thank you very much.  Hello to everyone at the IGF in Riyadh, from myself enjoying a sunny winter afternoon in Kathmandu, at least a couple of minutes back.  Overwhelmed to be part of this esteemed panel.  I would like to thank the CZI for this wonderful opportunity to talk with collective memories and digital realms.  I think it is the collective memories that have actually brought us together, our past activities that's on the Internet and so, yeah, although this is a very deep and also to dive in, away like to start very general and narrow my interest towards my own expertise and probably my geography as well.

When I was very young I was given a chalk and a slate by my parents.  And a formidable technology at that time to write and learn first alphabets.  It was not very long ago, but it was three decades back.  And I thought it was the most convenient tool, because I could scribble on it, write anything, and if I didn't like it, I could erase it as well, because this tool was very ephemeral, you know.

So, I don't remember what I scribbled then, you know, like not much memories of it except writing a few alphabets and maybe, like, scribbling some Mickey Mouses and Donald Ducks.  But passing this phase I was given a notebook and a pencil.  Now I was told I was to have more structures, you know, like write between the lines, do this, do that, be more disciplined, and only erase the errors.

A little bit later, a few years after the pencil, I was given a pen, a more permanent -- it was an idea, which gave me more permanence and what I scribbled state a little bit longer.  There were no traces of my chalk and slate experiences.  And now, still now, like although I don't find anything else in my basement, in my parents' basement, I still find some scribbles of whatever I did with the pen and pencils, you know.

So, yeah, I mean, that's how I would like to start and how these were my memories and that were kept in boxes in my parents' basement.  And probably many of these contexts you can relate to as a collective memory yourself.

So, we have a tendency to save and retrieve our memories as desired.  And as memories play a huge role in construction of our identities.

So, fast forward to my teens.  So, like, we get a computer with a little bit of access to the Internet, a little bit later there are more restricted (?), I was being watched by my guardians to go there, not to go there, probably being logged and being checked, my history, compared to the more analog past I had, the Internet seemed to make everything present, you know.  Even the past was so well weaved with the present, everything now felt like a block.  This is likely because at present our memory function is increasingly organized via media systems, specifically digital media and which has become very integrated, this integrated media system internalizes the main functions as cultural memory now, which has become a focal point of the document and system of the past and the present.  Example, like, now I use Google photos and it gives me seven years back memories, you're in the ocean and today you're in the ocean and you are doing well or I don't know, yeah, this is how you tag your memories with these tokens.

So, like coming again to (?), what they say, the Internet never forgets but people do, and when people do, then Internet actually rightly reminds you again that you have not forgotten.

So, now, you know, like, with Internet and digital technologies and in particular Internet and web-based information and communication technologies, our memories, our collective memories are formed and shaped during the digital era, while Internet systems have enabled kind of democratizing memory with, you know, like everyone's basic technology and Internet, like, devices can produce their own content, promote it on the web, while, now the big part is many who have been left behind, as even with the lack of basic technologies and infrastructures, are not being able to do so.

One of -- my country and the reason and the majority world chronicles these digital divides, which majority -- which majorly also affects vulnerable populations and the marginalist ones in my region and our country.

This reasons witnesses particularly of patriarchy on Internet as majority of discourses are still male dominated, like all the narrative discourses coming from political institutions, parties, universities are still very patriarchal.  That's what I feel.

The same population which actually did not have cameras, books, access to libraries, information, newspapers, access to education, basic healthcare, they are the same population who don't have access to the Internet, which is really sad.  Their memories have never been documented.  Rather, sometimes they have been part of some (?) narratives which have been seen by others and brought out into the world.

While this divide is closing in data with more access to technology, the debates on what we call meaningful, uninterrupted access still lingers.  That's where we stand in this Rio de Janeiro in Nepal, Bangladesh, Sri Lanka and the rest of South Asia.

So, like where we lag is, like, while social media helps forming collective communities, you have games, give people who play games, different interest groups who don't have to be face to face, these vulnerable populations are left out of discourse, they don't know had what's happening, where they are, where they stand in this technological world which was supposedly principled under democratization and participation of collective memory making.

So, yeah.  Coming forward, you know, like, the process of creating, storing, managing, removing, manipulating digital data.  Collect memory.  So, like where we stand right now in the digital age, collective memory are often intertwined with the data we generate, from the photos we post online, to the interactions we have in social media. 

This raises concerns about who controls our public collective memory, how it is used, whether it is a subject to manipulation and most likely we are very vulnerable when it comes to the government using our data.  And with the lack of very comprehensive data policies, mostly in this part of the world and elsewhere, too, there's a lack of accountability, while the governments have been proactively using available technologies to collect data, data from the citizens, there has been less accountability of where this data is being used, where it is being stored, for what purposes, for how long, how it will be used and what cases it will not be used.

There has been no accountable answers to these.  And there have been several breaches in the cases of data and personal information, perm data.  Example, I would like to give a few examples of the (?) data that was collected but that was breached and used for other things.  The government yet to realize people's -- the value of people's data and being accountable for it.

Also data being collected for one purpose and being used for another one, like you use for national, you know, like, national demographic population data for something else.  You give it to some marketeers, corporate houses for their own benefits so that is another problem.

Also data being -- also other sensitive cases of data collection and retrieval being procured to other countries because we don't have the expertise to manage our own data, which keeps us very vulnerable position.  In the absence of a comprehensive data law.

Then, again, there's the trustworthiness of social media platforms, which have been pretty active in most of these countries.  While we are using social media platforms in our day-to-day activities from our information sources to businesses, we tend to use all these big social media as vital tool for our information, for -- even for our businesses.  But no one questions the trustworthiness of it.

The government has tried to group the social media countries in this part of the world and other places as well, asking them to work in coordination, filter harmful data against national integrity and national interests and also establish a focal communication person for so that they can actually be in touch with these companies.  A few companies like TikTok which was banned in Nepal and some other countries in South Asia has adhered to the government's proposal, established a focal person, worked with the government for a data breaches.  But still, it's in a very nascent phase.

TikTok was banned by government of Nepal which have been lifted after they agreed to set up their centres and go accordingly.

Also, the ripe of misinformation in the platforms is ever increasing.  Political parties and political wings are using the Internet and social media to change narratives that have been abundant, which is everywhere, especially during the crisis, which is crisis during elections or some national disasters or the pandemics.

Whitewashing, smear campaigning, conspiracy theories and area of these enforced collective memories, at the same time memories shared public on social media have been very crucial during natural disasters and pandemics.  I'm not saying all is bad.  There is good things as well.  The recent floods, the use of social media and, you know, like, the posts made by citizens actually helped rescued many people as well.

Also coming back as a journalist I need to bring this together, the best example of consolidate open-source what we say is Wikipedia, which does not conform to historical recording practices.  However, Internet as a whole and social media are also great tool for open source.  Now as a journalist reporting with limited resources from this country, not being able to travel everywhere on foot, you know, like, I think open-source has been very crucial to my coverage on very sensitive issues.  It gives me multiple perspectives, angles, diverse ideas and approach to report.  I think it's a marvel for modernized journalism if you know how to use it.

So, yeah, the future, you know, I have been closely following the LLMs and as Marielza also pointed out how it's going to herald new ecosystem for the collective memory, is it going to be the future of collective memory is a question, particularly generative AI seems to have -- seems to have taken a technological leap with building new infrastructures for memory, while it also enables combination of various diverse and counter memories.

Now LLMs are being used to memorialize chat with historical figures and philosophers bringing them from past life.  There is this Silicon Valley (?) of saying, long termism and, you know, like, yeah, memorializing someone so you can talk to Russo, even if you use that like, I don't know, many hundred years back, so the Russo chatbot becomes more dynamic engaging in public memory with all the interactions with other people, quite exciting times, even the saturated discourses are likely to be dynamic again.

While AI could be the future of collective memories, it could be crucial to ensure participation of marginalized communities from the Global South in progress towards inclusive and multi linguism and multiculturalism is what I think so we cannot be left behind and our already vulnerable community is getting more vulnerable without the lack of -- with the lack of Internet and connected infrastructures.

So, I would like to end there and I would like to discuss more when given.  Thank you.

>> ONSITE MODERATOR: Thank you so much, Samik.  As we are advancing to the closing of the session, I go straight to Carlos Afonso.  Carlos Afonso has a master's degree in economics from university of Toronto.  Also a cultural studies and social political through thought (?) at university he has worked in human development field since the early '70s, cofounder of the association for progressive communication, APC, he coordinated the echo '92 Internet project with APC and United Nations.  He's member.  United Nations working group on Internet Governance, was member.  He's a Special Advisor of the Internet Governance Forum, was in 2007.  He was member of the UNCTAD expert grease on ICT and public television.  And he was a member of the un CSTD working group on enhanced cooperation.  He was member of the Multistakeholder Advisory Group of the IGF.  He's cofounder and member of the Brazilian Internet student committee and he's cofounder and chair of the Brazilian chapter of Internet Society.  And finally he's a director of the new path institute in Rio De Janeiro.  The floor is yours.

>> CARLOS AFONSO: Good morning.  Can you hear me?  Hello.

>> ONSITE MODERATOR: Yes, we hear but with a little bit of noise.  But, yes.

>> CARLOS AFONSO: Let me see if I can switch.

>> ONSITE MODERATOR: I'm sorry.  We are having difficult to listen.

>> CARLOS AFONSO: Yes.  Can you hear me now?

>> ONSITE MODERATOR: Yes, perfect, perfect, great.

>> CARLOS AFONSO: Thank you.  Good morning.  It is still morning there?  No, it's not.  It is 5:00 in the morning here.  You probably are looking at the map that is from Wikipedia, which I posted there.  And the map, as most maps are, is distorted, benefiting the northern sphere.  So the northern sphere shows much bigger than the southern hemisphere.

But the important thing is, the countries painted green are the countries which have significant Internet archiving services, like the Internet archives, like many other efforts to archive the Internet.

Countries below the equator, which takes most of South America, and also the Caribbean and Mexico, there is no indexing, no indexing of the Internet in those countries.

When I say there is no indexing, I say there is no significant indexing, which is worth mentioning.  There are experimental ones.  We are only a small institute are doing a project like that.  But it's too small to figure in the map, no.

In Africa, you have only one country with an important indexing service, Internet indexing, web indexing service, which is Egypt.  And why Egypt?  Because they have the Alexandria library which does Internet archiving.  Wonderful.  But it's only Egypt in the entire Africa.  No.

In the southern atmosphere you have only Australia and New Zealand doing significant Internet archiving.  This is a major challenge for the countries, the so-called Global South.

And we need to address that, because we are losing a lot of information, because as other speakers mentioned, the information on the Internet is anything but eternal.  It disappears.  And many government sites disappear when political issues arise.

And this happened recently in Brazil, several sites almost disappeared.

We are trying in Brazil, there are initiatives, but are not at the scale which could be present in that map.

But there are initiatives trying to do something.  And one of them is our -- is more institute to which we call the Grauna Project.  Grauna is a bird with a strong, tremendous resistance to environmental challenges, and so on, and is also figure of a famous cartoon in Brazil, which represented people, impoverished people, in the northeast of Brazil.

So, we use the name grauna to represent our project of trying to do indexing of the Internet in Brazil.  And it has two components, one of them is indexing based on the technologies used by the Internet archive by RQ.EPT, which is the major indexer in Portuguese language but does not index Brazil, index only Portugal.  And several others which use open source technology and the reproducible technology to index the Internet.

And this Grauna Project is a small server, which is a small box which you can carry in your -- with you anywhere, which has a copy of many information systems which are there to be used in remote communities which have poor or no connectivity to the Internet.  So, they have a reproduction of Wikipedia in Portuguese, for instance, in this box.  And several other facilities, information facilities.  So, this is part also for Grauna Project.

And what are we doing right now is the project in an experimental phase in trying to protect content relevant to the democratic processes, which is a potential target of hate attacks, censorship or political practice or eventually which cannot be backed up satisfactory, no. 

The Grauna archives store websites selecting and using selected -- using a methodology that prioritizes qualitative interviews and analysis of the political scenario.  It is very experimental in this experimental phase and priority areas are defined, like environment, health, cultural human rights.  But we have 18 areas to index and the challenges we are confronting are quite interesting, and we had to do it to understand why people are not indexing the Internet.  And now we know, it's very difficult.  It's a big, big challenge, no.

We have created several interesting features in the system for archiving, like the ability to belong to a group of users, for example, if a research group wants to have multiple users creating archives for the same project in the system.

Ability to schedule recurring archiving to maintain different versions, display of archiving, date and time which is typical of the major Internet archives, no.

And we have defined to begin, 18 themes from culture to government, racial equality, gender, elections, communication, et cetera, et cetera.

In recent years there have been several cases of removal of or alteration of the content of public information, as well as deliberate attacks on web pages.  There are also frequent reports from civil society about greater difficulty in accessing previously available public information.  Despite some relevant experiences in the academic field, for instance, at the federal university of (?) there is an initiative, an indexing initiative.  Brazil still lacks permanent projects aimed at archiving the web or a scale compatible with the brother rich of the Internet in Brazil.

Disappearance of information in all elections due to poor management or incorrect application of electoral is an issue which has to be considered, no.

And Gruana started in 2018, and we managed to get some funding from the open society, the media democracy fund, and others to help us start the project.  We have support from NIC.br with equipment and from the national research network which provides connectivity to our project.

And we conducted about 60 interviews about threatened websites, relevant content, security of their own websites.  And we also had a legal context documents prepared by our lawyer and regarding archiving of content which may be challenged by the actual owners of these contents.  And this is a challenge that has to be contemplated in these projects.  No.

In 2022 and 2023, we improved the infrastructure to ensure the necessary conditions for the system to run security.  And part of it is to provide almost realtime backup of the system, which is one major challenge.  If the minute the centre fails, you have to have a backup to run immediately, almost immediately.  And this is also a challenge that has to be contemplated.

We initially had 18 themes active with 227 archived sites.  And more than 100 dot-gov dot-BR sites, government sites archived.  That was especially important because there was a political transition in Brazil in which many of these government sites were challenged or disappeared, no.

The scale of indexing is much smaller than the Internet archive and others.  And this project is specifically aimed at preserving content at risk for several reasons.  It's also an experiment that seeks to (?) in contents that is public available but often extremely difficult to capture.  There are several reasons.  Use of increasingly complex technologies, frequent challenges in the technologies used, huge databases, sites with multiple depth levels, many other challenges.

There is the possibility of archiving that is not made public is one of the features that we manage to install in the system, which is useful, for example, for the storing of sites that promote disinformation and that we do not want to multiply but can preserve, there is still controversy, however, about the use of archiving as forensic evidence.

Preservation of the dialogues in Brazil about preservation of web contents are happening in the sessions, in dialogues and meetings of academics and the other interest groups in Brazil since at least 2019.  And this -- and we are discussing this now here at the IGF, no.

And in the IGF, I understand there is an intersessional network, a policy network dedicated to highlighting best practices for preserving and creating local content.  And that's a major challenge, for instance, for the original idioms, languages, which are a challenge in north region, especially, huh.

We are finishing the first version of the software of the system (?) and will be available on deep hub for free development use and application by other organizations and also by the public authorities.

And we are organizing a permanent curation team or committee to preserve more sites and review archiving criteria which is a big challenge (?) which was mentioned here by, I think, Marielza.  And advanced research on public debate on the formats to archive, which have to be compatible with several libraries and other (?).  Advanced debate on the authenticity of archives in WARC format so that they can constitute evidence, establish support partnership to advance the development of the project.

And train people to perform archiving, hire a team to perform more complex or large volume archiving.  Further improvement in useability of the tool which is already online, by the way.  Keep the system up to date in light of the constant information of the web and expand infrastructure to increase processing storage capacity.

Preserving content in any language is a complex challenge.  Brazil currently has more than 300 indigenous ethnic groups with more than 270 languages, all of which are risk of disappearing and with them an entire culture disappears.

Similar challenges acquired in other Latin America and other countries, in the Portuguese speaking countries and so on.  How can Internet resources be used to support the preservation and continuity of these languages and cultures is a big challenge.

That's it.  Thank you!  I speak -- I talk too much.  Here are the address of our institute, is (?).org.BR.  I will put this in the chat.  And the address of the Grauna Project is grauna.org.br.  I will put there as well.  Thank you.

>> ONSITE MODERATOR: Thank you so much, Carlos Afonso.

We have one question from the online audience and I will ask if someone in the room would like to have a question.  We have a question here.

>> PARTICIPANT: Hi.  I am a researcher based in Germany.  I would like to ask Ricardo, can you hear me?  You mentioned the link between memory or political agenda or collective memory and the elections in Brazil.  Can you give us some examples, elaborate how collective memory has been used and or has impacted the results of the elections in Brazil.

I also have a question for the journalist from Nepal.  Can you give us some examples of the relations between whitewashing and a collective memory?  And if possible, could you give us examples from Nepal?

And I would like to give -- I'm sorry, but I forgot the name of the first speaker, the only female speaker in the room.  Okay.  Okay.  If you can hear me.  Actually, I am also working on collective memory, and when it comes to the deaf people in famine and in natural disasters in the past, I couldn't find the number of, you know, that female bodies just because in the past only males -- only for each households, males dead bodies were counted.  What do you suggest me do, like, you know, doing to count -- to count this challenge and I think that is the question for all the panelists.

There was no data on certain issues in the past.  And maybe you apply for funding, when you apply for -- when you talk to your editors, when you talk to your bosses to convene someone of your research proposals they would ask you, where's your data, where do you get data from?  And in case there is no data due to historical indices, what are you to do?  Thank you. 

>> PARTICIPANT: Thank you for the interest and various interventions, and I was thinking about the interventions you mentioned, public institutions, NGOs or civil society institutions looking for memory, open-source initiatives also very interesting.  But I thought it is a lack of business interest on memory, like companies and compared to Marielza said that technologies are expensive and index is expensive.  We see in business today people saying cloud storage and cloud processing, it's cheap.

So, what I feel after this is that you have technologies that are cheap for business interests, for producing products and pilot services and expensive for memory.  And I would like you to comment about that.  Because we see that we have technology for private interests, they are cheap and available.  But when think on public interest, there is no market interest and not thinking only states, but on the public, the commons, common goods and the public interest, we don't have investments of states or even of business.  I'd like the comment the difference between the access and availability of the technology for private interests and public interests like memory, we have this, how way to do that?

>> PARTICIPANT: Hi.  I am Alex Mora, originally from Brazil, I work here in Saudi Arabia currently in the Carlos University and I have a question for Carlos Afonso.

As I have a past working back in RNP, the Brazilian academic network, I am aware of the challenges that happen in the science identification area where people struggle also to have data for scientific purposes, for educational purposes in universities, in research institutions.  And this brought me recollection that this is an open problem in Brazil, that we don't have any specific institution towards storage or preservation, digital preservation.

So, how are you tackling this part of the problem of the storage capacity for the Grauna Project and what are your thoughts on how Brazil and other countries can address the problem of the storage capacity for many purposes, not only for Internet memory, but also for scientific education and cultural and arts and et cetera?

>> We have one question that I received from the online questioners.  Dr. TV Gopal from Anna University in Chennai, India.  He says, Public memory is short.  Internet memory is seldom so.  Any solutions for the mismatch hazard in the geopolitical space?"

>> ONSITE MODERATOR: Very good question in very short time.  I would ask panelists to make their final remarks trying to address the questions, which are very important and interesting but I would also have to ask you to not go further than three minutes because we will have to close the session very soon.

Then we could start from backwards with Carlos Alberto Afonso, and then Samik, and then Marielza -- Ricardo Pimenta and Marielza, please. 

Carlos Afonso, the floor is yours.

>> CARLOS AFONSO: I will be very brief.  Good question.  I recall that the Grauna Project is still an experiment exactly to measure the difficulties which you mentioned, among others, like, for instance, backing up in realtime is a tremendous challenge.  The cost of doing that is already very expensive in a big scale.  That is why we restricted the breadth of the information that the project can capture to mostly civil society organizations web information and on the basis of this we will try to progressively expense.  This means more storage and more memory and more backup, which is tremendous, tremendous.  The challenge is tremendous.

Our idea with the project is also to provide a sort of small reference, but easy for reference for organization that could tackle the challenge in full and really do a Brazilian Internet archive.  And I have to say that all of the organizations that has these resources to do that, especially technical resources is NIC.BR and we do hope they consider this in the future.  Thank you.

>> ONSITE MODERATOR: Samik, please.

>> SAMIK KHAREL: Thank you.  I would like to address the question from the lady in Germany.  She asked about global parties and memory and whitewashing, I think.

So, like, it's been a common trend for major political parties in Nepal and in the reason to deploy what you call the cyber army.  What they do is look around the Internet and, you know, like, if there's any criticisms about them or any critical discourse about them, then they document that and go to make a counter argument against that to make their image better.  So it's very common for them to do that these days and to I inject some populist idea and go against whatever is trending.  That's how it works.

Anyway, finally, speaking of collective memory, with the ubiquity of Internet the way we are accessing collective memory, restoring, discovering and retrieving these cultural memories has changed with modern emerging technologies.  The way we have interact with our memories has changed and I think it will keep on changing with advents of LLMs and generative AI, mainly social media and platform identifications have implemented new ways to approaching these memories by allowing us to actively contribute to them, making collective memories more interactive and collaborative now.

However, we should be careful to ensure everyone has equal access and infrastructures to these.  There should be accountability of our data and the future should be shared, equal and be democratic and bring together all marginalized and vulnerable populations of the (?) world together.  Thank you.

>> ONSITE MODERATOR: Thank you so much, Samik.

Please, Ricardo, Pimenta.  I'm sorry.  Okay.

>> RICARDO PIMENTA: I will try to answer in one breath.  So, about the question about elections, the memory is always a place of struggle, political struggle.  Many people tell about the cultural side of memory.  But, okay, that is okay.  It exists, obviously.  But even the cultural side of memory, if we can talk about this, is a result from political struggles, struggles about power.  So, how it impacts precisely related to the fact that it is potentially violated or rewritten as a food of dispute by those who seek the dispute on fruit, political past, science and so on.  And let me tell you something about the tool I was talking about.  The Tempora.  The Tempora, this digital tool was created during the COVID-19 pandemic in Brazil.  From 2019 until mid-2022, we collected there with this digital tool almost 6000 notices from media in Brazil, Brazilian media.  Stories about how COVID spread in Brazil, something like that, and, obviously, the Brazilian media that didn't have a paywall.

So, in the process, most of the new -- most of the news stories produced by the Minister of Health in Brazil and other Brazilian government bodies websites had their links broken.  In 2019 this happened very quickly.

And then when -- and then we realized this back in 2019.  We tried to develop the system so we could save an image, a kind of PDF website and also scrap the entire corpus of news that soon tended to disappear.

So, how the question of memory could impact the elections, for example.  The elections is a place of struggle, dispute about discourse, about the past, the near past, and about projects of future.  So, we can afford this kind of thing and we need to develop something, some strategies to avoid that this kind of discourse could stay in some group, some political groups that could do all the things that we almost know -- we already know that they are.

So, the other thing that I think I can answer about the question of data and so on, it's perspective about algorithmic governmentality a new Virginia truth.  About data, I think our biggest data is the automatization of social existence.  I think we are all talk about that.  Automatization of social existence through computational process deployed in online media.  It's memory that comes from it will not be a memory preserved by the demands and conflicts of social groups, institutions, institutions or cultural practices.  This kind of memory -- sorry.  Rather, it will be mathematically elaborated by algorithmic devices that are in turn programmed by groups such as the (No English translation), that is Google, Apple, Facebook, Amazon, Microsoft and IBM.

In the end, the perspective of a political surveillance that we are talking about here today, we know that is something that we must fight against.  But the surveillance by the market, and many of us let this kind of thing happens.

So, this is correct -- is this correct?  I think there is a kind of ratification of the practices when we just give to (?), for example, our data in change of visual and information, informational kinds of consume. 

So, look, I don't agree with any kind of surveillance.  It is a fact that we all practice it on different scales.  The culture of following on social networks, the culture of attention that we all share a little bit, a little bit more, a little bit less, our surveillance practice also.  So that we carry out intimately and a valid way.  I find this difficult to overcome.  And right now the answer is that I don't know how we can solve this problem.  But I know that in stats, these types of questions is important.  Thank you.

>> ONSITE MODERATOR: Ricardo, thank you so much. 

And then we close the session with Marielza Olivera.  Please, Marielza, the floor is yours.

>> MARIELZA OLIVERA: Thank you.  It was a fabulous exchange.  Thank you very much for this.  I'm going to close with a very simple, very simple thing.  Curation is a political economic process.  It's as simple as that.

We have to ask, you know, whose memory is being preserved.  And, you know, why do we care to preserve memory now, if we didn't care so much before, you know, and the proof that we didn't care so much before is simply that physical archives are being left to rot, you know, essentially.  You go into warehouses of documents that are exposed to, you know, floods and fires and mildew and simple neglect and we haven't really digitized.  That should have been digitized since the beginning.

One of the statistics is about 40% of the birth records of people above 60 years of age are still on paper and not digitized.