The following are the outputs of the captioning taken during an IGF intervention. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid, but should not be treated as an authoritative record.
***
>> CYNTHIA LO: All right. Welcome everybody. Thank you for joining us this morning. Today we're talking on a workshop on connecting open code with policymakers to development.
And an agenda that we have here today, we're going to go through a round of introductions and then overview of connecting open code with policymakers. Then we'll move directly to our panel discussion and a Q&A.
Feel free to ask questions. Also, to our online participants, as well.
I'm going to hand it over first to Mike Linksvayer. To do an introduction.
>> MIKE LINKSVAYER:
>> CYNTHIA LO: One second. A slight technical difficulty. And...
One moment while I fix that. I'm going to hand it over to Helani and ‑‑ oh, never mind. We have Mike now.
>> MIKE LINKSVAYER: Hey, thanks a lot for solving those technical difficulties. I'm sorry I can't be there in person. I'm Mike Linksvayer, the V.P. of developer policy here at GitHub. Former developer myself, who is now been doing policy work, focussed on making the world better for developers and helping developers make the world a better place. Kind of open sources that a big part of the way that happens, and I'm real excited about measuring it and forming policymakers about what is going on. So I'm real excited about this panel.
>> CYNTHIA LO: Great. And I'll pass it over to our speakers here.
>> HELANI GALPAYA: I'm Helani Galpaya, CEO of LIRNEasia. Think tank on structure and a huge focus on digital policy.
Thank you.
>> HENRI VERDIER: Hello, good morning. French Ambassador for internal affairs. Just want to know I'm at a diplomat. French entrepreneur long time and used to be the state CIO of France.
>> CYNTHIA LO: Great. Thank you. So we're going move directly to our panel talk. And to start, let's talk a little more about challenges from unmet data needs.
Let's start with Helani here. What are some of the challenges that you've seen over the years on unmet data needs?
>> HELANI GALPAYA: From a development perspective, understanding where we are in whatever those development objectives are. That is the starting point of any kind of development. And that is a problem if there is no data.
The, and particularly when it comes to developing countries, which is where I come from, this is a particular challenge. Right? So traditionally we've relied on government‑produced datasets.
Take for example, the census. Every 10 years it is supposed to happen. And low levels of digitisation has traditionally meant it takes about three years after the census to actually get some data out in many countries. By which time, you know, the population has changed, migration patterns have changed, and so on.
But we know now there are obviously lots of other proxy datasets we can use. But, you know, timeliness is one concept. That we worry about in development. Because data is slow to come by. Even when it is available.
If second unmet need is, if you are outside of government, is the availability of data to actors outside government? And frankly, within government, sometimes the data that is connected by one department or ministry is not even available to others. Right?
So there is a very low level of data access possibly within government. And many governments have signed on to open data. Charters and all of those things. But really the data that they put out is sometimes not what most people need. It is not usually machine readable format. So you spend enormous amounts of time digitising it and datafying it.
So these are sort of, you know, basic challenges. And from the government point of view, governance and regulation in particular, the oxygen that feeds that engine is data. Because there is a huge data between the government and regulators versus the government. They have a lot manufacture information about the operations than the regulators or the governing party would. So there is really multiple data challenges that we have.
And increasing the conversation is that the private sector data can act as a proxy to inform development. But negotiating that and accessing that is particularly hard. So there is multiple data challenges in developing countries, particularly from our point of view as a research organisation sitting outside government and outside private sector.
>> HENRI VERDIER: Thank you for the question. 15 years ago, something like this, government understood that open government data was very important. And together we did work a lot to open data. And then maybe we'd commit later.
And we learned some lessons that data could create much more value if more people can use it. That it was a matter of transparency, democracy, and efficiency and maybe citizenship.
And more and more we understood that government (?) interests and some very important data. And as a private sector.
So it is time probably and that's the moment to start thinking deeply about the private sector data. In Europe, first there is a growing consensus that first we need to help research and promote knowledge. There are lot of topics where we have to know.
I can speak about this information. Some impact of social networks. Climate change or some important topics. We need more knowledge. And for example if you look at GSA, that we did adopt last year, we do organise specific access to private data. For public research.
Of course I know that there are important issues. Privacy. Intellectual property. Sometimes security.
Because if you share everything, you can organise ‑‑ you can allow reverse engineering and hacking and stuff. But we can fix it.
And for example, there is a important fete of research we are doing computing. You can use the data without taking the data. So this is a growing consensus and probably will have collectively the international community to make public research stronger and to organise ourself to be better understand important mechanism.
But then there are also other actors that need access to this data. And for France, first we do encourage private sector to be more responsible. Let's think, for example, at transportation ‑‑ transport industry.
If you don't have all the data, you have nothing. If you don't have busses and taxis and personal cars and motorcycle and metro and train. You don't understand the system and you cannot make good decision.
And this is in the interest of everyone. Public decision makers. Private actors. Everyone needs a good comprehension, a good knowledge of the system itself.
So we do encourage cooperation, sharing the data, et cetera.
Then we think that we can go further. Maybe you know the French economy Nobel prize, Jean Charon, and the common good and time to achieve some consensus to make the private sector share some important data. And those ‑‑ this year in the French government, we are starting to work deeply on what we "data of general interest." Because sometimes the government monopoly of interests. And some data should be considered as too important to allowed to remain private.
Of course this is complex. Because we need real framework to give a status to this kind of data. But really we did open the case to create a status for some very important impactful data. And to decide those data have to be open, even if they come from the private sector.
>> CYNTHIA LO: I do also wonder, you mentioned one thing about policy and standards. A lot of meta data has very clear standards and financial markets and health care, for instance. I'm curious to know, are there any unmet needs within the standards of meta data.
Is there a certain standard that isn't quite out there that could be?
>> HELANI GALPAYA: The data we deal with. My data scientists deal with telecom, mobile telecom network. Big data basically called detailed records, for example that we get from base stations. Trillions of call detail records.
These are not standardized under my means. Right? In fact, you know, the teams spent 4‑6 months cleaning up the data. Because across when you get data from 4‑6 telecom operators, they are actually not standardized how the numbers are. You know, they are not interoperability standards. There are many, many sectors where it isn't interoperability standards.
And of course some of the cooler stuff that comes out is from unstructured data, anyway, right? Like social media data and so on.
So I think, you know, the financial sector has traditionally been well forward in this. But many other sectors haven't.
Health, I think in developing countries is less developed. Interoperability standards. And certainly for cross‑border data sharing, this is a fundamental problem. Right? Like when you look at taxation data. All of that. Lot more work needs to be done particularly when it comes to developing economies. Approximate.
>> HENRI VERDIER: Yeah. I join ‑‑ (?) data policy. The lesson learned is that if you wait for perfect que expe and good meta data you will never do anything. When I joined ‑‑ wanted to index every dataset through an index that was conceived during the middle age. For the national archives. With 10,000 words. So it was quite impossible to publish data. Because you had co‑come back to Philippe Robell to decide where... . So I take from, to share your raw data as they are. And don't wait.
But it doesn't that standards don't matter. Of course they do.
But let's start by publishing. Second lesson that maybe the APEification process is more important than the meta data themselves. First during maybe five years, publish everything. That it was not always very useful. Especially for data that has to be refreshed. And you need the last data and not just a data.
For this three years to a proper API ecosystem. And then people told me first you have to conceive good architecture of the good API. I said no, let's do API and then we will optimise the system.
So my personal experience, don't wait for a perfect strategyisation. Because you will never exceed to this goal. This is a moving target. So don't wait.
>> CYNTHIA LO: Thank you. And I think that brings us to the next point quite well. You have highlighted this as well, on private sector data for development purposes. And I know Mike also has some thought on that. But I'd love to know. You mention on private sector data, lot of times it is a little unstructured. But that is interesting because you have ‑‑ it's wider. You can take a look in and out, analyse that. In easier way.
Tell us a little more on that. What has been a surprising find?
>> HELANI GALPAYA: On private sector data?
>> CYNTHIA LO: Yeah. On some unstructured data.
>> HELANI GALPAYA: So unstructured data that we work with include, let's say for misinformation and disinformation identification. Automatic identification of mis‑ and disinformation that spread across platforms in languages outside of English in particular.
And there I think ‑‑ well, this is sort of two types of problems. One is just the lower levels of data. So you know, even assuming you have all of the language resources like a language corpus that is needed to identify this using natural language processing. You are at some point going need a fact base to cheque against.
So the unstructured is ‑‑ well, structured or unstructured data comes from government resources and maybe other sort of credible ours sources. Right? So you are dealing with two types of data. To fact cheque numbers. You are looking at usually trying to find government. And to fact cheque other things you are looking at reports and so on.
And there is a serious lack of data. For example if you look at the big popular English language models, they are trained on millions of articles.
We tried this in Bangladesh and Sri Lanka to fact cheque. We're down to 3,000 articles that are credible, you know, sort of data sources that we can use to fact cheque against unstructured. So we're working with a limited universe of credible data out there. There is very much little out there. I think that's for us the biggest challenge.
>> HENRI VERDIER: Is very complex question. First I was thinking that completely unstructured data are very aware. Because usually someone they produce data and it pay something. So answer to one certain question. But is that your question? So they have a structure, usually. Of course in the world of internet of things and sensors, you have more and more quite not structured data. But if you do excel E we are living in the world of data with purpose. So they have a structure. So question again is to think about interoperability and to build bridges.
One other question with unstructured data or with minimum structure is that, if you want to share the data to give them as more value as they can have. You also have to protect other important securities. Like, again, privacy. But not just privacy. Like interoperability.
And if you don't ‑‑ what's easy within the data. You are not sure that you are protecting all the securities you have to protect. That is why I pay more and more attention to the field of research, as I said, of shared computing. We have to learn to work with the data to train a model to ask questions.
For example, in France, as you know, we have an structured social security system. There is one database. Social security. With every precipitation that every French doctor made during the last 20 years.
Can you imagine this? 70 million people. Every prescription made by doctor 20 years. And we make statistic archive. So you take 1% and we...
Here of course, you have lot of knowledge and science. You can discover new drugs. Because you can discover that I don't know. Someone had lot of head ash at the age of 20. Alzheimer's 40 years later and discover some new drug and lot of things like this. But you cannot just open this kind of data. Because this is pure privacy. This is my health and your health.
But you can organise technical strategy to exceed to this data without sharing them. And if you do this, you can control a bit. Using the data. And if they don't respect some laws or principles, you can disconnect them.
So this is probably important field. So again, I'm not looking for a perfect strategyisation. But we can organise ecosystem of how to access the data when, why and ‑‑ and given as a relation between the knowledge of data.
>> MIKE LINKSVAYER:
>> HELANI GALPAYA: I agree with the minister. Some of the, quality records to still have the data be as usable to inform policy. But without revealing, you know, where an individual might actually be or you know what that person's number is and all of that.
The other part is policy. To have a governing structure to make sure that we are able to use it and preserving the privacy, and having some sort of rules around what the user, what the data is used for. Like in the health care system, insurance companies cannot use it and then drop, you know, private insurance companies cannot drop coverage. Because they have so much more information about a set of users. Even if they are not individually identifiable. Once you are in insurance pool, you can identify this is a much higher risk. So sort of policy as well as technical solutions there.
>> CYNTHIA LO: On the privacy part, I'm curious to also know, hear from Mike. What are your thoughts on privacy? And ‑‑ sorry, private sector data. Love to know your thoughts on that too. Or anything to add.
>> MIKE LINKSVAYER: Okay. I first would say I should have said in my introduction and shouldn't assume people know what GitHub is where I work. It is the largest platform where software developers around the world come to develop software collaboratively. Lot of it open source. And there are a lot of themes I'm hearing that you can ‑‑ software development is kind of a very specific thing. But think there are a at lo themes that talked about unstructured data. APIs. And privacy that maybe I can paint a little picture about how it works. With data about code development. And that the code that programmers are writing is data itself. And indeed you can think of it as unstructured data. It is a txt file but also each programming language has its own structure because it needs to be able to parse the individual statements.
So it is really a matter of how much work you want do, and what are the questions that you have. That you have about, for example, software developments. And then APIs is another aspect.
If you want to crawl all of the code. We call it repositories. Where we a project on GitHub or similar programmes. Collaborated on. If you want to call all the code in the world, it will take a long time and be very resource intensive. GitHub and other platforms also make APIs available. I think that is another common theme we can look at how exactly that looks with code. Or you can both do queriries to ask questions about kinds of projects that you are interested in. Or you can kind of try to just all the activity as it comes out. Because GitHub as a very open kind of everything, all events feed. But that also is extremely expensive to do.
Some researchers who do research around programming trends, I don't know, cybersecurity. There is a bunch of different research areas that you can look at GitHub to do.
You know lot of them spend lot of their time kind of gathering data before they can even answer or, you know, kind of validate whether they are asking the right questions.
So one approach to that and dealing with privacy is publishing aggregate data that will be, you know, helpful for some use cases. And that is what we've done with a new kind of initiative we have at GitHub. We're calling the innovation graph. Which is longitudinal data on a per country ‑‑ per economy, roughly country basis at various kinds of activity. And so that we we did it particularly too in forum on policymakers and international development practitioners who want to use that data to understand things like digital readiness within their sphere of influence.
And to ‑‑ we're able to ‑‑ publishing aggregate data kind of satisfies some of these use cases, or at least allows people to explore the aggregate data to figure out what they want make an investment in, you know, crawling more. It also certainly deals with the fundamental privacy questions that you don't want to identify, you know, individuals and things like that. So you can do that by thresholding certain number of people have to be doing an activity within a country in order to report aggregate statistics. On ‑‑ so that covers a lot of different themes I think we've heard covered there.
And I think there is a tonne of promise in a range of technologies like confidential commuting, differential privacy, and excited about them all because developers are bringing them in a lot of research. And lot of the R&D is open source.
But simple ‑‑ I guess I'll highlight here, you know, very simple approach ‑‑ kind of very low tide approach of as a first, you know, step and sharing data can be just sharing data that doesn't have any privacy concerns can be, you know, that's ‑‑ it's actually very much kind of, to Henri's point about, like sharing data before you do all of the standards work. Because that will, you know, you might be waiting forever.
Also, sharing aggregate data is a way to kind of take that first step. Share data that is going to be useful to a range of stakeholders. And then, you know, work on the harder part. That might be, you know, pending more advanced technology to deal with, umm, harder issues.
>> CYNTHIA LO: Yes, please.
>> HENRI VERDIER: First, we know on GitHub. When I was a French CIO, France was a government (?). And by French law, every research government develops has to be open source and free software. So open source and reuse able.
And more than this, many times the government use to take a decision. Has to publish the source code. But also to tell to the cities that we are using. And to be able to explain in simple word how it works. So that is an important policy. And regarding structure or unstructured data. What I learned from my open data experience, as I said. The first duty is to share data as they are.
And then some people will structure. And if we think about GitHub. And in France, the software research project. Some researchers decided to build biggest possible archive of every software. So GitHub, better source and like Google one and they are working hard to structure it now to be able to. So they are working. But we did allow this because we did publish software. And then some people can continue. And maybe someone will do better but we'll have a variety of experiences.
So my lesson is to separate. First publish. And then structure.
And can have a diversity of attempt to structure if you have a common ground of raw data or software.
>> CYNTHIA LO: I think you mention ad really interesting ‑‑ oh, sorry please.
>> MIKE LINKSVAYER: Yeah, I just wanted to add. Thanks for cherishing GitHub. I definitely cherish software heritage. And really archiving is almost a third part that is also extremely important. And I think underinvested in. So I think in the software preservation space, software heritage is doing an amazing job. And I think that's, you know, preservation of data is something that can be decoupled from the making available unstructured. But I think is extremely important to think about.
>> CYNTHIA LO: I think we actually have a slide here on the innovation graph that Mike mentioned. I also saw in the audience here. We have mylaKumar who helped on the research. Because we want to understand exactly what type of data would help. And what type of data would public sector or the social sector require. And as mentioned, we have the API, which is that large set of data that Henri mentioned first. And now we've gathered all of the datasets into specific aggregated data that based on economies, that Henri had mentioned. Mike I don't know if you want to mention anything there on. I think you might also be able to share your screen if you would like. But huge thanks to myla Kumar joining us online.
>> MIKE LINKSVAYER: I could share my screen briefly, if it would useful. I'm not sure if folks will be able to see. But maybe I'll share. And you can tell me whether you can actually see it in a useful way.
Can you see anything on the screen?
>> CYNTHIA LO: Yes.
>> MIKE LINKSVAYER: Okay. Great. I think I'm sharing a window that has the page for France and innovation graph. So this is just to show that we have a bunch of data on a per economy basis. Some of them are fairly technical. Git pushes codes to GitHub. And you can see that summer vacation actually happens. And repositories, as I was saying this is the kind of unit of a project on GitHub. In similar platforms have, are using the same concept.
Developers, people actually writing the code or some cases doing design around software project. Organisations, which is kind of a larger unit of organising projects on GitHub that sometimes correspond to a real world organisation. Sometimes do not.
Programming languages. This can be very useful for thinking about scaling within a country.
And licenses are about copyright.
And then probably ‑‑ oh, and then topics. This is currently very unstructured. Basically maintainers on GitHub can assign key words to their projects. But this can also be ‑‑ so it is in a very noisy data. But it can be helpful in, you know, really diving into like identifying a set of projects that you want to study more. And one thing that I'm excited about. And so you can tag with any kind of text.
So even going forward people might tag that, you know, your project as relevant to sustainable development goal and we all navigate the text in that way or topics in that way.
And finally, perhaps most interesting and new is this kind of trade flow diagram. You can see economies like France is collaborating that developers are sending code back and forth.
So you see U.S., Germany, Great Britain, Switzerland, it's not surprising those are some of the top ones. You can also combine all the EU member states.
And this is a first release. There is lot of exciting analysis that can be done. The data is actually open. In the repository. You can see the data here.
And at the end of the day, data can be extremely boring. This is literally a csv file. But that boringness is fantastic. Because it means that you can use your tool of choice, whether it is a spreadsheet. Jupiter notebook or something fancier to analyse the data.
And I'll show quick the reports that Cynthia and mylaworked on. Useful for international development public policy and economics practitioners. So did lot of, you know, discussions with entities that are part of the Data Development Partnership, for example, to help design this. And then I also pulled up software heritage because I'm a big fan. They have a page on here showing all the different projects they index.
I cherish that too. I'll stop sharing. So if people later have questions about a particular country or metric, happy to share again.
>> HENRI VERDIER: Thank you. Very promising. We did agree the best policy is to first publish and think later. But we also have to see and to understand.
I observe that we are more and more living in a world of inter‑dependant, free and open source software. And there are dependant and security issues. If we don't understand a bit, the very structure of the soft ecosystem we are living in, we'll have to face important concerns. We can remember (?). Sometimes when we discover a security failure because we don't know the story of the evolution of the codes, the fox, et cetera. We are note able to correct everything because we don't have a proper vision of the history and the evolution of the code.
And that's very important new frontier. We have to build new tools and new approaches to understand and to control. This is very complex system of softs. Do you agree.
>> HELANI GALPAYA: Yeah, I completely agree. I think, you know, Sri Lanka, just one example. Has a really vibrant open source community. So this kind of data if they are using GitHub primarily. It could be really interesting to understand the evolution of that community as one thing.
But just on the, you know, many countries are technology takers and product takers when it comes to e government systems. So don't have the luxury of saying, you know, everything will be open. They are buying software from big companies which will not certainly make the code open. Not even APIs. A very close, tight, licensed system is what they are buying.
And I think as countries go along that technology maturity road, like Sri Lanka came to the point where there was enough capacity with the CTO, with the government agency who was able to say, okay, we will build some of this in‑house. I will use the open source community who is working around the world to build some of these tools to set up the basic government architecture. But that takes a bit of time I think to get to this stage.
Because the easiest thing is to get some donor money and do a procurement of a closed system. And that is really problematic. Yeah.
>> HENRI VERDIER: One comment. When I was in charge, the budget for buying software in France was 4 billion euros a year. Half of it was consumer products. Like I don't know. Windows or. So for this we cannot negotiate. But for half of this, 22 billion. And here you can decide by law anyway in the procurement, the software has to be open. And we are trying to this. And quite a standard for French procurements.
>> CYNTHIA LO: I'm very curious been talking lot during IGF about digital public goods and how that could be discovered a little more. But that is maybe a little bit off course. But made me think a little bit about that.
>> MIKE LINKSVAYER: Actually if I could answer. It is actually not off course in a way. Maybe I can tie it in.
>> CYNTHIA LO: Please.
>> MIKE LINKSVAYER: And maybe I'll share my screen again really quick. This might have been something you are planning to talk about later but I think as good opportunity to actually.
So this I'm sure sharing now is the digital public goods registry. Digital public goods could be software, data, AI models, could be lot of different things. But it is mostly software. In fact you can see the breakdown here. Between software data and content. And you can see they are all tagged in relation to a particular SDG.
Part of the ‑‑ a big part of motivation here is we're going to find and share solutions, you know, to progress on various SDGs.
The same kind of concepts can be useful to just, basically, umm, curation of information about open projects is its own data project in a way. And can be very helpful in not reinventing the wheel, finding that, you know, a government or Civil Society institution is already, you know, surveying a particular need. And that software was developed in country A, and people in country B can maybe take it and use it. Or customize it. And until they have a little bit more sovereignty or autonomy, to use those words that are quite popular now.
And the way it is really tied together I think is that the ‑‑ yes, we're ‑‑ these are tools that can be helpful for development for SDG attainment for sovereignty. But it is also a data project, kind of doing this kind of organisation. And you know, which the its own effort.
And I'll stop sharing now.
>> CYNTHIA LO: Thank you, Mike.
I did also want to highlight the Open Terms Archive, which I believe is a digital public good w linked with France. And back obscurity, ways to public record a reversion of a specific term. And I think it does tie in very well with security. And I was a little curious to go to the next slide about our topic on data, privacy and consent. And then also widely on security.
Would love to know some of your thoughts on how to really safeguard the ‑‑ all the data that impacts the users. How should public or private sector provide data that is secure and ensures privacy?
It is a big question. And there is no perfect answer, of course. But another way to think about it is, if there is one suggestion for private sector data, who would there thinking of releasing datasets. If they released a wide set, is there anything they should keep in mind before doing so?
>> HENRI VERDIER: Yeah. That is very complex question and there is no silver bullet. In Europe, we started with principles. So the GDPA, which started in France in 1978. Decided that regarding personal data, data speaking about you, the consent of the user is needed. So it is mandatory.
So then we had to ‑‑ you can conceive legal approaches or technological approaches. And I'm very interested in project digital and privacy architecture. That does organise technically a way to cheque the content in a way that's try to be in infrastructure to unleash innovation.
This is not a burden. This is infrastructure for innovation. And you can implement on values approach and some are better than others. But there are strong principles there.
And just to mention, sort of legal controversy between France and Anglo‑Saxon countries. Because we consider personal data as something like your body. You are not the owner of your body. You cannot decide anything regarding your body. And you cannot decide anything regarding your personal data.
There are some fundamental rights. In the world of the copyright, this is a different approach. And that's great. We can exchange on that. But in France we are very ‑‑ we have strong commitment that you cannot treat personal data as an average data.
>> HELANI GALPAYA: I think this approach many countries are taking, seeing a difference sharing data and very different from personal data. I think we talked about it earlier as well. I think what the minister is talking about the policy legal and technical solutions. And I think at a practical level there is private data but also commercially sensitive data.
Our approach, for example, was to say we will not work with one telecom operators data. Because that is highly commercially sensitive. Where the base stations are, which direction it is facing. You know the power on those base stations, et cetera.
We will say we're going to the kind of data and analytics to understand, you know, where people live, where people move. All of that is possible with mobile network data. But we will only do it if we have more than one company contributing data. And then we sort of nominees at a company level. The base stations are not no one where the company X or Y. So the more data that you pool, that brings another level of protection on commercially sensitive data in our case. Yeah.
>> HENRI VERDIER: Statistically. Can be useful for some purposes. If you want to make ‑‑ if you want to understand where people, population goes in case of natural disaster. If you want even to cheque if France or Germany respect more of the law under covid. You know that we did respect more than Germans. We learned this through operators data. Because of course everyone ‑‑ German would have been more, more strong.
So you can have very important use of statistic data. But except this approach, I think that you can never really anon mys the personal data. At some point someone will find you. Here you need other approaches. Like confidential computing. Technology solutions.
>> HELANI GALPAYA: Agree. Agree. And I think sort of it depends on the situation and what the company is releasing data for. Right? And what we're saying is, at aggregate level there is lot of views you can make out of it. You don't even anything even remotely identifiable. You can talk about groups of people. But code was classic example to. Understand movement that was enough. Facebook check‑in data was being used in some governments to see where some people are. But at at some point looking at outbreak and trying to contact trace using data. That is different level of privacy violation and you need legal backing to say okay now this is an national emergency and I'm now going to identify who owns that cell phone because we need to know where that person may have spread, moved and then spread the virus.
To it depends on the question you are asking really. What company data can do and what the safeguards you.
>> CYNTHIA LO: Thank you. Also want to make sure give an opportunity. Mike if you have any thought on safeguards and privacy and consent on private sector data being released?
>> MIKE LINKSVAYER: I think really all of the key points have been covered already. So I don't think I really have anything substantive to add to.
I mean directly, but maybe I'll just relate it to another thing that's happening now kind of related to open data and open code. Which is a debate around how open, quote, open source AI has to be? And lot of the reason why there is a link is lot of times data can't be fully opened. Before privacy and ‑‑ and other reasons.
And yet society can still benefit from having some of the outputs of that training, often called the model. And so there is kind of a debate about what kinds of sharing of data that's being used to train and open AI model makes it open or not. To some extent. This is a very academic debate.
But at the same time it could end up being, you know, reflected and law as you know because it is often recognised that that open source might need special treatment because of its non proprietary nature.
But, you know, it can be ‑‑ there are kind of a bunch of different ways that you can for data corps us used to train AI model. The raw data is extremely useful. But there are other things that can be useful as well. For example, you know, a description of the schema off the data you are using so're people can bring their own data and replicate the model. If two parties have access to similar private datasets then they can be close substitutes for each other. So I think that is a burgeoning area all of these issues kind of come back together around.
>> HENRI VERDIER: This is not just academic issue. Question of which data use use to train the model.
First, you are in a California I feel. I read one of the important reasons of the screen writers cheque was digital AI because by a want to be sure the work would be respected.
So you can have very concrete and important impact. And if you don't pay attention to. This first we will delay all the international architecture of interroup rability. And then we create new balance on inequalities. Because some big companies will take profit of every creation of every all human kind. Because they will take everything, everything we dream, write, learn, publish, share. And they will use it to train some big monopolistic models.
From my perspective not just economic. One of the most important topic of the days. And we have to be sure. And we cannot just think about security issues. Security concerns.
So the trustability, if I may, of how was this model educated is very very important issue. And we don't have proper answers today.
>> MIKE LINKSVAYER: Yeah. I agree with you. Just to clarify that. The economic comment was exactly what ‑‑ what can you call open or not is this thing that is somewhat academic. But the fundamental issues are extreme importance. And I really appreciate the, you know, French government's direction around open source AI. It is extremely important.
>> HELANI GALPAYA: Just to say like a million conversations about training data and the problems of using certain data for training. I don't think this is the forum for it.
Women, people of color, developing country people are the receiving end of decisions made by models made ‑‑ that were trained on data that does not talk about them. So but you know that is a whole other field. So I don't think we need to talk about it. Just to say that completely agree. The issues around training data are very real and huge.
>> HENRI VERDIER: Important concerns is definition of privacy itself. Because 10 years ago to protect my privacy, I just had to protect my personal data. And I was protected.
Today, I can know a lot about you without knowing anything about you. Because I ‑‑ly educate a model and it will predict something about you. So I can not protect myself just while protecting my personal data.
>> HELANI GALPAYA: And not living in the digital world is no longer a safeguard against not being profiled. You could profile me even no e‑mailed address, no presence online. So yeah.
>> CYNTHIA LO: On privacy, I think being able to layer in different datasets. As a result, you have a profile of a person. I think and it's fascinating different datasets.
I think as I'm looking at the time now. I want to move on to our last point on promoting and supporting open code initiatives. Considering all the topics with security, safeguards, privacy.
What is the best way to really promote open code initiatives? And how can member states do so?
>> HENRI VERDIER: First, there are more and more approaches, great. So you have a strong European policy, for example. You have a network of open source officers in European government.
You have, I did mention the French law. It was named law for digital republic. That imposed the government to publish everything on open source. And reusable. We are promoting, this is European foundation for digital comments. Because we want Europe to take its responsibility. And to contribute to finance, comments that are important for freedom and sovereignty and self determination.
So there are lot of initiatives. But the more I work on this field, the more I I observe that financing is not enough. And maybe it is not the most important part. Really using free software, open source, contributing, allowing your public servant to contribute. Paying attention, for example, when we did prepare the DSA, we did quite kill Wikipedia. Because we said companies with more than, I don't remember, 400,000 connection a month in more than 70 open countries, has to be a legal represent in every European state.
For big tech companies that is not very expensive. But for Wikipedia, that's very expensive.
So we need a conviviality. We need proximity. We need direction and mutual understanding. And this is maybe most difficult today.
>> HELANI GALPAYA: I want to add two things to this. I think someone capacity. Public sector had very low technical capacity in many of the majority world countries, you know, in the developing world.
And the expectation of ‑‑ except for a handful of public sector officials, anyone else being able to contribute could. I mean it is a dream for many countries. Right?
So maybe what they ‑‑ what we need are ‑‑ and that's great if you can do that and that's kind of the aspirational stage you want to be.
So instead of that, another solution is to build the communities. Because the private sector is a lot more evolved and highly skilled. Right?
So like, I keep going back to Sri Lanka. But the really vibrant open source community, highest number of contributions to aperture for example. That comes from Sri Lanka. That comes from, you know, being in high paid exploratory software companies. But couple of people really getting this community together to create this.
So how can they participate in government‑related stuff? I think that needs two things. One is that community building. But they can't participate in government procurement. That is really hard. Government procurement is a system that puts out a bid and gives points to a company that has done this 10 times before in five reference countries. Right?
A group of people who come together who don't have that references, it is very hard to signal that they can do this.
So I think there is some problem there.
Then at a practical level, I think if you want to maybe, you know, not go all out but at least give some preference for open source, some governments what they do is, you know, out of 100, allocate 5‑10 extra points where you will get as a bonus if you are proposing a open system.
And there are variations on open, completely open code. Free and open. Open ‑‑ you know, open AI et cetera. So graded systems marks in the procurement. So different types of company can at least have hope of participating and competing against the large firms. This is kind of same strategy that governments in the south have used to promote local companies when it comes to government procurement of itself systems.
It is very hard to compete with. I personally.
Ed pension systems. Big company will say I've done pension systems in five countries. It is very hard for local company. So we say if you at least have a local partner first year technical support. Second year actual deployment. You get five mores so same way you can build this legacy of open source by allocating mark oversight time in procurement systems.
>> HENRI VERDIER: Definitely a point. And that's interesting because if you do observe the story of governments, they had teching skills to build bridge, roads, railways. And there is something different in the history of IT. Maybe because started in military era. As you know. With project to launch rockets from a submarine.
It was from the beginning very big procurement. Very expensive with bizarre rules of conducting projects. And government should learn to work with ecosystems, as you say. To be maybe a bit more humble to learn about (?). To agree to start with imperfect project and to improve it. To have a consult improvement policy. So this is a cultural change.
And just to finish, maybe this will be time to conclude. That is why from my perspective, there is a strong connection between open source movement, open government movement. Because you need to learn humility, to be an actor within a network of actors. And a (?). Maybe the new democracy that we need.
You cannot work just on one of the three topics. You need to cross the three topics.
>> CYNTHIA LO: , thank you. And I think look that time we're almost at time but before we go to Q&A I want to make sure, Mike, if you have any thoughts as well on this topic of promoting and supporting open code initiatives.
>> MIKE LINKSVAYER: , sure, I have everything I already said has been great and I have too many thoughts but I'll just say one thing. You know, I think ‑‑
>> CYNTHIA LO: Okay.
>> MIKE LINKSVAYER: What doesn't get measured doesn't get paid attention to. Fancy we have free and open source software advocates within governments now. But a much broader set of policy makers need to appreciate the role that open source playeds in the economy and development. Et cetera. And that is one of the motivations of the innovation graph that we, you know, launched that we want to ‑‑ if you want to see numbers that are kind of tuned to your jurisdiction, then you can ‑‑ you can look at those even, umm, you know, even if you don't have a fundamental appreciation of open source. And understand that it is a really big driver of jobs, economic growth. People have used GitHub data to show that, you know, more, including policies that support, that foster open source leads to more start ups, more jobs and things like that.
There is a really important study are commissioned by the European Commission, I guess, several years ago, kind of putting a floor on the contribution of open source to the EU economy. Of I believe range was like 65‑95 billion euro year. So quite significant. And would love to see that replicated in, you know, in other jurisdictions in a way that is very legible to policy makers who don't know anything about ‑‑ don't have any affinity for open source. Don't know anything about technology necessarily.
So I think those making it legible is super important.
>> CYNTHIA LO: Thank you, Mike. And I think before I ‑‑ before we move to our Q&A, particular open source in the social sector. There is a lot of organisations that work in the social sector that are also open source. We mentioned digital goods. And also research in India, Kenya and Mexico, taking a look at what were the drivers for social sector, open source, organisations. How are they funded? What are their initiatives as well? That I think in another section question explore more on open source in the social sector.
Okay. And I believe also Milah Kumar was instrumental in leading that research.
As we open the floor up to anybody who has any yeses here in person? Please.
>> AUDIENCE: Good morning everyone. I'm from Nepal. And I was very curious to attend this because I have a lot of questions. So the first one is how do you incentivise these entities to actually share data?
you incentivise these entities to actually share data. When you think about different actors that exist to improve the society. You have private sector obviously. You have got government. And you have very influential INGOs and UN that work. So what are some of the ideas what has worked, maybe in Sri Lanka ore other parts of the world, to incentivise these different actors to actually share data. In whatever format. Whatever privacy‑setting format. The reason I ask that is one of the things in my previous life before parliamentarian. What I've seen is there is a massive incentive to hoard the data and then come up with insights to then present and say okay, I have some advantage over everybody else. That then sort of warrants funding for me to go out and do something. It could be going and distributing relief material when there are earthquake.
Or disasters. For example.
Right? That's one.
And then it will be really great to understand a bit more on this French procurement. Law that you mentioned. That require certain percentage to be open source. In Nepal we have a very big ‑‑. Anything open. Tay think anything that is free is not good quality. Et cetera. So we tend to procure ‑‑ and you are smiling, maybe because we see the same problem.
>> HENRI VERDIER: That is exactly the contrary.
>> AUDIENCE: Well.
>> HENRI VERDIER: ‑‑ closed system. You don't know if there are back doors.
>> AUDIENCE: Right. I think maybe also, I understand, but how did you go about building that level of trust in open sources if you have seen ‑‑ if there was something fundamental you did?
I think also maybe pertains to the capacity, right? How many people do actually have the capacity in Nepal to go cheque the open source code and then see if there are back doors. So what are some of the inbuild assumptions that you have? And what are the maybe very focussed attention that you paid to strengthen those pillars to then bring this level of trust in open source?
Let's start with that.
>> HELANI GALPAYA: I'll go on the data part I think. Sort of the official answer is it's actually very difficult to get the incentives right for the intersharing, right.
Data power. And therefore the incentives are to hoard it. Whether you use it or not actually. That is the interesting part.
So we've spent the past year looking at public/private data partnerships across Africa, Asia, Latin America, Middle East and the Caribbean. And mappeddic, like, over 900 different partnerships around data.
And done some in‑depth case studies. And we see a couple of things. One is that data sharing is really high transaction cost activity. Right?
Because you know capacities are different. Particularly if you are dealing with large company and trying to get some data, you don't even know who to reach because they are regional manager, marketing manager, somebody in San Francisco, et cetera, et cetera. Right?
So it is high transaction cost. And what that does is it privileges the really large companies. Because they can come negotiate with the government, spend the money. And they can also enter a market and subsidize something with data with a very long term view. Microsoft for example is case in point where they can go and do something in a country that is in the early stages of development. Digitisation. Because in 10 years when everyone gets a computer, you know that operating system is more likely to be a Microsoft one.
So they can make those kinds of investments for the long‑term in data partnerships. Many small ones don't.
So partnership building. This is why I said the easy answer is it is difficult. Because partnership building around data are really difficult. So the incentives have to be set up. So we talk often about this incentive of, you know, you can get data from Uber, if it is in Nepal. But I'll talk about Sri Lanka, that has some present age market share. Uber can give to government ‑‑ society to understand where people are or something. But actually if you now combine with two other local taxi companies and share the data back with Uber and everybody in a non‑commercially sensitive way. It is now suddenly much more useful to Uber. It is useful to the local person. Useful to the transport planning person in government as well.
So you kind of find is incentive system that makes it worthwhile for the large and small operators to come and play. And then you set up the technical infrastructure for data sharing, of course. Right? And you give them the kind of confidence that says we are not going to share sensitive data, you know, like in the telecom example I gave.
You also then put the legislation around it for telecom data in particular. We really have to sort of make sure telecom regulators didn't have a problem. So you need to research exceptions or public policy or journalistic exceptions in data sharing, particularly comes to sensitive data.
So I mean, reaching those transaction costs and getting the incentives right? I ‑‑ those are the broad principles but finding the ‑‑ case by case. Successful ones are often like a middle broker involved in getting these data partnerships going. Right? Someone who can convene multiple people. So classic example would be in India, UN had a ‑‑ now defunct but UN data governance system sort of ‑‑ you know.
They would sort of sit in the middle and convince government that they need to play in this data game. That they need to use private sector data. They developed that capacity. Because government doesn't automatically say I'll use private sector data. Right? And sometimes governments can't say that either. Because, you know, like the census department often has a rule that how to shalt conduct national surveys. Not use call detail records for population projection. Right? So they don't ‑‑ so work with government. Then bring like five different private sector players to together.
Sometimes it vofts paying for the data. Sometimes setting up the incentive systems. Global partnership for sustainable development data in Africa brought together the group on earth observations which allowed satellite data about Africa as a block. And any country who wanted it they made it available. So data brokerage plays a role. Not saying a government can't be a broker. But the role of data breaker is really important. Otherwise what you have is one of data attractions. Everyone managed to get some Facebook data to understand where people were. That is not really useful because now covid is over. None of those data is flowing anymore to government or Civil Society.
So to set it up in a sustainable way that you can understand development and use that data requires a bit more.
>> HENRI VERDIER: Thank you for your precise and important questions.
First, as you say, most people are poor, as quite an instinct of hiding the data. But this is the old approach. ‑‑ is not the best global organisation, as you can easily see in the ‑‑ (?). When I joined the French government ten years ago. Sometimes four different administrations did bid the same data, with mistakes. And they did spend lot of time and money to sell data between administrations of the same government. So it was unuseful, expensive, long.
I discovered because it was expensive sometimes some administration did use very old dataset. Because they did buy it just every four years, for example, to the neighbour. And with the same money. Because it is ‑‑ we are one state. So this is not the best global organisation. And maybe it is not the best strategy.
What I have learned from the digital economy story is that platforms (?) better. You have data, you share this and you become part of the ecosystem. And more ‑‑ (?)
And story of I don't know. Microsoft, Google, Amazon is story of people sharing that data. Not of people hiding that data.
So first, this is natural instinct. But we have to fight it because it is a stupid strategy. To hide your data.
Then regarding the controversies, regarding open source. Yes in France we usually consider that open source is best security approach because you can cheque. You can ‑‑ you can contribute. So if you discover something, you can fix it. That's funny because for example if you have story of European countries. Now everything is converging. But 20 years ago the French private sector did use lot of open source and free software and not the private sector. And Germany was the contrary. German companies did use lot of free software and not the German government.
So you have international stories of course. Depend of your ‑‑
But (?), most public decision maker consider that open source is less expensive. And if it is not ‑‑ because sometimes it does cost, of course. But you will spend your money to pay national workers. Not benefit in Seattle. So it is better use of your public money because you create value in your country. And usually it is less expensive. Better security. And maybe a better democracy. You know in the declaration of human rights, 1789. We said that the government has to be accountable. That every citizen is a right to understand what the government is doing and to cheque if this is a most efficient approach.
Snow most of the governmental actions are made through big and complex systems. If you don't have the right to understand the black box, you are not a perfect democracy. And you have to rely on someone to pretend to make the best but you don't know.
So the mix of cost, security and democracy make that in France this is not a controversy anymore. Most people in the public sector encourage this approach.
If you need a strategy, you did ask for ‑‑ first easy step is about public procurement. I'm not speaking about buying software. I'm speaking about buying services. I remember ten years ago the city of Paris wanted a network of self‑driving cars. But they did write in the procurement, and I will access to every data and I will share it in open data. And the companies didn't want to. They say that is my market. My procurement. If you don't accept, I will take another solution.
So for (?) when you buy a service, when you delegate a public service, just think about writing one clause saying, and I will take the data and I will share the data. That is not so difficult if you have a competitive market.
One thing of course is to explain, to exchange, to build an ecosystem. And to be Frank, I don't think those strategy can be done if you don't have any ecosystem.
It can be an ecosystem of open source software. Can be an ecosystem of start‑up or big tech company. Yo I don't care. But you need to work with Civil Society or private sector. You need to work with outside of the government if you ‑‑ if you cannot rely on competences and energy and innovation and creativity, that is very difficult.
And regarding the (?). So to be precise. We wrote that every software the government develop, we pay for development, as to be open source. It was built on premises of the law of free access to information during the '70s. We decided. So we wrote the citizen is right to ask for every information regarding government action. So how did you pay? Where did your money go? And we did build on this premises. So of course when we buy as I said a consumer project we don't ask for open source. But when we finance the development of the product, when we develop ourselves, this is mandatory.
Regarding the competencies, as you said, this is very often a problem. But you know, you don't really need very, very, very skilled people. Because we are speaking about a simple IT. And sometimes for example, just a funny story. Ten years ago I did job of chief officer for the French government and did hire a great data scientist to fix and to build good public policies. And I did hire a Bryant people and we did help maybe 100 administrations to improve some public policies. And after four years they went to me and they told me, this job is a bit boring. We did just use excel software and integration. Because government has very structured data. And very simple questions. You don't need to make generative AI on big data with big ‑‑ you don't need this to fix 80% of the programmes. If you have simple people with simple software. But very focussed to have an impact. And very (?) build in France the first I.D. system, which is used now by 40 million people every week. So 40 million people is something in France. I did build wit six developers in six months. Price was 600,000 euros. Of course I had decided to buy it to some big companies that you can imagine, would have cost, I don't know, 30 million euros. But when you do it yourself with simple principles, with giant methodology I did mention. So make your first minimum product and improve it. That is not so expensive and you don't need the Nobel prize if I may. You just need good and serious developers.
And maybe one last thing, when we ‑‑ I was there when we did decide. This law, regarding the.
So some people had concerns. We decided to mention a cybersecurity exception. So if the cybersecurity agency say that publishing the code is dangerous, we won't.
Five years ago, it did never happen. They did never find software publishing the code was dangerous. So it was a security to make people comfortable and it was never useful.
>> HELANI GALPAYA: A quick thing. I think this is quite amazing.
Just one little challenge, depending on the structure of your civil service is to attract people with skill to do this kind of development. You need to look at whether the options you have. And particularly in south Asia they can work for a Global IT firm. Usually for 5 to 10 times the government salary. And that is a real incentive problem.
So the way some countries deal with it is to have these other structures like a government‑owned private company that does lot of this IT development, who don't have to abide by government pay scales. And that then suddenly makes it attractive. Someone who wants to do civic tech, public technology, but also isn't compromising and making low government salary.
>> HENRI VERDIER: I can say something because very important. Most of the people I went to work with me did divide their salary by two. But you can have very skilled and dedicated people if you give them a mission and autonomy.
But if you ask them to divide their salary and to obey to (?) and to respect stupid and very complex framework. You have to give them mission. A real mission. Let's fight unemployment. Let's educate. Let's... and, kind of autonomy. And that is why we have to change the way we do democracy. And that is not impossible. And actually lot of countries did it. And more and more, I think. And always with people coming from the private sector. Private also big, important. Open source ecosystem. It can be with Wikipedia, GitHub, (?) community. Linux and ‑‑ it is not always private firms. Outside of the government.
>> CYNTHIA LO: Thank you. And taking a look on our virtual attendees. We have some questions on whether government tools regarding securing data.
And potentially, I think let's start with that first.
If there is any thoughts on that. If not we do have another question as well.
>> MIKE LINKSVAYER: I have a small comment on that might not be directly addressing it but I want to highlight how important basically cybersecurity is for protecting data. There is if you have (?) your data is exposed no matter what other measures you have taken. And I want to kind of tie that back into the previous discussion. I think the, sort of, the idea that open source is more secure because everybody can audit it and see. See exploits and fix them is sort of true but also a little bit of a double‑edged sword and can actually be useful in poll ‑‑ is very pertinent in policy conversations now. Because one analogy is that open source is free but it is also like a free puppy that you have to take care of.
And do to instants like law ‑‑ (?) attention of policy makers has been focussed that open source is part of our societial infrastructure. And it is something that we can't only rely on that developers of individual projects to adequately secure O. there needs to be kind of investment from bunch of stakeholders including governments in making sure that that ability to ‑‑ for everybody to review the code and make fixes is actually acted on. And Germany is really leader in this with this sovereign tech funds. But there are others in the U.S. and open technology fund and kind of others brewing. But I think that is a really important point that potential for open source being more secure actually needs to be actioned and needs coordinated. Coordinated action. And I think ‑‑ and sort of another way that this kind of loops back on itself is that those decisions about where to invest what open source code is actually critical, umm, you know, for power plants, whatever. You actually need data to be able to identify why I make those investments otherwise you are boiling the ocean. So fairly tangential. Just basically cybersecurity. Is just absolutely crucial for protecting data.
>> HENRI VERDIER: You are completely right. Open source creates possibility to cheque. But someone has do it.
I have another funny experience. In France we had interesting free ‑‑ (?) named pharmasoft. And during the covid administration did decided and said publicly I will use pharmasoft. And the people from pharmasaid did yell and contest. I said are you crazy, you really to put one million teacher and 10 million student on my infrastructure without giving me anything? I will die. You have to finals. Infrastructure. Servers. Or you will kill me. And that was funny because it could be seen as a big victory. That ministry, French ministry of education. One of the biggest international administration. Bigger than the red Army. So it could have been seen as a victory but it was a kiss of death. So we have to be serious and to nurture you are and protect and finance this ecosystem. Or we'll kill it. There is no such thing as free software. Free lunch. Someone has to pay a bit.
>> CYNTHIA LO: Thank you. I know we are out of time. But I want to double cheque to see if anybody has any questions in the audience here or online?
All right. Well thank you so much, everybody, for attending. Any concluding thoughts from our speakers here?
Nope, not a problem. Well, thank you so much for everybody to attend this very early morning in Japan session. And we look forward to any other thoughts that you have on open code on development. Thank you.