Software Heritage https://www.softwareheritage.org Tue, 12 Mar 2024 12:39:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://www.softwareheritage.org/wp-content/uploads/2015/08/cropped-swh-logo-32x32.png Software Heritage https://www.softwareheritage.org 32 32 Highlights of the 2024 Software Heritage Symposium https://www.softwareheritage.org/2024/03/12/software-heritage-symposium-summit-2024-website/ Tue, 12 Mar 2024 13:00:21 +0000 https://www.softwareheritage.org/?p=38711 We’re excited to share the news that the Software Heritage Symposium 2024 materials are now accessible online: those who couldn’t join us in person for our annual event on February 1st, 2024 will now find video recordings of all sessions and the detailed slide decks from our speakers on the event webpage.

Over the past few years, the Software Heritage Symposium has evolved into a cornerstone event that unites diverse stakeholders from different sectors. The momentum began with the inaugural Symposium in 2021, which celebrated Software Heritage’s 5th anniversary. The momentum continued with a second Symposium in 2023, further solidifying the importance of our mission. Most recently, in February 2024, we convened the 3rd Symposium, reinforcing our commitment to safeguarding the vast knowledge embedded in software source code.

Organized in collaboration with UNESCO, the event provided an opportunity for attendees to explore the varied impacts and applications of software source code archival across different sectors. By bringing together a diverse group of presenters and panelists, the event fostered a rich exchange of ideas and experiences, highlighting the ongoing significance of software source code in society and the importance of preserving our digital legacy. Here, we delve into the key insights and discussions that emerged during this enlightening event.

UNESCO and Inria’s commitment to Software Preservation

Launched in 2016 by Inria, Software Heritage, is a pioneering initiative to collect, preserve, and share all software source code as a common infrastructure at the service of cultural heritagescienceindustry, and society as a whole.

In a significant milestone, UNESCO and Inria joined forces in 2017, with the formal partnership agreement, to collaborate on preserving and disseminating software source code knowledge. This partnership reflects a commitment to safeguarding our digital legacy for future generations, enhancing access to information, and supporting global innovation and education.

Symposium Highlights

 

UNESCO – Paris | © Inria / Photo M. Magnin

The welcome address and opening of the third annual Symposium was launched by Mr Fackson Banda, Chief of the Unit for Documentary Heritage at UNESCO, whose Documentary Heritage Unit is the Secretariat of the Member of the World (MoW) programme, co-host of the event.

In her opening remarks, Ms Marielza Oliveira, Director Communications and Information – Division for Digital Inclusion, Policies and Transformation, UNESCO, emphasized the pivotal role in preserving and sharing the invaluable cultural heritage embedded in software code, “software is the lifeblood of our digital age permeating every aspect of our lives and driving innovation across industries”

Marielza Oliveira at UNESCO – Paris | © Inria / Photo M. Magnin.

Mr Gilles Mathieu, representative, French Ministry of Research and Higher Education, highlighted the ministry’s commitment to open science by advocating for the promotion and preservation of research-generated source code through archiving in Software Heritage.

 

 

Roberto Di Cosmo at UNESCO – Paris | © Inria / Photo M. Magnin

 

Mr Jean-Frédéric Gerbeau, Inria Deputy CEO for Science, reminded the challenges that Software Heritage met in harvesting the software source code, as he puts, arguably more difficult operation than referencing the web itself. Recognized that the Software Heritage archive has truly flourished, emphasizing the recent projects that pave the way to new opportunities.

Reinforcing the role of software and its source code being a fundamental enabler in all human activities and setting the scope for the symposium, Mr Roberto Di Cosmo, Director of Software Heritage, stated:

“Software Heritage is building a universal source code archive as one infrastructure, shared and mutualized across all fields of endeavour, because software is the digital fabric that binds all them together”

 

Following the welcome address, the main topics were discussed in panels and presentations:

  • Empowering Innovation through Software Source Code in Industry and Governments
  • Scientific Challenges in Analyzing and Learning from Preserved Source Code
  • Software Source Code in the Open Science Ecosystem
  • Software Source Code as Documentary Heritage

Industry & Public Administration Panel

Kate Stewart at UNESCO – Paris | © Inria / Photo M. Magnin

The first panel introduced topics from industry and public administration and Industry, moderated by Mr Roberto Di Cosmo and including: Mr Guillaume Avrin, National coordinator for artificial intelligence, Direction Générale des Entreprises, Mr Omar Mohsine, Office of the United Nations Special Envoy on Technology, Mr Marc Palazon, Board member and President of the Open Source commission, Numeum and Ms Kate Stewart, VP Dependable Embedded Systems, Linux Foundation.

Ms Kate Stewart emphasized the importance of reproducibility and the need of an infrastructure for tracking the source of truth, where Software Heritage and the SWHID play a key role. Mr Omar Mohsine shared that the UN engagement toward Open Source by building upon a strategy composed by three pillars: policy, culture change and the Open Source heroes. In France, the public administration understood that Open Source is key and today the biggest market in Europe for Open Source is in France, stated Mr Marc Palazon.

Mr Guillaume Avrin discussed the national strategy for AI and the announcement in 2023 of the extension of the super-computing facility on the Plateau de Saclay.

Guillaume Avrin, Roberto Di Cosmo, Omar Mohsine, Marc Palazon et Kate Stewart at UNESCO – Paris |© Inria / Photo M. Magnin

Scientific Challenges: Rust Analytics for Software Heritage

Sebastiano Vigna at UNESCO – Paris | © Inria / Photo M. Magnin

Mr Sebastiano Vigna presented Rust Analytics for Software Heritage, as well as the graph representation approach with the WebGraph framework. Showing the  results of a collaboration between Télécom Paris and the Università degli Studi di Milano, on a new graph compression framework written in Rust on the Software Heritage history graph. This innovative approach for representing large graphs opened new possibilities with a more predictable performance, and three times faster!  Preparing for the future growth of Software Heritage.

Scientific Challenges: Big Code

Landro v. Werra, Hugging Face and Harm de Vries, Staff research scientist, ServiceNow Paris | © Inria / Photo M. Magnin

Mr Leandro von Werra and Mr Harm de Vries offered insights into the development of large language models for code during their presentation. They showed how the BigCode project that they lead builds the most open and transparent models available today, making available all the data collection, filtering and training pipeline, and providing tools for developers to check whether their code is in the training dataset, to support opt out. They detailed the reasons for establishing a collaboration with Software Heritage: a shared engagement to contribute to a common good. “We partner with Software Heritage  to ensure that the source code used to build the models is accessible and identifiable, enhancing the transparency of our efforts.”

Open Science Panel

The following session included lively discussions by the Open Science Panel, which demonstrated how software source code is not only a tool for the preservation of the world’s software heritage but also an instrument at the service of Open Science emphasizing the importance of Open Source as an enabler for Open Science.

Mr Christopher S. Marcum, Senior Statistician and Senior Science Policy Analyst at the Office of the Chief Statistician of the United

States, shed light on the substantial source code repository maintained by the U.S. Bureau of Statistics (USB) on GitHub. “The big thing here that is relevant for Software Heritage is that by U.S. federal policy, federal agents are required to share their open-source code…” noting that it is a policy of the United States government for federal agencies to share, at a minimum, the metadata associated with their code repositories on code.gov, aligning with executive orders previously mentioned.

Ms Claudia Bauzer Medeiros, Professor at the University of Campinas (UNICAMP), declared that “Software Heritage is a treasure,” while quoting Paul Valéry

It depends on those who pass

Whether I am a tomb or treasure

Claudia Bauzer Medeiros at UNESCO – Paris |© Inria / Photo M. Magnin

Whether I speak  or am silent

The choice is yours alone.

~ Paul Valéry

Reminding us that “unless you understand its nature and reuse it, unless you take advantage of this treasure, we must make it abundantly clear that it’s not a mere repository to be buried as a tomb.” Claudia’ words highlighted the importance of recognizing and actively engaging with the wealth of knowledge preserved within Software Heritage, emphasizing its potential to drive innovation and progress in the digital age.

Katlin Thaney at UNESCO – Paris | © Inria / Photo M. Magnin

Ms Kaitlin Thaney, Executive Director of Invest in Open Infrastructure, encouraged attendees to broaden their perspective on open research during her address, emphasizing the synergies between open research and other areas of innovation. “If you rely on any form of open research in your work in research and development, it’s part of your process,” she asserted. Kaitlin urged participants to consider not only the systems they utilize but also to explore open alternatives and ways to give back. She emphasized the importance of collaboration beyond government and philanthropic support, as they alone are unable to sustain long-term efforts. Kaitlin extended her gratitude and congratulations to Software Heritage for their leadership in this endeavor, concluding her remarks with an invitation for attendees to reflect on these principles.

Mr Bhanu Neupane, Programme Manager for ICT and Sciences and Open Access to Scientific Research at UNESCO, highlighted the need for reliable indicators to gauge the impact of free and open-source software on driving the open science agenda globally. “It’s crucial to develop indicators that member states can use to measure the extent to which free and open-source software contributes to scientific research in their countries”.

The Open Science panel at UNESCO – Paris | © Inria / Photo M. Magnin

Mr Roberto Di Cosmo, CEO of Software Heritage, underscored the universal nature of software during his remarks. “Software is designed for all,” he said. “Not just for specific countries, research areas, or industries, but as a common infrastructure for humanity.” Roberto stressed the importance of communication and collaboration with diverse communities, acknowledging the complexity of the open-source ecosystem. He challenged the notion that open source alone guarantees value, highlighting the need for quality control and decision-making in software development and funding. Roberto outlined Software Heritage’s mission to provide a centralized infrastructure accessible to all, aiming to streamline the multitude of individual repositories into a unified resource for the future.

Software Source code as part of Memory of the World Panel

The last session was a Panel on Software Source code as part of Memory of the World moderated by Mr Fackson Banda, which addressed different aspects of digital cultural heritage and the place of source code the larger ecosystem of cultural heritage preservation and accessibility.

Rosana Lanzelotte at UNESCO – Paris | © Inria / Photo M. Magnin

Ms Rosana Lanzelotte, President of Musica Brasilis, showcased their collection of 6283 free sheet music downloads of Brazilian music scores, emphasizing the need for Music Character Recognition (MCR) software to decode handwritten scores. Aligning with FBG 11.4, Musica Brasilis contributes to preserving cultural heritage by going digital and promoting accessibility while adhering to the FAIR principles. Their collaboration with IICT for digital preservation and interoperable metadata exchange underscores their commitment to open access. Rosana highlighted the adaptability of Musica Brasilis’s web software, which can now support initiatives beyond its initial scope, promoting cooperation and potential contributions to networks like Software Heritage.

Mr Pio Pellizzari, Delegate of IASA, emphasized the importance of digital archiving for preserving access to invaluable audiovisual materials, highlighting the need for comprehensive documentation and collaboration to address evolving preservation needs effectively.

Valérie Schaferat UNESCO – Paris | © Inria / Photo M. Magnin

Ms Valérie Schafer, Professor at C2DH – University of Luxembourg, expressed her enthusiasm for the Software Heritage mission, highlighting its transparent technical processes and the open large community. Furthermore, Valérie quoted Lawrence Lessig’s statement “code is law” and shared the broader implications of code as a cultural artifact, shaping societal discussions around politics, gender, and ideology. She emphasized the need to engage citizens in understanding coding basics and its societal impact, envisioning a future where coding literacy becomes more widespread. Reflecting on her own research in web archives, Valérie explored the potential of hidden layers in archived web pages and praised Software Heritage’s efforts to narrate the history of code, acknowledging the diverse range of contributions from scholars in the field.

Conclusion

UNESCO – Paris | © Inria / Photo M. Magnin.

The Software Heritage 2024 Symposium served as a testament to the collective commitment to preserving our software commons. As we embark on the journey ahead, we would like to take this opportunity to thank UNESCO for their continuous partnership, our sponsors for their support and our community for their engagement. Let us continue to nurture our digital heritage while building a large community, ensuring that future generations have access to the universal source code archive.

 

Didn’t make it to the event? No worries!

Discover the slides to catch up on the presentations, view the snapshots from the day on our event webpage, and watch the sessions online.

Visit our Symposium 2024 webpage!

]]>
Big Data Development and Architecture Engineer https://www.softwareheritage.org/2024/03/01/big-data-development-and-architecture-engineer/ Fri, 01 Mar 2024 09:51:32 +0000 https://www.softwareheritage.org/?p=38897 The Software Heritage project

Software Heritage is a universal software source code archive project, whose aim is to recover, preserve for the very long term and share all publicly available source code, together with its development history (e.g., as stored in version control systems). The Software Heritage archive already contains over 17 billion unique source files and 3.6 billion commits, retrieved from over 266 million software development projects. The Software Heritage initiative, hosted by the Inria Foundation, is an entirely free software (FOSS) and non-profit project.

The Position

We are looking for an experienced Big Data-oriented software engineer. The ideal candidate will have significant interest and experience in large-scale data processing and exploitation architectures, including storage, indexing and retrieval.

You can consult a more detailed list of our current projects on the Software Heritage Roadmap 2023 (https://docs.softwareheritage.org/devel/roadmap/roadmap-2023.html)

Main tasks and activities

– Setting up a data processing architecture (a la Spark)
– Design and modeling of Big Data architectures
– Implementation of solutions based on defined architectures
– Set up Big Data pipelines

Skills

The ideal candidate will have experience in Big Data development and architecture, preferably in an open-source context. We expect self-organization and autonomy skills commensurate with the candidate’s experience. Participation in existing FOSS projects in any capacity (developer, community organizer, technical writer, etc.) is an added advantage.

The following skills are expected:

– Mastery of a large-scale data processing system (e.g. Apache Spark, Flink, or Hadoop)
– Fluent software development skills (basics in Rust and Python)
– Good level of English (written and spoken)
– Use of Git
– Use of continuous integration tools (e.g. Gitlab and/or Jenkins)

Knowledge and experience of the following will be considered an asset:

– Experience in data processing on a scale of tens of terabytes or even petabytes
– Experience with Cassandra and Kafka
– Knowledge of Java
– Knowledge of Kubernetes
– Data visualization

Software Heritage is a complex technical architecture, based on many different technologies, which continues to evolve. We do not expect candidates to master all of them, but rather to be open to discovery and learning. Prior knowledge of one or more of the above-mentioned subjects will help in the process of getting to grips with the project, but we encourage you to apply whatever your level of experience in these technologies.

Working conditions

We are a team of 15 people, including 9 technical staff (5 developers and 4 sysadmins).
Autonomy, transparency and consultation are at the heart of our values (the project is free and open source).

Most of the team is based at the Inria center in Paris, but the position is open to any location in France close to an Inria center (Bordeaux, Lille, Lyon, Grenoble, Rennes, Saclay, Sofia Antipolis, Nancy).

The contract offered by Inria is a 2-year renewable full-time fixed-term contract, with the prospect of a permanent position.
– Telecommuting: 90 days/year (average 2 days per week)
– Vacation: 35 days + 10 days RTT
– Salary range: 30 to 70 k€ depending on profile and experience.

Application

Please send your application (CV + cover letter) to hiring@softwareheritage.org

]]>
Pioneering the Future of Code Preservation and AI with StarCoder2 https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ Wed, 28 Feb 2024 13:08:36 +0000 https://www.softwareheritage.org/?p=38691 Software Heritage’s mission is to collect, preserve, and make the entire body of software source code easily available, especially emphasizing Free and Open Source Software (FOSS) as a digital commons encapsulating decades of human ingenuity. Our journey is dedicated not only to safeguarding this invaluable resource for future generations but also to maximizing its utility to enhance science and software development for the benefit of all.

Challenges and opportunities

The advent of Large Language Models (LLMs) capable of generating code presents both a challenge and an opportunity. The challenge lies in navigating the complex legal and ethical landscapes governing these innovations. To address this, we unveiled our statement on Large Language Models for Code in October 2023, outlining our guiding principles: openness, transparency, and respect for the authors.

The opportunity arises from the potential to make the vast body of knowledge embedded in humankind’s source code more accessible and reusable for a much broader community: this aligns perfectly with our core mission.

A first milestone with StarCoder2

Today, we are thrilled to see a first realization of this opportunity with the introduction of StarCoder2, the first-ever AI model for code developed using the comprehensive source code repository of the Software Heritage archive, and fully aligned with our principles for LLMs for code.

We congratulate Hugging Face, ServiceNow, and NVIDIA for their collaborative efforts to reach this important milestone within the BigCode project, showing commitment to ethical AI development, and advancing technology for the greater good. StarCoder2 is a significant step towards a world where the barriers to software development are lowered, where innovation is fueled by ethically developed AI, and where the digital commons serve as a foundation for future breakthroughs, ensuring that the knowledge derived from decades of software development benefits humanity at large.

Forward Together

As we celebrate this milestone, we are reminded of the ongoing importance of our mission to collect, preserve, and share the wealth of human knowledge embedded in software.

We invite the global community to join us in this exciting new chapter.

Software Heritage is more than just a repository; it signifies a commitment to a future where every line of code adds to our shared legacy and collective advancement. Embark on this journey with us—because every line of code, and every step towards ethical AI, counts.

]]>
Université de Lorraine https://www.softwareheritage.org/2024/02/26/universite-de-loraine/ Mon, 26 Feb 2024 17:59:22 +0000 https://www.softwareheritage.org/?p=38743 Software is often at the heart of the scientific endeavor in most disciplines, all the way from tiny pieces of it to worldwide, decades long, millions of lines projects. Its archival, openness and citability are therefore key stones on the path to open and reproducible research. Following the example of the French national plan for open science, the Université de Lorraine has included Software Heritage as a core component of its open science policy, along with the open software forge gitlab and the national open archive HAL for software development and exposure.

— Nicolas Fressengeas,
Vice President in charge of the digital policy, research data and Open Science

]]>
Partnering in the FAIR-IMPACT Open Call to implement the Research Software Metadata Guidelines https://www.softwareheritage.org/2024/02/15/swh-partnering-fair-impact-open-call/ Thu, 15 Feb 2024 13:00:03 +0000 https://www.softwareheritage.org/?p=38308 We’re happy to share with you a new opportunity emerging from the FAIR-IMPACT European project: the Research Software support offer. Software Heritage is a partner in the FAIR-IMPACT project to increase recognition of research software and improve software curation and metadata standardization in the scholarly ecosystem. The current support offer is open to researchers, practitioners, scholarly infrastructures, institutions and anyone interested in research software.

The FAIR-IMPACT project launched in June 2022, has the role to support and disseminate FAIR-enabling practices, tools and services across scientific communities at a European, national and international level. Software Heritage’s role in the project is to establish guidelines for the collection and the curation of metadata to archive, reference, describe and cite research software. In June 2023, the Research Software MetaData (RSMD) Guidelines were published on Zenodo following a community workshop in March and a community review webinar in May.

Understanding the RSMD Guidelines

The RSMD Guidelines, written by Task 4.3 in the FAIR-IMPACT project, provide a comprehensive framework, offering flexible and adaptable recommendations for end-users across various disciplines and software development contexts. This deliverable encompasses:

  • Introduction of goals, methodology, and use cases
  • State-of-the-art review of existing practices and guidelines
  • Comprehensive analysis of the metadata landscape
  • Proposal for RSMD guidelines to collect and curate research software metadata
  • A clear checklist for researchers

A living version of the document version will be available on the RSMD Guidelines repository.

Moving toward adoption

The journey doesn’t end with the proposal. Task 4.3 is committed to making the RSMD Guidelines normative within the academic community. This involves ongoing engagement with stakeholders, gathering feedback, and incorporating best practices and advancements in metadata management. By establishing these guidelines as a norm, the aim is to promote widespread adoption and adherence, leading to greater standardization and harmonization of metadata practices across research domains.

We offer two support actions that are designed to enhance the FAIRness and impact of research software:

  • Path I: Assessing and improving existing research software using a new extension of F-UJI which implements some of the metrics for automated FAIR software assessment. Successful applicants to this support action will receive 4000 € to support their participation between May-September 2024.
  • Path II: Implementing the Research Software MetaData guidelines for better archiving, referencing, describing, and citing research software artefacts. Successful applicants to this support action will receive 6000 € to support their participation between May-September 2024.

How can I implement the RSMD guidelines in the FAIR-IMPACT Open Call?

This goal can be achieved in many forms with the high-level purpose to better archive, reference, describe and cite research software artifacts.

Participants in path II of the support action will suggest implementation activities in their own resources or contributions to Open Source / Open Science existing projects, such as CodeMeta.

Mentored by software metadata experts, participants will strive to implement the RSMD guidelines, throughout a month-long challenge and will complete it by writing  a detailed implementation story to showcase the adoption of the RSMD guidelines. The outputs will be released in open access via Zenodo.

This support action will consist of four virtual workshops.

  • Introductory session to provide context and background and to introduce scholarly infrastructures and tools to make Research Software a first class output (May);
  • Introduction to the RSMD guidelines and examples of adoption (May);
  • One day sprint to progress planned implementation activity (June);
  • Post-assessment workshop where the Implementation story will be presented (September).

 

How to submit your proposal?

Apply on the Open Call page, after creating an account, you’ll find the form on the grants application dashboard. applications should include:

  • Your name, contact details and organisational affiliation. Where the application is being made on behalf of a group, a nominated lead participant will need to provide these details;
  • The type of organisation you are based at; The country you are based in (Please note that applicants must work in a European Union or Associated Country for the duration of the grant. For a full list of Associated Countries see here);
  • Proposal for implementing the RSMD guidelines (Max. 500 words): What are the objectives of your proposed Research Software MetaData (RSMD) implementation? Additionally, specify how you plan to achieve this objective. A few examples are listed below
    • CodeMeta mapping tools;
    • Import/export tools in a software infrastructure;
    • UI improvements to facilitate software submissions;
    • SWHID exposure on software record;
    • Institutional documentation based on the RSMD guidelines;
    • Contribution to the existing CodeMeta generator or to the CodeMeta community.
      • contributing to the crosswalks
      • contributing to the documentation and to the user/developer guides
  • Impact, adoption and dissemination
    • Describe how your participation in this support action will impact your work/project.
    • Describe how participating in the support action will support the wider adoption and uptake in your field of the tools, methods, and/or solutions employed.
    • Describe how you will disseminate the results of your participation in this support action.

Conclusion

The FAIR-IMPACT open call, with its focus on enhancing research software FAIRness and implementing metadata guidelines, presents a significant opportunity for collaboration and advancement. Together, let’s adopt and implement the RSMD Guidelines and work towards a future where research software is acknowledged as a first class output.

For further inquiries about the RSMD Guidelines or the FAIR-IMPACT project, feel free to reach out to: opencalls[at]fair-impact.eu

Your contributions are invaluable as we strive to build a more open and curated research software landscape.

]]>
Software Heritage in 2023: a perspective https://www.softwareheritage.org/2024/02/01/software-heritage-annual-report-2023/ Thu, 01 Feb 2024 09:01:11 +0000 https://www.softwareheritage.org/?p=37646 As we enter 2024, we publish, as usual, our annual report on the past year, and like last year this is now available as a → standalone document ←, making it easier to grasp the breadth of the mission, follow the progress made and share it with a broader audience.

The start of 2023 witnessed the Software Heritage symposium and summit held at UNESCO’s headquarters in Paris, France. This collaborative event with UNESCO focused on the international conference themed “Software Source Code as documentary heritage and an enabler for sustainable development.” The program extensively delved into five primary dimensions:

  • Understanding software source code as documentary heritage and its role in digital skills education;
  • Considering software source code as a research object in open science;
  • Examining software source code’s impact on innovation and sharing in industry and administration;
  • Discussing long-term preservation perspectives, and
  • Reviewing technological advances in software source code analysis.

UNESCO – Paris | © Inria / Photo B. Fourrier

The event gathered our community, including team members, ambassadors, grantees, partners, and contributors who discussed the Software Heritage Archive and various aspects of its mission. The dedicated blog post offers a summary of the workshop’s key points, and our annual report, presented as a standalone document for the first time, gives an overview of our progress.

We suggest reading UNESCO’s article, , Positioning software source code as digital heritage for sustainable development“, the complete transcript is accessible in PDF format.

The event’s recording is also available online for those who couldn’t attend.

In 2023, we welcomed 10 new ambassadors to our cause, 5 women and 5 men, bringing the count of our team of ambassadors to 33 worldwide. We featured several ambassador articles this year: one by Simon Phipps titled “Open Source ensures code remains a part of culture” advocating for the preservation of software as a cultural element through Open Source, one by Agustin Bethencourt titled “Why did I become a Software Heritage Ambassador?” that delves into the significance of Software Heritage within the industry, and one titled  “Viewpoints on software in research at the Gustave Eiffel University, an interview with Céline Rousselot and Joenio Marques da Costa.

Throughout the year, the ambassador community held two plenary sessions, in close contact with the Software Heritage core team. One key topic has been software metadata, a complex but essential issue, that is detailed in the article  “Deep Dive into the archival of Software Metadata”. A special effort has been made to present the broad lines of the 2023 Software Heritage technical roadmap, that has been published in the first quarter of 2023.

Supporting Open Source

At Software Heritage, we remain committed to advocating for the importance of open-source software and its role in shaping the future of technology. This is why we co-signed an open letter with the Eclipse Foundation on the Cyber Resilience Act. The objective of this new regulation is to ensure the safety and security of our digital infrastructure, including software, but we must make sure that it does not hinder the progress and innovation of open-source software as an unintended side effect. You can read the open letter and learn more about this important topic on the Eclipse Foundation’s website.

Building a collaboration infrastructure

We know that to succeed in the humbling mission we have undertaken we need to enalbe a large community to contribute and collaborate. This year we are happy to report several key adavances in this direction.

We concluded a multi year effort conducted with help by Open Tech Strategies to transition our development and operations from our previous system to our own GitLab instance, that is more familiar for external contributors.

We opened a new documentation landing page at docs.softwareheritage.org to make it easier for newcomers to find their way in the vast amount of documentation available.

We have been working to make it easier for developers to regularly archive their software in Software Heritage by introducing the dedicated save code webhooks in the API for several popular forges and technologies: Bitbucket, Gitea, GitHub, Gitlab and Sourceforge.

Last, but not least, we have introduced a GRaphQL API, that greatly simplifies programmatic access to the archive: users can play with it usint the Software Heritage GraphQL Explorer. This is an addition to the traditional Software Heritage’s REST API that will enable clients to craft robust queries and seamlessly retrieve server data.

SWHID sees growing adoption adn becomes the Software Hash Identifier

A key part of the Software Heritage infrastructure are the persistent identifiers known as SWHID, that allow to guarantee integrity of software artefact without relying on third parties, enabling better scientific reproductibilit.

This year, SWHID adoption has been growing in academia. A close collaboration wich CCSD and IES-INRIA led to opening up SWHID deposit on HAL since January 2023 to all french researchers, massively simplifying the referencing research software in french institutional portals, and the generation of the many reports often requested in an academic career. At an international level, the Computer Graphic Replicability Stamp Initiative (GRSI) now uses Software Heritage to archive software associated to research articles, and uses SWHIDs to reference it: when a code is accepted for the Replicability Stamp, it relies on Software Heritage to create a snapshot of the project and references the accepted version with the corresponding SWHID.

The SWHD identifier has been developed at Software Heritage, where it has been in use in our archive for almost a decade. Since it can be computed independently, and used of a variety of other applications, the time has come to create and independent specification, to ensure that all stakeholders can benefit from it. To this end, after almost two years of intense work an open working group has released the publicly available specification of the SWHID, that is now spelled “Software Hash Identifier” and no longer “Software Heritage Identifier” (pronounce it /ˈswɪd/).

Software Heritage in European Research Projects

At Software Heritage, we have a long tradition of participating to collaborative research project when we can help improve the way research software is archived, referenced, descibed and cited. On the infrastructural side of Open Science, groundbreaking work is ongoing in a dedicated work package in the FAIRCORE4EOSC European project, to connect scholarly infrastructures with the Software Heritage archive. The first visible outcome is the partnership initiated with the swMATH portal to bridge mathematical publications with comprehensive software records, enriching the scholarly landscape. This year, we also contributed to a collaborative effort by two such projects,  FAIR-IMPACT and FAIRCORE4OSC during the RDA P20 plenary in Gothenburg.

Software Heritage in also part of the SoFAIR project, recently awarded through the CHISTERA Open Research Data & Software Call, whose goal is to elevate the discoverability and reusability of open research software, aligning with our commitment to advancing the accessibility of software source code artifacts.

Research on Software Heritage

Campus Cyber – Paris | © Inria / Photo B. Fourrier

Software Heritage is an archive, but also an exceptional infrastructure to enable research on software develoment. This year, we embarked in the SWHSec project, announced during the launch of a new national research and innovation program on cybersecurity – PTCC. This groundbreaking initiative brings together eight expert research teams specializing in security, software engineering, and open-source software to harness the power of Software Heritage’s robust infrastructure and create cutting-edge tools for cybersecurity.

Software Heritage and Large Language Models for Code

We acknowledge the huge potential of the Software Heritage archive for the training of machine learning models, particularly large language models (LLMs) that can automatically generate code to assist with software development tasks. In alignment with our mission, we advocate for a transparent and respectful approach to the development of these models, aligned with our mission, as detailed in our statement for acceptable machine learning use of the Software Heritage archive.

Saving Inria’s software legacy

In the pursuit of safeguarding Inria’s software legacy, we started a collaboration with the Inria alumni network and the Direction of Culture and Scientific Information (DCIS) to reach out to, and invite former individuals who had worked at Inria to participate in enriching the inventory of software heritage created at Inria since its inception.

Leveraging the Software Stories interface, created in 2021 in collaboration with the Science Stories team and the University of Pisa with UNESCO’s support, a first result of this effort is the publication of the story of the web browser and editor Amaya

Software, a pillar of Open Science

Software, and its source code, is a pillar of Open Science, and Software Heritage has been recognized by the Global Sustainability Coalition for Open Science Services (SCOSS) for its key role in ensuring continuous access to software as a research output. We look forward to seeing many new members join the newly created Archives and Libraries Interest Group (ALIG) that will bring together academic stakeholders worldwide.

Thanks to our sponsors

We’re grateful to our sponsors, including our new additions Hugging Face, ServiceNow, and Scanoss: it is their continued support that enables us to make progress in this long term mission.

 

First international mirror

And we finished this intense year with the launch of the first international mirror of the Software Heritage Mirror Network by ENEA, the Italian National Agency for New Technologies, Energy and Sustainable Economic Development.  This is a key milestone in the long-term preservation strategy of all our software commons, and is the result of long years of technical and organisational development efforts that will make it much easier for the other forthcoming mirrors to go into production.

 

Roberto Di Cosmo
Director, Software Heritage

]]>
Join the 2024 Software Heritage Symposium at UNESCO https://www.softwareheritage.org/2024/01/15/join-the-2024-software-heritage-symposium-at-unesco/ Mon, 15 Jan 2024 12:33:02 +0000 https://www.softwareheritage.org/?p=37227 We’re excited to announce the Software Heritage Annual Symposium on February 1st, 2024, hosted at UNESCO Headquarters. Now in its third iteration, this exclusive gathering will focus on the preservation and sharing of software source code, addressing its impact on digital development, scientific innovation, and cultural heritage.

The hybrid event, offering both virtual and on-site participation, will take place from 14:00 to 18:00 and will showcase distinguished speakers and panels discussing key aspects, such as software as documentary heritage, scientific challenges, and its role in open science for both the industry and public administration. The complete program is available on the event page.

To secure your spot, please register for the event using this form. Depending on seat availability, you will receive an invitation.

Stay tuned for updates, and be sure to mark your calendars for a unique exploration of our software heritage.

]]>
Meet our 10th ambassador in 2023, Wendy Hagenmaier https://www.softwareheritage.org/2023/12/28/ambassador-wendy-hagenmaier/ Thu, 28 Dec 2023 13:00:13 +0000 https://www.softwareheritage.org/?p=37268 We are delighted to introduce our 10th ambassador in 2023, Wendy Hagenmaier!

Wendy Hagenmaier is the Software Preservation Program Manager at Yale University Library, where she leads the Emulation-as-a-Service Infrastructure (EaaSI) program to empower the widespread use of emulation for interaction with software, computer systems, collections, and data. The Yale University Library software preservation team is building an emulation infrastructure that cultural heritage and research practitioners from around the world will be able to use to access their historical software and software-dependent content. Source code is integral to the Library’s cultural heritage collections and to the research enterprise of the university.

Wendy has served as Strategic Coordinator for the Software Preservation Network’s Coordinating Committee and seeks to foster collaboration and build alliances among organizations engaged in software preservation and curation. In 2022, she co-authored a white paper on “Supporting Software Preservation Services in Research and Memory Organizations,” based on survey and interview research with practitioners engaged in software preservation activities.

Wendy was delighted to participate in the 2022 SWHAP Days event and 2023 SWHAP Workshop and is grateful for the opportunity to support the Software Heritage mission by becoming an ambassador. She believes software preservation and curation are global challenges that require international cooperation, relationship-building, and understanding.

If you want to contact Wendy or to learn more about our mission, she will be happy to answer you back!

And do not forget! We are looking for enthusiastic organizations and individuals to volunteer as ambassadors to help grow the Software Heritage community. If you too want to become an ambassador, please tell us a bit about yourself and your interest in the mission of Software Heritage.

]]>
SCOSS https://www.softwareheritage.org/2023/12/20/scoss/ Wed, 20 Dec 2023 10:01:15 +0000 https://www.softwareheritage.org/?p=36810 The Global Sustainability Coalition for Open Science Services (SCOSS) board has selected Software Heritage for its 5th pledging round, in November 2023, recognizing Software Heritage as a crucial open science infrastructure ensuring continuous access to the software code outputs generated by researchers worldwide.

SCOSS Members, libraries, archives, institutions and research funders supporting open science can make a difference by committing to fund Software Heritage. Pledge an annual donation for three years, offering a secure financial foundation and access to the dedicated Software Heritage ALIG.

Discover the detailed pledging program in the section dedicated to the 5th pledging round on the official SCOSS website.

]]>
FAIRCORE4EOSC https://www.softwareheritage.org/2023/12/20/faircore4eosc/ Wed, 20 Dec 2023 09:59:38 +0000 https://www.softwareheritage.org/?p=36930 The FAIRCORE4EOSC project focuses on the development and realisation of core components for the European Open Science Cloud (EOSC), launched in June 2022. Software Heritage is leading a full work package, named Services and tools to archive, reference, describe and cite research software, following the recommendations and plan defined by the SIRS report. It will develop Research Software APIs and Connectors to ensure the long-term preservation in the Software Heritage archive of research software in different disciplines, establish a mirror of the Software Heritage universal source code archive for the EOSC, and ensure regular archival in Software Heritage of core EOSC code hosting platforms. It will also design curation mechanisms to support quality metadata, in particular for citations, and contribute to standardisation for software metadata and identifiers.

To achieve these objectives we are collaborating with a panel of partners including, CERN (InvenioRDM, Zenodo), LZI (Dagstuhl), FIZ (swMath), GRNET, OpenAire, KNAW-DANS (DataVerse), IES-Inria (episcience), GWDG and DataCite.

FAIRCORE4EOSC - CSC Company Site

]]>