The two year project will first
Posted: Thu Jul 10, 2025 8:52 am
Archived web data from IA’s general web collections will be used in the project. Because translations are particularly scarce for Icelandic, Croatian, Norwegian, and Irish, the IA will also use customized internal language classification tools to prioritize and extract data in these languages from archived websites in its collections.
The partnership expands on IA’s ongoing effort to provide computational research services to large-scale data mining projects focusing on open-source technical developments for furthering the public good and open access to information and data. Other recent collaborations include providing web data for assessing the state of local online news whatsapp lead nationwide, analyzing historical corporate industry classifications, and mapping online social communities. As well, IA is expanding its work in making available custom extractions and datasets from its 20+ years of historical web data. For further information on IA’s web and data services, contact webservices at archive
focus on OSF Registrations data and expand to include other open access materials hosted on OSF. Later stage work will test interoperable approaches to sharing subsets of this data with other preservation networks such as LOCKSS, AP Trust, and individual university libraries. Together, IA and COS aim to lay the groundwork for seamless technical integration supporting the full lifecycle of data publishing, distribution, preservation, and perpetual access.
“I think our presentation experience has until now not been as much of a focus as our gathering of materials from different sources,” Cheng said. “So now we are really trying to take time and check with our users, finding out who’s using the site and what they need. And we’re trying to present better experiences for exploring, consuming and searching for content.”.
The partnership expands on IA’s ongoing effort to provide computational research services to large-scale data mining projects focusing on open-source technical developments for furthering the public good and open access to information and data. Other recent collaborations include providing web data for assessing the state of local online news whatsapp lead nationwide, analyzing historical corporate industry classifications, and mapping online social communities. As well, IA is expanding its work in making available custom extractions and datasets from its 20+ years of historical web data. For further information on IA’s web and data services, contact webservices at archive
focus on OSF Registrations data and expand to include other open access materials hosted on OSF. Later stage work will test interoperable approaches to sharing subsets of this data with other preservation networks such as LOCKSS, AP Trust, and individual university libraries. Together, IA and COS aim to lay the groundwork for seamless technical integration supporting the full lifecycle of data publishing, distribution, preservation, and perpetual access.
“I think our presentation experience has until now not been as much of a focus as our gathering of materials from different sources,” Cheng said. “So now we are really trying to take time and check with our users, finding out who’s using the site and what they need. And we’re trying to present better experiences for exploring, consuming and searching for content.”.