Semantic Web and data model

Summary

Data and the semantic web

The Data project has to be placed in the context of our move towards open data. This approach has been defined by the W3C, regarding the “semantic web” or “linked data”.

This is about structuring resources in order to make them reusable by machines in a better way. The Data project uses data which have been created in various formats such as InterMarc for the main catalogue, XML-EAD for archives inventories and Dublin Core for the digital library.

This data is restructured, grouped, enriched by automatic processing and published according to the semantic web's descriptive model, RDF. The result is available on this site in several RDF syntaxes (XML, N3, NT) as well as in JSON and JSON-LD.

Part of the data is matched with external value vocabularies: id.loc.gov for languages and nationalities, DCMI type for document types.
They are also matched with data sets that are identified by VIAF, Idref, Wikidata, etc. Finally, data from Rameau theme pages are aligned with other thesauri : LCSH, DnB, BNE, Agrovoc, Geonames, Thesaurus W

The Bibliothèque nationale de France provides:

  • URIs for resources: all resources have permanent identifiers, granted via the ARK process which is the way to find all resources of the library.
  • for each resource, a set of metadata associated with the resource's URI in the form of RDF triplets, using linked open data technologies. This metadata can be retrieved for each page (export) and for the entire database (dump). They can also be queried via a Sparql console.

Testing the IFLA LRM model

The data model of Data is based on the IFLA LRM (Library Reference Model), the conceptual reference model for structuring catalogue data in libraries, defined by the International Federation of Library Associations and Institutions (IFLA).

Published in 2017, this model combines and replaces three previous models : FRBR (which concerned bibliographic records), FRAD (authority records) and FRSAD (subject authority records).

IFLA LRM defines a set of entities (selected for their relevance to the user), endowed with attributes, and linked by properties. This entity-relationship model has been designed to be transposable to semantic web technologies.

To find out more about this model, see the National Bibliographic Transition Programme website..

Schéma complet IFLA-LRM

Data does not use the IFLA LRM model in its entirety, but provides a means of navigating the relationships between entities. The various pages of the site (see a brief presentation of their content on the page What can you ask data?) reflect several entities of the model:

  • the work pages give access to information about the work as such and also allow you to enter the OEMI tree (Work, Expression, Manifestation, Item). the first three levels are set out in the RDF of the pages, the item can only be apprehended in the form of the digital version of a document held in the BnF's collections, when this is available. In the html pages themselves, the expressions can be identified indirectly by applying the language filter, which makes it possible to distinguish the language of the events listed under "Editions".
  • Entities of "Agent" type are represented in the "Authors" pages under their two sub-classes, that of natural persons on the one hand, and that of organisations on the other. A person can be the author of a work (a link then exists between the "author" page and the corresponding "work" page) or a contributor to an expression (preface writer, translator, librettist, etc.). In RDF data, the relationship between an author and a resource will be expressed at the work level if the person is the author of the work (author of the original text, composer, director); or at the expression level if they have produced a translation, an interpretation (in music), etc. The notion of author expressed at the level of the work will in any case be repeated at the level of the expression.
  • All the entities in the IFLA LRM model are likely to be the subject of a work, but theme pages have a more restricted scope: they are constructed from Rameau authority records, the language used for indexing at BnF.

The Data project data model (new window)

Ontologies and vocabularies used

Using widely available ontologies

Reuse of existing vocabularies has been favoured to promote interoperability, in particular :

rdf

https://www.w3.org/TR/rdf-syntax-grammar/

rdfs

https://www.w3.org/TR/rdf-schema/

skos

http://www.w3.org/2004/02/skos/core

dcterms

https://dublincore.org/specifications/dublin-core/dcmi-terms/#section-2

foaf

http://xmlns.com/foaf/0.1/

rdaregistry

http://rdaregistry.info/Elements/

The following vocabularies are also used:

bibo

https://www.dublincore.org/specifications/bibo/bibo/

bio

https://vocab.org/bio/

dbpedia

http://mappings.dbpedia.org/index.php/Main_Page

dc

https://dublincore.org/specifications/dublin-core/dcmi-terms/#section-3

dcmi-box

https://www.dublincore.org/specifications/dublin-core/dcmi-box/

dcmitype

https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#section-7

frbr-rda

http://metadataregistry.org/schema/show/id/14.html

geo

https://www.w3.org/2003/01/geo/wgs84_pos#

geonames

http://www.geonames.org/ontology#

go

http://geneontology.org/docs/ontology-documentation/

ign

http://data.ign.fr/def/topo/20190212.htm

rdagroup1elements

http://rdvocab.info/Elements/

rdagroup2elements

http://rdvocab.info/ElementsGr2/

Embedded data to boost referencing

The HTML pages pages on data.bnf.fr are open to the Web, and can be accessed directly by Internet users using search engines.
This is why, in addition to the traditional methods of indexing the home page, three types of data are embedded to structure the site's pages:

  • Schema.org, which provides a vocabulary for adding information to HTML content in a microdata format, making it easier for the major search engines to find it.
  • JSON-LD which is a structured metadata format readable by search engines.

This is a very simple vocabulary, set up to encode in RDFa some of the metadata that will be retrieved when the user adds the resource to their Facebook profile. In the header of the HTML page, the following metadata are integrated using META tags :

  • og:title (page title)
  • og:description (description of page content)
  • og:type (type of resource described: author and book)
  • og:url (Page URL)
  • og:image (URL of image illustrating the page)
  • og:author (for "Work" pages, the author's name)

BnF's own ontology and vocabularies

The bnf-onto ontology

Certain properties and classes can only be expressed by an ontology specific to BnF: bnf-onto. To publish its ontologies, BnF has chosen a uniform namespace: https://data.bnf.fr/ontology https://data.bnf.fr/ontology.

BnF's own vocabularies

The repositories specific to BnF are declared at the following address: https://data.bnf.fr/vocabulary.

List of vocabularies :