What is the Unique Names Assumption?
And how does it apply to RDF and SPARQL?
Or: why can't I have aggregate functions in SPARQL? E.g. COUNT?
Comment from Andrew Newman (showed up on planetrdf, but disappeared from blog?):
' ...purists would probably say that it shouldn't be possible to write (SQL) queries that are semantically inconsistent.
To me this would be similar to extending SPARQL with COUNT for example with out resolving the semantic inconsistencies between the RDF model and one that assumes unique names.
Comment from Dan Connolloy in evaluating SPARQL w.r.t an RDF query language survey (5th April 2005):
Count the number of authors of a publication.
unique names assumption.
Whereas Jeen Broekstra responded by saying:
I have trouble with this. I think that defining a counting operation that does not count entities, but simply labels (URIs, bNodes, literals), would be useful. This does not make any unique names assumption, AFAICS. Counting would simply be a way of retrieving the number of results a query would give, without giving the actual result.
This presentation on Equality in logic (Warning: PDF!) describes the UNA as:
In many applications, one makes the assumption that every object has a unique name. This is called the unique names assumption (UNA). The upshot is that a difference in name implies a difference in referent.
σ= τ ⇔ σi = τi
The unique names assumption is not true in general!!!
This document, Ontologies and Formal Statements for ISO/IEC 11179 (19th Sept 2005) says:
...it is necessary to specify whether the unique name assumption holds for a particular ontology.
The unique names assumption (UNA) asserts that different names for objects implies that they are different individuals, i.e., individuals do not have aliases. Such an assumption is used in equality testing. Unique names are called "keys" in database parlance.
The absence of the unique names assumption implies that individuals may have many names and that no inference of distinctness may be drawn by the observation of differing names. On the World Wide Web (WWW) unique names assumptions are not valid, because hosts may have more than one name, and files may have multiple links to them. In many file systems which support multiple links to a file the UNA does not hold. Finally, in email systems a user may have multiple email addresses (identities).
See also Andrew Newmans posting: Why Different Things Are the Same, in which he notes:
So using cardinality restrictions, systems can infer that if you have two values for a property when the ontology defines you should have one, it's not that you've broken cardinality, it's that those values are actually the same/equal. In Paul's example sameAs isn't required - firstPropertyOne and firstPropertyTwo are the same. This seems weird, completely backwards and very non-intuitive.
I'm with Jeen, a mechanism to just count labels would be a useful one. I can (just about!) understand the wider issues for ontologies, reasoning, etc. But for a query language it would just be useful to be able to perform some aggregate operations on the results.
E.g. count number of solutions.