Trustworthiness

Overview:

Many large corpora, such as the World Wide Web, databases, and collections of books, contain contradictory assertions. For example, one web page might claim John F. Kennedy was shot by Lee Harvey Oswald acting alone, while another might claim it was a conspiracy under the direction of J. Edgar Hoover. Clearly, these are mutually exclusive possibilities, and while in this case users typically already have a well-formed belief distribution over these possibilities, there are other cases where the absence of expertise makes the decision far harder. If two alternate dates of birth for William Shakespeare (traditionally thought to be April 23rd, 1564 but generally considered unknown) are presented, how can the uninformed user choose between them? A naive system might consider the source: certainly, if one date comes from the Wall Street Journal and the other comes from a single MySpace profile, the user is likely to prefer the former because of a prior assumption of trustworthiness. However, if a thousand MySpace profiles make the same assertion, the user would probably relent: even the Wall Street Journal can make mistakes. Alternatively, if the MySpace profile were linked to a prominent European historian the user knows and respects, trust would flow through this implicit endorsement and the user would again shun the newspaper. Clearly, to be effective, a trust system must look beyond the corpus to incorporate the potentially extensive background knowledge available (both what the user already believes and who the user already trusts).

We view the information trustworthiness problem as sources (such as authors and publishers) producing documents that are sets of assertions, and we must find the appropriate degree of trust in each of these elements. We must also determine exactly what “trustworthiness” means. Our trust in an individual assertion is simplest - merely a degree of belief - but how the trustworthiness of a document or source should be defined is non-obvious. For example, a document may contain fully believed assertions, but present them in such a way that the document as a whole is biased (for example, by listing only the virtues of one political candidate and only the faults of another); consequently, part of our task is to precisely define the factors that contribute to trustworthiness, including accuracy, completeness, and bias, and find a satisfactory way of measuring these.