--- abstract: "Social (or folksonomic) tagging has become a very popular way to describe content within Web 2.0 websites. Unlike\r\ntaxonomies, which overimpose a hierarchical categorisation of content, folksonomies enable end-users to freely create and choose the categories (in this case, tags) that best\r\ndescribe some content. However, as tags are informally de-\r\nfined, continually changing, and ungoverned, social tagging\r\nhas often been criticised for lowering, rather than increasing, the efficiency of searching, due to the number of synonyms, homonyms, polysemy, as well as the heterogeneity of\r\nusers and the noise they introduce. To address this issue, a\r\nvariety of approaches have been proposed that recommend\r\nusers what tags to use, both when labelling and when looking for resources. As we illustrate in this paper, real world\r\nfolksonomies are characterized by power law distributions\r\nof tags, over which commonly used similarity metrics, including the Jaccard coefficient and the cosine similarity, fail\r\nto compute. We thus propose a novel metric, specifically\r\ndeveloped to capture similarity in large-scale folksonomies,\r\nthat is based on a mutual reinforcement principle: that is,\r\ntwo tags are deemed similar if they have been associated to\r\nsimilar resources, and vice-versa two resources are deemed\r\nsimilar if they have been labelled by similar tags. We offer an efficient realisation of this similarity metric, and assess its quality experimentally, by comparing it against cosine similarity, on three large-scale datasets, namely Bibsonomy, MovieLens and CiteULike." altloc: [] chapter: ~ commentary: ~ commref: ~ confdates: ~ conference: 'SEKE ’11: 23rd International Conference on Software Engineering and Knowledge ' confloc: ~ contact_email: ~ creators_id: [] creators_name: - family: Quattrone given: Giovanni honourific: '' lineage: '' - family: Ferrara given: Emilio honourific: '' lineage: '' - family: De Meo given: Pasquale honourific: '' lineage: '' - family: Capra given: Licia honourific: '' lineage: '' date: 2011 date_type: published datestamp: 2011-10-01 00:34:59 department: ~ dir: disk0/00/00/76/46 edit_lock_since: ~ edit_lock_until: 0 edit_lock_user: ~ editors_id: [] editors_name: [] eprint_status: archive eprintid: 7646 fileinfo: application/pdf;http://cogprints.org/7646/1/seke2011%2DQuattroneGiovanni.pdf full_text_status: public importid: ~ institution: ~ isbn: ~ ispublished: pub issn: ~ item_issues_comment: [] item_issues_count: ~ item_issues_description: [] item_issues_id: [] item_issues_reported_by: [] item_issues_resolved_by: [] item_issues_status: [] item_issues_timestamp: [] item_issues_type: [] keywords: ~ lastmod: 2011-10-01 00:34:59 latitude: ~ longitude: ~ metadata_visibility: show note: 'ISBN: 978-1-891706-29-5' number: ~ pagerange: 385-391 pubdom: TRUE publication: ~ publisher: ~ refereed: TRUE referencetext: ~ relation_type: [] relation_uri: [] reportno: ~ rev_number: 9 series: ~ source: ~ status_changed: 2011-10-01 00:34:59 subjects: - comp-sci-art-intel succeeds: ~ suggestions: ~ sword_depositor: ~ sword_slug: ~ thesistype: ~ title: Measuring Similarity in Large-Scale Folksonomies type: confpaper userid: 14714 volume: ~