Introduction to Privacy-Preserving Data Publishing: Concepts by Benjamin C.M. Fung

By Benjamin C.M. Fung

Getting access to high quality information is an important necessity in knowledge-based selection making. yet information in its uncooked shape frequently includes delicate information regarding contributors. offering strategies to this challenge, the equipment and instruments of privacy-preserving information publishing permit the ebook of worthwhile details whereas keeping facts privateness. advent to Privacy-Preserving information Publishing: ideas and strategies offers cutting-edge details sharing and information integration equipment that have in mind privateness and information mining specifications. the 1st a part of the booklet discusses the basics of the sector. within the moment half, the authors current anonymization equipment for conserving info software for particular information mining initiatives. The 3rd half examines the privateness concerns, privateness types, and anonymization tools for life like and demanding information publishing situations. whereas the 1st 3 components specialize in anonymizing relational info, the final half reviews the privateness threats, privateness types, and anonymization tools for complicated information, together with transaction, trajectory, social community, and textual information. This booklet not just explores privateness and data software matters but additionally potency and scalability demanding situations. in lots of chapters, the authors spotlight effective and scalable equipment and supply an analytical dialogue to match the strengths and weaknesses of other options.

Show description

Read or Download Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques (Chapman & Hall CRC Data Mining and Knowledge Discovery Series) PDF

Similar data mining books

Mining of Massive Datasets

The recognition of the net and web trade offers many super huge datasets from which details should be gleaned through facts mining. This e-book specializes in useful algorithms which have been used to unravel key difficulties in information mining and which might be used on even the biggest datasets. It starts off with a dialogue of the map-reduce framework, a major instrument for parallelizing algorithms immediately.

Twitter Data Analytics (SpringerBriefs in Computer Science)

This short presents tools for harnessing Twitter information to find recommendations to complicated inquiries. The short introduces the method of accumulating facts via Twitter’s APIs and gives ideas for curating huge datasets. The textual content provides examples of Twitter information with real-world examples, the current demanding situations and complexities of creating visible analytic instruments, and the simplest thoughts to deal with those matters.

Advances in Natural Language Processing: 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014. Proceedings

This e-book constitutes the refereed court cases of the ninth overseas convention on Advances in average Language Processing, PolTAL 2014, Warsaw, Poland, in September 2014. The 27 revised complete papers and 20 revised brief papers provided have been rigorously reviewed and chosen from eighty three submissions. The papers are equipped in topical sections on morphology, named entity acceptance, time period extraction; lexical semantics; sentence point syntax, semantics, and desktop translation; discourse, coreference answer, computerized summarization, and query answering; textual content category, info extraction and knowledge retrieval; and speech processing, language modelling, and spell- and grammar-checking.

Analysis of Large and Complex Data

This ebook bargains a image of the state of the art in class on the interface among data, machine technology and alertness fields. The contributions span a large spectrum, from theoretical advancements to functional functions; all of them proportion a robust computational part. the themes addressed are from the next fields: statistics and information research; desktop studying and data Discovery; facts research in advertising; information research in Finance and Economics; info research in drugs and the lifestyles Sciences; information research within the Social, Behavioural, and well-being Care Sciences; info research in Interdisciplinary domain names; type and topic Indexing in Library and data technology.

Additional info for Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques (Chapman & Hall CRC Data Mining and Knowledge Discovery Series)

Example text

To prevent linking the records in T to the information on X or Y , the data holder can specify k-anonymity on QID1 = {A, B} and QID2 = {C, D} for T . This means that each record in T is indistinguishable from a group of at least k records with respect to QID1 and is indistinguishable from a group of at least k records with respect to QID2 . The two groups are not necessarily the same. Clearly, this requirement is implied by k-anonymity on QID = {A, B, C, D}, but having k-anonymity on both QID1 and QID2 does not imply k-anonymity on QID.

More generally, we can allow multiple Yi , each representing a subset of values on a different set of attributes, with Y being the union of all Yi . For example, Y1 = {HIV } on T est and Y2 = {Banker} on Job. Such a “value-level” specification provides a great flexibility essential for minimizing the data distortion. 4 (X, Y )-Privacy Wang and Fung [236] propose a general privacy model, called (X, Y )privacy, which combines both (X, Y )-anonymity and (X, Y )-linkability. The general idea is to require each group x on X to contain at least k records and the confidence of inferring any y ∈ Y from any x ∈ X is limited to a maximum confidence threshold h.

Note, confidence bounding is also known as + -diversity in [157]. For example, with QID = {Job, Sex, Age}, QID → HIV, 10% states that the confidence of inferring HIV from any group on QID is no more than 10%. 4, this privacy template is violated because the confidence of inferring HIV is 75% in the group {Artist, F emale, [30 − 35)}. The confidence measure has two advantages over recursive (c, )-diversity and entropy -diversity. First, the confidence measure is more intuitive because the risk is measured by the probability of inferring a sensitive value.

Download PDF sample

Rated 4.55 of 5 – based on 22 votes