Restricted isometry property: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Qetuth
m more specific stub type + intro link
 
No edit summary
Line 1: Line 1:
Аρrès ce commencement νachement fantastique, lɑ ѵoici qui dérοbе le pieu ɗe сet obsédé sexսel pour pomper telle une obsédée, une folle explique à un guss qu'elle apprécie le déchirage brutal. Matez le film sexy et ça c'est seulement pour vous espèce Ԁe coquins, cette femme légère reste en manque comme une  сhaudassе! Tоut compte fait еlle est parfaite pour niquer. C'est en réalіté une authentique orgіe de verges pοur une cгoqueuse. Un tringleur gère pas mal l'exρloration bien hardϲore, cette allumeuse aimeгait te montrer qu'elle kiff vidanger deѕ zоbis. Regardeƶ diгectement la scène  ardente gratos et faites vous surtout plaisir en ѵous masturbant l'andouillette. Après avoir léché sans compter, lɑ revoici qui se fait énergiquement truffer sa chatte. Tu ne seras pas caƿable de repouѕser le désir de te branler. Juste apгès avoir léché plus que dе гaison, la voilà qui a le privilègе Ԁe ѕe faіre labourer sa chatte façon hardcore.   Triquante aνec ses énoгmes flotteurs, cеtte polissonne déрaѕse les lіmites en acceptant un tronchage de moule. Voici une vidéo  fougueuse  à sauvеgarder sous le matelas.<br><br>When you loved this informatiѵe article and ƴou wish to receive more details with regaгds to [http://www.belle-sodomie.com/ dame sexy] gеneroսsly vіsit our webpage.
'''Differential privacy''' aims to provide means to maximize the accuracy of queries from statistical databases while minimizing the chances of identifying its records.
 
== Situation ==
 
Consider a trusted party that holds a dataset of sensitive information (''e.g.'' medical records, voter registration information, email usage) with the goal of providing global, statistical information about the data publicly available, while preserving the privacy of the users whose information the data set contains. Such a system is called a [[statistical database]].
The notion of ''[[Indistinguishability (cryptography)|indistinguishability]]'',<ref name="Dwork, McSherry 2006"/> later termed ''Differential Privacy'',<ref name="Dwork, ICALP 2006">Dwork, ''ICALP'' 2006.</ref> formalizes the notion of "privacy" in statistical databases.
 
== &epsilon;-differential privacy ==
 
The actions of the trusted server are modeled via a randomized
algorithm <math>\mathcal{A}\,\!</math>.
A randomized algorithm <math>\mathcal{A}\,\!</math> is
<math>\epsilon\,\!</math>-differentially private if for all datasets
<math>D_{1}\,\!</math> and
<math>D_{2}\,\!</math> that differ on a single element (i.e., data of one person), and all
<math>S\subseteq \mathrm{Range}(\mathcal{A})\,\!</math>,
 
:<math>\Pr[\mathcal{A}(D_{1})\in S]\leq
e^{\epsilon}\times\Pr[\mathcal{A}(D_{2})\in S]\,\!</math>
 
where the probability is taken over the coins of the algorithm and <math>\mathrm{Range}(\mathcal{A})\,\!</math> denotes the output range of the algorithm <math>\mathcal{A}\,\!</math>.
 
'''N.B.:''' Differential Privacy is a condition on the release mechanism and not on the dataset.
 
This means that for any two datasets which are close to one another (that is, which differ on a single element) a given differentially private algorithm <math>\mathcal{A}\,\!</math> will behave approximately the same on both data sets. The definition gives a strong guarantee that presence or absence of an individual will not affect the final output of the query significantly.
 
For example, assume we have a database of medical records <math>D_{1}\,\!</math> where each record is a pair ('''Name''','''X'''), where <math>X\in\{0,1\}\,\!</math> denotes whether a person has diabetes or not. For example:
 
{| border="1" align="center"
|-
!  Name !! Has Diabetes (X)
|-
|  Ross
||  1
|-
|  Monica
||  1
|-
| Joey
||  0
|-
|  Phoebe
||  0
|-
|  Chandler
||  1
|}
 
Now suppose a malicious user (often termed an ''adversary'') wants to find whether Chandler has diabetes or not. As a side information he knows in which row of the database Chandler resides. Now suppose the adversary is only allowed to use a particular form of query <math>Q(i)\,\!</math> which returns the partial sum of first <math>i\,\!</math> rows of column <math>X\,\!</math> in the database. In order to find Chandler's diabetes status the adversary simply executes <math>Q(5)-Q(4)\,\!</math>. One striking feature this example highlights is: individual information can be compromised even without explicitly querying for the specific individual information.
 
Let us take this example a little further. Now we construct
<math>D_{2}\,\!</math> by replacing (Chandler,1) with (Chandler,0). Let us
call the release mechanism (which releases the output of
<math>Q(i)\,\!</math>) as <math>\mathcal{A}\,\!</math>. We say
<math>\mathcal{A}\,\!</math> is <math>\epsilon\,\!</math>-differentially
private if it satisfies [[#&epsilon;-differential privacy|the definition]], where <math>S\,\!</math> can be thought of as a singleton
set (something like <math>\{3.5\},\{4\}\,\!</math> etc.) if the output
function of  <math>\mathcal{A}\,\!</math> is a Discrete Random Variable
(i.e. has a '''probability mass function (pmf)'''); else if it is a
Continuous Random Variable (i.e. has a '''probability density
function (pdf)'''), then <math>S\,\!</math> can be thought to be a small
range of reals (something like
<math>3.5\leq\mathcal{A}(D_{1})\leq3.7\,\!</math>).
 
In essence if such an <math>\mathcal{A}\,\!</math> exist then a particular
individual's presence or absence in the database will not alter the
distribution of the output of the query by a significant amount and
thus assures privacy of individual information in an information
theoretic sense.
 
== Motivation ==
In the past, various ad-hoc approaches to anonymizing public records have failed when researchers managed to identify personal information by linking two or more separately innocuous databases. Two well-known instances of successful "Linkage Attacks" have been the [[Netflix Prize|Netflix Database]] and the '''Massachusetts Group Insurance Commission (GIC) medical encounter
database'''.
 
=== Netflix Prize ===
[[Netflix]] has offered $1,000,000 prize for a 10% improvement in its recommendation system. Netflix has also released a training dataset for the competing developers to train their systems. While releasing this dataset they had provided a disclaimer: ''To protect customer privacy, all personal information identifying individual customers has been removed and all customer ids have been replaced by randomly assigned ids.''
Netflix is not the only available movie rating portal on the web; there are many others, including [[IMDB]]. On IMDB individuals can register and rate movies and they have the option of not keeping their details anonymous. [http://www.cs.utexas.edu/~arvindn/ Narayanan] and [http://www.cs.utexas.edu/~shmat/ Shmatikov] linked the Netflix anonymized training database with the [[IMDB]] database (using the date of rating by a user) to partially de-anonymize the Netflix training database.<ref>Arvind Narayanan, Vitaly Shmatikov. Robust De-anonymization of Large Sparse Datasets. In ''IEEE Symposium on Security and Privacy'' 2008, p. 111–125.</ref> Thus, it is clear that the individual information of a user was compromised.
 
=== Massachusetts Group Insurance Commission (GIC) medical encounter database ===
In this case<ref name="Dwork, ICALP 2006"/> [[Latanya Sweeney]] from Carnegie Mellon University linked the anonymized GIC database (which retained the birthdate, sex, and ZIP code of each patient) and voter registration records to identify the medical record of the governor of Massachusetts.
 
=== Sensitivity ===
 
Getting back on the main stream discussion on Differential Privacy,
the sensitivity <ref name="Dwork, McSherry 2006">Dwork, McSherry, Nissim and Smith, 2006.</ref> (<math>\Delta f\,\!</math> ) of a function <math>f:\mathcal{D}\rightarrow\mathbb{R}^{d}\,\!</math> is
 
:<math>\Delta f=\max_{D_1,D_2} \lVert f(D_1)-f(D_2) \rVert_{1}\,\!</math>
 
for all <math>D_1\,\!</math>,<math>D_2\,\!</math> differing in at most one element, and <math>D_{1},D_{2}\in\mathcal{D}\,\!</math>.
 
To get more intuition into this let us return to the example of the medical database and a query <math>Q(i)\,\!</math> (which can also be seen as the function <math> f \,\!</math> ) to find how many people in the first <math>i\,\!</math> rows of the database have diabetes. Clearly, if we change one of the entries in the database then the output of the query <math>Q(i)\,\!</math> will change by at most one. So, the sensitivity of this query is one. It so happens that there are techniques (which are described below) using which we can create a differentially private algorithm for functions with low sensitivity.
 
==Trade-off between utility and privacy==
 
A trade-off between the accuracy of the statistics estimated in a privacy-preserving manner, and the privacy parameter &epsilon;. This trade-off is studied in <ref name="Ghosh, Roughgarden, Sundararajan 2009"/> and
.<ref name="Brenner, Nissim 2010"/>
 
== Laplace noise ==
 
Many differentially private algorithms rely on adding controlled noise<ref name="Dwork, McSherry 2006"/> to functions with low sensitivity. We will elaborate this point by taking a special kind of noise (whose kernel is a [[Laplace distribution]] i.e. the probability density function <math>\text{noise}(y)\propto \exp(-|y|/\lambda)\,\!</math>, mean zero and standard deviation <math>\lambda\,\!</math>). Now in our case we define the output function of <math>\mathcal{A}\,\!</math> as a real valued function (called as the transcript output by <math>\mathcal{A}\,\!</math>) <math>\mathcal{T}_{\mathcal{A}}(x)=f(x)+Y\,\!</math>, where <math>Y \sim \text{Lap}(\lambda)\,\!\,\!</math> and <math>f\,\!</math> is the original real valued query/function we plan to execute on the database. Now clearly <math>\mathcal{T}_{\mathcal{A}}(x)\,\!</math> can be considered to be a continuous random variable, where
 
:<math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math>
 
which is at most <math>e^{\frac{|f(D_{1})-f(D_{2})|}{\lambda}}\leq e^{\frac{\Delta(f)}{\lambda}}\,\!</math>. We can consider <math>\frac{\Delta(f)}{\lambda}\,\!</math> to be the privacy factor <math>\epsilon\,\!</math>. Thus <math>\mathcal{T}\,\!</math> follows a differentially private mechanism (as can be seen from [[#&epsilon;-differential privacy|the definition]]). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have <math>\mathcal{A}\,\!</math> as the <math>\epsilon\,\!</math>-differential private algorithm we need to have <math>\lambda=1/\epsilon\,\!</math>.
Though we have used Laplacian noise here but we can use other forms of noises which also allows to create a differentially private mechanism, such as the Gaussian Noise (where of course a slight relaxation of the definition of differential privacy <ref name="Dwork, ICALP 2006"/> is needed).
 
== Composability ==
 
=== Sequential composition <ref name="McSherry, SIGMOD '09">McSherry, SIGMOD 2009 (Theorem 3 and 4).</ref>  ===
 
If we query an ε-differential privacy mechanism <math>t</math> times, and the randomization of the mechanism is independent for each query, then the result would be <math>\epsilon t</math>-differentially private. In the more general case, if there are <math>n</math> independent mechanisms: <math>\mathcal{M}_1,\dots,\mathcal{M}_n</math>, whose privacy guarantees are <math>\epsilon_1,\dots,\epsilon_n</math> differential privacy, respectively, then any function <math>g</math> of them: <math>g(\mathcal{M}_1,\dots,\mathcal{M}_n)</math> is <math>(\sum\limits_{i=1}^{n} \epsilon_i)</math>-differentially private.
 
=== Parallel composition <ref name="McSherry, SIGMOD '09"/> ===
 
Furthermore, if the previous mechanisms are computed on ''disjoint'' subsets of the private database then the function <math>g</math> would be <math>(\max_i \epsilon_i)</math>-differentially private instead.
 
== Group privacy ==
 
In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if '''one''' particular participant submitted his information. However this is also extendable if we want to protect databases differing in <math>c</math> rows, which amounts to adversary with arbitrary auxiliary information can know if '''<math>c</math>''' particular participants submitted their information. This can be achieved because if <math>c</math> items change, the probability dilation is bounded by <math>\exp ( \epsilon c )</math> instead of <math>\exp ( \epsilon )</math>,<ref name="Dwork, ICALP 2006"/> i.e. for D<sub>1</sub> and D<sub>2</sub> differing on <math>c</math> items:
 
:<math>\Pr[\mathcal{A}(D_{1})\in S]\leq
\exp(\epsilon c)\times\Pr[\mathcal{A}(D_{2})\in S]\,\!</math>
 
Thus setting ε instead to <math>\epsilon/c</math> achieves the desired result (protection of <math>c</math> items). In other words, instead of having each item ε-differentially private protected, now every group of <math>c</math> items is ε-differentially private protected (and each item is <math>(\epsilon/c)</math>-differentially private protected).
 
=== Proof idea ===
 
For three datasets D1, D2, and D3, such that D1 and D2 differ on one item, and D2 and D3 differ on one item (implicitly D1 and D3 differ on at most 2 items), the following holds for an ε-differentially private mechanism <math>\mathcal{A}</math>:
 
<math>\Pr[\mathcal{A}(D_{1})\in S] \leq \exp(\epsilon)\times\Pr[\mathcal{A}(D_{2})\in S]\,\!</math>,
and
<math>\Pr[\mathcal{A}(D_{2})\in S] \leq \exp(\epsilon)\times\Pr[\mathcal{A}(D_{3})\in S]\,\!</math>
 
hence:
 
<math>\Pr[\mathcal{A}(D_{1})\in S] \leq \exp(\epsilon)\times ( \exp(\epsilon)\times\Pr[\mathcal{A}(D_{3})\in S]) = \exp(2 \epsilon)\times\Pr[\mathcal{A}(D_{3})\in S] \,\!</math>
 
The proof can be extended to <math>c</math> instead of 2.
 
=== Stable transformations ===
A transformation <math>T</math> is <math>c</math>-stable if the hamming distance between <math>T(A)</math> and <math>T(B)</math> is at most <math>c</math>-times the hamming distance between <math>A</math> and <math>B</math> for any two databases <math>A,B</math>. Theorem 2 in <ref name="McSherry, SIGMOD '09"/> asserts that if there is a mechanism <math>M</math> that is <math>\epsilon</math>-differentially private, then the composite mechanism <math>M\circ T</math> is <math>(\epsilon \times c)</math>-differentially private.
 
This could be generalized to group privacy, as the group size could be thought of as the hamming distance <math>h</math> between
<math>A</math> and <math>B</math> (where <math>A</math> contains the group and <math>B</math> doesn't). In this case <math>M\circ T</math> is <math>(\epsilon \times c \times h)</math>-differentially private.
 
==See also==
*[[Quasi-identifier]]
*[[Exponential mechanism (differential privacy)]] – a technique for designing differentially private algorithms
 
==Notes==
{{Reflist|refs=
<ref name="Ghosh, Roughgarden, Sundararajan 2009">
A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In Proceedings of the 41st annual ACM Symposium on Theory of Computing, pages 351–360. ACM New York, NY, USA, 2009.</ref>
<ref name="Brenner, Nissim 2010">
H. Brenner and K. Nissim. Impossibility of Differentially Private Universally Optimal Mechanisms. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2010.
</ref>
}}
 
==References==
* [http://research.microsoft.com/en-us/projects/DatabasePrivacy/sensitivity.pdf Calibrating Noise to Sensitivity in Private Data Analysis] by Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith In Theory of Cryptography Conference (TCC), Springer, 2006.
* Differential Privacy by Cynthia Dwork, International Colloquium on Automata, Languages and Programming (ICALP) 2006, p.&nbsp;1–12.
* Frank D. McSherry. 2009. [http://research.microsoft.com/pubs/80218/sigmod115-mcsherry.pdf Privacy integrated queries: an extensible platform for privacy-preserving data analysis]. In Proceedings of the 35th SIGMOD international conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 19-30. DOI= [http://doi.acm.org/10.1145/1559845.1559850 10.1145/1559845.1559850]
 
==External links==
* [http://research.microsoft.com/apps/pubs/default.aspx?id=74339 Differential Privacy: A Survey of Results] by Cynthia Dwork, Microsoft Research April 2008
* [http://video.ias.edu/csdm/dynamicdata Privacy of Dynamic Data: Continual Observation and Pan Privacy] by Moni Naor,  Institute for Advanced Study November 2009
* [http://www.cerias.purdue.edu/news_and_events/events/security_seminar/details/index/j9cvs3as2h1qds1jrdqfdc3hu8 A Practical Beginner's Guide To Differential Privacy] by Christine Task, Purdue University April 2012
* [http://blog.myplaceinthecrowd.org/2011/04/27/the-cdp-private-map-maker-v0-2/ Private Map Maker v0.2] on the Common Data Project blog
 
[[Category:Theory of cryptography]]
[[Category:Data privacy]]

Revision as of 03:20, 25 January 2014

Differential privacy aims to provide means to maximize the accuracy of queries from statistical databases while minimizing the chances of identifying its records.

Situation

Consider a trusted party that holds a dataset of sensitive information (e.g. medical records, voter registration information, email usage) with the goal of providing global, statistical information about the data publicly available, while preserving the privacy of the users whose information the data set contains. Such a system is called a statistical database. The notion of indistinguishability,[1] later termed Differential Privacy,[2] formalizes the notion of "privacy" in statistical databases.

ε-differential privacy

The actions of the trusted server are modeled via a randomized algorithm 𝒜. A randomized algorithm 𝒜 is ϵ-differentially private if for all datasets D1 and D2 that differ on a single element (i.e., data of one person), and all SRange(𝒜),

Pr[𝒜(D1)S]eϵ×Pr[𝒜(D2)S]

where the probability is taken over the coins of the algorithm and Range(𝒜) denotes the output range of the algorithm 𝒜.

N.B.: Differential Privacy is a condition on the release mechanism and not on the dataset.

This means that for any two datasets which are close to one another (that is, which differ on a single element) a given differentially private algorithm 𝒜 will behave approximately the same on both data sets. The definition gives a strong guarantee that presence or absence of an individual will not affect the final output of the query significantly.

For example, assume we have a database of medical records D1 where each record is a pair (Name,X), where X{0,1} denotes whether a person has diabetes or not. For example:

Name Has Diabetes (X)
Ross 1
Monica 1
Joey 0
Phoebe 0
Chandler 1

Now suppose a malicious user (often termed an adversary) wants to find whether Chandler has diabetes or not. As a side information he knows in which row of the database Chandler resides. Now suppose the adversary is only allowed to use a particular form of query Q(i) which returns the partial sum of first i rows of column X in the database. In order to find Chandler's diabetes status the adversary simply executes Q(5)Q(4). One striking feature this example highlights is: individual information can be compromised even without explicitly querying for the specific individual information.

Let us take this example a little further. Now we construct D2 by replacing (Chandler,1) with (Chandler,0). Let us call the release mechanism (which releases the output of Q(i)) as 𝒜. We say 𝒜 is ϵ-differentially private if it satisfies the definition, where S can be thought of as a singleton set (something like {3.5},{4} etc.) if the output function of 𝒜 is a Discrete Random Variable (i.e. has a probability mass function (pmf)); else if it is a Continuous Random Variable (i.e. has a probability density function (pdf)), then S can be thought to be a small range of reals (something like 3.5𝒜(D1)3.7).

In essence if such an 𝒜 exist then a particular individual's presence or absence in the database will not alter the distribution of the output of the query by a significant amount and thus assures privacy of individual information in an information theoretic sense.

Motivation

In the past, various ad-hoc approaches to anonymizing public records have failed when researchers managed to identify personal information by linking two or more separately innocuous databases. Two well-known instances of successful "Linkage Attacks" have been the Netflix Database and the Massachusetts Group Insurance Commission (GIC) medical encounter database.

Netflix Prize

Netflix has offered $1,000,000 prize for a 10% improvement in its recommendation system. Netflix has also released a training dataset for the competing developers to train their systems. While releasing this dataset they had provided a disclaimer: To protect customer privacy, all personal information identifying individual customers has been removed and all customer ids have been replaced by randomly assigned ids. Netflix is not the only available movie rating portal on the web; there are many others, including IMDB. On IMDB individuals can register and rate movies and they have the option of not keeping their details anonymous. Narayanan and Shmatikov linked the Netflix anonymized training database with the IMDB database (using the date of rating by a user) to partially de-anonymize the Netflix training database.[3] Thus, it is clear that the individual information of a user was compromised.

Massachusetts Group Insurance Commission (GIC) medical encounter database

In this case[2] Latanya Sweeney from Carnegie Mellon University linked the anonymized GIC database (which retained the birthdate, sex, and ZIP code of each patient) and voter registration records to identify the medical record of the governor of Massachusetts.

Sensitivity

Getting back on the main stream discussion on Differential Privacy, the sensitivity [1] (Δf ) of a function f:𝒟d is

Δf=maxD1,D2f(D1)f(D2)1

for all D1,D2 differing in at most one element, and D1,D2𝒟.

To get more intuition into this let us return to the example of the medical database and a query Q(i) (which can also be seen as the function f ) to find how many people in the first i rows of the database have diabetes. Clearly, if we change one of the entries in the database then the output of the query Q(i) will change by at most one. So, the sensitivity of this query is one. It so happens that there are techniques (which are described below) using which we can create a differentially private algorithm for functions with low sensitivity.

Trade-off between utility and privacy

A trade-off between the accuracy of the statistics estimated in a privacy-preserving manner, and the privacy parameter ε. This trade-off is studied in [4] and .[5]

Laplace noise

Many differentially private algorithms rely on adding controlled noise[1] to functions with low sensitivity. We will elaborate this point by taking a special kind of noise (whose kernel is a Laplace distribution i.e. the probability density function noise(y)exp(|y|/λ), mean zero and standard deviation λ). Now in our case we define the output function of 𝒜 as a real valued function (called as the transcript output by 𝒜) 𝒯𝒜(x)=f(x)+Y, where YLap(λ) and f is the original real valued query/function we plan to execute on the database. Now clearly 𝒯𝒜(x) can be considered to be a continuous random variable, where

pdf(𝒯𝒜,D1(x)=t)pdf(𝒯𝒜,D2(x)=t)=noise(tf(D1))noise(tf(D2))

which is at most e|f(D1)f(D2)|λeΔ(f)λ. We can consider Δ(f)λ to be the privacy factor ϵ. Thus 𝒯 follows a differentially private mechanism (as can be seen from the definition). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have 𝒜 as the ϵ-differential private algorithm we need to have λ=1/ϵ. Though we have used Laplacian noise here but we can use other forms of noises which also allows to create a differentially private mechanism, such as the Gaussian Noise (where of course a slight relaxation of the definition of differential privacy [2] is needed).

Composability

Sequential composition [6]

If we query an ε-differential privacy mechanism t times, and the randomization of the mechanism is independent for each query, then the result would be ϵt-differentially private. In the more general case, if there are n independent mechanisms: 1,,n, whose privacy guarantees are ϵ1,,ϵn differential privacy, respectively, then any function g of them: g(1,,n) is (i=1nϵi)-differentially private.

Parallel composition [6]

Furthermore, if the previous mechanisms are computed on disjoint subsets of the private database then the function g would be (maxiϵi)-differentially private instead.

Group privacy

In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if one particular participant submitted his information. However this is also extendable if we want to protect databases differing in c rows, which amounts to adversary with arbitrary auxiliary information can know if c particular participants submitted their information. This can be achieved because if c items change, the probability dilation is bounded by exp(ϵc) instead of exp(ϵ),[2] i.e. for D1 and D2 differing on c items:

Pr[𝒜(D1)S]exp(ϵc)×Pr[𝒜(D2)S]

Thus setting ε instead to ϵ/c achieves the desired result (protection of c items). In other words, instead of having each item ε-differentially private protected, now every group of c items is ε-differentially private protected (and each item is (ϵ/c)-differentially private protected).

Proof idea

For three datasets D1, D2, and D3, such that D1 and D2 differ on one item, and D2 and D3 differ on one item (implicitly D1 and D3 differ on at most 2 items), the following holds for an ε-differentially private mechanism 𝒜:

Pr[𝒜(D1)S]exp(ϵ)×Pr[𝒜(D2)S], and Pr[𝒜(D2)S]exp(ϵ)×Pr[𝒜(D3)S]

hence:

Pr[𝒜(D1)S]exp(ϵ)×(exp(ϵ)×Pr[𝒜(D3)S])=exp(2ϵ)×Pr[𝒜(D3)S]

The proof can be extended to c instead of 2.

Stable transformations

A transformation T is c-stable if the hamming distance between T(A) and T(B) is at most c-times the hamming distance between A and B for any two databases A,B. Theorem 2 in [6] asserts that if there is a mechanism M that is ϵ-differentially private, then the composite mechanism MT is (ϵ×c)-differentially private.

This could be generalized to group privacy, as the group size could be thought of as the hamming distance h between A and B (where A contains the group and B doesn't). In this case MT is (ϵ×c×h)-differentially private.

See also

Notes

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

References

External links

  1. 1.0 1.1 1.2 Dwork, McSherry, Nissim and Smith, 2006.
  2. 2.0 2.1 2.2 2.3 Dwork, ICALP 2006.
  3. Arvind Narayanan, Vitaly Shmatikov. Robust De-anonymization of Large Sparse Datasets. In IEEE Symposium on Security and Privacy 2008, p. 111–125.
  4. Cite error: Invalid <ref> tag; no text was provided for refs named Ghosh, Roughgarden, Sundararajan 2009
  5. Cite error: Invalid <ref> tag; no text was provided for refs named Brenner, Nissim 2010
  6. 6.0 6.1 6.2 McSherry, SIGMOD 2009 (Theorem 3 and 4).