Re: [Geopriv] location obscuring

From: Jorge Cuellar ^lt;jrcuellarj@googlemail.com>
Date: Thu Aug 26 2010 - 07:49:10 EDT

Fine! I still have two questions:

1. Is it safe to assume that the algorithm that generates
the reported locations has a memory? It seems that the
algorithm checks somehow, (it doesn't matter how) if we are
still fine with the last reported location we still report
it, and if not, we report a new location. It would be good
if we could rely on this: you will be giving much more
information away if you did not remember what your last
reported location was.

2. Good random number generation is not easy. Shall we say
something about it? (Most hashes are fine: but hashes of
what?)

-----------------------------------------------------------
I would prefer an algorithm that tells that I am in Berlin
(say a circle with center in the center of Berlin) than one
that gives each time a different circle with a center
randomly distributed around my real location. The averages
(plus some clustering algorithms) of those values would
provide a high precision on the places I visit in Berlin.

> The problem is that the (real) location can vary
> slightly, or you might move a little. If it does vary
> slightly, even if this is well below the obfuscation
> distance, then the random number is completely altered.
> I haven't found a good way around this particular
> problem. Anything you do to stabilize the
> direction/length of the shift vector only leads to the
> ability to learn what the shift vector is.

Thus I need one algorithm that chooses out of my location a
"landmark" that is a good approximation of my location (and
of all locations in a neighborhood).

The idea is: if I am "clearly closer" to a landmark: choose it.
If not, choose the two closest landmarks. Then I am "clearly
between those two landmarks". I will choose anyone
either memoryless with prob 1/2 or with a bias toward my
previously reported location.

Consider the following algorithm sketch:

Divide the problem in two steps:

1. What are the locations I will be willing to have a
possible reported locations? The centers of those possible
reported locations are called "landmarks".

2. Given a real measurement, how to choose a landmark and
therefore a reported location?

1st step:

Given (input) a reported uncertainty that one wants to
meet, first associate to them a grid G which is a set of
points ("landmarks" or "possible reported centers") and a
set A (atlas) of regions (say, circles, the "possible
reported locations"). Notice that the associations:

  uncertainty --> G = grid = Set of "landmarks" or
                            "possible reported locations"

  uncertainty --> A = atlas = Set of "possible reported
                             locations"

are independent of any measured location. It is possible to
publish, for given value ranges of uncertainty, the list G
of landmarks (or, even better, the list of "blocks", see
below).

An atlas is a set of regions that cover all possible places
(locations with uncertainty 0). I would restrict the
possible set of atlases by saying that the have to have a
certain degree of coverage (or degree of overlap). Although
not necessarily every point has to be in several regions of
the atlas, the overlaps between neighboring regions has to
be relatively large. This will be discussed later in more
detail.

This can be done in many ways. For instance:

a. the "possible reported centers" could be just the points
in a grid in some geographic coordinate system (say,
latitude and longitude) with a given degree of precision
(significant digits). For instance: represent the latitude
and longitude of a point as a number between 0 and 1 in
base two (more convenient than base 10). Consider first all
points with 1 decimal, than all with two decimal places,
than all with three, etc. For each point considered (this
is one "possible reported center") create the corresponding
"possible reported locations" (which can be, say, a
circle). You continue adding points until you get the degree
of coverage wanted.

b. You can choose the grid points to be the centers of
"large" cities, towns, or other important landmarks in
maps. Again, we call those points in the maps "landmarks".
Assume you have an ordering of landmarks by importance
(say: size of the city in habitants). Given a certain
targeted uncertainty, create G and A as follows: add a
landmark as a "possible reported center" and the
corresponding "possible reported locations" as long as you
need it to cover the area in question with the expected
degree of coverage.

For many applications I myself would choose in Germany the
reported uncertainty to be, say, 100 or 300 km and the
"possible reported centers" to be the center of some "large
cities".

2nd step:

Now: given a measured location, a grid G and an atlas A,
the question is how to choose the region R in the atlas
that will be used as reported location.

You can do it like this:
if the measured location is in only one region R, report
this as the location.

Else, determine c1 and c2 the two closest landmarks, i.e,
"possible reported centers": at measured distances d1 and
d2. If d1 is "clearly smaller" than d2, choose c1, and
report the region centered at c1. "Clearly smaller" means
that the quotient d1/d2 is less than 0.8 (or other some
positive number, smaller than 1, to be chosen carefully a
priori). Similarly for d2/d1<1.3. If d1 and d2 are "about
the same", choose c1 with probability 1/2, c2 with
probability 1/2. In this case we say that the point is
"clearly between the two landmarks c1 and c2.

The idea is that each point is either "close to one
landmark" or "between two landmarks". This partitions the
whole space into connected (even: convex) "blocks" (let us
call them so). Thus all points in one block are all clearly
closer to the same landmark or clearly between two given
landmarks.

What do you think?
-- Jorge

On Tue, Aug 24, 2010 at 7:38 AM, Thomson, Martin
<Martin.Thomson@andrew.com> wrote:
> Hi Jorge,
>
> My email was a little loose in its use of language, so I can understand how you might have trouble understanding this.
>
> More inline...
>
>> -----Original Message-----
>> From: Jorge Cuellar [mailto:jrcuellarj@googlemail.com]
>> Sent: Tuesday, 24 August 2010 1:14 AM
>> To: Thomson, Martin; rjsparks@nostrum.com; Hannes.Tschofenig@gmx.net;
>> geopriv@ietf.org
>> Subject: Re: [Geopriv] location obscuring
>>
>> Hi Martin, Hannes, Robert,
>>
>> I would like to understand you discussions on "location
>> obscuring".  First, let me ask you a couple of things :
>>
>> 1. What is your vocabulary / notation?? Assume there is no
>> uncertainity, first. You have one "real location" (?) and two
>> circles, a "trigger circle" and a "reported circle" (obfuscated
>> circle? obfuscated location?). The centres of those two
>> circles ("trigger centre / reported centre" ??) are, I guess,
>> different (if not, I see some problems).
>
> There are three locations:
>
>  - the original location ("real" might be a little too strong here)
>  - the reported location
>  - the trigger location
>
> The first two probably have uncertainty associated with them (imagine a circle).  The trigger location doesn't have uncertainty in the same way, but can also be described with a corresponding circle enclosing it.  Trigger and reported are different (though there is a slim chance that they are the same...).
>
>> The vectors:
>>
>>   trigger centre  - real location   and
>>   reported centre - real location
>>
>> let us call them "shift in reporting" / "shift for trigger" have
>> lenght (or absolute value or norm) "distance to trigger centre"
>> / "distance to reported centre". The direction (angle to
>> some "x-axis" of those vectors are the "Direction of the shift
>> in reporting" and "Direction of the shift for trigger".
>
> That's a good description.
>
>>
>> I read the formula:
>>
>>   Direction = 2 * pi * random
>>
>> as an angle (random is a random number on [0,1]). And
>>
>>   direction * distance
>>
>> as a vector (in radians, of lenght distance at an angle direction).
>
> Correct; I was using shorthand - to the detriment of clarity :(
>
>> 2. Have you thought about the following problem? Suppose each day
>> I commute from home to work. They are sufficiently apart so that
>> the 2 "reported circles" have to be disjoint. Suppose I turn off
>> my device during the trip, but torn it on as soon as I changed my
>> location. All algorithms you have been discussing, as far as I
>> see, provide every evening, when I come back home, a
>> new "reported circle" centred around home plus a vector of fixed
>> lenght with random direection. True?? Thus if anyone sees my
>> provided locations all evenings he will know rather fast where
>> exacly I live (by averaging the random shifts).
>
> That's one that didn't factor into this, though perhaps it should have.  Though this relies on the assumption that each night you are going to the same place, that's a pretty easy conclusion to come to.
>
> One of the original ideas was to seed a pseudorandom number with the (real) location.  A hash function would be a good substitute.  This would produce the same shift vector for the same location.
>
> The problem is that the (real) location can vary slightly, or you might move a little.   If it does vary slightly, even if this is well below the obfuscation distance, then the random number is completely altered.  I haven't found a good way around this particular problem.  Anything you do to stabilize the direction/length of the shift vector only leads to the ability to learn what the shift vector is.
>
>> 3. In the formula:
>>
>>   Distance = sqrt(random(0, max(0, distance - existing uncertainty)))
>>
>> you have twice distance (one with capitals): is that intended?
>> Later you use only the non-capitalized one (?)
>
> That's a mistake, the second is the obfuscation distance, the first is the resulting shift.
>
>> 4. I did not understand this:
>> This is the same as below, except: (exept what??? and notation below?)
>>
>>     reallyFuzz: function(cu) {
>>         var wobble = Math.max(0, this.distance - cu.uncertainty);
>>         var centre = this.randomize(cu.centroid, wobble);
>>         var unc = Math.max(cu.uncertainty, this.distance);
>>         this.fuzzed = new GeoShape.Circle(centre, unc);
>>         this.triggerPoint = this.randomize(centre, this.distance/2);
>>     }
>
> This is a replacement for the original Javascript code I used to describe the algorithm.  The complete (original) is here:
>
>  <http://www.ietf.org/mail-archive/web/geopriv/current/msg08630.html>
_______________________________________________
Geopriv mailing list
Geopriv@ietf.org
https://www.ietf.org/mailman/listinfo/geopriv
Received on Thu, 26 Aug 2010 13:49:10 +0200

This archive was generated by hypermail 2.1.8 : Thu Aug 26 2010 - 07:49:28 EDT