Available Diagnosis Keys will soon be linkable across 12 days
See original GitHub issueDescribe the bug
@pithumke I agree with the comments by @markuspi in https://github.com/corona-warn-app/cwa-server/issues/108#issuecomment-648497115:
In the near future a scenario where
- user A uploads 13 Diagnosis Keys (I believe this is the maximum the Google/Apple API will release, please correct me if I’m wrong), and
- user B uploads 1 Diagnosis Key (because it’s a new user)
within the time frame that can trigger a new package of 140 keys from version/v1/diagnosis-keys/country/DE/date/{date}/hour/{hour}
will become extremely probable.
If someone has collected RPIs on up to 12 days that were generated on user A’s device, it will be trivial to prove without doubt that up to 12 Diagnosis Keys belong to the same device, and therefore also to prove without doubt that those RPIs were advertised by the same device.
Expected behaviour
cwa-server should prevent that multiple Diagnosis Keys can be linked without doubt to the same device
Steps to reproduce the issue
POC: https://github.com/mh-/diagnosis-keys/blob/master/lib/count_users.py use e.g. like this:
curl https://svc90.main.px.t-online.de/version/v1/diagnosis-keys/country/DE/date/2020-06-23 --output 2020-06-23.zip
./parse_keys.py -l -d 2020-06-23-hour-17.zip -u
This Python script will follow the standard Transmission Risk Level profile = [5, 6, 8, 8, 8, 5, 3, 1, 1, 1, 1, 1, 1, 1] through the list of TEKs, and in the scenario above, [8, 8, 8, 5, 3, 1, 1, 1, 1, 1, 1, 1] will be unique.
Possible Fix
There are multiple options, e.g.
- once the max. usage time of the published app approaches 14 days, increase the
shifting-policy-threshold
- pad with a random number of random keys
- …
Additional context
-
At the moment this is mainly problematic because of the
hour
endpoint. Thedate
endpoint would automatically aggregate more uploads as there are multiple key submissions each day. But when the COVID-19 crisis ramps down, the scenario above also could happen once per day only. -
As I stated before, I do not request that you mix the TEKs of 10 users. For me it makes no difference if you mix the TEKs of 2, 5, 10, or 20 users: In practice they will come from single devices spread all over Germany, and linking the RPIs will be possible with high probability because of the geolocation where the RPIs are recorded. What you should prevent is linking them without doubt.
-
From previous comments I’m not sure if there is a misunderstanding how the mobile apps use the backend. The “Epidemiological Motivation of the Transmission Risk Level” document mentions 4 possible cases how the Transmission Risk Level could be generated by the apps, and I understand that you may want to prepare the backend for all cases. But in reality, currently only case 4 is used, which allows for trivial backtracing through the list of distributed Diagnosis Keys.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:13 (4 by maintainers)
Top GitHub Comments
Agree with @mh- about this issue in academic sense, but I think in practice this may by very weak or non-issue, considering the following:
Considering (in my view) practical non-preventability of this attack regardless of padding / shifting strategies, and the fact that 1 day tracking time-frame will always be possible anyway (regardless of how sophisticated solution not to link Diagnosis Keys is), I would suggest not to introduce any solutions which delay notification of exposure status (like shifting Diagnosis Keys to next package). If what is described by @mh- can be solved “for free” then it’s fine, but if it would result if even 1 person getting notification later I think it would not be a pragmatic approach.
I added a
--multiplier
option to the script now, <del>but yes, it’s manual, no automatic detection.</del> and because automatic detection isn’t that difficult either (as long as the value isn’t random per key), I also added the--auto-multiplier
option. This will not work properly for the first download batch, where submissions with different multipliers will be included, but will work with a very high chance of guessing the correct multiplier for subsequent batches.