Shainer's SiteRandom stuff I find interesting or I am working on. At some point I will move all the content from my old website, giudoku.sourceforge.net, here.
http://shainer.github.io/
Sun, 22 Oct 2017 15:20:17 +0000Sun, 22 Oct 2017 15:20:17 +0000Jekyll v3.6.0The Chinese remainder theorem (with algorithm)<p>Let me preface by saying that you could potentially write a dozen blog posts with all the
implications and mathematical connections that I saw involving the <a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem">Chinese remainder theorem</a>.
That being said, I am going to focus on a basic description and how to implement it.</p>
<p>Crypto enthusiasts will have understood that this post comes directly from set 8
of the crypto challenges. I believe all things considered I spent more time producing
a viable implementation of this theorem than on the rest of the challenge combined.</p>
<h2 id="the-theorem">The theorem</h2>
<p>Let me write the following set of k equations:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = a1 (mod n1)
...
x = ak (mod nk)
</code></pre></div></div>
<p>This is equivalent to saying that <code class="highlighter-rouge">x mod ni = ai</code> (for i=1…k). The notation above is
common in group theory, where you can define the group of integers modulo some number
n and then you state equivalences (or <em>congruence</em>) within that group.</p>
<p>So x is the unknown; instead of knowing x, we know the remainder of the division
of x by a group of numbers. If the numbers ni are pairwise coprimes (i.e. each one
is coprime with all the others) then the equations have exactly one solution. Such
solution will be modulo N, with N equal to the product of all the ni.</p>
<p>For some notes on the history and the reason it was named the Chinese theorem
refer to Wikipedia (or dozen other websites for math); it is quite interesting :)</p>
<h2 id="proof">Proof</h2>
<p>There are many ways to prove this theorem. Most of them are directly related to
the algorithms we are going to present below to compute the solution. I picked
the proof that I found more immediate to understand; it will be employed
in the Gauss algorithm.</p>
<p>Let’s define a slightly simpler problem, where we have only two equations.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = a1 (mod n1)
x = a2 (mod n2)
</code></pre></div></div>
<p>As above, let’s define <code class="highlighter-rouge">N = n1 * n2</code>.</p>
<p>Let’s define <code class="highlighter-rouge">p = n1^-1 (mod n2)</code> and <code class="highlighter-rouge">q = n2^-1 (mod n1)</code>. This is the
operation called <strong>modular inverse</strong>, where we find the inverse of a number in
the group of numbers mod N. If I say that p and n1 are inverse in mod n2, this
means that <code class="highlighter-rouge">p * n1 = 1 (mod n2)</code>. Such inverse will only exist when n1 and
n2 are coprimes, and here they are by definition.</p>
<p>Now I claim that a solution y to the set of equations can be expressed as:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y = a1 * q * n2 + a2 * p * n1 (mod N)
</code></pre></div></div>
<p>this is a valid solution because <code class="highlighter-rouge">y = a1 * q * n2 = a1 (mod n1)</code> and
<code class="highlighter-rouge">y = a2 * p * n1 = a2 (mod n2)</code>. This follows from the definition of the
modular inverse telling me that <code class="highlighter-rouge">p * n1 = 1 (mod n2)</code> and
<code class="highlighter-rouge">q * n2 = 1 (mod n1)</code>.</p>
<p>This is easily extendible to a generic number of equations, where the final
construction of y becomes:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y = sum(ai * (N / ni) * invmod(N / ni, ni)
</code></pre></div></div>
<p>When building p and q before, we used only n1 or only n2; what that generalizes
to is the product of all moduli excluding the “current one”: the <code class="highlighter-rouge">N / ni</code>
above. The rest is unchanged.</p>
<p>So we have a solution: the next step is to prove it is the unique solution. Let’s
assume a second solution z exists for the same set of equations. Then <code class="highlighter-rouge">z = a1 (mod n1)</code>,
which implies that <code class="highlighter-rouge">z - y</code> is a multiple of n1, since the remainder of their division
by n1 is the same number. By the same reasoning, <code class="highlighter-rouge">z - y</code> is also a multiple of n2.
But since n1 and n2 are coprimes, then it would also be a multiple of N, or as it
is often written:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>z = y (mod N)
</code></pre></div></div>
<p>z must be the same as y in the mod N group.</p>
<h2 id="algorithm-1-gauss-algorithm">Algorithm 1: Gauss algorithm</h2>
<p>This is quite easy: it is a direct translation to code of the construction
explained above. The n and a parameters are lists with all the related factors
in order, and N is the product of the moduli.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">ChineseRemainderGauss</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">a</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">n</span><span class="p">)):</span>
<span class="n">ai</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">ni</span> <span class="o">=</span> <span class="n">n</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">bi</span> <span class="o">=</span> <span class="n">N</span> <span class="o">//</span> <span class="n">ni</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">ai</span> <span class="o">*</span> <span class="n">bi</span> <span class="o">*</span> <span class="n">invmod</span><span class="p">(</span><span class="n">bi</span><span class="p">,</span> <span class="n">ni</span><span class="p">)</span>
<span class="k">return</span> <span class="n">result</span> <span class="o">%</span> <span class="n">N</span>
</code></pre></div></div>
<p>The good thing about this algorithm is that the result is guaranteed to be
positive, given bi and ni both positive. This does not apply to the next
implementation.</p>
<p>For an implementation of <code class="highlighter-rouge">invmod</code> (finding the modular inverse), see next
section.</p>
<h2 id="algorithm-2-euclid">Algorithm 2: Euclid</h2>
<p>This is the <em>direct construction</em> procedure described by <a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem#Existence_.28direct_construction.29">Wikipedia</a>.</p>
<p>The extended Euclidean algorithm is used to find two coefficients a and b such
that <code class="highlighter-rouge">a * (N / ni) + b * ni = gcd(N / ni, ni) = 1</code>.</p>
<p>Then x is computed the following way:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = sum(ai * b * (N / ni)) for i=1...k
</code></pre></div></div>
<p>Translated into code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">ChineseRemainderEuclid</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">a</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">n</span><span class="p">)):</span>
<span class="n">ai</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">ni</span> <span class="o">=</span> <span class="n">n</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">_</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">si</span> <span class="o">=</span> <span class="n">ExtendedEuclid</span><span class="p">(</span><span class="n">ni</span><span class="p">,</span> <span class="n">N</span> <span class="o">//</span> <span class="n">ni</span><span class="p">)</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">ai</span> <span class="o">*</span> <span class="n">si</span> <span class="o">*</span> <span class="p">(</span><span class="n">N</span> <span class="o">//</span> <span class="n">ni</span><span class="p">)</span>
<span class="k">return</span> <span class="n">LeastPositiveEquivalent</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
</code></pre></div></div>
<p>As you can see for my specific application, I wanted only positive results; but
the si coefficients can be negative in a lot of cases, making the final sum
negative. What do I do in that case? What I did there is to multiply the result
by -1, then add N and take the remainder of the division by N, to wrap around
the modulus of the solution.</p>
<h3 id="extended-euclidean-algorithm">Extended Euclidean algorithm</h3>
<p>As explained above, the algorithm takes two numbers, x and y, and returns two
coefficients a and b such that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a * x + b * y = gcd(a, b)
</code></pre></div></div>
<p>The implementation returns both the coefficients and the GCD itself.</p>
<p>Now if I take two positive integers x and y, I know I can express them as</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = qy + r
</code></pre></div></div>
<p>where q is the <em>quotient</em> of the division (i.e. <code class="highlighter-rouge">q = x // y</code> where // denotes
the integer division) and r is the <em>remainder</em> and is always strictly smaller
than y. If x is a multiple or y, of course r is going to be zero.</p>
<p>The GCD of two integers can be found by repeating this procedure until the
remainder is 0; more specifically:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = q1 * y + r1
q1 = q2 * r + r2
...
</code></pre></div></div>
<p>The final r before getting to 0 is the GCD. Let’s see this with an example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcd(102, 38)
102 = 2*38 + 26
38 = 1*26 + 12
26 = 2*12 + 2
12 = 6*2 + 0
</code></pre></div></div>
<p>so the GCD is 2. Now to find the coefficients we work backwards from the
second-to-last division, expressing the new remainder in terms of the other
parts:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2 = 26 - 2*12
2 = 26 - 2*(38 - 1*26) = 26 - 2*(38 - 1*(102 - 2*38))
2 = 3*102 - 8*38
</code></pre></div></div>
<p>3 and -8 are the coefficients in the Bezout identity. To compute them in
practice we do not work backward, but simply store them as we go, as they
can be derived from the main division equation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">ExtendedEuclid</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
<span class="k">while</span> <span class="n">y</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">q</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">y</span><span class="p">),</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span> <span class="o">%</span> <span class="n">y</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span> <span class="o">=</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x0</span> <span class="o">-</span> <span class="n">q</span> <span class="o">*</span> <span class="n">x1</span>
<span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="n">y1</span><span class="p">,</span> <span class="n">y0</span> <span class="o">-</span> <span class="n">q</span> <span class="o">*</span> <span class="n">y1</span>
<span class="k">return</span> <span class="n">a</span><span class="p">,</span> <span class="n">x0</span><span class="p">,</span> <span class="n">y0</span> <span class="c"># gcd and the two coefficients</span>
</code></pre></div></div>
<h3 id="modular-inverse">Modular inverse</h3>
<p>Ok so let’s suppose I want to find</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>17*x = 1 (mod 43)
</code></pre></div></div>
<p>my unknown x is the <strong>modular inverse</strong> of 17 in mod 43. First we need to
verify that gcd(17, 43) is 1, otherwise the inverse does not exist. Once we
have done that, we compute the two Bezout coefficients as shown above. If we
work that out manually by retracing all the divisions, we get:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Forward:
43 = 17*2 + 9
17 = 9*1 + 8
9 = 8*1 + 1 # <-- my GCD is here, so it is 1
Backward:
1 = 9 - 8*1
8 = 17 - 9*1
9 = 43 - 17*2
Replacing the first "backward" equation with everything else:
1 = 43 - 17*2 - 17 + 43 - 17*2 = 2*43 - 17*5
</code></pre></div></div>
<p>So we have expressed this in terms of <code class="highlighter-rouge">a*x + b*y = gcd(x, y)</code>: 2 and -5
are our Bezout coefficients. Let’s rewrite the last equation in mod 43.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2*43 - 5*17 = 1 (mod 43)
</code></pre></div></div>
<p>But 2*43 is a multiple of 43 so it is irrelevant here; this leaves us with:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-5*17 = 1 (mod 43)
</code></pre></div></div>
<p>By comparing this with the starting equation in the unknown x, -5 is the inverse
we were looking for. However we need a positive result: in this case we can do
the same procedure we have done earlier, by changing -5 to 5, then adding 43 and
computing the mod 43 to remain in the group. This yields 38.</p>
<p>The code is quite simple:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">invmod</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">m</span><span class="p">):</span>
<span class="n">g</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">ExtendedEuclid</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">m</span><span class="p">)</span>
<span class="k">if</span> <span class="n">g</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">'modular inverse does not exist'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">x</span> <span class="o">%</span> <span class="n">m</span>
</code></pre></div></div>
<h2 id="references">References</h2>
<p>Several websites contributed to me getting to the bottom of this interesting
theorem and provided the numerical examples I use above:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem">Wikipedia page</a></li>
<li><a href="https://crypto.stanford.edu/pbc/notes/numbertheory/crt.html">Stanford university explanation</a></li>
<li><a href="https://www.di-mgt.com.au/crt.html">More details on the Gauss algorithm</a></li>
<li><a href="https://www.youtube.com/watch?v=fz1vxq5ts5I">Tutorial video on the modular inverse computation</a></li>
</ul>
Sun, 22 Oct 2017 15:30:00 +0000
http://shainer.github.io/crypto/math/2017/10/22/chinese-remainder-theorem.html
http://shainer.github.io/crypto/math/2017/10/22/chinese-remainder-theorem.htmlcryptomathIntroduction to ring signatures<p>Well finally I got around to reading <a href="https://cryptoservices.github.io/cryptography/2017/07/21/Sigs.html">this article about confidential transactions</a>. Recommended if you are interested in how some
security guarantees in cryptocurrency transactions can be enforced. However, thanks to that article I also learned something
about <a href="https://en.wikipedia.org/wiki/Ring_signature">ring signatures</a>, so in this post I will talk about that.</p>
<p>Ring signatures are a special type of <strong>group signature</strong>. In group signatures you have (you guessed it!) a group of signers;
each of them owns a keypair. Messages are signed by the group in a way that the receiver can verify that the signature
was generated by the group, but cannot uncover which signer specifically made the signature (a property named <em>signer ambiguity</em>).</p>
<p>However group signatures have a “weakness”: the group needs centralized management. There must be a manager that allows new
members in the group, and can revoke their anonymity if they want to; this is because they must know the private keys of each member
and therefore they can also reveal it publicly without “endangering” themselves directly. <strong>Ring signatures</strong>, presented for the first time
in 2001, remove this need. Any user can build a set that includes themselves, and then produce a signature
using its own private key and the public keys of the other members. Signer ambiguity is respected, therefore the verifier
cannot tell which of the set members was the signature’s author. However notice that since the user only needs the public key
of another member, he/she can use them in a ring without the permission (or even the knowledge) of the key owners.</p>
<p>The original paper is aptly named “<a href="https://link.springer.com/content/pdf/10.1007%2F3-540-45682-1_32.pdf">How to leak a secret</a>”,
and indeed it describes a scenario where somebody is leaking a secret, wants the recipient to be sure the origin of the
secret can be trusted (or better, that it belongs to a trustworthy group), but does not want to reveal its identity to the
recipient completely.</p>
<p>Note that after the original publication other schemes of ring signatures were developed; I am going to focus on the first one.
Indeed the original article I linked uses a different one.</p>
<p>The second desirable property of this ring signature scheme is that adding a new ring member is efficient: both the generation
and verification procedures only need to compute one extra modular multiplication and one symmetric encryption. Pretty good :)</p>
<h2 id="details">Details</h2>
<h3 id="the-setup">The setup</h3>
<p>Each signer in the ring is associated with a public key <code class="highlighter-rouge">Pi</code> and the generation scheme is known. All of them use the same scheme
(let’s assume RSA for simplicity). Note that the moduli N of each key could have different bit sizes; to work around this problem,
the paper finds a bit size b larger than any of the moduli’s size, then changes the encryption functions so that their outputs are unchanged
for most inputs, and set equal to the input for a negligible number of inputs. This guarnatees that if the original encryption function
was infeasible to invert (as in RSA) given a random output, the new one will be too.</p>
<p>The message to sign is named <code class="highlighter-rouge">m</code>, and we have <code class="highlighter-rouge">r</code> members in the ring. The generation function of each signer is named <code class="highlighter-rouge">gi</code>; This
is the function that computes the public key given the private key (and internal parameters such as the N in RSA). In RSA, this is the modular exponentiation.</p>
<p>We also have:</p>
<ul>
<li>a symmetric encryption function <code class="highlighter-rouge">E</code> such that for any key k, <code class="highlighter-rouge">E</code> with k is a permutation over strings of b bits.</li>
<li>a public collision-resistant hash function <code class="highlighter-rouge">H</code> that maps inputs to strings of the length of k, used then as keys for E.</li>
</ul>
<p>Finally we need a family of keyed <em>combining functions</em> C(k, v). These take as input k, a random initialization vector v and r
arbitrary values Y, each composed of b bits. Each function will use <code class="highlighter-rouge">E(k)</code> to produce outputs of b bits, such that for any
(k, v) pair we have that the function has the following properties:</p>
<ul>
<li>it is a permutation over all the Y values;</li>
<li>when fixing all the Y values but one, and with the output z known, there is exactly one solution for the remaining value and it is easy to compute;</li>
<li>given k, v and z it is infeasible for an attacker to solve the equation <code class="highlighter-rouge">C(k, v, g1(x1)...gr(xr)) = z</code> (given access to each g function),
provided it is also infeasible for them to invert the g functions themselves.</li>
</ul>
<h3 id="signature-generation">Signature generation</h3>
<p><strong>Step 1</strong>: compute <code class="highlighter-rouge">k = H(m)</code></p>
<p><strong>Step 2</strong>: select a <em>glue</em> value k, a bit string of length b, at random.</p>
<p><strong>Step 3</strong>: select random <code class="highlighter-rouge">xi</code> values, one for each ring member beside yourself, again bit strings of length b; then compute <code class="highlighter-rouge">yi = gi(xi)</code>.
This means using the random <code class="highlighter-rouge">xi</code> as replacements for the private keys you don’t know.</p>
<p><strong>Step 4</strong>: to find your own y value, solve the combining function for v, i.e. find the remaining value such that <code class="highlighter-rouge">C(k, v, Y) = v</code>. By definition there must be exactly one solution and it should be easy to find. Then compute the x value from y by inverting the function (i.e. by computing the private key of a RSA pair given the public key).</p>
<p>The signature is the set of public keys P, the glue v and the set of X values you computed, including your own.</p>
<h3 id="signature-verification">Signature verification</h3>
<p>The verification is quite trivial and is also based on the solutions of the ring equation:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>yi = gi(xi) for each xi
k = h(m)
</code></pre></div></div>
<p>then you verify that the equation with C (called <em>ring equation</em>) is satisfied with the given parameters.</p>
<h3 id="security">Security</h3>
<p>The anonymity is guaranteed by the fact that the ring equation, when k and v are fixed, has <code class="highlighter-rouge">(2^b)^(r-1)</code> solutions, and each of them
can be chosen with equal probability by the signing procedure, since it is based on random numbers.</p>
<p>The paper proves a theorem that any forging algorithm A, i.e. any algorithm that is able to forge a valid signature for a message m after
observing a non-huge quantity of signatures for different messages, can be turned into an algorithm B that is able to invert a function gi
for any random y. Since we assumed such a task must be computationally hard for g to be inverted (and it certainly is for common keypair
algorithms such as RSA), A should not exist.</p>
<p>I am thinking it would be cool to attempt this in practice. Pick a bad function g and make the ring members use it. Then build A;
in the proof, A is simply an oracle that can produce valid signatures for messages; we do not care how this is achieved internally, so for
our purposes it can just run the signature procedure. Then we build B from that following the explanation of the paper, and we verify
that it actually inverts g. This looks like a new crypto challenge in the making and I am already excited! The procedure described for building B was not 100% clear to me upon reading the paper, so it will require more careful study. I will let you know if I succeed in this adventure;
I could even propose this as a new challenge for the Matasano team to add to their website.</p>
<h3 id="conclusion">Conclusion</h3>
<p>One interesting improvement of the AOS ring signature scheme (the one used in the original article) is that it works with signers that use
different keypair generation functions. It uses something called Schnorr signature as the basis, and then “chains” multiple signatures together
into a ring to produce the final result. The idea is still to chain all the signatures together in a way that makes them depend on each other
and allows the verifier to repeat the same procedure to verify we end in a loop.</p>
Sun, 15 Oct 2017 22:00:00 +0000
http://shainer.github.io/crypto/2017/10/15/ring-signatures.html
http://shainer.github.io/crypto/2017/10/15/ring-signatures.htmlcryptoRSA padding oracle attack<p>My long series of posts on the Matasano crypto challenges and cryptography in general cannot be called complete without a dissertation
on challenges <a href="http://cryptopals.com/sets/6/challenges/47">47</a> and <a href="http://cryptopals.com/sets/6/challenges/48">48</a>, dedicated to the PKCS1.5 padding oracle and how it is exploited to break RSA and recover a plaintext. I was fascinated by this attack and read the whole paper before coding the implementation, so this post will include a bit more details on why the attack works.</p>
<h1 id="the-setup">The setup</h1>
<p>Alice takes her secret message and applies the PKCS1.5 encoding, getting a byte string of length equal to the number of bytes in the modulus of the RSA pair. She then encrypts it with Bob’s public key and sends it over the network. You, as the attacker, have access to the resulting ciphertext, and an oracle function on Bob’s server: when invoked, the server will decrypt the message and return true if the first two bytes in the plaintext are equal to ‘\x00\x02’. This is a necessary but not sufficient condition for the plaintext to be a PKCS1.5-encoded message; however for our purposes we can pretend that the oracle returning true means this ciphertext decrypts to a message <em>conformant</em> to PKCS1.5.</p>
<p>For a full description of the PKCS1.5 format, refer to the source paper or the implementation directly. The latter is a bit lazy, filling the padding portion of the message with the same byte repeated as many times as needed, rather than pseudorandom bytes.</p>
<h1 id="the-attack">The attack</h1>
<p>Well now you want to use the oracle output to recover the full plaintext. The idea is that the RSA ciphertexts are just numbers; by intelligently searching through the space of numbers, you will find another ciphertext that decrypts to the same plaintext. Once the algorithm completes you will be certain to have found such a number even without any verification.</p>
<p>In cryptography, this is called an <strong>adaptive chosen-ciphertext attack</strong>. Adaptive here means we choose the following ciphertext based on information derived from the previous one.</p>
<p>Let’s go into more details. We want to decrypt a ciphertext c, i.e. find <code class="highlighter-rouge">m = c^d mod n</code>. Due to a well-known property of RSA, if I decrypt
<code class="highlighter-rouge">cs^e</code> instead of c (for some arbitrary s), the plaintext will be equal to <code class="highlighter-rouge">ms</code>. So if I pick some random s, send <code class="highlighter-rouge">cs^e</code> to the oracle, and the response is “true”, I know that <code class="highlighter-rouge">ms</code> is PKCS1.5 conformant, i.e. it starts with ‘\x00\x02’. Let’s set <code class="highlighter-rouge">B = 2^8(k-2)</code> where k is the byte length of the modulus of the RSA pair (i.e. of the parameter n). Then it must be true that <code class="highlighter-rouge">2B <= ms mod n <= 3B</code>.</p>
<p>This means that by choosing different s, we are able to derive a set of intervals that must contains the plaintext m we are looking for. Once we are down to one potential interval, it is possible to choose s such that the probability of <code class="highlighter-rouge">cs^e</code> decrypting to a conformant message is quite high. After sufficient iterations, we’ll end up in an iteration with one interval of length 1, and we will have found our ciphertext.</p>
<p>The first set of interval is derived by setting s to 1; we know that m is contained one interval, <code class="highlighter-rouge">[2B, 3B - 1]</code>, as explained above. This is however too big to be useful yet, so we need to proceed to the next step.</p>
<p>In the next step, we increase our choice to s until we find another cs^e that decrypts to a conformant plaintext. The search either starts from
the value of s we found at the previous iteration, or from <code class="highlighter-rouge">n / 3B</code> for the first one. This is because small values (beside 1) are
less likely to generate conformant plaintexts. If we have only one interval in our set, however, we are able to narrow down the search a bit,
since we know that the number we are looking for must lie there. So, if our m lies between two numbers a and b, there will be a number r such that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2B <= ms - rn <= 3B - 1 (for some r)
2B + rn / s <= m <= 3B - 1 + rn / s
</code></pre></div></div>
<p>and this gives us the computation needed to find s in step 2c.</p>
<p>From the same formula above we can also explain step 3, i.e. how to update the list of intervals to use in the next iteration. We just need to
take all possible intervals created by every value of r for which the first equation remains true, and for all possible (a, b) intervals we
have in the current iterations. We also make sure not to add any interval that is contained in one we already have.</p>
<h1 id="remarks">Remarks</h1>
<p>The merits of this attack compared to others of its kind are simply that the number of oracle calls we have to do is on average smaller than
what other attacks need. The paper has some notes on why this is true and how to approximate the number of iterations required to find the
solution. I won’t go into that part as I found it quite technical and long to explain.</p>
<p>For the <strong>implementation</strong>, the two challenges are very similar; the only difference is that in the first one you can afford to be sloppy
and not implement certain parts (such as handling multiple intervals in one iteration, which never happens). I did the full implementation
at the beginning so I only had to change the parameters to make it work again.</p>
Sat, 14 Oct 2017 14:30:00 +0000
http://shainer.github.io/crypto/matasano/2017/10/14/rsa-padding-oracle-attack.html
http://shainer.github.io/crypto/matasano/2017/10/14/rsa-padding-oracle-attack.htmlcryptomatasanoUsenix Security 2017: global DNS manipulation<p>Usenix Security 2017 happened recently, and Usenix has then published all the videos and proceedings
(see <a href="https://www.usenix.org/conference/usenixsecurity17">their main page</a>). Since I didn’t attend, I looked
through the videos to watch those I found more interesting. <a href="https://www.youtube.com/watch?v=W_rBPdaTojQ">One in particular</a> struck my attention.
The presenter talks about a study performed by several academic bodies to measure DNS manipulation across the world in a reliable
and repeatable fashion.</p>
<p>What is DNS manipulation? DNS manipulation means changing the responses to DNS resolution queries to prevent
the requestor from accessing the actual domain they were looking for. Such “bad” responses will either be errors
or different IP addresses that redirect the user somewhere else.</p>
<p>Of course, the success of this depends on the DNS server(s) that perform the manipulation being somewhat popular
among the audience we intend to target. If people can easily use a truthful server instead, they are eventually
going to do so. But I digress; let’s assume that problem is more or less solved (and in practice this happens
all the time, at least to non-power users).</p>
<p>The study asked the questions of how common DNS manipulation is in the current world and what forms does it take.
Surprisingly while we all agree that Internet censorship is a thing, and can probably name a few notorious examples that
make headlines, comprehensive data on the subject are hard to come by.</p>
<p>This introduces the next question: how do we collect such data? Half of the video presents their pipeline dedicated to collect
lots of data about DNS manipulation across the world. The next presents some results, divided by country, category
of domains (pornography, gambling, human rights, multimedia sharing were among those graphed on the slides) and
actual top-level domain (are Google-owned domains more often censored than, say, Wikipedia or Facebook?).</p>
<p>The first lesson taken from this experiment is that it is incredibly easy to introduce bias in the results by
simply not checking a diverse enough set of domains. We can only check whether a domain is manipulated by issuing
DNS queries and look at the result. We cannot infer whether other domains are also manipulated in the same context.
Therefore if we don’t start with a varied input set, our picture is going to be incomplete.</p>
<p>The second is that manipulation is incredibly common, and it manifests itself in a variety of ways. Discussions on this
go quickly into politics of specific countries, so I encourage people to look at the data presented and form their own
opinions on all this.</p>
Sun, 24 Sep 2017 14:30:00 +0000
http://shainer.github.io/security/dns/2017/09/24/global-dns-manipulation.html
http://shainer.github.io/security/dns/2017/09/24/global-dns-manipulation.htmlsecuritydnsFinding solutions to weird equations<p>A while back, on social media, I found a link to <a href="https://www.quora.com/How-do-you-find-the-positive-integer-solutions-to-frac-x-y%2Bz-%2B-frac-y-z%2Bx-%2B-frac-z-x%2By-4/answer/Alon-Amit?share=1">an interesting post on solving a particular mathematical equation</a>. Now I know this description might not sound very enticing; people solve equations every day! But, some equations carry more meaning than what appears at first glance.</p>
<p>The equation has a pretty symmetric structure and a simple definition:</p>
<p><img src="http://shainer.github.io/images/equation.png" alt="The equation" /></p>
<p>And we want to find the <strong>positive</strong> solution or solutions to this equation. How? As will be clear, brute forcing won’t
be enough, so we have to study the properties of this equation and come up with ingenious methods.</p>
<p>The post then explains how we are dealing with a 3-degree equation that has at least one rational (but not positive) solution.
This means the equation describes an <strong>elliptic curve</strong>: by mean of complex (and quite boring) transformations, we can
express it in the usual elliptic curve form, which is</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y^2 = x^3 + ax + b
</code></pre></div></div>
<p>with a and b rational coefficients of the curve. At this point the post author provides one solution for the equation,
the point (-100, 260). By simple math, the corresponding values of the initial unknowns are computed as 4, -1 and 11. So this
is not good, because it’s not a positive solution. Now, we can make use of some properties of elliptic curves to find new
solutions to test for positivity.</p>
<p>In particular, elliptic curves are closed under addition: adding two points P and Q on the curve yields another point on
the curve. Such point is found by drawing a line connecting the two points, looking for a third point where the line intersects
the curve, and taking the point that is symmetric to this on the x axis (which will also belong to the curve).</p>
<p>Now, if we add our initial point (-100, 260) to itself, finding 2P, 3P, etc… and every time we test the new point to see if
the unknowns are positive. We have to arrive to 9P to finally find the following solution to the equation:</p>
<p>a=154476802108746166441951315019919837485664325669565431700026634898253202035277999,
b=36875131794129999827197811565225474825492979968971970996283137471637224634055579,
c=4373612677928697257861252602371390152816537558161613618621437993378423467772036</p>
<p>well those are huge numbers, so you can see why employing brute force wouldn’t have helped here.</p>
<p>But why did we pick a point and started adding it to itself? Well it turns out this point is the one and only <em>generator</em>
of the curve, so by adding it to itself enough time you are able to find any other non-infinity point on the curve. So if the positive solution
existed, we were bound to find it with this method.</p>
<p>The final interesting thing is that the number of digits in the first positive solution we can find with the generation
method depends on the coefficient on the right side of the equation (here, 4). The bigger this coefficient, the bigger the number
of digits is going to be. How bigger? Can we find a correlation between the two quantities?</p>
<p>Unfortunately, we cannot. This is because there is no algorithm for finding an integer solution to a Diophantine equation, so
naturally we cannot make statements about the property of such solution. We cannot even know if one will exist in advance.
Thanks to this post, I am now aware of the <a href="https://en.wikipedia.org/wiki/Hilbert%27s_tenth_problem">story behind this</a>, which
I might explore in more details in a future post.</p>
<h2 id="extra-notes">Extra notes</h2>
<p>I am temporarily without my main laptop, using a Chromebook (long story). So I don’t have access to my usual writing environment and I wrote this post entirely on Github. I have to say that the general flow is quite good: it’s easy to upload a bunch of new files to some location and create a commit, and the Markdown editor is decent with live preview. I still prefer Atom for that though :)</p>
<p>Also, embedding LateX in Markdown, at least for Github Pages, is still a messy affair. Perhaps one day I’ll find the patience to figure out which method actually works and does not require complex HTML code. That day is not today, sorry, you get the equation as a PNG image cut from a screenshot :D</p>
Tue, 12 Sep 2017 00:30:00 +0000
http://shainer.github.io/math/2017/09/12/finding-solutions-to-weird-equations.html
http://shainer.github.io/math/2017/09/12/finding-solutions-to-weird-equations.htmlmathForging RSA signatures<p>As I vaguely promised some weeks ago, going back to solve the crypto challenges I had missed in the sets before the eigth one, I found yet another interesting problem to talk about: exploiting several weaknesses to <a href="http://cryptopals.com/sets/6/challenges/42">forge a RSA signature</a>.</p>
<h2 id="preconditions">Preconditions</h2>
<p>The following conditions are necessary for this attack to work:</p>
<ol>
<li>RSA is used to generate and verify digital signatures;</li>
<li>The keypair generation is lazy, and sets the public exponent e to 3;</li>
<li>The signature verification is also a bit lazy; I’ll describe how later.</li>
</ol>
<p>#1 is nothing weird: digital signatures need some form of asymmetric encryption and RSA is the most popular choice.</p>
<p>#2 can happen in practice due to how keypair generation works: the two internal parameters for RSA, q and p, need to be primes, large enough to make the factorization of N = pq hard, and such that (p - 1) and (q - 1) are either coprimes or don’t have a lot of common factors after 2. A common way to achieve this is to set e, the “exponent” part of the final public key, to some small prime like 3, and then derive p, q and d (the exponent of the private key) from that. Having a small value for e also makes encryption and decryption quite easy, since we work with “small” numbers.</p>
<p>The full standard to compute signatures with RSA is described in <a href="https://tools.ietf.org/html/rfc2313">RFC 2313</a>. In short, the message is first hashed (most common algorithms are supported), then the following bytes block is generated:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00 01 FF .. FF 00 ASN.1 HASH
</code></pre></div></div>
<p>where ASN.1 is a byte string identifying the hash algorithm used (see <a href="https://www.ietf.org/rfc/rfc3447.txt">RFC 3447</a>), and there’s as many FF bytes as needed to make the total size equal to the size in
bytes of N, the modulus of the RSA keys. Note that N is part of both the public and the private RSA key. The block is then converted to the corresponding integer and encrypted with the private key; optionally, the result is converted again to an hex string or similar representation.</p>
<p>Now we can see where mistake #3 can come from: if I verify a RSA signature using, for example, regular expressions, it’s easy to check that there’s one or more FF bytes in the padding zone, but not checking that there’s exactly the number I expect. Furthermore, I might not check that there’s nothing else in the signature after the hash. Note that this bites me even if I separately check that the
total signature length is as expected.</p>
<p>If all these conditions are there, the attacker is able, without any knowledge of the private key, to forge a RSA signature for pretty much any message, and have it accepted by the verifier.</p>
<h2 id="how-the-forgery-works">How the forgery works</h2>
<p>If I am verifying a signature, I am decrypting with the public key, which means this operation is performed:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D = EncryptedSignature ** e mod N
</code></pre></div></div>
<p>if e is equal to 3, it’s quite possible that <code class="highlighter-rouge">EncryptedSignature ** 3</code> ends up being smaller than N, therefore the modulo operation does not change the result. So, if we forge a block that satisfies only the conditions we know the system checks for, and also corresponds to a perfect cube, we can pass the cube root as a signature to such a verification system, and it will be accepted as valid. How does that happen?</p>
<p>Let’s take a block with this format:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00 01 FF 00 ASN.1 HASH GARBAGE
</code></pre></div></div>
<p>Now let’s take the sub-block composed of the last 00 byte, the ASN.1 code, and the hash. If SHA-256 is used for the hash, the total size is 52 bytes. We then convert this block into an integer, which we call D.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Block = '00 ASN.1 HASH'
D = int(Block)
N = 2^len(Block) - D
</code></pre></div></div>
<p>Now let’s say that our RSA key has length 2048 bits; with the format above, there are going to be (2048 - 52 - 3) bits left on the right for garbage. Let’s call this number X. The numeric block is going to be:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2 ^ (2048 - 15) - 2 ^ (X + len(Hash)) + D * 2^X + garbage
2 ^ (2048 - 15) - N * (2^X) + garbage
</code></pre></div></div>
<p><strong>Edit</strong>: the second equation follows from the first quite easily if you remember that according to the definition above, <code class="highlighter-rouge">N = 2^len(Block) - D</code>, and therefore, <code class="highlighter-rouge">D = 2^len(Block) - N</code>.</p>
<p><strong>Disclaimer</strong>: this is where I get lost, sadly. I don’t understand what the 15 bits subtracted from the key size represent. If somebody reads this and wants to send me an email for a more complete explanation, they are welcome to do it!</p>
<p>From empirical evaluation, it seems that for the garbage number you should prefer higher values: the lowest cube roots might end up being encoded in something that does not quite contain the full hash at the end, possibly because we run out of bits to convert before that. This is why my code takes a shortcut and just sets it to the highest number possible given the allowed number of bits.</p>
<p>The final step is computing the cube root and converting the result to an integer (in my case, by rounding down). Here I ran into a limitation of Python: the suggested way to compute a cube root is to elevate the number to the power of (1.0 / 3.0), but this requires the base to be converted to a float, and that does not work for very large integers such as this one. I could have looked at some mathematical library like numpy, but I am reluctant to add too many dependencies; eventually I found a code snippet on the Internet that does the job with the decimal builtin module.</p>
<p>Aaand we are done!</p>
Sun, 20 Aug 2017 17:30:00 +0000
http://shainer.github.io/crypto/2017/08/20/forging-rsa-signatures.html
http://shainer.github.io/crypto/2017/08/20/forging-rsa-signatures.htmlcryptoInteresting C++ features, part 2<p>I have developed in C++ a lot, both at work and outside. So I like to keep updated with the new features and utilities introduced by
new versions or available through common libraries. This post will describe a few new things I have discovered recently.</p>
<p>If you are wondering why this is part 2, I have decided <a href="https://shainer.github.io/c++/opensource/2016/11/13/cpp-errors.html">this post</a> can be
considered as “part 1” of this series. I plan to write more about C++ in the future: I’ll make all the posts numbered and under the
“c++” category.</p>
<h2 id="span">span</h2>
<p>This is not part of the STL, but rather of the Guidelines Support Library, which is any library implementing <a href="https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md">this set of guidelines</a> by the standards committee.</p>
<p>Let’s say that you want to pass an array to a function as a pointer. A basic example:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">sum</span><span class="p">(</span><span class="kt">int</span><span class="o">*</span> <span class="n">data</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// sum all elements in data.
</span><span class="p">}</span>
</code></pre></div></div>
<p>Now in order for this code to work, we assume n represents the array size, and that it’s
actually correct and does not make us access out-of-bounds memory.</p>
<p>A better way:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">sum</span><span class="p">(</span><span class="k">const</span> <span class="n">span</span><span class="o"><</span><span class="kt">int</span><span class="o">>&</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// sum all elements in data.
</span><span class="p">}</span>
</code></pre></div></div>
<p>a <code class="highlighter-rouge">span</code> of the array (which might or might not contain all the elements) is built on-the-fly to
represent a view of the array, and passed to any function. All the information about size are contained
internally.</p>
<p><code class="highlighter-rouge">array_view</code> (which has a more descriptive name) works the same way, but unlike <code class="highlighter-rouge">span</code> it’s a read-only
view of the original array. This is preferrable to ensure no function can write on what is a view of another
data structure, without having to enforce constness instead.</p>
<h2 id="custom-error-codes">Custom error codes</h2>
<p>This was introduced in C++11, but I think it was overshadowed by other more revolutionary features, since you seldom find
trace of it.</p>
<p>There is another blog which describes quite in details <a href="https://akrzemi1.wordpress.com/2017/07/12/your-own-error-code/">how to define your own error code space</a> and <a href="https://akrzemi1.wordpress.com/2017/08/12/your-own-error-condition/">how to write error conditions</a>. It’s no use repeating
the entire content of those posts here, so I’ll make a short summary.</p>
<p>The <code class="highlighter-rouge">std::error_code</code> is a generic interface (used in non-programming sense of the word) to express custom
error codes. Error codes are identified by a <em>number</em> (minus 0, which always means success) and a <em>domain</em> or <em>category</em>; the
latter specifies the types of errors we are dealing with, and is identified by a name.</p>
<p>At a high level, this machinery allows you to construct and use variables of type <code class="highlighter-rouge">std::error_code</code> from the enum values representing the
custom error codes you need. By subclassing <code class="highlighter-rouge">std::error_category</code> you define a custom category,
with some nice functionalities such as associating an error message to each code.</p>
<p>Moving forward, it’s also possible to express complex groupings and conditions on your set of errors by using <code class="highlighter-rouge">std::error_condition</code>.
Let’s say that all of your errors fall into the following sub-categories: <strong>internal errors</strong> (something happens inside your program) and
<strong>external errors</strong> (due to e.g. networking). You can extend the logic of your category with a function that tells you which sub-category
a given code belongs to.</p>
<p>Personal opinion: this is not trivial to use, and requires more boilerplate than usual for the initial set up of codes and categories.
However once that part is done (likely in some utility library) it is incredibly useful: good error handling is
something many applications and libraries miss (and let’s not talk about dealing with <code class="highlighter-rouge">errno</code>…). I am happy to pay the cost to
have sets of well-defined error spaces and codes to deal with.</p>
<h2 id="cppcon">CppCon</h2>
<p>CppCon 2017 is around the corner! I won’t be able to attend, but it’s on my todo list and I am currently spending time watching
talks from previous editions on YouTube. I definitely recommend to keep an eye for this year’s talks.</p>
Sun, 13 Aug 2017 10:00:00 +0000
http://shainer.github.io/c++/2017/08/13/more-cpp-features.html
http://shainer.github.io/c++/2017/08/13/more-cpp-features.htmlc++Cleaning up the Matasano repositories<p>I started the Github repository for the Matasano crypto challenges mostly for myself. It’s not software with actual users, so I didn’t pay a lot of attention to code health or general organization.</p>
<p>But as I kept adding more files, and referencing my code in posts, I realized it was a pretty bad shape and it would benefit from some love. So I went back nand reorganized all the files, making sure viewers (and myself) can easily figure out which Python file is a binary you run to solve a given challenge, and which is a library used to abstract common operations. The README also gives more information about the status of the work and what dependencies the applications have.</p>
<p>Therefore I officially apologize to the people who looked at it or forked it when it was in a much worse shape :-)</p>
<p>This cleanup also had an unintended consequence: I realized I didn’t actually solve all the challenges. I skipped the MD4 one on purpose, as I already explained, but I actually “left for later” another 3-4 challenges, and then completely forgot about this. Well, better late than never, so I am now going back and solving them. Challenge 20 was the first of the list, and I just submitted the solution this morning.</p>
<p>So double yay for code cleanup!</p>
Mon, 07 Aug 2017 12:00:00 +0000
http://shainer.github.io/matasano/cleanup/2017/08/07/cryptopals-code-cleanup.html
http://shainer.github.io/matasano/cleanup/2017/08/07/cryptopals-code-cleanup.htmlmatasanocleanupFish shell<p>Among way more serious topics, there is something trivial I want to share with the world: I officially changed my default shell to <a href="https://fishshell.com">fish</a>.</p>
<p>Why? The autosuggestion features are way more user-friendly. While you are typing a command, you can see a suggestion for the rest of the command, options or files form to the right of your cursor, based on your recent history. Multiple suggestions are easy to browse and select should the first one not be the right one for you. Moreover, completions are derived from the manpages installed on your system, so you get the most useful ones out of the box; by looking atthe repository I believe it is also possible to add more completions if the manpage does not contain everything. Default colouring is nice too.</p>
<p>Fish comes with its own scripting language; I don’t plan to write any Fish script right now, so I haven’t looked at that at all.</p>
<p>For Chakra Linux users, I packaged fish in [desktop], and I plan to keep an eye on new releases to update it regularly.</p>
Sun, 16 Jul 2017 16:20:00 +0000
http://shainer.github.io/linux/2017/07/16/fish-shell.html
http://shainer.github.io/linux/2017/07/16/fish-shell.htmllinuxDiscrete logarithms: a guide<p>I am working through the second challenge in the 8th cryptopals set, and I am already learning something new. Let’s talk about discrete logarithms, what they are and how to compute them.</p>
<p>When preparing a Diffie Hellman key exchange, some parameters must be chosen first:</p>
<ul>
<li>P: a large prime, is the order of a finite <strong>cyclic group</strong>.</li>
<li>G: the <strong>generator</strong> of the cyclic group.</li>
</ul>
<p>G is a generator if for every element of the group, there is a x such that <code class="highlighter-rouge">G^x mod P</code> gives that element of the group.</p>
<p>The shared secret of Diffie Hellman is computed using <strong>modular exponentiation</strong> of the generator. There are algorithms to compute modular exponentiation efficiently, even for numbers that have hundreds of bits. However, the security of Diffie Hellman (among others) depends on the fact that the inverse operation, the <strong>discrete logarithm</strong>, cannot be computed efficiently in the general case, if the parameters are chosen with the right properties.</p>
<p>However, algorithms that are able to do better than the brute force still exist. Let’s talk about two of them: <strong>baby-step, giant-step</strong> and <strong>Pollard’s kangaroo</strong>.</p>
<h2 id="baby-step-giant-step">Baby-step, giant-step</h2>
<p>Recall the formulation of the problem: we want to find x such that</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>G^x mod P = B
</code></pre></div></div>
<p>We can rewrite x as</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = im + j
</code></pre></div></div>
<p>where m is sqrt(P) and i and j are two (unknown) coefficients between 0 and m. So applying some exponential properties the formulation becomes:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>B(G^-m)^i = G^j
</code></pre></div></div>
<p>All operations happen inside the group, so modulo P. Now, we precompute <code class="highlighter-rouge">G^j mod P</code> for all values of j up to m. We store them in a data structure, for easy lookup later; the data structure maps these exponentials with the value of j. A natural choice here is a hash map.</p>
<p>One of the values of j will be the one that makes the above equation true. But we need to find i too. So we brute force it: for all possible i between 0 and m, we compute <code class="highlighter-rouge">B(G^-m)^i mod P</code>. If the result is in the data structure we built initially, we retrieve the corresponding j. We have found both coefficients, and therefore x.</p>
<h3 id="complexity">Complexity</h3>
<p>This is still a brute force algorithm, however, instead of trying all possible x between 0 and P (the trivial brute force), we only try up to m, the square root of P. This reduces the number of operations we perform in the worst case. So in terms of complexity analysis it does not mean much. We also “pay” for this speedup by using more memory to store m key-value pairs.</p>
<h3 id="implementation">Implementation</h3>
<p>I did mention I wanted some practice with Rust, so <a href="https://github.com/shainer/baby-step">here is a Rust implementation of baby-step giant-step algorithm</a>. The code contains efficient implementations of modular inverse (to compute <code class="highlighter-rouge">G^-m</code>), and of modular exponentiation. I have used both several times in the Matasano solutions, but here I took the time to examine, understand and explain them in the comments.</p>
<h2 id="pollards-kangaroo-algorithm">Pollard’s kangaroo algorithm</h2>
<p>This algorithm is used when the discrete logarithm is known to lie in a subrange [a, b] of the group. Of course in the worst case you can set the subrange to [0, P-1], i.e. all elements in the group, but in that case more efficient alternatives exist.</p>
<p>The basic idea is to generate two pseudorandom sequences of elements in the range, and then looking for collisions. The first sequence starts from an element of known discrete logarithm, the second from the element whose logarithm we want to find (B).</p>
<ul>
<li>Define a deterministic function F from elements of the input group to S, a set of integers.</li>
<li>Choose an integer N and compute a sequences of N integers like this:</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y0 = G^b mod P
y := y * (G^F(y) mod P) mod P
</code></pre></div></div>
<p>In another variable, usually called the <em>distance</em> D, you store the sum of all the F(y) you computed for the sequence. Also note that for the final element of the sequence, this property holds:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>yFin = G^(b+D) mod P
</code></pre></div></div>
<p>This sequence is the <strong>tame Kangaroo</strong>. It starts from y0, an element whose discrete logarithm is b, the end of our range. From there, we take N jumps to other elements.</p>
<p>Now we define the <strong>wild kangaroo</strong>. The new sequence has the same definition, only we start from B:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y0 = B
y := y * (G^F(y) mod P)
</code></pre></div></div>
<p>Again we keep track of the “distance” travelled in D’. If the next element of the sequence collides with an element we have seen before, then the discrete logarithm is equal to</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>b + D - D'
</code></pre></div></div>
<p>We stop when we have travelled more than <code class="highlighter-rouge">b - a + D</code>. This algorithm does not guarantee that a solution is always found: it is possible to exceed the limit without colliding with the tame kangaroo, even if a discrete logarithm exists.</p>
<p>F controls the size of the jumps you make at each iteration. Bigger jumps give you a better computation time for large numbers, but also increase the probability that you won’t collide. To reduce the risk, N is chosen so that larger outputs of F correspond to a larger N: this causes the first sequence to have more elements.</p>
<h3 id="implementation-1">Implementation</h3>
<p>None of now. Or at least not public: I baked a Python implementation in the solution of the latest challenge I am working on. When I have time I am going to take it out, translate it to Rust, and make it public, but after spending time on the baby-step giant-step algorithm, I don’t feel like it :-)</p>
<h2 id="others">Others</h2>
<p>There are other algorithms that are better than brute force, but none of them run in polynomial time. I have not studied or implemented them so I am not going to talk about them, but <a href="https://en.wikipedia.org/wiki/Discrete_logarithm#Algorithms">Wikipedia has a list</a> if you want to learn more!</p>
Sat, 27 May 2017 17:00:00 +0000
http://shainer.github.io/math/crypto/2017/05/27/discrete-logarithms.html
http://shainer.github.io/math/crypto/2017/05/27/discrete-logarithms.htmlmathcrypto