Shainer's SiteRandom stuff I find interesting or I am working on. At some point I will move all the content from my old website, giudoku.sourceforge.net, here.
http://shainer.github.io/
Mon, 25 Dec 2017 21:00:12 +0000Mon, 25 Dec 2017 21:00:12 +0000Jekyll v3.6.2Advent of Code and Rust<p>The <a href="http://adventofcode.com/2017">Advent of Code</a>, which I did not know
was a thing until this year, is an excellent way to practice a new
language. The puzzles are not too easy that you get bored, but not too hard
that using a new language you are not very familiar with becomes frustrating.
So finally I had found the perfect excuse to practice Rust. And here is
what I learned.</p>
<h2 id="borrow-checking-and-all-that-jazz">Borrow checking and all that jazz</h2>
<p>The Rust Book warns you that fighting with the borrow checker is a common
occurrence for every beginner. What happens is that you do something that is
perfectly normal and logical in any other language, and the compiler gives you
obscure errors, plus suggestions that might or might not work. However, once you
learn the basic concepts, everything falls into place.</p>
<p>Let’s say we have an object O of a non-primitive type. If we assign O to another
object, or pass it as a parameter to a function, the object is <strong>moved</strong>.
This is equivalent to what happens in C++ when you call <code class="highlighter-rouge">std::move</code>: the name O
does not point to the memory region it was pointing to before. Owernship of
the memory region has been passed to the new name. Any subsequent usage of O
in the code will cause the Rust compiler to throw a (very detailed) error.</p>
<p>How do you get around that? One way is to make a copy of the object, if that
is what you wanted.</p>
<p>But if you want to act on the same structure, you should <strong>borrow</strong> the object
instead of getting full ownership. This is done by acquiring a reference to
the object. If you need to modify the object in the process,
the reference must be mutable, and this is called a <strong>mutable borrow</strong>. There
are two main rules about borrows:</p>
<ul>
<li>they last until the current scope ends;</li>
<li>if you borrow mutably, you cannot borrow the same object again in the scope; conversely
you can have as many non-mutable borrows as you want.</li>
</ul>
<p>The latter implies that iterating over a data structure and trying to modify it in
the same loop is not going to end well: the iteration causes a non-mutable borrow
of the structure (to access its elements), so modifying the structure violates
the rules (it would require a mutable borrow).</p>
<p>The first rule however allows you some freedom. Consider this code snippet:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
<span class="k">let</span> <span class="n">count</span> <span class="o">=</span> <span class="n">map</span><span class="nf">.entry</span><span class="p">(</span><span class="n">key1</span><span class="p">)</span><span class="nf">.or_insert</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="o">*</span><span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="k">let</span> <span class="n">count</span> <span class="o">=</span> <span class="n">map</span><span class="nf">.entry</span><span class="p">(</span><span class="n">key2</span><span class="p">)</span><span class="nf">.or_insert</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="o">*</span><span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>map is an object of type <code class="highlighter-rouge">std::collections::HashMap</code>. Given two map keys,
<code class="highlighter-rouge">key1</code> and <code class="highlighter-rouge">key2</code>, I want to increment their value by 1, if they exist, or
insert them with value 1. The <code class="highlighter-rouge">entry</code> method is convenient, because
I can lookup a pair, insert it with a default value if it’s not there, and
then modify the value. But for this I need a mutable borrow of <code class="highlighter-rouge">map</code>. This
means that if I put everything in the same scope, Rust is going to complain that
I am trying to do two mutable borrows of <code class="highlighter-rouge">map</code>.</p>
<p>I don’t really care about both instructions being in the same scope:
I am okay with <code class="highlighter-rouge">count</code> only living until I perform the increment. So I create
scopes in a way that the borrows of <code class="highlighter-rouge">map</code> lasts only for when they are needed.</p>
<p>How about lifetimes, the other much-whispered Rust feature? I have only
used them once, and this post is getting too long, so perhaps another time.</p>
<h2 id="matching">Matching</h2>
<p>You can read about matching in any good Rust tutorial. It’s cool, having
used it in Haskell I was pleasanty surprised to see it got adopted more widely.</p>
<h2 id="global-variables">Global variables</h2>
<p>You can use global variables, but every usage (read or write) needs to be
wrapped in an <code class="highlighter-rouge">unsafe</code> block, which is a generic way of telling the compiler
to relax a few of its safety checks.</p>
<p>In a nutshell, try not to use global variables. I approve of that sentiment.</p>
<h2 id="missing-things">Missing things</h2>
<p>There are a few operations that I give for granted that are not yet available in
the stable Rust build:</p>
<ul>
<li>In a range iteration, you cannot control the step, it’s always 1.</li>
<li>Remove an element in a data structure given an identical element (you can
use <code class="highlighter-rouge">retain</code> through).</li>
<li>Find the index of an element in a structure.</li>
</ul>
<p>I have been told all this in the experimental builds though, so no big deal.</p>
<h2 id="casting">Casting</h2>
<p>I like how easy casting is. However, in Rust there is no implicit type
conversion. This means that if you want to write a mathematical expression
involving one f64 and a bunch of i64, with the result as f64, it does not
look so nice: every i64 needs to be explicitly converted to f64 when it’s used.</p>
<h2 id="final-opinions">Final opinions</h2>
<p>Well I like Rust. I imagine the move semantics and all the extra safety checks
can be difficult to master if you are a beginner, just learning how to do
for loops and the like. However, if you have used e.g. C++ and have run into
some of the issues Rust is supposed to prevent, you are more likely to understand
why rules are made in a certain way.</p>
Mon, 25 Dec 2017 21:00:00 +0000
http://shainer.github.io/rust/2017/12/25/rust-and-advent-of-code.html
http://shainer.github.io/rust/2017/12/25/rust-and-advent-of-code.htmlrustAbout reverting Git commits<p>Today I created a new repository on Github and I wanted to upload some local
code. However I messed it up and created the first commit locally too soon,
so I needed to revert it to fix things.</p>
<p>This is when I realized that the recommended way to revert a commit, in the
case where we don’t want to also undo the corresponding changes, is:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git reset <old commit>
</code></pre></div></div>
<p>This tells git to set HEAD to a different commit than the current one, and
anything that comes after that reverts to a local uncommitted diff.</p>
<p>But what if the commit you want to revert is the first one? You have no previous
commit hash to revert to.</p>
<p>The way I found is to delete the branch you implicitly created with your first
commit:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git update-ref -d HEAD
</code></pre></div></div>
<p>And all is good!</p>
Sun, 03 Dec 2017 18:00:00 +0000
http://shainer.github.io/git/2017/12/03/git-revert.html
http://shainer.github.io/git/2017/12/03/git-revert.htmlgitThe return of Coppersmith's attack<p>It was a close call between this and a post about elliptic curves. But in the
end I decided a post was going to help me summarize all I learned about
Coppersmith’s attack in the past days. So here we go!</p>
<h2 id="whats-up">What’s up?</h2>
<p>A group of researchers has found a vulnerability in how RSA keypairs are
generated in widely used cryptographic libraries. A lot of these libraries are
deployed on hardware that generates keys for smartcards or similar devices. This
vulnerability is easy to recognize given a few keys generated by the affected
software, and can be exploited to retrieve the private key of the pair in a
feasible computational time. Not a nice discovery :) but let’s describe how
this is accomplished.</p>
<h2 id="the-coppersmiths-attack">The Coppersmith’s attack</h2>
<p>The first building block of this vulnerability is a well-known “total break”
attack against RSA. Total break means that we are able to recover the
private key of the pair, therefore we can then decrypt any cyphertext we
intercept.</p>
<p>With RSA, a ciphertext is computed as:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>c = m^e mod N
</code></pre></div></div>
<p>The attacker wants to find m; what if they knew a part of m? What if</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>m = m0 + x0
</code></pre></div></div>
<p>with m0 known for some reason, and x0 the new unknown to break. There are
several ways to translate this to an equation that looks like:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>f(x) = c - (m0 + x)^e mod N
</code></pre></div></div>
<p>Now there are algorithms to find the root of a polynomial if such root
is small enough; let’s call X the upper bound on our root. This will be our
solution. But here our polynomial is defined over “mod N”, and there are no
simple algorithms for this case.</p>
<p>What we need to do is to build a second polynomial <code class="highlighter-rouge">g(x)</code> with the same roots as
f(x) but defined over the integer space Z. To do this we use Howgrave-Graham’s
theorem that states that if g<code class="highlighter-rouge">g(x0) = 0 mod N</code> (with <code class="highlighter-rouge">|x0| <= X</code>) and
<code class="highlighter-rouge">||g(xX)|| < N/sqrt(n)</code> then <code class="highlighter-rouge">g(x0) = 0</code> holds over the integer space
Z. In the third equation, n is the number of monomials that composes g(x).</p>
<p>Now we need to find the starting g(x0). Here is where lattices and the LLL
algorithm are useful. Let’s describe what they are briefly.</p>
<h3 id="lattices-and-lll">Lattices and LLL</h3>
<p>If I take two vectors in 2D space, and they are not a linear combination of
each other, then such vectors can <em>generate</em> the whole space by computing
different linear combinations. Now let’s say that I am only allowed to compute
linear combinations with integer coefficients; instead of the whole 2D space
I can only generate a set of discrete points on the space: such new space
is called a <em>lattice</em>, and the two starting vectors are the <em>basis</em> of
the lattice.</p>
<p>The LLL algorithm gets the basis of a lattice and returns the shortest vectors
that generate the same lattice. In particular there is a clear upper bound
on the first vector of the new basis. Exactly what we need!</p>
<h3 id="putting-it-all-together">Putting it all together</h3>
<p>The final part is easy: instead of generating one f(x) I generate multiple
ones until they form the basis for a lattice. I apply LLL on the result and
then take the first vector of the new basis (and its known upper bound) as
my g(x). Howgrave-Graham’s theorem allow me to convert this g(x), still defined
in mod N, into a polynomial defined over the integer space.</p>
<p>There are a few more caveats on how the starting polynomial must be defined,
but this is the gist of the attack; once I have g(x) over the integers,
finding the roots is a solved problem. Mission accomplished!</p>
<h2 id="the-new-attack">The new attack</h2>
<p>So why did this attack deserve renewed attention recently? The researchers
performed statistical analysis on the RSA primes generated by common cryptographic
libraries and found some patterns that should not be there. Generating the very large
primes required for RSA to be secure, especially if you need to be fast, is
tricky, and there are a lot of conditions that you need to watch out for to
avoid accidentally making your pair easier to attack even without Coppersmith.</p>
<p>Such vulnerable libraries therefore use fomulas. In particular the libraries
examined by the paper set:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>P (or Q) = k * M + (65537^a mod M)
</code></pre></div></div>
<p>where k and a are the only internal parameters. M is set once for all
the generations of pairs of a given bit size and is public. M is also quite large,
which means that k and a tend to be small. Therefore the resulting P has very
low entropy: two different P values will only differ by a relatively small amount
of bits (much smaller than the keysize), and the space of possible primes the
library can generate becomes smaller.</p>
<p>Now our “polynomial” has two roots, so the idea is to iterate through values of
one of them and use Coppersmith’s method to compute the other. The researchers
tried setting a and computing k, but the required amount of attempts was
infeasibly large in the average case. So they got creative, by transforming the
equation as to use M’, one of the small divisors of M, instead of M
itself. The way M is chosen makes sure small divisors always exist: it is actually
computed as the product of several small primes up to a given number. This makes
finding the corresponding k’ and a’ much easier. The optimal M’ value in terms of
speed of attack, for every M supported by the library, was found by local
brute force search plus some heuristics; note that they only need to do this
once for every possible key size.</p>
<h2 id="conclusion">Conclusion</h2>
<p>No matter how many challenges I solve there’s always more to cryptography that
meets the eye, even for a relatively simple and well-known algorithm such as RSA
One shortcut or simple failure in checking conditions can result in pretty bad
failures down the line; and it takes a beginner like me days of careful study
and scribbling on paper to even understand how such failures materialize.</p>
<h2 id="references">References</h2>
<ul>
<li><a href="https://crocs.fi.muni.cz/public/papers/rsa_ccs17">Overview of the attack and the paper</a>.</li>
<li><a href="https://github.com/mimoo/RSA-and-LLL-attacks">Implementation for SageMath by David Wong</a>.</li>
<li><a href="https://www.youtube.com/watch?v=3cicTG3zeVQ">David Wong’s excellent explanatory video</a>.</li>
<li><a href="https://en.wikipedia.org/wiki/Coppersmith_method">Coppersmith’s method</a>.</li>
</ul>
Sun, 03 Dec 2017 10:00:00 +0000
http://shainer.github.io/crypto/math/2017/12/03/the-return-of-coppersmith.html
http://shainer.github.io/crypto/math/2017/12/03/the-return-of-coppersmith.htmlcryptomathThe Chinese remainder theorem (with algorithm)<p>Let me preface by saying that you could potentially write a dozen blog posts with all the
implications and mathematical connections that I saw involving the <a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem">Chinese remainder theorem</a>.
That being said, I am going to focus on a basic description and how to implement it.</p>
<p>Crypto enthusiasts will have understood that this post comes directly from set 8
of the crypto challenges. I believe all things considered I spent more time producing
a viable implementation of this theorem than on the rest of the challenge combined.</p>
<h2 id="the-theorem">The theorem</h2>
<p>Let me write the following set of k equations:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = a1 (mod n1)
...
x = ak (mod nk)
</code></pre></div></div>
<p>This is equivalent to saying that <code class="highlighter-rouge">x mod ni = ai</code> (for i=1…k). The notation above is
common in group theory, where you can define the group of integers modulo some number
n and then you state equivalences (or <em>congruence</em>) within that group.</p>
<p>So x is the unknown; instead of knowing x, we know the remainder of the division
of x by a group of numbers. If the numbers ni are pairwise coprimes (i.e. each one
is coprime with all the others) then the equations have exactly one solution. Such
solution will be modulo N, with N equal to the product of all the ni.</p>
<p>For some notes on the history and the reason it was named the Chinese theorem
refer to Wikipedia (or dozen other websites for math); it is quite interesting :)</p>
<h2 id="proof">Proof</h2>
<p>There are many ways to prove this theorem. Most of them are directly related to
the algorithms we are going to present below to compute the solution. I picked
the proof that I found more immediate to understand; it will be employed
in the Gauss algorithm.</p>
<p>Let’s define a slightly simpler problem, where we have only two equations.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = a1 (mod n1)
x = a2 (mod n2)
</code></pre></div></div>
<p>As above, let’s define <code class="highlighter-rouge">N = n1 * n2</code>.</p>
<p>Let’s define <code class="highlighter-rouge">p = n1^-1 (mod n2)</code> and <code class="highlighter-rouge">q = n2^-1 (mod n1)</code>. This is the
operation called <strong>modular inverse</strong>, where we find the inverse of a number in
the group of numbers mod N. If I say that p and n1 are inverse in mod n2, this
means that <code class="highlighter-rouge">p * n1 = 1 (mod n2)</code>. Such inverse will only exist when n1 and
n2 are coprimes, and here they are by definition.</p>
<p>Now I claim that a solution y to the set of equations can be expressed as:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y = a1 * q * n2 + a2 * p * n1 (mod N)
</code></pre></div></div>
<p>this is a valid solution because <code class="highlighter-rouge">y = a1 * q * n2 = a1 (mod n1)</code> and
<code class="highlighter-rouge">y = a2 * p * n1 = a2 (mod n2)</code>. This follows from the definition of the
modular inverse telling me that <code class="highlighter-rouge">p * n1 = 1 (mod n2)</code> and
<code class="highlighter-rouge">q * n2 = 1 (mod n1)</code>.</p>
<p>This is easily extendible to a generic number of equations, where the final
construction of y becomes:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y = sum(ai * (N / ni) * invmod(N / ni, ni)
</code></pre></div></div>
<p>When building p and q before, we used only n1 or only n2; what that generalizes
to is the product of all moduli excluding the “current one”: the <code class="highlighter-rouge">N / ni</code>
above. The rest is unchanged.</p>
<p>So we have a solution: the next step is to prove it is the unique solution. Let’s
assume a second solution z exists for the same set of equations. Then <code class="highlighter-rouge">z = a1 (mod n1)</code>,
which implies that <code class="highlighter-rouge">z - y</code> is a multiple of n1, since the remainder of their division
by n1 is the same number. By the same reasoning, <code class="highlighter-rouge">z - y</code> is also a multiple of n2.
But since n1 and n2 are coprimes, then it would also be a multiple of N, or as it
is often written:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>z = y (mod N)
</code></pre></div></div>
<p>z must be the same as y in the mod N group.</p>
<h2 id="algorithm-1-gauss-algorithm">Algorithm 1: Gauss algorithm</h2>
<p>This is quite easy: it is a direct translation to code of the construction
explained above. The n and a parameters are lists with all the related factors
in order, and N is the product of the moduli.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">ChineseRemainderGauss</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">a</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">n</span><span class="p">)):</span>
<span class="n">ai</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">ni</span> <span class="o">=</span> <span class="n">n</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">bi</span> <span class="o">=</span> <span class="n">N</span> <span class="o">//</span> <span class="n">ni</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">ai</span> <span class="o">*</span> <span class="n">bi</span> <span class="o">*</span> <span class="n">invmod</span><span class="p">(</span><span class="n">bi</span><span class="p">,</span> <span class="n">ni</span><span class="p">)</span>
<span class="k">return</span> <span class="n">result</span> <span class="o">%</span> <span class="n">N</span>
</code></pre></div></div>
<p>The good thing about this algorithm is that the result is guaranteed to be
positive, given bi and ni both positive. This does not apply to the next
implementation.</p>
<p>For an implementation of <code class="highlighter-rouge">invmod</code> (finding the modular inverse), see next
section.</p>
<h2 id="algorithm-2-euclid">Algorithm 2: Euclid</h2>
<p>This is the <em>direct construction</em> procedure described by <a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem#Existence_.28direct_construction.29">Wikipedia</a>.</p>
<p>The extended Euclidean algorithm is used to find two coefficients a and b such
that <code class="highlighter-rouge">a * (N / ni) + b * ni = gcd(N / ni, ni) = 1</code>.</p>
<p>Then x is computed the following way:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = sum(ai * b * (N / ni)) for i=1...k
</code></pre></div></div>
<p>Translated into code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">ChineseRemainderEuclid</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">a</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">n</span><span class="p">)):</span>
<span class="n">ai</span> <span class="o">=</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">ni</span> <span class="o">=</span> <span class="n">n</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">_</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">si</span> <span class="o">=</span> <span class="n">ExtendedEuclid</span><span class="p">(</span><span class="n">ni</span><span class="p">,</span> <span class="n">N</span> <span class="o">//</span> <span class="n">ni</span><span class="p">)</span>
<span class="n">result</span> <span class="o">+=</span> <span class="n">ai</span> <span class="o">*</span> <span class="n">si</span> <span class="o">*</span> <span class="p">(</span><span class="n">N</span> <span class="o">//</span> <span class="n">ni</span><span class="p">)</span>
<span class="k">return</span> <span class="n">LeastPositiveEquivalent</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">N</span><span class="p">)</span>
</code></pre></div></div>
<p>As you can see for my specific application, I wanted only positive results; but
the si coefficients can be negative in a lot of cases, making the final sum
negative. What do I do in that case? What I did there is to multiply the result
by -1, then add N and take the remainder of the division by N, to wrap around
the modulus of the solution.</p>
<h3 id="extended-euclidean-algorithm">Extended Euclidean algorithm</h3>
<p>As explained above, the algorithm takes two numbers, x and y, and returns two
coefficients a and b such that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>a * x + b * y = gcd(a, b)
</code></pre></div></div>
<p>The implementation returns both the coefficients and the GCD itself.</p>
<p>Now if I take two positive integers x and y, I know I can express them as</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = qy + r
</code></pre></div></div>
<p>where q is the <em>quotient</em> of the division (i.e. <code class="highlighter-rouge">q = x // y</code> where // denotes
the integer division) and r is the <em>remainder</em> and is always strictly smaller
than y. If x is a multiple or y, of course r is going to be zero.</p>
<p>The GCD of two integers can be found by repeating this procedure until the
remainder is 0; more specifically:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = q1 * y + r1
q1 = q2 * r + r2
...
</code></pre></div></div>
<p>The final r before getting to 0 is the GCD. Let’s see this with an example:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcd(102, 38)
102 = 2*38 + 26
38 = 1*26 + 12
26 = 2*12 + 2
12 = 6*2 + 0
</code></pre></div></div>
<p>so the GCD is 2. Now to find the coefficients we work backwards from the
second-to-last division, expressing the new remainder in terms of the other
parts:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2 = 26 - 2*12
2 = 26 - 2*(38 - 1*26) = 26 - 2*(38 - 1*(102 - 2*38))
2 = 3*102 - 8*38
</code></pre></div></div>
<p>3 and -8 are the coefficients in the Bezout identity. To compute them in
practice we do not work backward, but simply store them as we go, as they
can be derived from the main division equation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">ExtendedEuclid</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span>
<span class="k">while</span> <span class="n">y</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">q</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">y</span><span class="p">),</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span> <span class="o">%</span> <span class="n">y</span>
<span class="n">x0</span><span class="p">,</span> <span class="n">x1</span> <span class="o">=</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x0</span> <span class="o">-</span> <span class="n">q</span> <span class="o">*</span> <span class="n">x1</span>
<span class="n">y0</span><span class="p">,</span> <span class="n">y1</span> <span class="o">=</span> <span class="n">y1</span><span class="p">,</span> <span class="n">y0</span> <span class="o">-</span> <span class="n">q</span> <span class="o">*</span> <span class="n">y1</span>
<span class="k">return</span> <span class="n">a</span><span class="p">,</span> <span class="n">x0</span><span class="p">,</span> <span class="n">y0</span> <span class="c"># gcd and the two coefficients</span>
</code></pre></div></div>
<h3 id="modular-inverse">Modular inverse</h3>
<p>Ok so let’s suppose I want to find</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>17*x = 1 (mod 43)
</code></pre></div></div>
<p>my unknown x is the <strong>modular inverse</strong> of 17 in mod 43. First we need to
verify that gcd(17, 43) is 1, otherwise the inverse does not exist. Once we
have done that, we compute the two Bezout coefficients as shown above. If we
work that out manually by retracing all the divisions, we get:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Forward:
43 = 17*2 + 9
17 = 9*1 + 8
9 = 8*1 + 1 # <-- my GCD is here, so it is 1
Backward:
1 = 9 - 8*1
8 = 17 - 9*1
9 = 43 - 17*2
Replacing the first "backward" equation with everything else:
1 = 43 - 17*2 - 17 + 43 - 17*2 = 2*43 - 17*5
</code></pre></div></div>
<p>So we have expressed this in terms of <code class="highlighter-rouge">a*x + b*y = gcd(x, y)</code>: 2 and -5
are our Bezout coefficients. Let’s rewrite the last equation in mod 43.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2*43 - 5*17 = 1 (mod 43)
</code></pre></div></div>
<p>But 2*43 is a multiple of 43 so it is irrelevant here; this leaves us with:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-5*17 = 1 (mod 43)
</code></pre></div></div>
<p>By comparing this with the starting equation in the unknown x, -5 is the inverse
we were looking for. However we need a positive result: in this case we can do
the same procedure we have done earlier, by changing -5 to 5, then adding 43 and
computing the mod 43 to remain in the group. This yields 38.</p>
<p>The code is quite simple:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">invmod</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">m</span><span class="p">):</span>
<span class="n">g</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">ExtendedEuclid</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">m</span><span class="p">)</span>
<span class="k">if</span> <span class="n">g</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">'modular inverse does not exist'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">x</span> <span class="o">%</span> <span class="n">m</span>
</code></pre></div></div>
<h2 id="references">References</h2>
<p>Several websites contributed to me getting to the bottom of this interesting
theorem and provided the numerical examples I use above:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem">Wikipedia page</a></li>
<li><a href="https://crypto.stanford.edu/pbc/notes/numbertheory/crt.html">Stanford university explanation</a></li>
<li><a href="https://www.di-mgt.com.au/crt.html">More details on the Gauss algorithm</a></li>
<li><a href="https://www.youtube.com/watch?v=fz1vxq5ts5I">Tutorial video on the modular inverse computation</a></li>
</ul>
Sun, 22 Oct 2017 15:30:00 +0000
http://shainer.github.io/crypto/math/2017/10/22/chinese-remainder-theorem.html
http://shainer.github.io/crypto/math/2017/10/22/chinese-remainder-theorem.htmlcryptomathIntroduction to ring signatures<p>Well finally I got around to reading <a href="https://cryptoservices.github.io/cryptography/2017/07/21/Sigs.html">this article about confidential transactions</a>. Recommended if you are interested in how some
security guarantees in cryptocurrency transactions can be enforced. However, thanks to that article I also learned something
about <a href="https://en.wikipedia.org/wiki/Ring_signature">ring signatures</a>, so in this post I will talk about that.</p>
<p>Ring signatures are a special type of <strong>group signature</strong>. In group signatures you have (you guessed it!) a group of signers;
each of them owns a keypair. Messages are signed by the group in a way that the receiver can verify that the signature
was generated by the group, but cannot uncover which signer specifically made the signature (a property named <em>signer ambiguity</em>).</p>
<p>However group signatures have a “weakness”: the group needs centralized management. There must be a manager that allows new
members in the group, and can revoke their anonymity if they want to; this is because they must know the private keys of each member
and therefore they can also reveal it publicly without “endangering” themselves directly. <strong>Ring signatures</strong>, presented for the first time
in 2001, remove this need. Any user can build a set that includes themselves, and then produce a signature
using its own private key and the public keys of the other members. Signer ambiguity is respected, therefore the verifier
cannot tell which of the set members was the signature’s author. However notice that since the user only needs the public key
of another member, he/she can use them in a ring without the permission (or even the knowledge) of the key owners.</p>
<p>The original paper is aptly named “<a href="https://link.springer.com/content/pdf/10.1007%2F3-540-45682-1_32.pdf">How to leak a secret</a>”,
and indeed it describes a scenario where somebody is leaking a secret, wants the recipient to be sure the origin of the
secret can be trusted (or better, that it belongs to a trustworthy group), but does not want to reveal its identity to the
recipient completely.</p>
<p>Note that after the original publication other schemes of ring signatures were developed; I am going to focus on the first one.
Indeed the original article I linked uses a different one.</p>
<p>The second desirable property of this ring signature scheme is that adding a new ring member is efficient: both the generation
and verification procedures only need to compute one extra modular multiplication and one symmetric encryption. Pretty good :)</p>
<h2 id="details">Details</h2>
<h3 id="the-setup">The setup</h3>
<p>Each signer in the ring is associated with a public key <code class="highlighter-rouge">Pi</code> and the generation scheme is known. All of them use the same scheme
(let’s assume RSA for simplicity). Note that the moduli N of each key could have different bit sizes; to work around this problem,
the paper finds a bit size b larger than any of the moduli’s size, then changes the encryption functions so that their outputs are unchanged
for most inputs, and set equal to the input for a negligible number of inputs. This guarnatees that if the original encryption function
was infeasible to invert (as in RSA) given a random output, the new one will be too.</p>
<p>The message to sign is named <code class="highlighter-rouge">m</code>, and we have <code class="highlighter-rouge">r</code> members in the ring. The generation function of each signer is named <code class="highlighter-rouge">gi</code>; This
is the function that computes the public key given the private key (and internal parameters such as the N in RSA). In RSA, this is the modular exponentiation.</p>
<p>We also have:</p>
<ul>
<li>a symmetric encryption function <code class="highlighter-rouge">E</code> such that for any key k, <code class="highlighter-rouge">E</code> with k is a permutation over strings of b bits.</li>
<li>a public collision-resistant hash function <code class="highlighter-rouge">H</code> that maps inputs to strings of the length of k, used then as keys for E.</li>
</ul>
<p>Finally we need a family of keyed <em>combining functions</em> C(k, v). These take as input k, a random initialization vector v and r
arbitrary values Y, each composed of b bits. Each function will use <code class="highlighter-rouge">E(k)</code> to produce outputs of b bits, such that for any
(k, v) pair we have that the function has the following properties:</p>
<ul>
<li>it is a permutation over all the Y values;</li>
<li>when fixing all the Y values but one, and with the output z known, there is exactly one solution for the remaining value and it is easy to compute;</li>
<li>given k, v and z it is infeasible for an attacker to solve the equation <code class="highlighter-rouge">C(k, v, g1(x1)...gr(xr)) = z</code> (given access to each g function),
provided it is also infeasible for them to invert the g functions themselves.</li>
</ul>
<h3 id="signature-generation">Signature generation</h3>
<p><strong>Step 1</strong>: compute <code class="highlighter-rouge">k = H(m)</code></p>
<p><strong>Step 2</strong>: select a <em>glue</em> value k, a bit string of length b, at random.</p>
<p><strong>Step 3</strong>: select random <code class="highlighter-rouge">xi</code> values, one for each ring member beside yourself, again bit strings of length b; then compute <code class="highlighter-rouge">yi = gi(xi)</code>.
This means using the random <code class="highlighter-rouge">xi</code> as replacements for the private keys you don’t know.</p>
<p><strong>Step 4</strong>: to find your own y value, solve the combining function for v, i.e. find the remaining value such that <code class="highlighter-rouge">C(k, v, Y) = v</code>. By definition there must be exactly one solution and it should be easy to find. Then compute the x value from y by inverting the function (i.e. by computing the private key of a RSA pair given the public key).</p>
<p>The signature is the set of public keys P, the glue v and the set of X values you computed, including your own.</p>
<h3 id="signature-verification">Signature verification</h3>
<p>The verification is quite trivial and is also based on the solutions of the ring equation:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>yi = gi(xi) for each xi
k = h(m)
</code></pre></div></div>
<p>then you verify that the equation with C (called <em>ring equation</em>) is satisfied with the given parameters.</p>
<h3 id="security">Security</h3>
<p>The anonymity is guaranteed by the fact that the ring equation, when k and v are fixed, has <code class="highlighter-rouge">(2^b)^(r-1)</code> solutions, and each of them
can be chosen with equal probability by the signing procedure, since it is based on random numbers.</p>
<p>The paper proves a theorem that any forging algorithm A, i.e. any algorithm that is able to forge a valid signature for a message m after
observing a non-huge quantity of signatures for different messages, can be turned into an algorithm B that is able to invert a function gi
for any random y. Since we assumed such a task must be computationally hard for g to be inverted (and it certainly is for common keypair
algorithms such as RSA), A should not exist.</p>
<p>I am thinking it would be cool to attempt this in practice. Pick a bad function g and make the ring members use it. Then build A;
in the proof, A is simply an oracle that can produce valid signatures for messages; we do not care how this is achieved internally, so for
our purposes it can just run the signature procedure. Then we build B from that following the explanation of the paper, and we verify
that it actually inverts g. This looks like a new crypto challenge in the making and I am already excited! The procedure described for building B was not 100% clear to me upon reading the paper, so it will require more careful study. I will let you know if I succeed in this adventure;
I could even propose this as a new challenge for the Matasano team to add to their website.</p>
<h3 id="conclusion">Conclusion</h3>
<p>One interesting improvement of the AOS ring signature scheme (the one used in the original article) is that it works with signers that use
different keypair generation functions. It uses something called Schnorr signature as the basis, and then “chains” multiple signatures together
into a ring to produce the final result. The idea is still to chain all the signatures together in a way that makes them depend on each other
and allows the verifier to repeat the same procedure to verify we end in a loop.</p>
Sun, 15 Oct 2017 22:00:00 +0000
http://shainer.github.io/crypto/2017/10/15/ring-signatures.html
http://shainer.github.io/crypto/2017/10/15/ring-signatures.htmlcryptoRSA padding oracle attack<p>My long series of posts on the Matasano crypto challenges and cryptography in general cannot be called complete without a dissertation
on challenges <a href="http://cryptopals.com/sets/6/challenges/47">47</a> and <a href="http://cryptopals.com/sets/6/challenges/48">48</a>, dedicated to the PKCS1.5 padding oracle and how it is exploited to break RSA and recover a plaintext. I was fascinated by this attack and read the whole paper before coding the implementation, so this post will include a bit more details on why the attack works.</p>
<h1 id="the-setup">The setup</h1>
<p>Alice takes her secret message and applies the PKCS1.5 encoding, getting a byte string of length equal to the number of bytes in the modulus of the RSA pair. She then encrypts it with Bob’s public key and sends it over the network. You, as the attacker, have access to the resulting ciphertext, and an oracle function on Bob’s server: when invoked, the server will decrypt the message and return true if the first two bytes in the plaintext are equal to ‘\x00\x02’. This is a necessary but not sufficient condition for the plaintext to be a PKCS1.5-encoded message; however for our purposes we can pretend that the oracle returning true means this ciphertext decrypts to a message <em>conformant</em> to PKCS1.5.</p>
<p>For a full description of the PKCS1.5 format, refer to the source paper or the implementation directly. The latter is a bit lazy, filling the padding portion of the message with the same byte repeated as many times as needed, rather than pseudorandom bytes.</p>
<h1 id="the-attack">The attack</h1>
<p>Well now you want to use the oracle output to recover the full plaintext. The idea is that the RSA ciphertexts are just numbers; by intelligently searching through the space of numbers, you will find another ciphertext that decrypts to the same plaintext. Once the algorithm completes you will be certain to have found such a number even without any verification.</p>
<p>In cryptography, this is called an <strong>adaptive chosen-ciphertext attack</strong>. Adaptive here means we choose the following ciphertext based on information derived from the previous one.</p>
<p>Let’s go into more details. We want to decrypt a ciphertext c, i.e. find <code class="highlighter-rouge">m = c^d mod n</code>. Due to a well-known property of RSA, if I decrypt
<code class="highlighter-rouge">cs^e</code> instead of c (for some arbitrary s), the plaintext will be equal to <code class="highlighter-rouge">ms</code>. So if I pick some random s, send <code class="highlighter-rouge">cs^e</code> to the oracle, and the response is “true”, I know that <code class="highlighter-rouge">ms</code> is PKCS1.5 conformant, i.e. it starts with ‘\x00\x02’. Let’s set <code class="highlighter-rouge">B = 2^8(k-2)</code> where k is the byte length of the modulus of the RSA pair (i.e. of the parameter n). Then it must be true that <code class="highlighter-rouge">2B <= ms mod n <= 3B</code>.</p>
<p>This means that by choosing different s, we are able to derive a set of intervals that must contains the plaintext m we are looking for. Once we are down to one potential interval, it is possible to choose s such that the probability of <code class="highlighter-rouge">cs^e</code> decrypting to a conformant message is quite high. After sufficient iterations, we’ll end up in an iteration with one interval of length 1, and we will have found our ciphertext.</p>
<p>The first set of interval is derived by setting s to 1; we know that m is contained one interval, <code class="highlighter-rouge">[2B, 3B - 1]</code>, as explained above. This is however too big to be useful yet, so we need to proceed to the next step.</p>
<p>In the next step, we increase our choice to s until we find another cs^e that decrypts to a conformant plaintext. The search either starts from
the value of s we found at the previous iteration, or from <code class="highlighter-rouge">n / 3B</code> for the first one. This is because small values (beside 1) are
less likely to generate conformant plaintexts. If we have only one interval in our set, however, we are able to narrow down the search a bit,
since we know that the number we are looking for must lie there. So, if our m lies between two numbers a and b, there will be a number r such that:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2B <= ms - rn <= 3B - 1 (for some r)
2B + rn / s <= m <= 3B - 1 + rn / s
</code></pre></div></div>
<p>and this gives us the computation needed to find s in step 2c.</p>
<p>From the same formula above we can also explain step 3, i.e. how to update the list of intervals to use in the next iteration. We just need to
take all possible intervals created by every value of r for which the first equation remains true, and for all possible (a, b) intervals we
have in the current iterations. We also make sure not to add any interval that is contained in one we already have.</p>
<h1 id="remarks">Remarks</h1>
<p>The merits of this attack compared to others of its kind are simply that the number of oracle calls we have to do is on average smaller than
what other attacks need. The paper has some notes on why this is true and how to approximate the number of iterations required to find the
solution. I won’t go into that part as I found it quite technical and long to explain.</p>
<p>For the <strong>implementation</strong>, the two challenges are very similar; the only difference is that in the first one you can afford to be sloppy
and not implement certain parts (such as handling multiple intervals in one iteration, which never happens). I did the full implementation
at the beginning so I only had to change the parameters to make it work again.</p>
Sat, 14 Oct 2017 14:30:00 +0000
http://shainer.github.io/crypto/matasano/2017/10/14/rsa-padding-oracle-attack.html
http://shainer.github.io/crypto/matasano/2017/10/14/rsa-padding-oracle-attack.htmlcryptomatasanoUsenix Security 2017: global DNS manipulation<p>Usenix Security 2017 happened recently, and Usenix has then published all the videos and proceedings
(see <a href="https://www.usenix.org/conference/usenixsecurity17">their main page</a>). Since I didn’t attend, I looked
through the videos to watch those I found more interesting. <a href="https://www.youtube.com/watch?v=W_rBPdaTojQ">One in particular</a> struck my attention.
The presenter talks about a study performed by several academic bodies to measure DNS manipulation across the world in a reliable
and repeatable fashion.</p>
<p>What is DNS manipulation? DNS manipulation means changing the responses to DNS resolution queries to prevent
the requestor from accessing the actual domain they were looking for. Such “bad” responses will either be errors
or different IP addresses that redirect the user somewhere else.</p>
<p>Of course, the success of this depends on the DNS server(s) that perform the manipulation being somewhat popular
among the audience we intend to target. If people can easily use a truthful server instead, they are eventually
going to do so. But I digress; let’s assume that problem is more or less solved (and in practice this happens
all the time, at least to non-power users).</p>
<p>The study asked the questions of how common DNS manipulation is in the current world and what forms does it take.
Surprisingly while we all agree that Internet censorship is a thing, and can probably name a few notorious examples that
make headlines, comprehensive data on the subject are hard to come by.</p>
<p>This introduces the next question: how do we collect such data? Half of the video presents their pipeline dedicated to collect
lots of data about DNS manipulation across the world. The next presents some results, divided by country, category
of domains (pornography, gambling, human rights, multimedia sharing were among those graphed on the slides) and
actual top-level domain (are Google-owned domains more often censored than, say, Wikipedia or Facebook?).</p>
<p>The first lesson taken from this experiment is that it is incredibly easy to introduce bias in the results by
simply not checking a diverse enough set of domains. We can only check whether a domain is manipulated by issuing
DNS queries and look at the result. We cannot infer whether other domains are also manipulated in the same context.
Therefore if we don’t start with a varied input set, our picture is going to be incomplete.</p>
<p>The second is that manipulation is incredibly common, and it manifests itself in a variety of ways. Discussions on this
go quickly into politics of specific countries, so I encourage people to look at the data presented and form their own
opinions on all this.</p>
Sun, 24 Sep 2017 14:30:00 +0000
http://shainer.github.io/security/dns/2017/09/24/global-dns-manipulation.html
http://shainer.github.io/security/dns/2017/09/24/global-dns-manipulation.htmlsecuritydnsFinding solutions to weird equations<p>A while back, on social media, I found a link to <a href="https://www.quora.com/How-do-you-find-the-positive-integer-solutions-to-frac-x-y%2Bz-%2B-frac-y-z%2Bx-%2B-frac-z-x%2By-4/answer/Alon-Amit?share=1">an interesting post on solving a particular mathematical equation</a>. Now I know this description might not sound very enticing; people solve equations every day! But, some equations carry more meaning than what appears at first glance.</p>
<p>The equation has a pretty symmetric structure and a simple definition:</p>
<p><img src="http://shainer.github.io/images/equation.png" alt="The equation" /></p>
<p>And we want to find the <strong>positive</strong> solution or solutions to this equation. How? As will be clear, brute forcing won’t
be enough, so we have to study the properties of this equation and come up with ingenious methods.</p>
<p>The post then explains how we are dealing with a 3-degree equation that has at least one rational (but not positive) solution.
This means the equation describes an <strong>elliptic curve</strong>: by mean of complex (and quite boring) transformations, we can
express it in the usual elliptic curve form, which is</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>y^2 = x^3 + ax + b
</code></pre></div></div>
<p>with a and b rational coefficients of the curve. At this point the post author provides one solution for the equation,
the point (-100, 260). By simple math, the corresponding values of the initial unknowns are computed as 4, -1 and 11. So this
is not good, because it’s not a positive solution. Now, we can make use of some properties of elliptic curves to find new
solutions to test for positivity.</p>
<p>In particular, elliptic curves are closed under addition: adding two points P and Q on the curve yields another point on
the curve. Such point is found by drawing a line connecting the two points, looking for a third point where the line intersects
the curve, and taking the point that is symmetric to this on the x axis (which will also belong to the curve).</p>
<p>Now, if we add our initial point (-100, 260) to itself, finding 2P, 3P, etc… and every time we test the new point to see if
the unknowns are positive. We have to arrive to 9P to finally find the following solution to the equation:</p>
<p>a=154476802108746166441951315019919837485664325669565431700026634898253202035277999,
b=36875131794129999827197811565225474825492979968971970996283137471637224634055579,
c=4373612677928697257861252602371390152816537558161613618621437993378423467772036</p>
<p>well those are huge numbers, so you can see why employing brute force wouldn’t have helped here.</p>
<p>But why did we pick a point and started adding it to itself? Well it turns out this point is the one and only <em>generator</em>
of the curve, so by adding it to itself enough time you are able to find any other non-infinity point on the curve. So if the positive solution
existed, we were bound to find it with this method.</p>
<p>The final interesting thing is that the number of digits in the first positive solution we can find with the generation
method depends on the coefficient on the right side of the equation (here, 4). The bigger this coefficient, the bigger the number
of digits is going to be. How bigger? Can we find a correlation between the two quantities?</p>
<p>Unfortunately, we cannot. This is because there is no algorithm for finding an integer solution to a Diophantine equation, so
naturally we cannot make statements about the property of such solution. We cannot even know if one will exist in advance.
Thanks to this post, I am now aware of the <a href="https://en.wikipedia.org/wiki/Hilbert%27s_tenth_problem">story behind this</a>, which
I might explore in more details in a future post.</p>
<h2 id="extra-notes">Extra notes</h2>
<p>I am temporarily without my main laptop, using a Chromebook (long story). So I don’t have access to my usual writing environment and I wrote this post entirely on Github. I have to say that the general flow is quite good: it’s easy to upload a bunch of new files to some location and create a commit, and the Markdown editor is decent with live preview. I still prefer Atom for that though :)</p>
<p>Also, embedding LateX in Markdown, at least for Github Pages, is still a messy affair. Perhaps one day I’ll find the patience to figure out which method actually works and does not require complex HTML code. That day is not today, sorry, you get the equation as a PNG image cut from a screenshot :D</p>
Tue, 12 Sep 2017 00:30:00 +0000
http://shainer.github.io/math/2017/09/12/finding-solutions-to-weird-equations.html
http://shainer.github.io/math/2017/09/12/finding-solutions-to-weird-equations.htmlmathForging RSA signatures<p>As I vaguely promised some weeks ago, going back to solve the crypto challenges I had missed in the sets before the eigth one, I found yet another interesting problem to talk about: exploiting several weaknesses to <a href="http://cryptopals.com/sets/6/challenges/42">forge a RSA signature</a>.</p>
<h2 id="preconditions">Preconditions</h2>
<p>The following conditions are necessary for this attack to work:</p>
<ol>
<li>RSA is used to generate and verify digital signatures;</li>
<li>The keypair generation is lazy, and sets the public exponent e to 3;</li>
<li>The signature verification is also a bit lazy; I’ll describe how later.</li>
</ol>
<p>#1 is nothing weird: digital signatures need some form of asymmetric encryption and RSA is the most popular choice.</p>
<p>#2 can happen in practice due to how keypair generation works: the two internal parameters for RSA, q and p, need to be primes, large enough to make the factorization of N = pq hard, and such that (p - 1) and (q - 1) are either coprimes or don’t have a lot of common factors after 2. A common way to achieve this is to set e, the “exponent” part of the final public key, to some small prime like 3, and then derive p, q and d (the exponent of the private key) from that. Having a small value for e also makes encryption and decryption quite easy, since we work with “small” numbers.</p>
<p>The full standard to compute signatures with RSA is described in <a href="https://tools.ietf.org/html/rfc2313">RFC 2313</a>. In short, the message is first hashed (most common algorithms are supported), then the following bytes block is generated:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00 01 FF .. FF 00 ASN.1 HASH
</code></pre></div></div>
<p>where ASN.1 is a byte string identifying the hash algorithm used (see <a href="https://www.ietf.org/rfc/rfc3447.txt">RFC 3447</a>), and there’s as many FF bytes as needed to make the total size equal to the size in
bytes of N, the modulus of the RSA keys. Note that N is part of both the public and the private RSA key. The block is then converted to the corresponding integer and encrypted with the private key; optionally, the result is converted again to an hex string or similar representation.</p>
<p>Now we can see where mistake #3 can come from: if I verify a RSA signature using, for example, regular expressions, it’s easy to check that there’s one or more FF bytes in the padding zone, but not checking that there’s exactly the number I expect. Furthermore, I might not check that there’s nothing else in the signature after the hash. Note that this bites me even if I separately check that the
total signature length is as expected.</p>
<p>If all these conditions are there, the attacker is able, without any knowledge of the private key, to forge a RSA signature for pretty much any message, and have it accepted by the verifier.</p>
<h2 id="how-the-forgery-works">How the forgery works</h2>
<p>If I am verifying a signature, I am decrypting with the public key, which means this operation is performed:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D = EncryptedSignature ** e mod N
</code></pre></div></div>
<p>if e is equal to 3, it’s quite possible that <code class="highlighter-rouge">EncryptedSignature ** 3</code> ends up being smaller than N, therefore the modulo operation does not change the result. So, if we forge a block that satisfies only the conditions we know the system checks for, and also corresponds to a perfect cube, we can pass the cube root as a signature to such a verification system, and it will be accepted as valid. How does that happen?</p>
<p>Let’s take a block with this format:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00 01 FF 00 ASN.1 HASH GARBAGE
</code></pre></div></div>
<p>Now let’s take the sub-block composed of the last 00 byte, the ASN.1 code, and the hash. If SHA-256 is used for the hash, the total size is 52 bytes. We then convert this block into an integer, which we call D.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Block = '00 ASN.1 HASH'
D = int(Block)
N = 2^len(Block) - D
</code></pre></div></div>
<p>Now let’s say that our RSA key has length 2048 bits; with the format above, there are going to be (2048 - 52 - 3) bits left on the right for garbage. Let’s call this number X. The numeric block is going to be:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2 ^ (2048 - 15) - 2 ^ (X + len(Hash)) + D * 2^X + garbage
2 ^ (2048 - 15) - N * (2^X) + garbage
</code></pre></div></div>
<p><strong>Edit</strong>: the second equation follows from the first quite easily if you remember that according to the definition above, <code class="highlighter-rouge">N = 2^len(Block) - D</code>, and therefore, <code class="highlighter-rouge">D = 2^len(Block) - N</code>.</p>
<p><strong>Disclaimer</strong>: this is where I get lost, sadly. I don’t understand what the 15 bits subtracted from the key size represent. If somebody reads this and wants to send me an email for a more complete explanation, they are welcome to do it!</p>
<p>From empirical evaluation, it seems that for the garbage number you should prefer higher values: the lowest cube roots might end up being encoded in something that does not quite contain the full hash at the end, possibly because we run out of bits to convert before that. This is why my code takes a shortcut and just sets it to the highest number possible given the allowed number of bits.</p>
<p>The final step is computing the cube root and converting the result to an integer (in my case, by rounding down). Here I ran into a limitation of Python: the suggested way to compute a cube root is to elevate the number to the power of (1.0 / 3.0), but this requires the base to be converted to a float, and that does not work for very large integers such as this one. I could have looked at some mathematical library like numpy, but I am reluctant to add too many dependencies; eventually I found a code snippet on the Internet that does the job with the decimal builtin module.</p>
<p>Aaand we are done!</p>
Sun, 20 Aug 2017 17:30:00 +0000
http://shainer.github.io/crypto/2017/08/20/forging-rsa-signatures.html
http://shainer.github.io/crypto/2017/08/20/forging-rsa-signatures.htmlcryptoInteresting C++ features, part 2<p>I have developed in C++ a lot, both at work and outside. So I like to keep updated with the new features and utilities introduced by
new versions or available through common libraries. This post will describe a few new things I have discovered recently.</p>
<p>If you are wondering why this is part 2, I have decided <a href="https://shainer.github.io/c++/opensource/2016/11/13/cpp-errors.html">this post</a> can be
considered as “part 1” of this series. I plan to write more about C++ in the future: I’ll make all the posts numbered and under the
“c++” category.</p>
<h2 id="span">span</h2>
<p>This is not part of the STL, but rather of the Guidelines Support Library, which is any library implementing <a href="https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md">this set of guidelines</a> by the standards committee.</p>
<p>Let’s say that you want to pass an array to a function as a pointer. A basic example:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">sum</span><span class="p">(</span><span class="kt">int</span><span class="o">*</span> <span class="n">data</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// sum all elements in data.
</span><span class="p">}</span>
</code></pre></div></div>
<p>Now in order for this code to work, we assume n represents the array size, and that it’s
actually correct and does not make us access out-of-bounds memory.</p>
<p>A better way:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">sum</span><span class="p">(</span><span class="k">const</span> <span class="n">span</span><span class="o"><</span><span class="kt">int</span><span class="o">>&</span> <span class="n">data</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// sum all elements in data.
</span><span class="p">}</span>
</code></pre></div></div>
<p>a <code class="highlighter-rouge">span</code> of the array (which might or might not contain all the elements) is built on-the-fly to
represent a view of the array, and passed to any function. All the information about size are contained
internally.</p>
<p><code class="highlighter-rouge">array_view</code> (which has a more descriptive name) works the same way, but unlike <code class="highlighter-rouge">span</code> it’s a read-only
view of the original array. This is preferrable to ensure no function can write on what is a view of another
data structure, without having to enforce constness instead.</p>
<h2 id="custom-error-codes">Custom error codes</h2>
<p>This was introduced in C++11, but I think it was overshadowed by other more revolutionary features, since you seldom find
trace of it.</p>
<p>There is another blog which describes quite in details <a href="https://akrzemi1.wordpress.com/2017/07/12/your-own-error-code/">how to define your own error code space</a> and <a href="https://akrzemi1.wordpress.com/2017/08/12/your-own-error-condition/">how to write error conditions</a>. It’s no use repeating
the entire content of those posts here, so I’ll make a short summary.</p>
<p>The <code class="highlighter-rouge">std::error_code</code> is a generic interface (used in non-programming sense of the word) to express custom
error codes. Error codes are identified by a <em>number</em> (minus 0, which always means success) and a <em>domain</em> or <em>category</em>; the
latter specifies the types of errors we are dealing with, and is identified by a name.</p>
<p>At a high level, this machinery allows you to construct and use variables of type <code class="highlighter-rouge">std::error_code</code> from the enum values representing the
custom error codes you need. By subclassing <code class="highlighter-rouge">std::error_category</code> you define a custom category,
with some nice functionalities such as associating an error message to each code.</p>
<p>Moving forward, it’s also possible to express complex groupings and conditions on your set of errors by using <code class="highlighter-rouge">std::error_condition</code>.
Let’s say that all of your errors fall into the following sub-categories: <strong>internal errors</strong> (something happens inside your program) and
<strong>external errors</strong> (due to e.g. networking). You can extend the logic of your category with a function that tells you which sub-category
a given code belongs to.</p>
<p>Personal opinion: this is not trivial to use, and requires more boilerplate than usual for the initial set up of codes and categories.
However once that part is done (likely in some utility library) it is incredibly useful: good error handling is
something many applications and libraries miss (and let’s not talk about dealing with <code class="highlighter-rouge">errno</code>…). I am happy to pay the cost to
have sets of well-defined error spaces and codes to deal with.</p>
<h2 id="cppcon">CppCon</h2>
<p>CppCon 2017 is around the corner! I won’t be able to attend, but it’s on my todo list and I am currently spending time watching
talks from previous editions on YouTube. I definitely recommend to keep an eye for this year’s talks.</p>
Sun, 13 Aug 2017 10:00:00 +0000
http://shainer.github.io/c++/2017/08/13/more-cpp-features.html
http://shainer.github.io/c++/2017/08/13/more-cpp-features.htmlc++