Charlie Harvey

Digits in page numbers and thoughts on optimization

Sometimes solving problems with dumb code involves a simpler expression of a problem than solving the same problem with highfallutin' maths. Sometimes the simpler expression involves more time or space complexity. That might be fine, depending on the inputs we need to process, indeed spending a long time coming up with a better implementation to deal with inouts you will never encounter is wasted effort. But sometimes the extra effort is critical and makes the difference between solving and not solving a problem.

The other morning I was working on problem 19 in Levitin and Levitin’s Algorithmic Puzzles. The problem is called Page Numbering and it goes like this:

Pages of a book are numbered sequentially starting with 1. If the total number of decimal digits used is equal to 1578, how many pages are there in the book?

Well, I could just make a string of the digits of all the positive integers and look at the first 1578 of them.

In GHCi I can construct a list of all the positive integers, smush their digits into a string and grab the first 1578 of them. Intuitively it seems like there should be 3 or maybe 4 digits in the answer and indeed the last 3 digits are the answer.

Prelude> let posInts = [1..] Prelude> let allDigits = concatMap show posInts Prelude> take 1578 allDigits "123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562"

So, problem solved? In a sense, yes. This particular instance of the problem is solved but not especially elegantly. If I was in a hurry to get the answer, this would do. But we can do better.

A more general solution

I suppose that means that I should now define better. For now, I will take it as meaning building a general solution that will tell me how many pages are represented by any number of digits. And make the answer human-readable rather than just a long string of digits that need deciphering.

Since we need any number of digits to be supported I will use Integers as my datatype. That means I need a couple of functions from Data.List. posInts and allDigits are as before.

module PageNumbers where import Data.List (genericLength, genericReplicate) posInts :: [Integer] posInts = [1..] allDigits :: [String] allDigits = map show posInts

Next I can code the caller for my recursive code. I will call a recursive function step with the target number of digits, a list of all the digits of the positive integers as a string (for ease of counting), the current number of digits seen and the "page number".

solution :: Integer -> Integer solution n = step n allDigits 0 1

Implementing step is the hard bit.

If our list of string representations of the positive integers is empty, then we are not going to be able to solve things.

step :: Integer -> [String] -> Integer -> Integer -> Integer step _ [] _ _ = error "Can't solve unless started with digits"

Assuming we got our list of strings, then:

  • If the our accumulated count of digits plus the length of the next digit string is exactly our target number of digits then we are done and we can return our page count
  • If the count is less than our target number of digits, call ourselves recursively, incrementing our counter.
  • Otherwise (if the count is larger) then we can’t solve for this number of digits, so bail with an error message.

step n (x:xs) y i | y' == n = i | y' < n = step n xs y' (i+1) | y' > n = error $ "No solution for " ++ show n ++ " digits" where y' = y + genericLength x

Now we have a good solution for about any book length one is likely to encounter. Note too that the code is short and that it doesn’t try and do anything fancier than some counting and comparing.

Here it is in action.

PageNumbers> solution 1578 562 PageNumbers> solution 15787 ** Exception: No solution for 15787 digits PageNumbers> solution 15789 4224 PageNumbers>

Using maths to make a performant solution

The solution given in the book uses maths rather than simple counting however. If we need to deal with really big books then this starts being useful. If we switch on stats in GHCi we can see performance falling off around 8 digits.

PageNumbers> :set +s PageNumbers> solution 12121220 1890332 (1.84 secs, 1669633384 bytes)

Here is how the book describes solving the problem. There is a function D(n) where n is the number of numbers it gives the number of digits.

  • For 1 <= n <= 9, D(n) is just n, meaning that for the last number in the interval, 9 it is 9
  • For 10 <= n <= 99, D(N) is 9 + 2(n-9) and for the last number (99) it is 189
  • For 100 <= n <= 999, D(n) is 189 + 3(n-99) for the last number (999) its 2889
  • And so on

From this we can deduce the equation 189 + 3(n-99) == x, where x is our number of digits. We can solve that thus:

  • 189 + 3(n-99) == 1578 {subtract 189}
  • 3(n-99) == 1389 {div by 3}
  • n-99 == 463 {add 99}
  • 463+99 == 562

To translate this back in to code we will first of all need a few helper functions: digitLen tells us the length in digits of an Integer and to int which takes a list of Integers and coverts back into a single Integer

digitLen :: Integer -> Integer digitLen = genericLength . show toInt :: [Integer] -> Integer toInt = read . concatMap show

With those in place we can implement the function D (which we need to call d, this being Haskell).

d :: Integer -> Integer d n = if nLen == 1 then n else d max' + nLen * (n - max') where nLen = digitLen n max' = nines (nLen - 1) nines x = toInt $ genericReplicate x 9

If our length is 1, then just return n. If its longer, then take a string of 9s of length 1 less than the length of n and call it max' (to avoid a name clash with max from the Prelude). Add that to the length of n times n - max'. This is nice because its cost is proportional to the length of n in digits.

With d in place, we can think about how to approach the rest of the problem. Here is what we do.

  1. Work out the top number in the interval that our target number (n) lies in — 9, 99, 999 and so on. We call this intervalTop.
  2. Take the number of digits in n and call that x
  3. Get previous intervalTop — for 99 it would be 9 — and call it y
  4. Let z be D(y)
  5. If there is a solution then (n-z) mod x + y will be divisible by 9
  6. Finally, the solution, if there is one will be (n-z) / x + y

This is significantly easier to write in Haskell than in English.

solveWithMaths :: Integer -> Integer solveWithMaths n = if ok then sol else error $ "No solution for " ++ show n ++ " digits" where int = intervalTop n x = digitLen int y = int `div` 10 z = d y sol = (n-z) `div` x + y ok = ((n-z) `mod` x + y) `mod` 9 == 0 intervalTop :: Integer -> Integer intervalTop n = top 9 where top ns = if n < d ns then ns else top (ns * 10 + 9)

Let’s see how it performs

PageNumbers> solveWithMaths 12121220 1890332 (0.00 secs, 1609016 bytes)

Great! From 1.84 seconds to less than a tenth of a second.

Just for shits and giggles, let’s try a stupidly big book. I dunno, maybe they printed an epub of Google or something like that.

PageNumbers> solveWithMaths 12121220371837217381273127382132137218371283712231232132132132312543536547568787687697687687644131530446053207833251443424658529551929918223906906916594013370146824168372902137621599987082 (0.12 secs, 180892608 bytes)

What did I learn?

The initial implementation was enough to get a quick answer. There is nothing wrong with writing some code like that to throw away. You wouldn’t want it in production. It needs deciphering and it solves only the specific problem.

The first proper implementation was significantly simpler, quicker to write, easier to explain and probably a bit easier to maintain than the second version. If you are confident you’re only going to have to deal with books of a few thousand pages, you can save a lot of time and effort using simpler code. As Don Knuth said, Premature optimization is the root of all evil.

Sometimes, however, we need things to be nicely optimized and to do as little work as possible. The mathsy implementation was only a little bit more complex and allowed us to deal with very long numbers. This is the approach to take if you are going to need to deal with huge books or perhaps with lots of requests.

There is no doubt that the mathsy version could be further optimized, but the point is that sometimes a simpler implementation is enough and sometimes not.

As it turns out there is no substitute for programming by thinking.


Comments

  • Be respectful. You may want to read the comment guidelines before posting.
  • You can use Markdown syntax to format your comments. You can only use level 5 and 6 headings.
  • You can add class="your language" to code blocks to help highlight.js highlight them correctly.

Privacy note: This form will forward your IP address, user agent and referrer to the Akismet, StopForumSpam and Botscout spam filtering services. I don’t log these details. Those services will. I do log everything you type into the form. Full privacy statement.