My Book Picker (and Lister)

2018 Version

librarything
[Update 2019-11-11: Sources moved to GitHub; verbose flag added to picking script; HTML listing script includes stack weights and probabilities, and indicates whether book is owned on Kindle.]

This is an updated version of a "geekery" post from last year. I've made substantial changes to the script it describes since then. I'm leaving the former (and much simpler) version in place, but also wanted to show off my new version.

But one thing hasn't changed at all: it's another example of the mental aberration that causes me to write Perl scripts to solve life's little everyday irritants. In this case two little irritants:

  1. I noticed that I had a lot of books on my shelves, acquired long past, that I never got around to reading. Either because (a) they were dauntingly long and dense (I'm thinking about Infinite Jest by David Foster Wallace); or because (b) they just fell through the cracks. Both poor excuses, but there you are.

  2. I sometimes want to methodically read a series of books in a particular order.

In other words, I needed a way to bring diligence and organization to my previous chaotic and sloppy reading habits.

I think of what I came up with as the "To-Be-Read" (hereafter TBR) database. That's a slightly lofty title, but anyway:

The high-level view: all the TBR books are in zero or more stacks, each stack containing zero or more titles. Each stack is maintained in the order I want to read the books therein. (This goes back to the issue mentioned above: sometimes a series really "should" be read in publishing order, for example C.J. Box's novels featuring protagonist Joe Pickett.)

So picking a book to read involves (a) choosing an "eligible" stack; and (b) "popping" the top book from the chosen stack. Very computer science-y.

The interesting part is the "choosing an eligible stack" step. There are a number of possible ways to do it. But first, more details on "eligibility".

The major problem with the previous version of this script was that too often it would pick a book "too soon" after I'd read something off the same stack. (An issue mentioned in last year's post.) As it turns out, I wanted to let some time go by between picks from the same stack. (For example, at least 30 days between books by Heinlein. Too much of a good thing, too soon…)

So: in this version, each stack has an "age": the time that's elapsed since I previously picked a book from that stack. And a "minimum age", the amount of time that must elapse after a pick before that stack becomes eligible again.

Another minor difference: I don't actually own some of the books in some of the stacks yet. I want to read them someday. But I'm waiting, typically for the price to come down, either via the Barnes & Noble remainder table or the Amazon used market. I'm RetiredOnAFixedIncome, after all.

So an eligible stack is one that:

  • is non-empty;
  • the top book is owned;
  • the stack is older than its specified minimum age.
OK, so how do we choose among eligible stacks? Possibilities:
  1. Pick the "oldest" stack; the one for which it's been the longest time since a book from it was previously picked.
  2. Pick the highest stack, the one with the most titles therein. (Because it needs the most work, I guess.)
  3. Just pick a stack at random.
  4. Pick a random stack weighted by stack height. That is, any stack can be picked, but one with eight titles in it is twice as likely to be picked as one with four titles. (This was the algorithm used in the previous version.)
  5. Pick a random stack, weighted by age. That is, a stack that's 90 days old is twice as likely to be picked as a 45-day old one.
  6. But what I'm doing is a combination of the last two: the stack-weighting function is the stack height times the stack age. So (for example) a 120-day-old stack with 5 titles is twice as likely to be picked as a 50-day-old stack with 6 titles. Because 120 * 5 = 600 and 50 * 6 = 300. This is totally arbitrary, but it seems to work for me so far.

Here's my current take on scripting that.

Each stack is implemented as a comma-separated values (CSV) file, headerless, one line per book, each line containing two fields:

  1. The book title;
  2. Whether I own the book yet (1/0 = yes/no).
For example, here's the current content of moore.csv, containing the to-be-read books of Christopher Moore:

"The Serpent of Venice",1
"Secondhand Souls",1
Noir,0

I.e., three books, the first two owned, the third one, Noir, unpurchased as yet. (I'll get it someday, and edit the file to change the 0 to 1.)

[Added: in addition to 0/1, 'K' indicates that the book's Kindle version is owned. This is just a convenience in case I go looking for it long after actually buying it.]

There is a "master" CSV file, stacks.csv. It has a header (for some reason that I forget). Each non-header line contains data for a single stack:

  1. The (nice human-readable) stack name;
  2. The stack ID (corresponding to the name of the stack file);
  3. The minimum time, in days, that should elapse between consecutive picks from that stack;
  4. The date when a book was most recently picked from the stack.
As I type, here's what it looks like:

name,id,minage,lastpicked
"Chronicles of Amber",amber,42,2018-04-15
"C.J. Box",box,30,2018-06-16
"Michael Connelly",connelly,30,2018-06-22
"Continental Op",continental_op,30,2018-06-09
"Conservative Lit 101",conservative_lit_101,60,2017-09-07
"Elmore Leonard",elmore,30,2018-06-28
"Dick Francis",francis,30,2018-04-20
"General Fiction",genfic,30,2018-06-13
"Steve Hamilton",hamilton,30,2018-04-29
"Robert A. Heinlein",heinlein,30,2018-06-19
Monkeewrench,monkeewrench,30,2018-05-28
"Christopher Moore",moore,30,2018-04-23
Mystery,mystery,30,2018-01-04
Non-Fiction,nonfic,30,2018-07-01
"Lee Child",reacher,30,2017-12-29
"Science Fiction",sci-fi,30,2018-05-30
Spenser,spenser,30,2017-05-01
"Don Winslow",winslow,30,2018-03-02

No comments from the peanut gallery about my lack of literary taste, please.

Picking a random stack according to a weighting function isn't hard. I'd pseudocode the algorithm like this:

Given: N eligible stacks (indexed 0..N-1), with Wi being the calculated weight of the ith list (assumed integer) …

Let T be the total weight, W0 + W1 + ⋯ + WN-1

Pick a random number r between 0 and T-1.

p = 0
while (r >= Wp)
     r -= Wp
     p++

… and on loop exit p will index the list picked.

To anticipate CS pedants: I know this is O(N) and using a binary search instead could make it O(log N). In practice, it's plenty fast enough. And other steps in the process are O(N) anyway.

Enough foreplay! The "picking" script, bookpicker, is here. Notes:

  • Specifying the -v "verbose" flag will output a list of each stack's pick-probabilities.

  • The Text::CSV Perl module is used for reading/writing CSV files. The Time::Piece and Time::Seconds modules are invaluable for doing the simple age calculations and comparisons.

  • You just run the script with no arguments or options; output is the title and the name of the picked list.

  • The user is responsible for maintaining the CSV files; no blank/duplicate lines, etc. I use My Favorite Editor (vim), but CSVs are also editable with Your Favorite Spreadsheet.

  • For the "picked" stack, the script writes a smaller file with the picked title missing. The old stack is saved with a .old appended to the name. The stacks.csv file is also updated appropriately with today's date for the last-picked field for the picked stack.

  • The weighting function and random number generation are constrained to integer values; I think it would work without that, but who wants to worry about rounding errors? Not I.

I also have a couple scripts to list out the contents of the to-be-read database.

  1. A script that produces plain text output (on stdout) is here.

  2. A script that produces an HTML page and displays it in my browser (Google Chrome) is here. It uses text color to signify eligible/ineligible stacks and owned/unowned books. Sample output (again, comments on my literary taste, or lack thereof, are welcome) is here.

    The HTML::Template module is used to make output generation easier, and the template used for that is here

    Getting it to show up in my browser is accomplished via chromix-too server/client/extension; if you don't have it, it's pretty easy to do something else instead.

Whew! I feel better getting this off my chest..


Last Modified 2019-11-11 7:26 AM EDT

URLs du Jour

2018-07-03

[Amazon Link]
(paid link)

  • Proverbs 11:16 sounds like it could be the basis for a good blues song:

    16 A kindhearted woman gains honor,
        but ruthless men gain only wealth.

    This goes double if the kindhearted woman is named Ruth.

    But… only wealth? A lot of guys will take that deal.


  • At NRO, Robert Stein has A Modest Proposal for ‘Draining the Swamp’. Robert details, convincingly, how elected officials gallop to the watering trough once out of office.

    So here’s my suggestion: Once someone is elected to federal office — the House, Senate, or White House — they will get that office’s pay for life, guaranteed, plus inflation, no matter how soon they retire or how long they linger in office. However, all other income (except for withdrawals from previously accumulated retirement funds and Social Security) will be taxed at 100 percent.

    No speech fees, no lobbying, no consulting, no corporate boards, no book deals, no film deals, no university positions. No other jobs, either. Basically, no nothing. Unless, of course, you just want to work as a labor of love, in which case be my guest.

    I probably wouldn't go for that in practice—Stein admits he might not either—because, hey, it's a free country. But I'll admit that the "high eight figures" deal that Netflix gave the Obamas seems like little more than (as Robert puts it) a "postdated bribe".


  • At the Daily Signal, David Harsany espies The Next Phase of Our National Moral Panic:

    It looks as if the next phase of our ginned-up national moral panic will feature the public shunning and harassment of people we disagree with. And in a free country, even the pretend oppressed can kick imaginary Nazis out of their establishments, as we saw when the co-owner of The Red Hen in Lexington, Virginia, booted White House press secretary Sarah Huckabee Sanders from her restaurant.

    Certainly, politicos don’t deserve safe spaces from peaceful protest or confrontation. You want to make their lives miserable, humiliate them, and show everyone how principled and right-thinking you are? By all means, stop them from having those chimichangas. That’ll teach ’em.

    But don’t fool yourself into self-idealization. You’re no budding Martin Luther King. No matter what you think of President Donald Trump, you’re still an insufferable jerk. You’re just a member of a blindered tribalist mob, imbued with a false sense of certitude that allows you to justify incivility. That is to say, you’re like a Twitter troll made real.

    To repeat a Tyler Cowen quote from yesterday: "There is no better venue for politeness than commerce." Especially commerce conducted in public before witnesses.


  • Hey, it's almost the Fourth! Time to get prepared. Ira Stoll, at Reason, describes How the Declaration of Independence Explains Political News in 2018.

    The founders of the United States of America didn't just declare independence from Great Britain. They wrote a statement explaining their reasoning. Two-hundred-and-forty-two years later, we're navigating some of the same issues.

    President Trump's immigration crackdown? The Declaration of Independence complained that King George III "has endeavoured (sic) to prevent the population of these States; for that purpose obstructing the Laws for Naturalization of Foreigners; refusing to pass others to encourage their migrations hither."

    President Trump's tariff threats and the risks they may pose to international trade? The Declaration of Independence had faulted George III "for cutting off our Trade with all parts of the world."

    President Trump's encouraging Justice Anthony Kennedy to resign so Trump could reshape the Supreme Court? The Declaration criticized George III for having "made Judges dependent on his Will alone, for the tenure of their offices."

    OK, that last one was kind of a stretch, Ira.


  • Charles C. W. Cooke has our Tweet du Jour:

    I've quoted Jonah Goldberg on this before: we "like our Constitution like our beef jerky — cold, dead, tough to chew through."


  • And you'll want to take Mark J. Perry's Carpe Diem quiz on the Declaration of Independence. I got 11 out of 14, thanks to some semi-educated lucky guesses. See how you do.


Last Modified 2024-01-25 9:20 AM EDT