flwyd: (daemon tux hexley)
It's October, which means I'm now allowing myself to start scheming about Advent of Code 2024. The last four years I've taken it as an opportunity to learn a new language. This year for some reason I'm inspired to learn a stack-oriented language, so I've been playing with PostScript. The world has generally moved on from PostScript: it gave birth to the wildly-popular PDF format without quite as much "Anything goes in this Turing-complete document presentation language," and even the latest version of macOS Preview.app can't read PostScript files. But if you set aside the fact that a majority of the language builtins are designed for using graphics, text, fonts, and printer hardware, it's a pretty slick language. You've basically got two things available: a stack of data and operands and a stack of dictionaries for dynamic name lookup. Both the programmer and the computer use basically the same model for evaluating code: read a token; if it's data then push it on the stack, if it's executable then run it, reading from the operator stack and writing to the same stack.

Programming language tutorials, particularly for functional languages, have a strong affinity for calculating the n'th Fibonacci number, but let's generate the whole sequence by just calling the same "function" over and over again (== is "print the source representation of the object on top of the stack and GS> is the Ghostscript prompt):
GS> /fib { 2 copy add } bind def
GS> [ 1 1 fib fib fib fib ] ==
[1 1 2 3 5 8]
GS> [ 1 1 15 { fib } repeat ] ==
[1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597]

Stack-based languages can have pretty simple syntax rules. That [ ] array syntax isn't "parse elements between the brackets as an array." Instead, the [ operator pushes a mark onto the stack and the ] operator builds an array with all the items on the stack until the first mark. Everything between the brackets is just normal PostScript code being executed on the stack. The following code builds an array of increasingly nested empty arrays by copying the whole array so far:
GS> [ 5 { counttomark [ exch 1 add copy pop ] } repeat ] ==
[[] [[]] [[] [[]]] [[] [[]] [[] [[]]]] [[] [[]] [[] [[]]] [[] [[]] [[] [[]]]]]]
% Or if you prefer some non-empty values to help read
GS> [ 42 5 { counttomark [ exch 1 add copy pop ] } repeat ] ==
[42 [42] [42 [42]] [42 [42] [42 [42]]] [42 [42] [42 [42]] [42 [42] [42 [42]]]] [42 [42] [42 [42]] [42 [42] [42 [42]]] [42 [42] [42 [42]] [42 [42] [42 [42]]]]]]

I've heard that some functional programming devotees might think of the non-negative numbers as a sequence of nested empty lists…

In PostScript, procedures are just arrays with the executable bit set, and you can construct and manipulate them as arrays. {} is syntax for an executable array, [] is syntax for a regular array, and cvx converts its operand to executable:
GS> { (hello world\n) print } exec
hello world
GS> % same thing:
[(hello world\n) (print) cvx] cvx exec
hello world

Since a procedure is just a mutable array of things to do, you can replace things in that array. Here's a self-modifying procedure that increments the first value inside itself, similar to a static int in a function in C:
GS> /selfmod { 1 (caller #) print == currentdict /selfmod get dup 0 get 1 add 0 exch put } bind def
GS> selfmod
caller #1
GS> selfmod
caller #2
GS> selfmod
caller #3

Page 115 of Thinking in PostScript defines a procedure by name including some initialization code; it then redefines itself without the initialization code the first time it's run. (The fact that the book is PDF file instead of a PostScript one says something indicative about the status of PostScript in the modern world.)

While PostScript has a couple hundred operators that are part of the language, a lot of those are for doing things like drawing on the page and selecting fonts. The parts of the "standard library" for working with strings and arrays is pretty spartan: even concatstrings is a Ghostscript addition, and I think people copy/pasted their own concatenation function all over the place for years. Since my Advent of Code interests don't include "Let's re-implement parsing a variable-width text file into fixed-size mutable strings each night," I've been getting some practice with the language by writing some core library functions that I'm certain will be used frequently.

I was about to work on the functional programming standbys like map and reduce, but I realized that with arrays constructed by "run all the code between brackets, then move everything from the stack between those brackets into an array" that having a separate function for map would just be extra noise and the basic iteration operator does the trick. Here's the PostScript equivalent of myarray.map { x -> x + 1 } and mydict.mapValues { i -> x * x }.
% Add 1 to each element of an array:
[ myarray { 1 add } forall ]
% Create a dictionary with the same keys as mydict, but square the values:
<< mydict { dup mul } forall >>

reduce (or fold depending on your dialect) is similarly straightforward if you supply with the accumulator rather than using the first array value. Here's myarray.reduce(0) { x, acc -> x + acc } (sum) and an implementation of myarray.all { x -> x % 2 == 0 } (are all items are even?).
0 myarray { add } forall
true myarray { 2 mod 0 eq and } forall

None of these need special handling for empty collections. They also work to iterate through bytes of a string (though there doesn't seem to be a short way to construct a string from an array of bytes). The all implementation can become an any by switching true to false and and to or. all becomes none by adding not before and. Short-circuiting can be added with dup { exit } if after the and/or.

Thunderdome API

Friday, June 30th, 2023 07:26 pm
flwyd: Go gopher (go gopher mascot)
// Compares a to b, returns the larger number.
math.max(a, b) {
  return a > b ? a : b
}

// a and b fight a cage match, returns the worthier number.
mad.max(a, b) {
  return a ⚔️ b
}
flwyd: Go gopher (go gopher mascot)
I got involved in amateur radio in 2021 in part because it was a hobby I could do during pandemic stay-at-home periods that didn't involve staring at a computer screen. Getting a home station set up that's capable of transmitting on HF bands is kind of complicated, and involves making significant decisions like attaching masts to the roof and drilling holes through a wall. Fortunately, hams have come up with ways to encourage radio operation outdoors, with a temporary station setup. Parks on the Air awards fake Internet points to hams who set up in a state or national park and make contact with other amateur radio operators. I've found POTA to be a great motivation for getting out of the house and exploring some interesting places, while also improving radio skills and experimenting with emergency communication setup options.

After "activating" a park, the operator submits a log file of all their contacts to the Parks on the Air website so everyone can get credit. These log files use the ADIF format, commonly used for ham radio log interchange. ADIF uses a fairly simple format that predates general-purpose schemes like XML and JSON. Fields are encoded with a name, length, optional type, and value like <CALL:5:S>WT0RJ <NAME:6>Trevor <EOR>. These files are reasonably easy to read, but can be tricky to write by hand, since counting the number of characters in a string that's longer than about 7 is easy to get wrong. Hams typically use a logging program which can export an ADIF file for upload. But I don't really want to take a laptop on a camping trip, so I keep my Parks on the Air logs on paper in a notebook. Rather than transcribe that log directly into ADIF format I decided to enter them in Google Sheets, which makes it easy to fill in repeated fields like my park identifier, state, and the radio frequency where I called CQ. I then export the sheet to a CSV file.

There are a handful of CSV-to-ADIF converters that folks have published, but I wasn't particularly impressed by any of them. So of course I decided to make my own open source program. During the busy summer period that conversion was all it could do. Then after spending November organizing books and not spreading COVID and December staying up late writing Advent of Code, I was motivated to spend some time outside. Since Colorado got a lot of snow and cold air in January, "Go for a long hike" wasn't very attractive, but "brush the snow off the end of a picnic table and make ham radio contacts" was totally feasible. (Also "operate from the truck when the wind is too cold".) And since I wasn't camping, I had plenty of computer time to add more logging features in the evening, with park activations each weekend to provide insight on feature usefulness.

I decided to call the program ADIF Multitool, a "Swiss Army knife" that combines lots of single-use log processing tools into a convenient package. It's a command-line tool, written in the Go programming language, following the Unix pipeline philosophy. Several adifmt commands can be chained together; each prints a log file to standard output which is then consumed as standard input by the next command. My latest POTA CSV file had just the fields that varied for each contact; I then added all the repetitive details about my own station, fixed data formats, inferred some field values from others, ensured all fields matched the specification, and saved it as an ADI file with this pipeline:
adifmt edit --remove-blank --add qso_date=20230225 --add submode=USB
    --add my_pota_ref=K-1213 --add my_lat=39.644388 --add my_lon=-104.846254 \
    --add my_state=CO --add my_country="United States of America" \
    --add operator=WT0RJ mylog.csv \
  | adifmt fix \
  | adifmt infer --fields station_callsign,mode,band \
    --fields my_gridsquare,my_gridsquare_ext \
    --fields pota_ref,sota_ref,my_sig_info,dxcc,my_dxcc \
  | adifmt validate \
  | adifmt save --overwrite-existing ~/WT0RJ@K-1213-20230225.adi

Using Go for this tool has been a fairly positive experience. Go's I/O APIs provided useful levels of abstraction for handling the ADIF spec while also making it easy to provide my own fixed-string "filesystem" in test cases. Go strings are byte arrays, which is great for working with data interchange formats, while also providing robust Unicode support, which is great for working with user-provided text. The lightweight type system made it pretty easy to implement command line options like "a repeatable key=value flag that collects into a map." The field-tag approach to marshaling and serializing XML and JSON took some mental stretching to get used to (the simplistic examples in the package documentation contribute to this), but in practice they're quite nice and avoid the need to interact with a SAX or DOM parser. One of my main motivations for picking Go was the ability to generate code, which allowed me to convert the XML version of the ADIF specification into Go code for every data type, field, and enumeration value; when the next version of the spec is released I'll just need to run go generate again to add all of the additions and changes. Another expected benefit that I haven't tried yet: I can build releases for Windows and other operating systems I don't have access to, and distribute them without worrying that users might not have the latest version of Python or Java installed.

Advent of Elixir

Thursday, December 29th, 2022 12:18 am
flwyd: (java logo)
Another December, another adventure with Advent of Code (previously: 2021, 2020). This year I used it as an opportunity to learn Elixir. Since an elixir is (per Wikipedia a sweet liquid used for medical purposes, to be taken orally and intended to cure one's illness, I decided to write reflections each day about something interesting about Elixir the language and about some sort of elixir or other beverage.

Elixir is built on top of the Erlang programming language and runtime system. Erlang was designed by Ericsson for running large-scale telephone networks, so the language goals include distributed communication, high availability/uptime, fault tolerance, and avoidance of data corruption. I had heard that Erlang was nice for building concurrent software but had a reputation for confusing syntax, so I was interested to learn about Elixir, which was created by a notable member of the Ruby on Rails community and brought a syntax and coding style that's more familiar to developers who've used "functional style" in modern imperative languages. Some of the notable features of Elixir (via Erlang) are
  • All data structures are immutable
  • Functions are dispatched and assignments are made based on structural typing
  • Code is generally written in a functional programming style


Elements of functional programming have been coming to mainstream languages like JavaScript, Java, Python, and Ruby over the last decade and change, largely around handling collections of data. Stepping through an array by incrementing indices and modifying a result is falling out of vogue in favor of constructs like items.map(item => item.price).filter(price => price > 0).reduce((p, accumulator) => accumulator + p) (and convenience methods like sum() to avoid writing common reduce functions). Although I'm quite familiar with this style, I've always done it within the context of an imperative language, where I can write a linear function flow and drop into functional style for processing collections, then imperatively act on the results. Writing the whole program in a functional approach required switching my brain a bit, particularly since the Advent of Code problems are often specified by explaining a series of instructions which map more literally to an imperative program than a functional one. I also noticed that functional and recursive thinking is also more challenging when I'm sleep deprived after a few weeks of late night programming :-)

Structural typing is a very interesting approach, and allows logic flows to be written without a lot of if-else statements. Elixir's main data types are (with some simplification) numbers, strings, atoms, tuples, (singly-linked) lists, and maps. Pattern matching can be used to declare different "function heads", like this example which computes the absolute value:
def abs(x) when x < 0, do: -1 * x
def abs(x), do: x
or this one which computes the sum of distances of a list 2D points from the origin:
# base case: sum of an empty list is zero
def distance_sum([]), do: 0
# sum of a list is distance of the first point plus the sum of the rest of the list
def distance_sum([{x, y} | tail]), do: abs(x) + abs(y) + distance_sum(tail)
By default, Elixir programs crash if there isn't a structural match for a value that's being matched, which allowed some confidence when writing code that makes assumptions about the Advent of Code input: "if it doesn't look like one of the cases I've covered, the program will crash and it will be very obvious" rather than "who knows what will happen if I get unexpected input." Since pattern matching works on fixed-length string prefixes, I was able to get through more than 2 weeks of AoC without using any regular expressions, including some neat string pattern matching logic like this.

Immutable-only data structures were generally nice to work with, and avoid a whole category of bugs related to modifying things at the wrong time, or trying to treat an object as both the new state and the old one (common for Advent of Code problems where you're moving a bunch of things around one step at a time). The downside of immutable data structures is that using something like a cache isn't particularly straightforward. The general functional programming approach to a cache is a function which "adds" to a map by creating a copy of the old map but with the new value in addition, and pass that new cache value to each recursive call. But since later calls may add even more items to the cache (for example in a depth-first traversal of a graph), the recursive functions also need to return their modified cache, which ends up with a lot of bookkeeping. I also discovered that (no great surprise) creating a new map a million times with a million items in it generates a lot of memory pressure, slowing down runtime. Fortunately Erlang offers a concurrent "mutable" table with efficient access which I discussed for day 16. The other occasion where immutability became a challenge was when I built a doubly-linked list as a circular buffer using an Agent to keep each node's mutable state. Elixir's normally speedy execution time didn't do well with this design, taking about a minute to run. Interestingly, I didn't realize the problem description had an unstated assumption about how looping is handled, so while I was racking my brain to find the bug I implemented the same algorithm in Go using mutable data structures. The Go version ran orders of magnitude faster, suggesting there's a significant cost for communicating between Erlang processes (even though it's significantly cheaper than OS-level inter-process communication would be). I haven't yet taken the time to implement an Elixir version without Agents which recreates the whole circular list each time, or a Go version which uses goroutines to mimic the Agent behavior to compare concurrency overhead.

Overall I was pretty happy with Elixir as a language. The standard library is somewhat spartan (there's no priority queue, for example), but you can get a lot of mileage from Enum and the other core modules. The documentation is pretty good, though I did get caught up in one case where an example used low-valued integers and indices in the same example and I confused the order. I'm still getting used to all functions being defined on modules rather than being able to call "methods" on "objects" (so Map.update(my_map, some_key, new_value) rather than my_map.update(some_key, new_value)), but Elixir's pipeline syntax makes up for it: input |> Enum.map(some_func) |> Enum.map(other_func) |> Enum.sum(). In contrast to Raku I hardly spent any time at all debugging code where my assumption of what it was doing didn't match what the language was actually doing. These cases were mostly caught by the compiler or at runtime by a failed structural match. Compile times are generally under one second, as was runtime for most solutions, so much so that I would worry I'd hit an infinite loop if my program ran for more than five seconds. This strikes a good balance for rapid development in a contest-like environment.

Going in to the month, one of my main goals was to go to bed at a reasonable hour and not get burnt out. It turns out that "get a working solution, then clean up the code, come up with a brewing or beverage metaphor, then write a few paragraphs about an interesting facet of the program, then post it all to Reddit" wasn't a very effective way of reliably going to bed. In the first week or so when the problems are pretty easy I would get all excited and stay up past 2am reading and chatting about other people's solutions. So when it came for the hard problems in the third week, I was already running on not enough sleep, and sometimes stayed up past three (my worst was a 5am bedtime), only to spend half the day at work thinking about other approaches to the problem. Fortunately I didn't have a major breakdown at the end of the cycle like I did last year, partly because there wasn't anything as frustrating as 2021 day 24 and partly because I didn't have extra pent up frustration with the programming language.
flwyd: (java logo)
Advent of Code is an annual programming challenge. Every night at midnight EST from December 1st through 25th, a new problem is posted to the website, along with a participant-specific input file. Participants can use any programming language they want to solve the problem, using their input, and see if the output of their program matches the expected value. Once that's done, a second part of the problem becomes available. Part 2 uses the same input file and is generally a variation on the first part, such as changing the goal from "find the maximum" to "find the sum of all" or increasing the number of times an operation needs to be performed so that an inefficient algorithm will be far too slow. Problems are usually straightforward at the beginning of the month and get more challenging as the month progresses. There's a stats page measuring time-to-completion and a top-100 leaderboard, but no prizes; many folks pursue their own goals like learning a new language or minimizing program runtime which are somewhat at odds with quick finishes.

After having lots of fun and learning Kotlin with Advent of Code in December 2020, I decided to use Raku for this past year's edition (with code shared on GitHub). Raku is the new name of Perl 6, a "member of the Perl family of languages" which famously took a decade and a half of community experimentation before a finalized version was released. Perl, in turn, is a language originally focused on working with collections of text files which is famously easy to write or hard to read, depending on who you ask. Raku keeps many of Perl's basic syntactic elements—like scalar, array, and hash variables identified by $, @, and % sigils—but also brings to bear a lot of modern programming language developments like class-based object orientation, concurrency, type constraints, multi-dispatch, and just-in-time compilation. Raku is also what one might call Unicode-forward. Most languages these days allow Unicode literals in strings, and most languages made since the '90s allow Unicode letters in program identifiers. Raku takes this significantly further. First, Raku strings are sequences of graphemes rather than just bytes or Unicode code points, so both the single-code-point and the "combining diacritics" variants of "é" are identical in Raku. Second, Unicode characters aren't just limited to letters: the language and standard library provide $thing ∈ ($set1 ∪ $set2) for set operations, @words».uc to upper-case each element of an array, ⚛++ for atomic-increment, 25 ** ½ as another way to perform a square root, and quoting strings with arbitrary characters like say q༺hello world༻.uc. Additionally, Raku took one of Perl's big selling points in the '80s and '90s (terse and powerful regular expressions) and evolved them into grammars which are easier to read, modify, and compose, and likely also faster. This grammar support is what drew my interest to Raku, since I started a hobby project that involves parsing a small amount of information from source code written in a large number of languages, and the ability to quickly write but still maintain textual matchers would make that project more pleasant.

One of the driving principles in the design of Raku (and Perl before it) is There's more than one way to do it (TMTOWTDI). Another principle (repeated frequently by people who post in help forums, it seems) is that programming in Raku should be fun. Several times while working on an Advent of Code solution I tried to do something in a way that looked both elegant and reasonable, only to find out that There's More Than One Way To Do It, But Some Of Those Ways Are Wrong. (Furthermore, since TMTOWTDI, the documentation usually doesn't say "This isn't a good tool for X, use Y instead.") For example, I spent at least half an hour trying to understand why my memoization code didn't seem to be caching my Pair objects, despite all my experiments with the interactive interpreter looking like things should work just fine. It turns out that foo => 42 creates a Pair which uses value equality for both key and value, but my $key = 'foo'; my $val = 42; $key => $val creates a Pair that holds on to the $val scalar (but not the key scalar) and thus uses (half) reference equality. The documentation explains this behavior in an aside and explains that .freeze will result in an immutable Pair, but it's easy to encounter Pairs in documentation that doesn't mention that, and the half hour of WTF was not at all fun. (Another Way To Do It would be implementing a 2-value tuple class myself, which wouldn't have added fancypants scalar handling.) This discovery also reduced my confidence in the language a bit: when I look at a block of code, am I sure that my variables work as values, or might something else hold on to this funny scalar reference?

Another un-fun discovery was that for @first ∩ @second { … } doesn't iterate through the shared elements of two arrays, but instead iterates through Pairs, where the value is always True. I was aware that many set implementations are implemented as a hash table where the keys are set elements and the values are a placeholder value like a boolean. But most languages hide this implementation detail and present a set as akin to an unordered list which doesn't allow duplicate values. The workaround is easy (call .keys on the set when iterating over it), and it provides a nice symmetry with Bags (multisets) and Mixes (weighted sets), but it was still a big surprise. This was made worse by Raku's gradual typing discipline and implicit conversions; I think I was putting the set elements into a Hash, which converts keys to strings by default, so rather than a compile or runtime error complaining that I was using a Pair where a Str was expected I got a Hash with keys like foo<tab>True rather than just foo. Iteration and automatic conversion also combine in un-fun ways because the base class for most objects implements all the List methods by converting non-iterable objects into single-element lists. I would find it a lot more fun if the Raku compiler would tell me that sub answer(--> Int) { 42 } ; for answer() { … } was attempting to iterate over a non-iterable (maybe I changed answer from a List to an Int and forgot to update all the callers) rather than silently iterating over a single element. This annoyance is compounded by the fact that scalar references to iterable types (like a sequence, list, or array) are turned into single-element lists in this context, so changing my $x = nums => (1, 2, 3, 4); .say for $x.value (which prints four lines with one number each) to my $x = (1, 2, 3, 4); .say for $x changes the output to print a single line with four numbers and a pair of parentheses. This makes changing the shape of data structures while developing a program (like adding or removing a wrapper class) create surprising effects that aren't caught by the compiler. And maybe it's just me, but I think programming is more fun when the computer quickly tells you when you make a mistake, rather than losing an hour of sleep because you were debugging code that looked reasonable.

Working in Raku did have some fun elements. Contrary to my complaints about automatic conversions for container types, the ability to seamlessly work with numbers parsed from text was pretty nice for Advent of Code problems. Having a Complex number type that works as a hash key made several 2D grid traversal problems fairly convenient. And I even got to leverage Raku's nice Unicode property support on a problem about open/close punctuation balancing. I opted to use Raku this year as a way to try out grammars, and they were generally nice. They're way more readable than Perl regular expressions and the "Perl-compatible" regex implementations that have come along in the past three decades, and the ability to supply an Actions class to parse the relevant data from a match and bubble it up makes working with the result of a complex parse much nicer. Grammars are generally overkill for AoC inputs, but I found the structure pretty nice. The one downside to Raku grammars was the lack of helpful information when a parse failed. Unlike a typical programming language compiler that outputs the line and column where input didn't match expectations, a failed grammar parse just returns null, even though Raku has a Failure type that allows conveying additional information. So when my parsing rules were wrong I generally had to put output statements in the Action class and manually inspect what the next token should've been.

Contrary to last year, when I mostly focused on implementing the code and learning the language, I spent a lot of time on the social side of AoC this year. Last year I participated a bit in the Reddit community to provide hints to folks who were stuck. This year, after solving the problem and then participating in Google's group chat about the day's challenge, I frequently spent a couple hours on Reddit, reading through the "solutions megathread" and checking out people's visualizations. This meant a lot of 3am bedtimes this December, followed by another late night. Coupled with trying to actually get some work done, I spent far too much time staring at a screen this December. Also, unlike 2020, there were social reasons to leave the house this year—friends were amused that I was programming in Vim over SSH from a smartphone while semi-drunk at a holiday party while also chatting with folks. (There were just a couple small bugs by the time I got home, and Vim's terse modal editing proved to be a nicer phone-based development environment than I'd expected.)

Three weeks of late nights—including an all-nigher for day 19 because I'd incorrectly assumed that rotations in 3D were commutative—definitely caught up with me. I was pretty burned out by the time I got stuck and went to bed on day 23, which is kind of an interesting problems with a lot of little fussy ways to introduce bugs. I was extra toasty on day 24 (the night leading into Christmas Eve) when I discovered that what I thought would be a reasonable solution—a modified binary search—didn't work at all because in the possible solution space of about 20 trillion numbers, fewer than ten are even potentially the right answer. This fact about the input file wasn't at all clear from the problem statement, and the frustration was intensified by the fact that (contrary to every other AoC problem I've seen) there was no realistic example and expected output to experiment with. The fact that a seemingly reasonable solution can run for hours without providing any insight about the problem (other than "valid values are sparse") and that (as far as I can tell) an efficient solution to this otherwise NP-complete problem requires making assumptions based on a single input file, pretty much soured me on what had otherwise been an enjoyable month. That one day's problem (following a couple late nights) made me strongly question ever participating in "live" (i.e. during December) AoC ever again, which isn't a good feeling to have on Christmas Eve.

Raku's final un-fun factor played a role in this burnout too: slow execution speed. I'd seen folks who hang out on Raku help communities warn folks that Raku performance isn't great, but I figured it would be fine for Advent of Code, which has lots of folks working in languages like Python and more tortured environments like Bash or Google Sheets. But on days 19, 23, and 24 I discovered that my Raku code would spend tens of minutes running on the example input before producing a wrong answer, which is not a good situation in a "implement a program before bedtime" challenge. To more quickly test wrong hypotheses and spot bugs, I reimplemented those days in Go. The Go language is far more verbose and has many fewer features than Raku, but I could implement a Go solution and try it five times in the time it would take to run my Raku code twice. My day 19 solution in Go—using the same algorithm and only slightly optimized code paths—was about a hundred times faster than the Raku implementation. I recall noticing that one Go run took 45 seconds while Raku took 45 minutes. I spent more time optimizing the runtime of the day 23 solution (due to some discussion in the group chat at work) and ended up with a 2.5 second solution in Go and a 68 minute solution in Raku. I even spent some time with the Raku profiler (which amusingly produced about a gigabyte of output for 45 seconds of runtime and had to be analyzed with sqlite because the HTML profiler output crashes any browser tab) and was only able to get a maximum 10% speedup after playing with all of the hot code paths under my control. Two orders of magnitude in runtime is difficult to make up with even the most amazing language expressiveness.

Advent of Kotlin

Saturday, December 26th, 2020 07:14 pm
flwyd: (java logo)
Each December for the past several years, Advent of Code has presented a series of 25 daily programming challenge, with each problem adding to a Christmas-themed narrative. I think I'd seen references to AoC in the past but hadn't paid it any mind. This year, my team at work is evaluating Kotlin for adoption in our Android Java codebase, so a small daily excuse to get experience with the language seemed promising. Plus, there's a global pandemic so it's not like I've got any holiday parties to attend.

The event was more fun than I'd anticipated. Challenges are released at midnight America/New_York each night, and there's a time-to-completion leaderboard, so there's a competitive challenge aspect to get the juices flowing. This wasn't great for health, though—on a couple nights I started programming at 10pm America/Denver while already tired and didn't go to bed until 3am, whether because I was too sleep deprived to effectively debug or because I was having fun giving hints on the contest's subreddit. Mostly it was fun because the problems are small enough to do in one sitting and often involve an interesting algorithm. Lots of participants give themselves an additional challenge, like using a different programming language each day or using an unusual or challenging language—I saw someone posting solutions in the m4 macro language and some folks using Excel. Lots of folks create visualizations of their algorithm solving the problem; this year's challenges involved several which were based on Conway's Game of Life which naturally offer interesting visualizations.

My experience with Kotlin was a bit mixed. Kotlin is a programming language designed to run on the Java Virtual Machine and play well with Java code, but with a more expressive syntax and some features informed by two decades of programming language evolution since Java came into the world. It is perhaps most widely used in the Android ecosystem where some of its features help cover for poor Android framework design and API choices and where its coroutine concurrency model is a better fit for client application programming than Java is. Kotlin can also run in JavaScript and iOS environments, offering a hope of cross-platform shared logic. I've seen enough cross-platform efforts fail to be widely adopted to be skeptical on this front, though.

Using Kotlin for Advent of Code offered several benefits over Java. First, the heavy type inference and lower repetition and boilerplate reduced the number of symbols that had to be typed, which is nice for short programs, particularly one with Fake Internet Points for programming quickly. The standard library provides a lot of handy utilities like ranges, a typed Pair class and check/require (functions which concisely throw an exception if the program is in an invalid state) for which Java needs a library like Guava. when blocks were also handy in many AoC puzzles, and a lot friendlier than a chain of if/else conditions. Kotlin's fluent collection transformations (filter, map, sum, and friends) feel a little more expressive than Java Streams, and I found multiple occasions where "potentially infinite sequences" were helpful. Coroutines (which power sequences) are, I think, Kotlin's biggest selling point, and while most Advent of Code problems don't particularly benefit from concurrency, I found yielding values from a recursive function easier to implement and reason about than accumulating a list that gets returned up the chain.

I'm not entirely won over on Kotlin, though. My first gripe is that the language is at risk of falling into the C++ and Perl trap wherein the language provides multiple ways to do very similar things and two pieces of code which do the same thing look very different. This in turn can create a cognitive impediment when reading code written by a different programmer or team. One example of this is the distinction between properties and no-arg methods. In Kotlin, one writes list.size as a property but list.isEmpty() as a method and I've been unable to find guidance on when to use one rather than the other for read-only state.

Second, one of Kotlin's selling points is nicer handling of nulls, since nullability is part of a type definition (String? is nullable, String is not). This is handy, and reduces boilerplate, particularly with null-happy APIs like Android. But it also means the compiler forces you to handle null cases which you know semantically can't occur, such as calling .max() on a collection that you know is not empty. This leads to a proliferation of method name pairs, one of which throws an exception and one of which returns null (elementAt/elementAtOrNull/elementAtOrDefault, getValue/get/getOrDefault, maxBy/maxByOrNull, maxWith/maxWithOrNull…). This also isn't entirely consistent within the standard library: list[5] throws an exception if the list has fewer than six elements, but map[5] returns null if that key is not present. The need for "OrDefault" method variants also seems a bit odd when the language also provides the Elvis operator (?:) for null-coalescing.

Third, the impression that Kotlin is basically Java with nicer syntax can lead to unpleasant surprises when the Kotlin standard library has a slightly different implementation to a similar method in Java. For example, in Java, String.split with an empty argument returns an array with one character per string: "cake".split("") is the same as new String[] {"c", "a", "k", "e"}. The same behavior holds true in JavaScript, Python, Perl, and perhaps dates back to AWK. Kotlin, on the other hand, returns an array with empty strings at the beginning and end: "cake".split("") is the same as arrayOf("", "c", "a", "k", "e", ""). What's worse, the behavior of splitting on an empty string or pattern is not documented in Kotlin, so I don't know if it's a bug or an intentional choice.

This brings up another of my Kotlin complaints: documentation. There are plenty of valid complaints about Java's verbosity, but the clarity and completeness of Javadoc in the Java world is wonderful. I very rarely have to read the code in the JDK or a widely-used library to understand how it will handle a particular input. (The same cannot be said for Ruby, for example.) Kotlin seems to prefer more terse documentation and rarely gives sample code, so you're often left to figure it out yourself, experimentally. The Kotlin web interface for API documentation also has some notable room for improvement, like proper handling of "Open in new tab" clicks.

My final Kotlin complaint that cropped up during Advent of Code is a sneaky one. One of Kotlin's neat features is extension methods: you can define a method on a type defined by someone else, like operator fun Pair<Int, Int>.plus(other: Pair<Int, Int>) = Pair(first + other.first, second + other.second). This can help the readability of code with several steps by chaining all method calls from top to bottom, whereas Java would end up with a mix of static method calls wrapped around method chains. This feature, however, comes with a major downside: extension methods are resolved statically against the declared type of the receiver. They are not dispatched dynamically, despite having identical syntax as dynamically dispatched methods. A concrete example I ran into: a function which checks the neighboring cells of a 2-D grid used the following code:
fun checkNeighbors(x: Int, y: Int) {
  for (i in (x-1..x+1).intersect(0 until height)) {
    for (j in (y-1..y+1).intersect(0 until width)) {
      // do something with grid[i][j]
    }
  }
}

This expresses "go through all the cells from above left to below right while staying inside the grid bounds" by using the intersection of pairs of ranges. Unfortunately, this is an O(n^2) algorithm because intersect is defined as an extension method of Iterable, so it runs through all width columns for each height row, even though at most three of each are relevant. I could write a specialized IntRange.intersect(other: IntRange) = IntRange(max(start, other.start), min(endInclusive, other.endInclusive)) extension method, and it would improve the complexity in this code to O(1). But if someone passed an IntRange to a method declared to take an Iterable or a ClosedRange, an intersect call on that argument, the inefficient generic version would be used. This contrasts with Java 8's similar mechanism, default methods on an interface, which allow implementations to provide a specialized version dispatched at runtime.

Returning circularly to the "too many ways to do the same thing" problem, here are some efficient ways to write that grid code in Kotlin:
for (i in (x-1).coerceAtLeast(0)..(x+1).coerceAtMost(height-1)) {
  for (j in (y-1).coerceAtLeast(0)..(y+1).coerceAtMost(height-1)) {

for (i in (0 until height).let { (x-1).coerceIn(it)..(x+1).coerceIn(it) }) {
  for (j in (0 until width).let { (y-1).coerceIn(it)..(y+1).coerceIn(it) }) {

for (i in x-1..x+1) {
  if (i in 0 until height) {
    for (j in y-1..y+1) {
      if (j in 0 until width) {

for (i in (x-1..x+1).filter((0 until height).contains)) {
  for (j in (y-1..y+1).filter((0 until width).contains)) {

but I'm really not sure which is the most idiomatic.
flwyd: (escher drawing hands)
Come, let us hasten to a higher plane,
Where dyads tread the fairy fields of Venn,
Their indices bedecked from one to n,
Commingled in an endless Markov chain!
Come, every frustum longs to be a cone,
And every vector dreams of matrices.
Hark to the gentle gradient of the breeze:
It whispers of a more ergodic zone.

In Riemann, Hilbert or in Banach space
Let superscripts and subscripts go their ways.
Our asymptotes no longer out of phase,
We shall encounter, counting, face to face.

I'll grant thee random access to my heart,
Thou'lt tell me all the constants of thy love;
And so we two shall all love's lemmas prove,
And in our bound partition never part.

For what did Cauchy know, or Christoffel,
Or Fourier, or any Boole or Euler,
Wielding their compasses, their pens and rulers,
Of thy supernal sinusoidal spell?

Cancel me not—for what then shall remain?
Abscissas, some mantissas, modules, modes,
A root or two, a torus and a node:
The inverse of my verse, a null domain.

Ellipse of bliss, converse, O lips divine!
The product of our scalars is defined!
Cyberiad draws nigh, and the skew mind
cuts capers like a happy haversine.

I see the eigenvalue in thine eye,
I hear the tender tensor in thy sigh.
Bernoulli would have been content to die,
Had he but known such a² cos 2φ!
Early in my experience with Unix-like systems I discovered fortune. This program would occasionally provide me with a clever passage attributed -- Stanislaw Lem, "Cyberiad" "Who is this Stanislaw Lem fellow and what is a Cyberiad," I wondered. And then, because it was the mid-90s and search engines didn't exist yet, I did nothing.

A few years later, I started collecting quotes to add to my random signature program. A great many of them came from fortune, since it gave me a quip every time I logged in or out. The first Cyberiad quote that made it on the list was [The brilliant Cerebron, attacking the problem analytically, discovered three distinct kinds of dragon: the mythical, the chimerical, and the purely hypothetical.] They were all, one might say, nonexistent, but each nonexisted in an entirely different way. Different modes of nonexistence, a fantastic puzzle for a philosophy minor like me. I wanted to find and read this book.

There are a few books and authors I keep in the back of my mind for eventual purchase. It gives me direction when I find myself in a bookstore: check the D section of Classics for The Vicomte de Bragelonne, check the A section of Sci-Fi for the HHGTTG radio series scripts, check the L section of Sci-Fi for Stanisław Lem… You would think it wouldn't be too hard to find a book by "the most widely read science fiction writer in the world," yet ten years went by without finding one of his books between Le Guin and Lewis. I was even a beta tester for Google Books on Android tablets, but couldn't buy an electronic Cyberiad. (It's available now, though.) Tantalizingly, Google ran a fantastic narrative doodle based on The Cyberiad. I finally found a copy when I chanced to stop in to Red Letter Books in Boulder, enticed by a book about mangoes on the shelf out front. "Before I buy this, I need to see if they happen to have any Lem." Sure enough, my Quixotic quest found its goal, wedged in a dense shelf of mass market paperbacks.

The Cyberiad is a book of short stories about machines who build machines. The central character is Trurl, a constructor. He and his good friend Klapaucius the constructor build all manner of robots and devices, often on commission from rulers of distant worlds. Unlike the science fiction school led by Asimov, the engineering details of the machines and their scientific mechanism of action are of little importance. The stories are not about the machines but about the philosophical considerations and allegorical implications of such a device in a world not entirely dissimilar from ours. The first story, How The World Was Saved concerns a machine that can create anything starting with N. After creating concrete and abstract nouns, they ask the machine to do Nothing, whereby it starts to eliminate the universe.

Originally written in Polish, the book has a lot of rhymes and wordplay with sciency terms which works surprisingly well in translation (to English, at least.) The sidebar to the right has a poem produced by Trurl's Electronic Bard. Lem has a great facility for technical naming in a way that's fun rather than dry: The second, newer trail was opened up by the Imperium Myrapoclean, whose turboservoslaves carved a tunnel six billion miles in length through the heart of the Great Glossaurontus itself.

What I like best about The Cyberiad is how it resonates with my experience as a constructor of sorts. The book was written in 1967, when hardware was still the king of technology, before we realized that software eats the world. Yet the story Trurl's Machine and other passages describe the foibles of building, debugging, and otherwise producing a computer program better than any software-focused essay I've read. Throughout the book, Trurl displays the three cardinal virtues of the programmer: laziness, impatience, and hubris. If more tales were added to the Cyberiad today, perhaps the constructors would be programs which write other programs.

All makers and builders and coders and creators would do well to read The Cyberiad: Fables for the Cybernetic Age. This hypermedia book report claims the book inspired Will Wright to create SimCity; what might it do for you? Acquire it in cybernetic digital form or via a musty-bookstore-quest for a well-loved copy not so overpriced as these.
flwyd: (dogcow moof!)
I'm working on code for a Burning Man art project. I opted to do it in Ruby because it's got Perl-like text parsing capabilities and it's fun to use. Despite being a little unfamiliar with the idioms, I've been able to write and improve code quickly, without getting bogged down in object factories, dependency injection, long-term reuse, or build systems that pull five dozen other projects.

Today I started generating graphs (the art part of the project). I'm using gnuplot and my goodness is it fun! The official documentation is a little hard to navigate, but the examples provide a quick path to graphs that, were I to attempt them in a spreadsheet, would result in much fist-shaking.

One sign of quality that both ruby and gnuplot display is that as you get familiar with them, you can often guess which words to use, even if you've never seen documentation explaining what they do. It's tough to hit the sweet spot between English and formalism, but these two do a pretty good job.
flwyd: (spiral staircase to heaven)
Musing about Apple's mistake of pissing off developers with the App Store, Paul Graham wonders what a smart phone would have to offer to make developers want it instead of an iPhone. He wants to see someone make a software development environment worth using on a handheld device. It would have to be good enough that you'd want to program on your phone, with its small screen and limited keyboard,[1] rather than on a laptop or desktop with huge monitors and ergonomic keyboards.[2]

After using Eclipse for five years, I can't imagine doing significant Java development (particularly debugging) on a handheld device. Mobile device user interfaces are all about presenting only the immediately relevant information while an IDE[3] is really good at providing a bunch of information at once.

I still use a 24x80 text editor when writing in scripting languages, partly because the problems are usually small enough to fit in just a few files and the languages are high-level enough that a significant amount of code fits in a terminal window. I can imagine writing Python or Ruby on an iPhone or even Lisp if the editor had good parenthesis-matching. Perl and Haskell might be frustratingly symbol-heavy. But just because someone could write Python in a text editor on a handheld device, I don't think they would. It's just not as enjoyable as using a full keyboard on your desk.

A successful development environment on a mobile device should make use of the unique features a handheld offers. Typing on a phone may be less fun than using a laptop, but what if you could program with gestures? Tilt your phone to the left to generate an assignment. Move it in a circle to create a loop. Touch a variable to reference it again. Use multi-touch zooming and scrolling to select a function from an API.

Historically, attempts at visual programming languages haven't yielded many success at lower abstractions than GUI layout and UML, but I think a graphical language designed from the ground up to fit a handheld environment could be quite elegant. Text is very good at representing symbolic computation, but it's not the only way. Constraints often lead to beautiful art; if the crutch of an easy keyboard wasn't available, I'll bet someone would come up with something really clever. A lot of programmers are good at spatial reasoning, so let's try breaking nonlinear programs out of the linear medium of text.

Developing on a handheld would obviously be handy when you're working on an application for that device. Make a change to the tilt response code, then tilt your device to test it. Take a picture[4] and feed it straight to a test case. With lots of sensor input, mobile devices seem like an ideal platform for neural nets and other fuzzy techniques. On the development side, you could present an example to the device and immediately classify it, quickly building a model.

These are all hand-wavy suggestions of possible directions. I've hardly ever used a cell phone, let alone developed a mobile app, so my conceptions may be off base. Yet I think my key message is valid: to make handheld development attractive, it needs to be approached from the ground up. The metadeveloper needs to focus on solving a class of problems[5] and find ways to do so using things small computing devices are really good at. I've focused on sensor input here, but mobile devices have other unique features. Would dictating pseudocode to your phone be useful? Maybe there are ways to take advantage of location information and remote networking in programming. In the end, I think successful handheld development will be as different from desktop IDE development as that is from simple interactive text editors as that is from programming with punchcards.


[1] Not owning a smart phone, I'm assuming that typing on a handheld device is more of a chore than using a full keyboard. I know Apple and others have done some clever things with predictive text, automatic spelling correction, and other clever techniques to make typing easier on a phone. Unmodified, I suspect those would not work well when programming where function names deliberately deviate from proper spelling and punctuation has very special meanings that differ between languages. You could build language models for programming languages, of course, but you could also implant such models in IDEs (most already do) and use a full keyboard. I think dropping the keyboard metaphor would be preferable for handheld input. Why not leverage the motion and orientation sensors and use arm and wrist motions to "paint" your words in thin air? This seems like it would work particularly well for Chinese. Nonetheless, I don't think it would be faster than using a desktop keyboard.

[2] I actually hate ergonomic keyboards; I'm most comfortable on a laptop or Apple keyboard. The most RSI-inducing computer input I've done is use the scroll wheel on a cheap Microsoft mouse. But the key (hah!) feature of an extended keyboard for programming in rich environments is hot keys, which are hard to accomplish with a keyboard interface like the iPhone's.

[3] For the non-programmers in the crowd, IDE stands for Interactive Development Environment. Here's a screenshot of the information overload I'm talking about.

[4] Why do phones come standard with (crappy) cameras these days? My camera can't make low-quality phone calls, I don't need my phone to take low-quality pictures. The benefit I see to cell phone cameras is (a) you've always got it on you and (b) it's immediately connected to software and the network. This makes it ideal for things like taking a picture of a barcode and comparison shopping instantly. The picture quality only has to be good enough to make out black and white lines.

[5] The best programming languages are the result of somebody thinking "There's got to be a better way" to solve a set of problems he was working on. Perl is really good at working with text because that's was the itch bugging Larry Wall. Dennis Ritchie created C to work on Unix and it's still the most popular language for writing operating systems. Guido adds new features to Python by trying out the syntax on a sample of existing code and deciding which produces the most readable result.
flwyd: (dogcow moof!)
More gems from 1986's Programmers At Work, this one from Butler Lampson:
That’s why I think the idea of computer literacy is such a rotten one. By computer literacy I mean learning to use the current generation of BASIC and word-processing programs. That has nothing to do with reality. It’s true that a lot of jobs now require BASIC programming, but the notion that BASIC is going to be fundamental to your ability to function in the information-processing society of the twenty-first century is complete balderdash. There probably won’t be any BASIC in the twenty-first century.

It's the 21st Century now, and the surviving BASIC dialect is Visual Basic, which is more different than mid-80s BASIC than it is alike. The heart of BASIC is to make it easy for people with a strong computer background to write programs. Depending on your perspective, this may be good or bad; BASIC and Visual Basic have been home to some truly groan-worthy code, but also let people accomplish many straightforward tasks more effectively. As the number of computer users has grown exponentially in the last few decades, the percentage of people who know a programming language has dropped significantly. In the 1970s, perhaps half of computer users in academic or research environments could write a program and most businesses that owned a computer had someone who could program it to some degree. Today, we've realized that programming well takes a style of thinking that doesn't come naturally to a lot of people in addition to an investment of time in understanding the ins and outs of specific systems. We've shown that it's more effective to have experts in programming learn new domains and write programs targeted to those than to have experts in domains learn how to program.

Lampson's bigger point is also insightful, but in a way it's wrong. It's true that the details of almost no program used widely in 1986 is relevant today[1]. The specific syntax of Microsoft BASIC, the keystroke shortcuts of WordPefect for DOS, and the location of hidden items in King's Quest are all irrelevant today. But folks like me who learned how to use computers before we learned how to drive have a cognitive model of computer interaction that's a lot more flexible and successful than folks in my parents' generation who get confused about the web and have no hope for social media. The medium is the message.

[1] Amusingly enough, this isn't as true for programmers. The C programming language, the vi and emacs text editors, and Unix-like operating systems have all evolved significantly in the last 25 years, but if you knew how to accomplish something back in the day, you can still do it now. Not to mention COBOL, the illness in the sleeper zombie army of legacy code.

Protip

Thursday, March 19th, 2009 03:35 pm
flwyd: (java logo)
If you're wondering why you have no data, it's entirely possible you haven't written that part of the code yet.

Ruby Soup

Thursday, November 20th, 2008 12:06 am
flwyd: (java logo)
I had a thought at work the other day. Java has a strict one-parent-class rule. Philosophically, this is a good match in domains like taxonomic biology (a phylum is in one kingdom, a genus is in one family). But it's a terrible match in domains like cooking. A soup inherits flavors from many sources. You can add meat to what was originally a vegetable stew. Ruby is a much better language for programming a curry.
flwyd: (java logo)
I created a fairly simple programming exercise for job candidates. It's not trivial, but it's not super hard. Someone who's written programs before and can figure out the solution should be able to write the program in two hours or less. The problem description says "Your program should read from standard input (stdin) and write to standard output (stdout). Sample input and output are available."

I'm rather dismayed by the number of submissions which don't read from standard input or output. The most common violation is using a hard-coded path like C:\input.txt (tip: I'm not running Windows, I don't have a C: drive; even if I did, I wouldn't have the sample input there). Other violations include requiring filenames on the command line (not terrible), prompting for all values interactively in such a way that I can't just run cat input.txt | program, and one submission in PL/SQL that hard-coded the sample input as a bunch of INSERT statements. Tip: I put "use stdin and stdout" in the instructions so that you didn't have to bother with all the file opening and closing details. Also, do they not teach students to run their programs before submitting them? Running a submission on input bundled with the problem shouldn't throw an error before producing any output. Maybe students don't know how to use a command line environment any more and I/O redirection is a foreign concept.

Do today's computer science students really not know what standard input and output are? Do they really have assignments that say "Read this file from C:\Homework\Problem1?" My hope was to create an evaluation script that ran several files through submitted programs and report a correct answer rate. But when correct programs are little more likely to read from stdin, I can't even write a script capable of getting an answer.

Punks.
flwyd: (daemon tux hexley)
My comment on [livejournal.com profile] tongodeon's insightful post about "elitist" charges:

Now that I think about it, George W. Bush embodies the three cardinal virtues of the programmer: Laziness, impatience, and hubris.

* He was a C student at Yale, made a big deal of how infrequently the Texas legislature met, and spent a lot of presidential time vacationing on his ranch even while Katrina loomed.
* He argued that Florida shouldn't recount every vote, lobbied for a second land war in Asia before finishing his first, invaded Iraq before weapons inspectors could finish their search, and declared the mission accomplished at least five years before the war's end.
* He governs as if he got a majority of the vote (the first time), routinely adds "I won't respect this part of this law" when signing bills, claimed to be in tune with God's will when invading another country (despite the position of the Pope and many other church heads), and cites executive privilege instead of letting his staff participate in congressional inquiries.

I don't think his regular expression skills are very good, though.


Also, Democracy Now's highlights from the DNC and RNC are each ten or so minutes well spent. Listen to the audio or watch the video; the transcript doesn't capture the whole feeling.
March 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 312025

Most Popular Tags

Expand Cut Tags

No cut tags

Subscribe

RSS Atom
Page generated Wednesday, April 23rd, 2025 04:48 pm
Powered by Dreamwidth Studios