flwyd: (daemon tux hexley)
It's October, which means I'm now allowing myself to start scheming about Advent of Code 2024. The last four years I've taken it as an opportunity to learn a new language. This year for some reason I'm inspired to learn a stack-oriented language, so I've been playing with PostScript. The world has generally moved on from PostScript: it gave birth to the wildly-popular PDF format without quite as much "Anything goes in this Turing-complete document presentation language," and even the latest version of macOS Preview.app can't read PostScript files. But if you set aside the fact that a majority of the language builtins are designed for using graphics, text, fonts, and printer hardware, it's a pretty slick language. You've basically got two things available: a stack of data and operands and a stack of dictionaries for dynamic name lookup. Both the programmer and the computer use basically the same model for evaluating code: read a token; if it's data then push it on the stack, if it's executable then run it, reading from the operator stack and writing to the same stack.

Programming language tutorials, particularly for functional languages, have a strong affinity for calculating the n'th Fibonacci number, but let's generate the whole sequence by just calling the same "function" over and over again (== is "print the source representation of the object on top of the stack and GS> is the Ghostscript prompt):
GS> /fib { 2 copy add } bind def
GS> [ 1 1 fib fib fib fib ] ==
[1 1 2 3 5 8]
GS> [ 1 1 15 { fib } repeat ] ==
[1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597]

Stack-based languages can have pretty simple syntax rules. That [ ] array syntax isn't "parse elements between the brackets as an array." Instead, the [ operator pushes a mark onto the stack and the ] operator builds an array with all the items on the stack until the first mark. Everything between the brackets is just normal PostScript code being executed on the stack. The following code builds an array of increasingly nested empty arrays by copying the whole array so far:
GS> [ 5 { counttomark [ exch 1 add copy pop ] } repeat ] ==
[[] [[]] [[] [[]]] [[] [[]] [[] [[]]]] [[] [[]] [[] [[]]] [[] [[]] [[] [[]]]]]]
% Or if you prefer some non-empty values to help read
GS> [ 42 5 { counttomark [ exch 1 add copy pop ] } repeat ] ==
[42 [42] [42 [42]] [42 [42] [42 [42]]] [42 [42] [42 [42]] [42 [42] [42 [42]]]] [42 [42] [42 [42]] [42 [42] [42 [42]]] [42 [42] [42 [42]] [42 [42] [42 [42]]]]]]

I've heard that some functional programming devotees might think of the non-negative numbers as a sequence of nested empty lists…

In PostScript, procedures are just arrays with the executable bit set, and you can construct and manipulate them as arrays. {} is syntax for an executable array, [] is syntax for a regular array, and cvx converts its operand to executable:
GS> { (hello world\n) print } exec
hello world
GS> % same thing:
[(hello world\n) (print) cvx] cvx exec
hello world

Since a procedure is just a mutable array of things to do, you can replace things in that array. Here's a self-modifying procedure that increments the first value inside itself, similar to a static int in a function in C:
GS> /selfmod { 1 (caller #) print == currentdict /selfmod get dup 0 get 1 add 0 exch put } bind def
GS> selfmod
caller #1
GS> selfmod
caller #2
GS> selfmod
caller #3

Page 115 of Thinking in PostScript defines a procedure by name including some initialization code; it then redefines itself without the initialization code the first time it's run. (The fact that the book is PDF file instead of a PostScript one says something indicative about the status of PostScript in the modern world.)

While PostScript has a couple hundred operators that are part of the language, a lot of those are for doing things like drawing on the page and selecting fonts. The parts of the "standard library" for working with strings and arrays is pretty spartan: even concatstrings is a Ghostscript addition, and I think people copy/pasted their own concatenation function all over the place for years. Since my Advent of Code interests don't include "Let's re-implement parsing a variable-width text file into fixed-size mutable strings each night," I've been getting some practice with the language by writing some core library functions that I'm certain will be used frequently.

I was about to work on the functional programming standbys like map and reduce, but I realized that with arrays constructed by "run all the code between brackets, then move everything from the stack between those brackets into an array" that having a separate function for map would just be extra noise and the basic iteration operator does the trick. Here's the PostScript equivalent of myarray.map { x -> x + 1 } and mydict.mapValues { i -> x * x }.
% Add 1 to each element of an array:
[ myarray { 1 add } forall ]
% Create a dictionary with the same keys as mydict, but square the values:
<< mydict { dup mul } forall >>

reduce (or fold depending on your dialect) is similarly straightforward if you supply with the accumulator rather than using the first array value. Here's myarray.reduce(0) { x, acc -> x + acc } (sum) and an implementation of myarray.all { x -> x % 2 == 0 } (are all items are even?).
0 myarray { add } forall
true myarray { 2 mod 0 eq and } forall

None of these need special handling for empty collections. They also work to iterate through bytes of a string (though there doesn't seem to be a short way to construct a string from an array of bytes). The all implementation can become an any by switching true to false and and to or. all becomes none by adding not before and. Short-circuiting can be added with dup { exit } if after the and/or.

Advent of Elixir

Thursday, December 29th, 2022 12:18 am
flwyd: (java logo)
Another December, another adventure with Advent of Code (previously: 2021, 2020). This year I used it as an opportunity to learn Elixir. Since an elixir is (per Wikipedia a sweet liquid used for medical purposes, to be taken orally and intended to cure one's illness, I decided to write reflections each day about something interesting about Elixir the language and about some sort of elixir or other beverage.

Elixir is built on top of the Erlang programming language and runtime system. Erlang was designed by Ericsson for running large-scale telephone networks, so the language goals include distributed communication, high availability/uptime, fault tolerance, and avoidance of data corruption. I had heard that Erlang was nice for building concurrent software but had a reputation for confusing syntax, so I was interested to learn about Elixir, which was created by a notable member of the Ruby on Rails community and brought a syntax and coding style that's more familiar to developers who've used "functional style" in modern imperative languages. Some of the notable features of Elixir (via Erlang) are
  • All data structures are immutable
  • Functions are dispatched and assignments are made based on structural typing
  • Code is generally written in a functional programming style


Elements of functional programming have been coming to mainstream languages like JavaScript, Java, Python, and Ruby over the last decade and change, largely around handling collections of data. Stepping through an array by incrementing indices and modifying a result is falling out of vogue in favor of constructs like items.map(item => item.price).filter(price => price > 0).reduce((p, accumulator) => accumulator + p) (and convenience methods like sum() to avoid writing common reduce functions). Although I'm quite familiar with this style, I've always done it within the context of an imperative language, where I can write a linear function flow and drop into functional style for processing collections, then imperatively act on the results. Writing the whole program in a functional approach required switching my brain a bit, particularly since the Advent of Code problems are often specified by explaining a series of instructions which map more literally to an imperative program than a functional one. I also noticed that functional and recursive thinking is also more challenging when I'm sleep deprived after a few weeks of late night programming :-)

Structural typing is a very interesting approach, and allows logic flows to be written without a lot of if-else statements. Elixir's main data types are (with some simplification) numbers, strings, atoms, tuples, (singly-linked) lists, and maps. Pattern matching can be used to declare different "function heads", like this example which computes the absolute value:
def abs(x) when x < 0, do: -1 * x
def abs(x), do: x
or this one which computes the sum of distances of a list 2D points from the origin:
# base case: sum of an empty list is zero
def distance_sum([]), do: 0
# sum of a list is distance of the first point plus the sum of the rest of the list
def distance_sum([{x, y} | tail]), do: abs(x) + abs(y) + distance_sum(tail)
By default, Elixir programs crash if there isn't a structural match for a value that's being matched, which allowed some confidence when writing code that makes assumptions about the Advent of Code input: "if it doesn't look like one of the cases I've covered, the program will crash and it will be very obvious" rather than "who knows what will happen if I get unexpected input." Since pattern matching works on fixed-length string prefixes, I was able to get through more than 2 weeks of AoC without using any regular expressions, including some neat string pattern matching logic like this.

Immutable-only data structures were generally nice to work with, and avoid a whole category of bugs related to modifying things at the wrong time, or trying to treat an object as both the new state and the old one (common for Advent of Code problems where you're moving a bunch of things around one step at a time). The downside of immutable data structures is that using something like a cache isn't particularly straightforward. The general functional programming approach to a cache is a function which "adds" to a map by creating a copy of the old map but with the new value in addition, and pass that new cache value to each recursive call. But since later calls may add even more items to the cache (for example in a depth-first traversal of a graph), the recursive functions also need to return their modified cache, which ends up with a lot of bookkeeping. I also discovered that (no great surprise) creating a new map a million times with a million items in it generates a lot of memory pressure, slowing down runtime. Fortunately Erlang offers a concurrent "mutable" table with efficient access which I discussed for day 16. The other occasion where immutability became a challenge was when I built a doubly-linked list as a circular buffer using an Agent to keep each node's mutable state. Elixir's normally speedy execution time didn't do well with this design, taking about a minute to run. Interestingly, I didn't realize the problem description had an unstated assumption about how looping is handled, so while I was racking my brain to find the bug I implemented the same algorithm in Go using mutable data structures. The Go version ran orders of magnitude faster, suggesting there's a significant cost for communicating between Erlang processes (even though it's significantly cheaper than OS-level inter-process communication would be). I haven't yet taken the time to implement an Elixir version without Agents which recreates the whole circular list each time, or a Go version which uses goroutines to mimic the Agent behavior to compare concurrency overhead.

Overall I was pretty happy with Elixir as a language. The standard library is somewhat spartan (there's no priority queue, for example), but you can get a lot of mileage from Enum and the other core modules. The documentation is pretty good, though I did get caught up in one case where an example used low-valued integers and indices in the same example and I confused the order. I'm still getting used to all functions being defined on modules rather than being able to call "methods" on "objects" (so Map.update(my_map, some_key, new_value) rather than my_map.update(some_key, new_value)), but Elixir's pipeline syntax makes up for it: input |> Enum.map(some_func) |> Enum.map(other_func) |> Enum.sum(). In contrast to Raku I hardly spent any time at all debugging code where my assumption of what it was doing didn't match what the language was actually doing. These cases were mostly caught by the compiler or at runtime by a failed structural match. Compile times are generally under one second, as was runtime for most solutions, so much so that I would worry I'd hit an infinite loop if my program ran for more than five seconds. This strikes a good balance for rapid development in a contest-like environment.

Going in to the month, one of my main goals was to go to bed at a reasonable hour and not get burnt out. It turns out that "get a working solution, then clean up the code, come up with a brewing or beverage metaphor, then write a few paragraphs about an interesting facet of the program, then post it all to Reddit" wasn't a very effective way of reliably going to bed. In the first week or so when the problems are pretty easy I would get all excited and stay up past 2am reading and chatting about other people's solutions. So when it came for the hard problems in the third week, I was already running on not enough sleep, and sometimes stayed up past three (my worst was a 5am bedtime), only to spend half the day at work thinking about other approaches to the problem. Fortunately I didn't have a major breakdown at the end of the cycle like I did last year, partly because there wasn't anything as frustrating as 2021 day 24 and partly because I didn't have extra pent up frustration with the programming language.
flwyd: (java logo)
Advent of Code is an annual programming challenge. Every night at midnight EST from December 1st through 25th, a new problem is posted to the website, along with a participant-specific input file. Participants can use any programming language they want to solve the problem, using their input, and see if the output of their program matches the expected value. Once that's done, a second part of the problem becomes available. Part 2 uses the same input file and is generally a variation on the first part, such as changing the goal from "find the maximum" to "find the sum of all" or increasing the number of times an operation needs to be performed so that an inefficient algorithm will be far too slow. Problems are usually straightforward at the beginning of the month and get more challenging as the month progresses. There's a stats page measuring time-to-completion and a top-100 leaderboard, but no prizes; many folks pursue their own goals like learning a new language or minimizing program runtime which are somewhat at odds with quick finishes.

After having lots of fun and learning Kotlin with Advent of Code in December 2020, I decided to use Raku for this past year's edition (with code shared on GitHub). Raku is the new name of Perl 6, a "member of the Perl family of languages" which famously took a decade and a half of community experimentation before a finalized version was released. Perl, in turn, is a language originally focused on working with collections of text files which is famously easy to write or hard to read, depending on who you ask. Raku keeps many of Perl's basic syntactic elements—like scalar, array, and hash variables identified by $, @, and % sigils—but also brings to bear a lot of modern programming language developments like class-based object orientation, concurrency, type constraints, multi-dispatch, and just-in-time compilation. Raku is also what one might call Unicode-forward. Most languages these days allow Unicode literals in strings, and most languages made since the '90s allow Unicode letters in program identifiers. Raku takes this significantly further. First, Raku strings are sequences of graphemes rather than just bytes or Unicode code points, so both the single-code-point and the "combining diacritics" variants of "é" are identical in Raku. Second, Unicode characters aren't just limited to letters: the language and standard library provide $thing ∈ ($set1 ∪ $set2) for set operations, @words».uc to upper-case each element of an array, ⚛++ for atomic-increment, 25 ** ½ as another way to perform a square root, and quoting strings with arbitrary characters like say q༺hello world༻.uc. Additionally, Raku took one of Perl's big selling points in the '80s and '90s (terse and powerful regular expressions) and evolved them into grammars which are easier to read, modify, and compose, and likely also faster. This grammar support is what drew my interest to Raku, since I started a hobby project that involves parsing a small amount of information from source code written in a large number of languages, and the ability to quickly write but still maintain textual matchers would make that project more pleasant.

One of the driving principles in the design of Raku (and Perl before it) is There's more than one way to do it (TMTOWTDI). Another principle (repeated frequently by people who post in help forums, it seems) is that programming in Raku should be fun. Several times while working on an Advent of Code solution I tried to do something in a way that looked both elegant and reasonable, only to find out that There's More Than One Way To Do It, But Some Of Those Ways Are Wrong. (Furthermore, since TMTOWTDI, the documentation usually doesn't say "This isn't a good tool for X, use Y instead.") For example, I spent at least half an hour trying to understand why my memoization code didn't seem to be caching my Pair objects, despite all my experiments with the interactive interpreter looking like things should work just fine. It turns out that foo => 42 creates a Pair which uses value equality for both key and value, but my $key = 'foo'; my $val = 42; $key => $val creates a Pair that holds on to the $val scalar (but not the key scalar) and thus uses (half) reference equality. The documentation explains this behavior in an aside and explains that .freeze will result in an immutable Pair, but it's easy to encounter Pairs in documentation that doesn't mention that, and the half hour of WTF was not at all fun. (Another Way To Do It would be implementing a 2-value tuple class myself, which wouldn't have added fancypants scalar handling.) This discovery also reduced my confidence in the language a bit: when I look at a block of code, am I sure that my variables work as values, or might something else hold on to this funny scalar reference?

Another un-fun discovery was that for @first ∩ @second { … } doesn't iterate through the shared elements of two arrays, but instead iterates through Pairs, where the value is always True. I was aware that many set implementations are implemented as a hash table where the keys are set elements and the values are a placeholder value like a boolean. But most languages hide this implementation detail and present a set as akin to an unordered list which doesn't allow duplicate values. The workaround is easy (call .keys on the set when iterating over it), and it provides a nice symmetry with Bags (multisets) and Mixes (weighted sets), but it was still a big surprise. This was made worse by Raku's gradual typing discipline and implicit conversions; I think I was putting the set elements into a Hash, which converts keys to strings by default, so rather than a compile or runtime error complaining that I was using a Pair where a Str was expected I got a Hash with keys like foo<tab>True rather than just foo. Iteration and automatic conversion also combine in un-fun ways because the base class for most objects implements all the List methods by converting non-iterable objects into single-element lists. I would find it a lot more fun if the Raku compiler would tell me that sub answer(--> Int) { 42 } ; for answer() { … } was attempting to iterate over a non-iterable (maybe I changed answer from a List to an Int and forgot to update all the callers) rather than silently iterating over a single element. This annoyance is compounded by the fact that scalar references to iterable types (like a sequence, list, or array) are turned into single-element lists in this context, so changing my $x = nums => (1, 2, 3, 4); .say for $x.value (which prints four lines with one number each) to my $x = (1, 2, 3, 4); .say for $x changes the output to print a single line with four numbers and a pair of parentheses. This makes changing the shape of data structures while developing a program (like adding or removing a wrapper class) create surprising effects that aren't caught by the compiler. And maybe it's just me, but I think programming is more fun when the computer quickly tells you when you make a mistake, rather than losing an hour of sleep because you were debugging code that looked reasonable.

Working in Raku did have some fun elements. Contrary to my complaints about automatic conversions for container types, the ability to seamlessly work with numbers parsed from text was pretty nice for Advent of Code problems. Having a Complex number type that works as a hash key made several 2D grid traversal problems fairly convenient. And I even got to leverage Raku's nice Unicode property support on a problem about open/close punctuation balancing. I opted to use Raku this year as a way to try out grammars, and they were generally nice. They're way more readable than Perl regular expressions and the "Perl-compatible" regex implementations that have come along in the past three decades, and the ability to supply an Actions class to parse the relevant data from a match and bubble it up makes working with the result of a complex parse much nicer. Grammars are generally overkill for AoC inputs, but I found the structure pretty nice. The one downside to Raku grammars was the lack of helpful information when a parse failed. Unlike a typical programming language compiler that outputs the line and column where input didn't match expectations, a failed grammar parse just returns null, even though Raku has a Failure type that allows conveying additional information. So when my parsing rules were wrong I generally had to put output statements in the Action class and manually inspect what the next token should've been.

Contrary to last year, when I mostly focused on implementing the code and learning the language, I spent a lot of time on the social side of AoC this year. Last year I participated a bit in the Reddit community to provide hints to folks who were stuck. This year, after solving the problem and then participating in Google's group chat about the day's challenge, I frequently spent a couple hours on Reddit, reading through the "solutions megathread" and checking out people's visualizations. This meant a lot of 3am bedtimes this December, followed by another late night. Coupled with trying to actually get some work done, I spent far too much time staring at a screen this December. Also, unlike 2020, there were social reasons to leave the house this year—friends were amused that I was programming in Vim over SSH from a smartphone while semi-drunk at a holiday party while also chatting with folks. (There were just a couple small bugs by the time I got home, and Vim's terse modal editing proved to be a nicer phone-based development environment than I'd expected.)

Three weeks of late nights—including an all-nigher for day 19 because I'd incorrectly assumed that rotations in 3D were commutative—definitely caught up with me. I was pretty burned out by the time I got stuck and went to bed on day 23, which is kind of an interesting problems with a lot of little fussy ways to introduce bugs. I was extra toasty on day 24 (the night leading into Christmas Eve) when I discovered that what I thought would be a reasonable solution—a modified binary search—didn't work at all because in the possible solution space of about 20 trillion numbers, fewer than ten are even potentially the right answer. This fact about the input file wasn't at all clear from the problem statement, and the frustration was intensified by the fact that (contrary to every other AoC problem I've seen) there was no realistic example and expected output to experiment with. The fact that a seemingly reasonable solution can run for hours without providing any insight about the problem (other than "valid values are sparse") and that (as far as I can tell) an efficient solution to this otherwise NP-complete problem requires making assumptions based on a single input file, pretty much soured me on what had otherwise been an enjoyable month. That one day's problem (following a couple late nights) made me strongly question ever participating in "live" (i.e. during December) AoC ever again, which isn't a good feeling to have on Christmas Eve.

Raku's final un-fun factor played a role in this burnout too: slow execution speed. I'd seen folks who hang out on Raku help communities warn folks that Raku performance isn't great, but I figured it would be fine for Advent of Code, which has lots of folks working in languages like Python and more tortured environments like Bash or Google Sheets. But on days 19, 23, and 24 I discovered that my Raku code would spend tens of minutes running on the example input before producing a wrong answer, which is not a good situation in a "implement a program before bedtime" challenge. To more quickly test wrong hypotheses and spot bugs, I reimplemented those days in Go. The Go language is far more verbose and has many fewer features than Raku, but I could implement a Go solution and try it five times in the time it would take to run my Raku code twice. My day 19 solution in Go—using the same algorithm and only slightly optimized code paths—was about a hundred times faster than the Raku implementation. I recall noticing that one Go run took 45 seconds while Raku took 45 minutes. I spent more time optimizing the runtime of the day 23 solution (due to some discussion in the group chat at work) and ended up with a 2.5 second solution in Go and a 68 minute solution in Raku. I even spent some time with the Raku profiler (which amusingly produced about a gigabyte of output for 45 seconds of runtime and had to be analyzed with sqlite because the HTML profiler output crashes any browser tab) and was only able to get a maximum 10% speedup after playing with all of the hot code paths under my control. Two orders of magnitude in runtime is difficult to make up with even the most amazing language expressiveness.

Advent of Kotlin

Saturday, December 26th, 2020 07:14 pm
flwyd: (java logo)
Each December for the past several years, Advent of Code has presented a series of 25 daily programming challenge, with each problem adding to a Christmas-themed narrative. I think I'd seen references to AoC in the past but hadn't paid it any mind. This year, my team at work is evaluating Kotlin for adoption in our Android Java codebase, so a small daily excuse to get experience with the language seemed promising. Plus, there's a global pandemic so it's not like I've got any holiday parties to attend.

The event was more fun than I'd anticipated. Challenges are released at midnight America/New_York each night, and there's a time-to-completion leaderboard, so there's a competitive challenge aspect to get the juices flowing. This wasn't great for health, though—on a couple nights I started programming at 10pm America/Denver while already tired and didn't go to bed until 3am, whether because I was too sleep deprived to effectively debug or because I was having fun giving hints on the contest's subreddit. Mostly it was fun because the problems are small enough to do in one sitting and often involve an interesting algorithm. Lots of participants give themselves an additional challenge, like using a different programming language each day or using an unusual or challenging language—I saw someone posting solutions in the m4 macro language and some folks using Excel. Lots of folks create visualizations of their algorithm solving the problem; this year's challenges involved several which were based on Conway's Game of Life which naturally offer interesting visualizations.

My experience with Kotlin was a bit mixed. Kotlin is a programming language designed to run on the Java Virtual Machine and play well with Java code, but with a more expressive syntax and some features informed by two decades of programming language evolution since Java came into the world. It is perhaps most widely used in the Android ecosystem where some of its features help cover for poor Android framework design and API choices and where its coroutine concurrency model is a better fit for client application programming than Java is. Kotlin can also run in JavaScript and iOS environments, offering a hope of cross-platform shared logic. I've seen enough cross-platform efforts fail to be widely adopted to be skeptical on this front, though.

Using Kotlin for Advent of Code offered several benefits over Java. First, the heavy type inference and lower repetition and boilerplate reduced the number of symbols that had to be typed, which is nice for short programs, particularly one with Fake Internet Points for programming quickly. The standard library provides a lot of handy utilities like ranges, a typed Pair class and check/require (functions which concisely throw an exception if the program is in an invalid state) for which Java needs a library like Guava. when blocks were also handy in many AoC puzzles, and a lot friendlier than a chain of if/else conditions. Kotlin's fluent collection transformations (filter, map, sum, and friends) feel a little more expressive than Java Streams, and I found multiple occasions where "potentially infinite sequences" were helpful. Coroutines (which power sequences) are, I think, Kotlin's biggest selling point, and while most Advent of Code problems don't particularly benefit from concurrency, I found yielding values from a recursive function easier to implement and reason about than accumulating a list that gets returned up the chain.

I'm not entirely won over on Kotlin, though. My first gripe is that the language is at risk of falling into the C++ and Perl trap wherein the language provides multiple ways to do very similar things and two pieces of code which do the same thing look very different. This in turn can create a cognitive impediment when reading code written by a different programmer or team. One example of this is the distinction between properties and no-arg methods. In Kotlin, one writes list.size as a property but list.isEmpty() as a method and I've been unable to find guidance on when to use one rather than the other for read-only state.

Second, one of Kotlin's selling points is nicer handling of nulls, since nullability is part of a type definition (String? is nullable, String is not). This is handy, and reduces boilerplate, particularly with null-happy APIs like Android. But it also means the compiler forces you to handle null cases which you know semantically can't occur, such as calling .max() on a collection that you know is not empty. This leads to a proliferation of method name pairs, one of which throws an exception and one of which returns null (elementAt/elementAtOrNull/elementAtOrDefault, getValue/get/getOrDefault, maxBy/maxByOrNull, maxWith/maxWithOrNull…). This also isn't entirely consistent within the standard library: list[5] throws an exception if the list has fewer than six elements, but map[5] returns null if that key is not present. The need for "OrDefault" method variants also seems a bit odd when the language also provides the Elvis operator (?:) for null-coalescing.

Third, the impression that Kotlin is basically Java with nicer syntax can lead to unpleasant surprises when the Kotlin standard library has a slightly different implementation to a similar method in Java. For example, in Java, String.split with an empty argument returns an array with one character per string: "cake".split("") is the same as new String[] {"c", "a", "k", "e"}. The same behavior holds true in JavaScript, Python, Perl, and perhaps dates back to AWK. Kotlin, on the other hand, returns an array with empty strings at the beginning and end: "cake".split("") is the same as arrayOf("", "c", "a", "k", "e", ""). What's worse, the behavior of splitting on an empty string or pattern is not documented in Kotlin, so I don't know if it's a bug or an intentional choice.

This brings up another of my Kotlin complaints: documentation. There are plenty of valid complaints about Java's verbosity, but the clarity and completeness of Javadoc in the Java world is wonderful. I very rarely have to read the code in the JDK or a widely-used library to understand how it will handle a particular input. (The same cannot be said for Ruby, for example.) Kotlin seems to prefer more terse documentation and rarely gives sample code, so you're often left to figure it out yourself, experimentally. The Kotlin web interface for API documentation also has some notable room for improvement, like proper handling of "Open in new tab" clicks.

My final Kotlin complaint that cropped up during Advent of Code is a sneaky one. One of Kotlin's neat features is extension methods: you can define a method on a type defined by someone else, like operator fun Pair<Int, Int>.plus(other: Pair<Int, Int>) = Pair(first + other.first, second + other.second). This can help the readability of code with several steps by chaining all method calls from top to bottom, whereas Java would end up with a mix of static method calls wrapped around method chains. This feature, however, comes with a major downside: extension methods are resolved statically against the declared type of the receiver. They are not dispatched dynamically, despite having identical syntax as dynamically dispatched methods. A concrete example I ran into: a function which checks the neighboring cells of a 2-D grid used the following code:
fun checkNeighbors(x: Int, y: Int) {
  for (i in (x-1..x+1).intersect(0 until height)) {
    for (j in (y-1..y+1).intersect(0 until width)) {
      // do something with grid[i][j]
    }
  }
}

This expresses "go through all the cells from above left to below right while staying inside the grid bounds" by using the intersection of pairs of ranges. Unfortunately, this is an O(n^2) algorithm because intersect is defined as an extension method of Iterable, so it runs through all width columns for each height row, even though at most three of each are relevant. I could write a specialized IntRange.intersect(other: IntRange) = IntRange(max(start, other.start), min(endInclusive, other.endInclusive)) extension method, and it would improve the complexity in this code to O(1). But if someone passed an IntRange to a method declared to take an Iterable or a ClosedRange, an intersect call on that argument, the inefficient generic version would be used. This contrasts with Java 8's similar mechanism, default methods on an interface, which allow implementations to provide a specialized version dispatched at runtime.

Returning circularly to the "too many ways to do the same thing" problem, here are some efficient ways to write that grid code in Kotlin:
for (i in (x-1).coerceAtLeast(0)..(x+1).coerceAtMost(height-1)) {
  for (j in (y-1).coerceAtLeast(0)..(y+1).coerceAtMost(height-1)) {

for (i in (0 until height).let { (x-1).coerceIn(it)..(x+1).coerceIn(it) }) {
  for (j in (0 until width).let { (y-1).coerceIn(it)..(y+1).coerceIn(it) }) {

for (i in x-1..x+1) {
  if (i in 0 until height) {
    for (j in y-1..y+1) {
      if (j in 0 until width) {

for (i in (x-1..x+1).filter((0 until height).contains)) {
  for (j in (y-1..y+1).filter((0 until width).contains)) {

but I'm really not sure which is the most idiomatic.
May 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 2025

Most Popular Tags

Expand Cut Tags

No cut tags

Subscribe

RSS Atom
Page generated Friday, June 6th, 2025 09:06 am
Powered by Dreamwidth Studios