The Tactical Amulet Extraction Bot: June 2009

Friday, June 12, 2009

Synchronizing with NetHack

One of the nastiest problems TAEB has to solve is also one of the most mundane. If you've ever played a game over a network or on a heavily loaded computer, you are probably familiar with lag and the problems it creates. While NetHack, as a turn-based game, does not suffer from directly lag-caused deaths, it is still very difficult to make moves sensibly when the responses are delayed.

Bots like TAEB also suffer from lag, indeed much worse than people do. Twenty milliseconds is quite unnoticeable for a human but is an eternity for a bot, and even at that timescale issues of finding the end of a turn are normal. Furthermore, bots are much less able to recover from partial updates than people are, due to the lack of sensible error-recovery.

Presumably, the player just exited a full-screen menu or changed dungeon level. A full screen redraw for NetHack is larger than the MTU of most networks, and due to lag, the second packet has been delayed significantly. (This doesn't require weird conditions; TCP normally waits until data has been successfully received before sending more.) As a result, the lower half of the current frame is black. TAEB, on seeing this frame, makes a couple of damaging inferences:

The lower half of the level has turned into solid rock. This is an especially nasty case if the redraw was caused by stairs, as TAEB interprets large areas of new stone to indicate a brand new unexplored level, corrupting the dungeon graph.
The bottom line is all spaces and unparsable, resulting in error spam and no stats.

Clearly, bots need a reliable way to tell when NetHack is done sending output for the turn. There have been several approaches to this problem over the years.

Consistently slow

One of the oldest and simplest approaches is simply to sleep for one second or so after every move, to wait for NetHack. This works, most of the time. It has two major flaws; first it is very slow, second it is unreliable - if lag spikes above 1 second, the bot may see an incomplete frame. Nevertheless, it is very simple to implement and has been used in several efforts.

Avoiding the issue

Another approach which has been historically used, most famously by the Angband Borg and Rog-o-matic, is to link the bot into the game's input routines. In theory, this is foolproof, but it carries dangers of a different sort, in addition to the obvious restriction to local games.

If done incautiously, the linking can affect game mechanics or leak information, thus compromising the validity of the bot effort. Modifying the game requires a very in-depth knowledge of NetHack's implementation, which is very unclean and non-transparent in places. Furthermore, requiring the bot to be linked into NetHack greatly complicates the build process; the bot cannot be built or distributed as a self-contained object unless all of NetHack is duplicated, prohibitive effort for many starting projects. As such, the linking approach is very rarely used.

We want TAEB to play in tournaments on public servers, which precludes linking.

Inline pings

One approach which is often proposed, but sadly does not work, is the inline ping. Simply send an invalid command after every real command, and wait for the error. Unfortunately, there is no command which is invalid in every context in NetHack. Not even control-r to redraw the screen; it fails in prompts and menus.

Telnet ping/pong

Invented by Sartak for NetHack fairly early on, and the most common method in use today (in particular, TAEB uses this), this relies on telnet as a side channel. A bot connects to a telnet server using a raw TCP socket, and implements the protocol itself; after every command is sent, the bot attempts to negotiate an invalid telnet stream option (DO 99).

When a telnet daemon receives the packet containing the command, it sees an option it does not support and responds in failure (WONT 99), then passes the command off to NetHack. Shortly afterward, NetHack's response is responded to the bot. Thus, the bot receives NetHack's response very shortly after the "pong". In effect, the telnet options are used as a side channel to dynamically measure network lag. Since under normal conditions network lag is the dominant component of total lag, this gives the bot a very good idea of when the output should be expected.

The reliability of this approach is improved by kernel packet coalescing, also known as Nagle's algorithm. When the pong is sent, the network implementation at the server waits a short time to potentially send a larger packet; if packet sending is low-priority, this will additionally tend to be delayed until NetHack is done calculating. With the pong and the response generally in the same packet, they are guaranteed to arrive at the bot at exactly the same time.

While good enough for bots, this approach is not quite foolproof. If NetHack takes a long time to reply to a command, due to system load, slow execution (there are several quadratic algorithms in the game, most notably monster movement and item stack sorting), or simply being swapped out, it is possible for the kernel to time out the attempt to coalesce, and the pong will arrive significantly before the output. This has been the source of several unreproducible errors in TAEB.

This approach is also limited to telnet servers. It does not help at all with lag as it occurs on heavily-loaded local systems, and does not work well with other network protocols. In principle SSH could be used, however the vastly higher complexity of the SSH protocol makes a custom implementation prohibitively difficult.

Expectations and Confusion

Will Noble (futilius)'s bridey works on an entirely different principle from the other bots. Instead of relying on out-of-band data to find complete frames, bridey is more like a human in that it waits for a frame that looks complete. The code which generates actions also provides a predicate for a sensible-looking result; fully filled bottom line, cursor on the character, etc. When an action is run, bridey waits for a frame that satisfies the predicate. To guard against programming oversights, the predicates are made partially dependant on wait time, and become more lenient as more data has had a chance to arrive.

Actually, bridey combines this with other mechanisms above; the time, as it is used in predicates, is measured in telnet ping/pong cycles. For instance, the move action has a final timeout of 4 cycles. This allows the reliability of the expect system to partially stack on the reliability of ping/pong. bridey, like TAEB, uses simple sleeps on the local interface.

Abusing POSIX Job Control for Fun and Profit

In 2008 I developed a novel method, which is currently being adopted by TAEB for the local interface. NetHack is single threaded and makes very little use of asynchronous calls, so when it is blocked on tty input it cannot possibly generate output until more keys arrive. So, if we could somehow find out when NetHack blocks, we would solve the synchronization problem permanently - unlike the methods above, this is immune to system load.

On the surface, finding out when a process blocks would seem to require low-level, privileged, and generally highly nonportable code. However, we are not the first user of the information. It is possible in UNIX to move processes between the foreground and background at will; if it were possible for background processes to read keyboard input, chaos would result. The traditional solution is for the kernel to stop any process when it attempts to read in the background. Stops, like deaths, result in wait() events; this is necessary in order that the shell may continue them.

With this insight, the solution seems fairly obvious - simply put NetHack in the background, then when it stops, let it read. There are a few complications in this, though. First, it's not possible to simply shovel data into a stopped reader; it has to be briefly continued in the foreground. How long is briefly? Depends on... system load. Fortunately, there is a trick we can use to eliminate inaccuracy - check for data on the virtual keyboard, with ourselves in the foreground. If we see anything, the slave needs to read more, and we re-continue it, using exponential backoff. Of course, data typed is not always immediately available due to line-buffering and various forms of special-key processing - but that applies to us too, so we still get the right answer.

Even with that settled, POSIX only allows a process to use job control if it lives inside the pseudo-terminal. That means, we need to fork a slave in order to do the actual manipulation, and communicate using pipes.

This is very unusual activity terminal driver, and it has exposed two bugs in the BSD kernel so far. First, in all known versions of BSD, a process which reads in the background does not actually receive the stop signal until it returns to user mode; but it returns to user mode only once the wait for terminal data completes! In regular usage, this is masked, because when you type bg all waiting threads are woken, but it was a killer for us; the eventual workaround I found was to briefly change the terminal blocking mode, thus forcing the kernel to reconsider all active blocks. Finding this bug, and the workaround for it, took an annoyingly long amount of time.

Another bug the project found was that the BSD kernel does a one-second sleep whenever a process that is already in the background tries to read. The reasons for this are quite unclear, but it has the effect of killing performance of this approach. Fortunately, recent versions of Darwin (OS 10.5.7 is known to work) have a rewritten sleep system, and this call is no more.

A comparatively minor issue was found in Linux; attempting to read from an idle psuedoterminal master fails with a "No such device" error, instead of the EOF return that would be expected of devices with pipe semantics. I'm not sure if this is a bug, but it did require workaround code.

This method is implemented in the IO::Pty::HalfDuplex perl module.

Variations on the theme

Job control abuse works pretty well on Linux and Darwin, but what if you're on a different platform? Well, there's always low-level, privileged, and generally non-portable code. As seems to often be the case with hacks like this, it usually takes the form of a debugger API abuse; set breakpoints on the functions that can read from the terminal, and arrange to capture output somehow. These approaches share a common liability; debugger interfaces are highly security-sensitive and are often in some way disabled.

On most UNIXes (including the BSDs) there is a system call ptrace which fits the bill perfectly. Unfortunately, while the style of ptrace is virtually identical everywhere, no two UNIXes implement the call in exactly the same way. ptrace implements security by refusing to trace processes of other users and denying user and group changes while active; this is a real bother for us, as NetHack is normally installed setgid - it escalates its group to one that can write to the saves and scoreboard.
Windows has a debugger interface which is well documented and consistent across versions. Nice? Well, Windows also has, last time I counted, 30 different documented ways to read from the console. Writing the correct traps to catch them all would be interesting, to say the least.
Ironically, the best chances seem to exist on DOS. Programs are debugged by directly hooking the operating system, allowing for a fully consistent interface, and there are very few paths for keyboard reading. Unfortunately, CPAN on DOS is non-functional, and installing any large Perl system without it is extremely tedious.
Departing from the trend of debuggers, it is also possible on the BSDs to simply read the process status. Compared to the ptrace method, this has the advantage of working on segid processes, but requires polling.

I plan to implement most of these in IO::Pty::HalfDuplex.

Conclusion

In the field of bot writing, even the simplest things can prove almost insurmountably complex. This is very evident in dealing with lag, both network and loading. Lag-related problems are a major cause of unreproducable faults in even the most sophisticated NetHack bots. Many of the simplest approaches to controlling lag are deeply flawed and will make debugging difficult. Let this be a lesson to everyone who wants to start from scratch. Fortunately, you don't have to; TAEB is designed to be a framework, to support additional AIs easily.

Sunday, June 7, 2009

Anatomy of a Step

Several people have asked to see more articles about TAEB's architecture. Even after working with the codebase for a year and a half, I am still very happy with the design. My previous attempts at writing a NetHack bot lasted only a month or two before collapsing under their limitations. However, I think we finally got it right with TAEB, so I enjoy sharing what we have created.

To introduce the important components of TAEB, I am going to walk through TAEB's "main loop". We call each iteration of the main loop a "step". Each step vaguely resembles iterations of NetHack's own main loop. Note that this is completely unrelated to NetHack's turn counter. Due to the speed system, command repetition ("search for twenty turns"), and even paralysis, steps do not correspond with turns. The only similarity is that both counters increase monotonically. Today, we care only about steps, not turns.

Input

The first thing TAEB does is read input from NetHack. This can involve reading from a socket (as in TAEB::Interface::Telnet) or reading from a pseudo-terminal (as in TAEB::Interface::Local). I will eventually write a post about the mechanics of simply deciding when NetHack is done sending data. It is not at all trivial. (update: here's that post)

NetHack prints a stream of characters. Since it provides a full text user interface with two-dimensional maps and colors, NetHack prints escape sequences. These escape sequences encode commands such as "change the pen color to red" (\e[31m) or "go to cell (5, 12)" (\e[12;5H). We run all input through Term::VT102 to parse and handle these escape sequences. Term::VT102 then lets us ask high-level questions like "what is the text of row 23?" or "what color is cell (59, 6)"? We have a subclass TAEB::VT that lets us ask additional questions, such as "is the text 'Hit return to continue:' present on the terminal?"

We now have something resembling what the player would see (a two-dimensional, colored block of text). The next task is to understand what is on the screen. Consider the following NetHack terminal:

As a human, you can figure out a lot of what is going on, even if you have never played NetHack. You can see that the player is in combat with a goblin. From the bottom lines, you can guess that the player is stuffed with food, and probably the character's name. If you have played NetHack, you can recognize several more things. The character is a wizard (evidenced by the Evoker title). There is a general store in the top left room. The character is currently on the bottom of the screen, with a goblin just north of him. You can identify the seven rooms of the level, and the features of the dungeon (like stairs, a fountain, and many doors).

Such analysis is the job of two components, TAEB::ScreenScraper and TAEB::World::Cartographer. The Cartographer analyzes the map to populate internal level and tile information, while the ScreenScraper handles other input (mostly English text). The ScreenScraper runs first, parsing messages to publish announcements. The rest of the system does not have to worry about the content of messages, each component just listens for particular high-level events.

Thankfully, English text appears in fixed places on the screen; TAEB never has to guess which text is prose and which represents the map. Most of the time, text only appears on the top and bottom lines. However, menus can appear in more places on the screen.

Here we have an "identify" menu. The ScreenScraper can detect that there is a menu by looking at the text preceding the cursor. If it is "(end)" or "(# of #)" then TAEB knows a menu is on the screen.

Menus are interesting because they demand input from the rest of the system. The ScreenScraper does not know what items should be identified — it has to ask the AI component. Since there can be many menus in a particular step (such as identifying several items, one at a time), the ScreenScraper has its own loop for parsing different kinds of input. The various kinds of input include menus, prompts ("Eat what?"), location requests ("To what position do you want to be teleported?"), and ordinary top-line messages.

The ScreenScraper and NetHack may communicate several times before the next step is started. What is vital is that the ScreenScraper leaves NetHack in "action mode". There are no unresolved prompts, menus, --More-- messages, etc. This means that the Cartographer can assume that the cursor is on TAEB. This invariant makes TAEB cope perfectly with having a different character glyph due to polymorph or invisibility. Other bots (such as "moomaster") have struggled with this, since they guessed that the white @ on the screen was the character.

Since the ScreenScraper leaves NetHack in action mode, the Cartographer knows that every character between lines 2 and 22 inclusive are the map. If there were a menu on screen, TAEB would get massively confused, thinking there are suddenly hundreds of monsters floating in a void on the right side of the screen.

The Cartographer looks at each cell on the virtual terminal to update each tile in TAEB's internal map. It uses cell glyph and color to determine what is on each tile. For example, a gray { is a sink.¹ The sink is mapped to a TAEB::World::Tile::Sink object. This sink object of course knows things every tile knows, such as how many times it has been stepped on by the character, what items are on the tile, and what engraving is on the tile. In addition, sinks know whether they have produced a black pudding, ring, or foocubus through kicking.

One curiosity is that the map is updated after we publish messages from the ScreenScraper. If TAEB moves and receives the message "You see here an opal ring", the ScreenScraper sees that and announces that there is an opal ring on the current tile. However, since the Cartographer has not run, the current tile has not been updated yet, so the item would be incorrectly added to the previous tile. The crude solution we currently use is to freeze the publisher until after the map has been updated. We are not yet sure of a better solution to this.

Output

At this point, the map has been updated, all input from NetHack has been parsed, and the appropriate announcements have been published. We then redraw the screen for the users watching the bot play. We also check if the user typed a key. TAEB has many debug commands, such as ~ to open up an interactive REPL to inspect and change TAEB's internal state.

Finally, it is time to involve the AI. TAEB simply asks the AI "what now?" This can involve arbitrarily complex calculations. One of my favorite features of TAEB's design is that its AI is pluggable. TAEB requires only a few methods be implemented by the AI, and we are working to trim that down to just "what now?"

There are currently two AIs being developed: Behavioral (by most of the core team) and Planar (by ais523). Both are worthy of many posts. We hope to entice more developers to work on TAEB AIs! One of our future projects will be Interhack on top of TAEB. The intelligence of a TAEB::AI::Interhack would not actually be "artificial", of course.

Since TAEB is a framework, the AI will inevitably be tightly-bound to TAEB (though, as explained, the converse is not true). The AI is expected to call all sorts of methods on TAEB objects, such as "am I currently blind?" (TAEB->is_blind), "do I know the identify spell?" (TAEB->find_spell("identify")), "is there a fountain on this level?" (TAEB->current_level->has_type("fountain")), and so on. We think this is a huge boon to developers seeking to write NetHack bots — they can focus solely on the interesting bits of artificial intelligence. We handle the programmatic NetHack.

The AI tells TAEB what to do each step through TAEB::Action objects. These abstract away the details of interacting with prompts and menus. The AI can say:

    sub next_action {
        # ...
        if (my $lizard_corpse = TAEB->find_item("lizard corpse")) {
            return TAEB::Action::Eat->new(
                food => $lizard_corpse,
            );
        }
        # ...
    }

The action will take care of responding to the "Eat what?" prompt with $lizard_corpse's inventory letter. The action will also respond to unexpected prompts, such as "There is a fox corpse here; eat it?". This declarative nature pervades TAEB's design to great effect.

Finally, we send the keystrokes for the current action (such as "e" for eat) to NetHack.

The action will participate in the next iteration of the main loop to respond to prompts and listen for events caused by the action. For example, if TAEB receives the message "You stop eating the lizard corpse", then the action will know not to remove the lizard corpse from the inventory.

Conclusion

The design of TAEB's main loop has had a significant impact on its (relative) longevity. The design of handling all of the input from NetHack as soon as possible drove much of the system's design in very positive ways. It encouraged the reification of Actions, which has been immeasurably useful. It even encouraged the addition of the pubsub system and its later enhancement to announcements. Though there are still some lingering quirks caused by this main loop design (such as teleportation traps), none seem insurmountable.

Footnotes

TAEB remaps some dungeon features to avoid conflicts; in core NetHack, sinks are gray #, but so are corridors. Eventually we may be able to remove some remappings due to increasing Cartographer sophistication, but there's no rush. TAEB does not change too many characters as to render its NetHack screens unfamiliar. (back)

The Tactical Amulet Extraction Bot

Friday, June 12, 2009

Synchronizing with NetHack

Consistently slow

Avoiding the issue

Inline pings

Telnet ping/pong

Expectations and Confusion

Abusing POSIX Job Control for Fun and Profit

Variations on the theme

Conclusion

Sunday, June 7, 2009

Anatomy of a Step

Input

Output

Conclusion

Footnotes

Developers

Blog Archive

Tags