Saturday, August 22, 2009

Planar - a TAEB AI

TAEB::AI::Planar is a NetHack AI, written as a plugin for TAEB. (TAEB the framework does not contain AI-specific logic itself; instead, it aims to be a general platform that can be used both by AIs and by other programs that want to interact with NetHack; for instance, it would be plausible to use TAEB to implement an Interhack-like program.) The first TAEB AI, historically, is the one that is now called Behavioral (and which was originally just part of TAEB itself before it was split out); most TAEBs that you will see on, for instance, run that AI. Planar is a newer attempt to create an AI, by me, Alex Smith (ais523) and Stefan O’Rear (sorear); at present, it’s used by the bots TAEB523 and sortaeb523 (which run from much the same codebase, but are generally configured differently). Planar’s best score so far is 34179, in the large room at the end of the second level of Sokoban, which was scored locally on my computer; it died walking over a cockatrice corpse when blind and gloveless, the dangers of which it hadn’t been told about.

Planar treats NetHack as a pathfinding exercise; just as a computer would use a pathfinding algorithm to determine the shortest route from one room to another, Planar uses pathfinding algorithms to determine the best ‘route’ from its current situation to its eventual goal. The AI is based around resources, which are things that the AI can spend, can use, or has to be careful not to run out of (for instance, gold or hitpoints), threats, which are things that can be mitigated or fixed, but could cause plans to be more dangerous or which prevent them working correctly if not fixed (for instance, monsters or a trap on the current square), and plans, which are specific goals that it aims for, along with an idea of how to accomplish them.

Here is how Planar goes about deciding what to do on any given turn. (Note that although Planar remembers what it was thinking on previous turns, it recalculates most things every turn to allow for new information, rather than blindly continuing with its previous plan.) Each stage of the calculation is accompanied by a picture showing an actual situation from a game of NetHack that Planar has played (in fact, they’re all from the same game), and the text explains how Planar went about dealing with it.



The ‘start’ of Planar’s main loop (that is, when it starts dealing with the current turn, rather than the previous turn), is checking for threats that might make plans more risky. In the picture above, TAEB (the @ sign which appears inverse-video in this screenshot because it has a blinking cursor on it) has decided that it wants to eat the corpse (the % sign) on the ground, probably because it’s relatively hungry and wants to get to the corpse before it rots. Although in theory it could walk directly to the corpse and eat it, there are two potential problems; to the west is a nymph (a lowercase n) who would steal its items if given a chance, and to the east is a golem (an apostrophe) who, being a hostile monster, wants to beat the poor TAEB up. (Incidentally, Planar always asks for more information, if it’s available and doesn’t cost a turn, in order to analyse the situation as well as possible; for instance, if the material the golem was made of (relevant because it determines how dangerous it is) were undeducible from its colour on-screen, it would ask the framework to request NetHack to give more details, in this case by sending a farlook command to examine it remotely in detail.) The two monsters here are threats; the nymph is a threat to the TAEB’s items (which will be a source of resources such as nutrition, damage-potential, and armour), whereas the golem will be a threat to the TAEB’s hitpoints (a resource in themselves). Note that all threats, as well as everything else in Planar, are quantified so that it knows their exact value. (There are two ways to represent the value of things inside Planar; one is as a list of amounts of resources, the other is as a number that attempts to calculate the total value of all the resources in question by using exchange rates. These alter according to the amounts of the resources in question; for instance, normally one hitpoint is considerably more valuable than one unit of nutrition, but that rate would be rather different if the TAEB were healthy but fainting from hunger.) In this case, it would estimate the amount of theft the nymph would likely carry out (and as any NetHack player knows, that’s generally more than you can easily replace); and it would calculate the amount of damage the golem could deal per turn in mêlée combat (taking the worst-case scenario; Planar is just as scared of the Random Number Generator as any human would be).

The threat-check will quickly try to work out how quickly, on average, the monsters could get to each square on the map, and assign a threat function to those squares that indicates how damaging the threats could be if we tried to route to those squares at a particular moment. This is shown in the tactical map above; colder colours represent safer squares, with warmer colours being squares that are more threatening. So, for instance, going to the west past the nymph without dealing with it first is likely to be suicidal, thus the whole area there is filled with magenta (the worst possible colour); likewise, going to the east past the golem without dealing with it first is also a bad idea. The amount of threat that actually matters depends on when we finish business on the square in question; the map above shows the amount of threat (together with the amount of tactical cost, which is insignificant on that diagram) on each square at the first moment we can reach it, but carrying through time-consuming plans on those squares would give the monsters more time to catch up. As a result, the nearest unexplored corridor is a safe blue, as the TAEB could outrun the monsters along it (golems are rather slow, and the TAEB has a head-start over the equally speedy nymph); but the second-nearest unexplored corridor ends in a red square, because although the TAEB could get away from the golem eventually, the golem would get a few hits in as it walked past. So, if it was going exploring, it would use the safer first path.

However, just now the TAEB isn’t exploring; it’s hungry. The corpse shows up as brown on the above map, but that map (which was generated from the tactical planning stage) is just taking into account the length of time the corpse takes to walk to; eating the corpse will also take some time, and the threat function will give a higher threat value for a plan involving eating the corpse once the time taken is known (which will be calculated in the strategic planning stage).

There’s one other important task carried out by threat checking; for each square on the map (to be precise, the current level; as routing for other levels is cached, any threat calculation for them would be ignored anyway), it tags the square with information on how to mitigate the threat or threats that can get there. So for instance, there will be a plan to Mitigate the golem (i.e. kill it, eliminate it, or drive it away somehow) in order to make it safer to eat the corpse. (After it’s killed the golem, it’ll likely see that the nymph is also problematic, and it will try to get rid of her too, probably by throwing sharp pointy objects at her. This is another outcome of the threat check; it’s rather dangerous to attack a nymph in mêlée if you can help it, and the squares further away from her would show a much lower risk as a result, meaning that projectiles would be favoured over melee weapons.)

The bottom of the screen shows the decisions that the other stages of the loop have made; strategic planning has decided that it’s still worth eating the corpse, that getting rid of the golem first is a good idea, and that mêlée is the best way to do it; and tactical planning has decided that the best way to route near the golem is to walk east of it until the square immediately to its west. The chain of plans involved can be seen on the penultimate line of the screen; the final (l) on the screen is added by the framework, which knows that pressing l is the correct way to walk to the east (whereas the AI would just have returned a “walk east” action to the framework).

Tactical planning


Once all the threats have been placed on the map, the next step is to work out tactics for the turn. Again, this is done by working out how expensive (in terms of resources) it is to move to any given square on the map; but this time, it’s our own routing we worry about, rather than that of the monsters.

The diagram above shows our TAEB just after it’s killed another monster, again wanting to eat it. This time, the monsters around are less threatening; there’s just an iguana (the nearby colon) to worry about. Although this particular turn came to the same conclusion as the last one analysed (kill the monster so you can eat the corpse in peace), tactical planning does not care about what the TAEB eventually ends up doing, but rather, how best to get to each square on the map.

This is done in terms of tactical plans; a single tactical plan contains a suggestion of how to get from one square to another square, together with information on how risky it would be (taken from the threat map and from information on the cost of the action required). Plans in Planar are very specific; as opposed to Behavioral’s behaviours, which are few in number and general (such as FixStatus and Defend), a plan in Planar contains information about exactly what is to be done (such as ‘move from (12,10) to (13,10) by opening the door on (13,10) then walking there’, which would have a cost in time based on how much time it would likely take to force the door open). It would be entirely common for multiple plans to be considered; for instance, in that example with the door, Planar would also be considering kicking that door down, which could be faster if, say, it were approaching the door from the diagonal (as a broken doorway can be entered diagonally, but one with a door in can’t be).

Tactical planning is a form of routing, in physical NetHack space; however, unlike most other AIs, a large amount of AI logic is done during tactical planning. For instance, Behavioral has a behaviour to open doors so that they don’t block routing later; in Planar, opening doors is only done if they’re in the way. Likewise, Planar will push boulders (trying not to block corridors in the process) and even tunnel through walls if that’s necessary to go where it wants to go, or if that’s the fastest way; the tactical planning stage will have a plan all ready for any strategic plan that needs to move to the square, for whatever reason. The actual routing is done via Dijkstra’s algorithm, in order to gain routing information for the whole map at once.

The diagram above shows routing costs, as determined by tactical planning, to the whole level; as opposed to the previous example, TAEB523 now has a pickaxe, and thus can route into the walls if it wants to. Note that many of the nearby walls are considerably more expensive to route to than the corridors and rooms near them; but the distant walls have comparable costs to the nearby rooms. The reason for this is that digging nearby takes a sufficiently long time that the iguana will catch up to the TAEB and attack; but digging distantly would happen in the future, after the iguana had already been outrun.

At the end of the tactical planning stage, therefore, it’s possible to quickly and efficiently answer any query from the strategic planning stage about how to get somewhere, and how risky doing so would be. (Incidentally, in Planar, ‘risky’ is a technical term meaning ‘difficult, expensive, or both’.) As a result, huge numbers of strategic scenarios can be tried, to determine which one works best.

Strategic planning


Strategic planning is the heart of Planar; threat checks and tactical planning are simply calculating information so that strategic planning can refer to it quickly, whereas strategic planning decides what, overall, the bot is actually going to do.

To start with, there are several standard plans that work on improving Planar’s resource situation; for instance, eating when hungry, collecting ammo, and picking up useful items. (An item is useful enough to be picked up if its value to us, say as a source of food or as a backup armour in case the one we’re already carrying around turns out to be cursed, is higher than the drawbacks, such as its weight.) This gradual improvement of the character is known as resource conversion; if the TAEB can manage a net gain in resources, it does so, ignoring the main plan for a while (and going for the largest gains first). Therefore, there’s no need for every major plan to take the details of things like feeding and shopping into account; it’ll happen automatically without a specific request to suppress it. If none of the resource-conversion plans can produce an improvement, though, the main plan takes over; this is a user-configurable overall goal that the bot is trying to accomplish. In the example above, for instance, the main plan is SlowDescent, which explores the dungeon from the top down, thoroughly exploring each level before going onto the next one.

In order to determine how to accomplish the goal in question, strategic planning basically does routing in plan-space, again using Dijkstra’s algorithm. In this case, the steps on the route are plans that make other plans possible. So, for instance, the chain of plans shown in the picture above are SlowDescent | by exploring [this level] | by exploring [a specific square] – by walking towards it; the final plan there is a tactical plan, the rest are strategic. In order to come to this decision, Planar will start by asking SlowDescent if it can be accomplished directly; the answer in this case is always ‘no’, because there is no associated action (it’s a “metaplan”, which is defined in terms of plans rather than in terms of actions). As a result, it’ll look for plans that make up parts of the slow descent, or make it easier to accomplish; in this case, the plans in question will be exploring level 1, exploring level 2, exploring level 3, and exploring level 4, in that preference order. (There’s no problem with more than one plan having the same preference; in that case, Planar would simply take the cheapest, leading to a “nearest first” exploration of the dungeon. However, SlowDescent is specifically trying to explore from the top downwards.)

Each of the level exploration plans are likewise metaplans, which suggest exploring particular locations on the level in question. Although Planar would like to explore on the first or second floors, if possible, those floors were apparently explored out at the time; so as the next-best option, it would consider the exploration spots on the third floor. (The picture above is showing the cache of exploration spots on the level, cached for speed; warm colours show squares in need of exploration, cool colours showing squares that have already been explored.) This time, all the exploration spots have the same preference level; so the least risky exploration is taken, which in this case, with no visible threats, is the nearest. (The tactical planning stage will already have the risk of reaching each of the exploration spots, as well as everywhere else in the dungeon, ready-calculated so that it can be given very quickly when requested; so despite the large number of exploration spots shown, strategic planning will not take much time.)

Before enacting the plan in question, various checks are done. Plans are skipped if they require more of any resource than is available to spend; so, for instance, exploring one square is normally fine, but if that square is next to a demon prince and we’re low on hitpoints, it will be considered far too risky in terms of hitpoints to attempt. (A different plan would be favoured, therefore; probably running away to carry out business in another part of the dungeon.) Likewise, something can be too risky in terms of anything else. The alternative plans suggested by threat check will also be considered, to see if eliminating the threat first and then carrying out the plan will be a better idea than just carrying out the plan directly. A plan can be suppressed by the previous turn’s plan, to avoid losing sight of a slightly-longer-term goal in favour of a short-term goal (so, for instance, it will unequip a shield in order to wield a two-handed digging tool, even though normally improving its armour would be more urgent than digging through a wall). Finally, a plan might not be enacted due to success measurement deciding that it is doomed to fail, or create an oscillation, based on observations rather than on known information.

Success measurement


Once Planar has decided on a plan, it will find out what action is associated with it, and tell the framework that that’s the desired action for the turn. After the framework has processed the results of the action and asked for the next one, though, Planar does after-the-fact processing on the current turn before moving on to consider the next turn’s plan. In particular, it’s trying to determine whether the plan worked as expected or not, and whether it’s stuck in an oscillation.

Each plan comes with a test to determine whether or not it accomplished what it set out to do. For instance, a plan to open a door will check to see if the door is now open; if it isn’t, it will check to see if it was for a known reason (doors in NetHack sometimes randomly fail to open, which is not a problem), or if it was for an unknown reason (for instance, Planar doesn’t yet know it can’t open a door if it’s been forced into jackal form by a werejackal, but it can determine that it tried to open a door, it didn’t work, and it doesn’t know why). If a plan fails unexpectedly, it won’t be retried for some time, which increases sharply the more times the plan in question has failed in a row. (There’s an exception to this; if something happens to make Planar think that the situation has changed, for instance if a tile relevant to the plan in question changes glyph or one of the plans marked as part of or an alternative to the plan succeeds, it will try again instantly.) This happens for both strategic and tactical plans; as always, failure will simply cause it to seek alternatives. In this way, Planar can get around deficiencies in either its own, or the framework’s, understanding of NetHack; this ability to determine what works by experiment will helpfully aid it in handling unexpected situations (whereas Behavioral normally responds to such situations with an infinite loop as it tries a doomed action over and over again).

More subtly, a plan can seem to be working correctly, but leave TAEB stuck in an infinite loop; this is the dreaded “oscillation” that’s so common in NetHack-playing AIs. The picture above shows TAEB near a throne room (full of letters representing monsters); it looks like a good place to explore from afar, but upon approaching more closely, it’s possible to see that it’s full of monsters. Planar tends to be cautious around monsters, especially when there are alternative plans available; so it would walk away from the throne room, forget the location of the monsters (neither the TAEB framework nor Planar currently has any way to track the locations of monsters that are out-of-sight), walk towards it again, and repeat. The repeated indecision in which plan to accomplish is detected in success measurement; and as a result, the success measurement deliberately makes up its mind to try one plan or the other, by temporarily interfering with both the strategic planning and tactical planning stages to help force a particular decision. In the map above, therefore, most of the routing map is unexpectedly grey, rather than the usual blue; this is because success measurement has detected an oscillation and decided to allow only plans that move towards the throne room for a few turns (by causing routing to fail when moving away from it). As a result, the TAEB has decided to look at the items on the floor in the room instead; and has concluded that the safest way to do that is to slaughter the inhabitants of the throne room, picking them off at range using projectiles. (The messages at the top show that the previous turn, it threw something at a monster and killed it; it wants to do that again, but needs to reposition itself first.)