Friday, September 25, 2015

hackerrank Mr K marsh

This puzzle was quite the mind bender for me. I didn't know that performance was one of the requirements so my first attempt worked for the most part but failed some of the test cases due to timeouts. This was only my second challenge outside of the warmups.

The Timed Out Method


Briefly, my first attempt was simply iterate through each 'cell' and within each cell I iterated through all the possible perimeters. As it iterated, I just kept the largest, allowable perimeter. This of course had many, many loops. 1 loop to check so that is O(n^2), then another loop within the loop so another O(n^2) which would make it O(n^4). Then a final loop to validate the perimeter... which for simplicity say another n-order so at most O(n^5). Of course, I tried to cut corners by looking for largest perimeter and skip some loops, but that will be at best be O(n^5 / x) where x is some fixed value which will not outperform the size of the data.

Comparing my solution to a couple that I found on-line, I iterate through rows instead of columns. This is quite minor as long as you remain consistent which dimension goes to which, then they are essentially the same except for the terminology used in the notations or variables. A bigger difference is that I iterate from the largest potential perimeter. I don't think this made a huge difference depending on how the data is provided. Worst case O is the same either way.

Accepted Method


I attempted to write how I reached the need for each step in the code but this turned out to be very messy and tedious and likely very boring to read. But I hope this explanation would be more helpful than just reading the code.

Another matrix is used to calculate the number of marshes. This is used to determine if the top border or bottom border contains a marsh. This matrix contains the accumulation of marshes per row. Thus 'x.x.x.x.....x' will become '112233444445'. The reason for this is to easily identify that 11 is a set, 22 is a set, and so on and so forth. When comparing the left part of the perimeter to the right, if the numbers are different that means that a marsh exists. But when the iteration reaches (for example) as 4 for the top left corner while the top right corner is on the last 4, then a border is found without having to iterate through each cell to find a marsh.

The first logical loop is to identify the left and right side of the perimeter. I chose to start with the far left and the far right (the other solutions start with the far left and the second column which is fine too). This requires two loops to identify all possible widths with different starting points.

The next loop is to identify the height of the perimeter. I started with the top most and bottom most. But as I debugged, I found that the subset perimeters are always true. Thus if I started with the smallest height, I eliminate one of the loops.

Depending on the loop, this determines whether the perimeter should be calculated or not. Initially, we assume we have no rows. When we finally iterate to a partial row with no marshes, we set the initial row.

Iterate to the next row, if the row has no marshes, then calculate the perimeter (store only if it is larger than a previous calculation). If the row contains an x, then do not calculate the perimeter but continue to iterate because it is still a potential side. If the end points of the row contains an x, then that is the end of the iteration for that top boundary by resetting the initial row. Continue to iterate to find other potential loops. Logically, the checks should be in reverse order (ie marshes as endpoints, marshes in boundary, then no marshes).

Hopefully that would be of some help while going over the other codes online.

Attempt to Illustrate the Algorithm

First find top part of the fence. If the land is:
x.x.x.x.....x
x.x...x.....x
x.x...x..x..x
x.x...x.....x
x.x...x..x..x

You'll find one acceptable top in line one, column 8-12. Then look if the next row is a potential bottom. If a potential bottom, record the row. If not acceptable, then do not record as potential row. If To be efficient, check if the left-most and right-most are marsh-free. If marsh-free, check if the two values are the same. This will reduce a loop to check all spaces are marsh-free.

x.x.x.x.....x
x.x...x.....x (potential row)
x.x...x..x..x (not potential row)
x.x...x.....x (potential row)
x.x...x..x..x (not potential row)

After looping throw all rows, you'll find that row #4 will create the largest perimeter for this iteration. If the perimeter is larger than the last perimeter (always true for the first set), record new perimeter value.

x.x.x.x.....x
x.x...x.....x
x.x...x..x..x
x.x...x.....x
x.x...x..x..x

Repeat above steps for all other valid top rows. I did not include any logic to exclude subsets but one could be included to make the algorithm faster (efficiency is debatable as it probably requires another matrix).

There is one other major set but you'll find is smaller so will not be recorded.

x.x.x.x.....x
x.x...x.....x
x.x...x..x..x
x.x...x.....x
x.x...x..x..x

Edit (10/31/2015):
Added an illustration portion to hopefully better show what is going on.

Reference

https://www.hackerrank.com/challenges/mr-k-marsh