Today, AI isn't just limited to voice assistants or smart homes; it's venturing into the realm of coding. Numerous AI tools, primarily powered by LLMs, are emerging to generate code. GitHub's Copilot was one of the pioneers in this space. Yet, like all code, it's essential that AI-generated scripts are checked and validated to ensure they function correctly. This aligns with the traditional approach in professional coding, where every piece of code is reviewed by peers. What makes LLMs stand out is their vast exposure to a plethora of code samples online. This vast knowledge helps them predict the next step in a coding sequence. Surprisingly, they can even tackle challenging algorithmic problems, like the knapsack issue, a famous NP-hard problem. But can the AI solve the problem for given input?

Knapsack problem

Given a set of items, each with a weight and a value, determine the number of each item to include in a knapsack so that the total weight doesn't exceed a given weight capacity while maximizing the total value.

Program Guesses the Output of Another Program

American mathematician Alonzo Church found problems, that can not be decided by algorithms. Predicting whether a program will run indefinitely or eventually stop is undecidable. This means that based on its code, a program can't determine if another program will terminate. This is known as the halting problem. A primary complication related to this issue is self-reference, reminiscent of dilemmas such as Gödel's incompleteness theorem and the computation of Kolmogorov complexity. Suppose we have a hypothetical function H(p, i) that indicates whether program p will terminate for input i in a finite amount of time. Then consider a function, g.

def g(x):
    if h(i,i):
        loop_forever()
    else:
        return

Now, reflect on the paradox of g(g). If h(g, g) returns true, indicating termination in finite time, it would lead to a never-ending loop, which contradicts its output. Conversely, if h(g, g) is false, suggesting the function runs indefinitely, it would terminate, which is also contradictory. Due to this paradox, we cannot assume the existence of a program that reliably predicts another program's termination within a finite timeframe.

Can ChatGPT Guess The Output of Another Program?

I gently asked the ChatGPT to write a Python code, that solves the knapsack problem. It writes the correct code. After I asked to guess the output of the code it wrote on given inputs.

My experiments are not well designed evaluation. I had 9 different size knapsack problems. The first 3 problem had maximum 10 items, the others were much bigger problems with thousands of items. ChatGPT imidiately started to interpret the big input as a totally different data, like temprature values. I had to prompt engineer to keep its focus on the original problem. After that it still failed with finding the correct solution. So I just tested one big input.

You find the 3 small problems at the end of the blogpost. The 3-item problems were succesfully solved by ChatGPT, but the 10-item problem was failed. The predicted sum of units was over the capacity and I tried to explain this to it, but the 2nd or 3rd trials were also fails.

Problem 1:

  • Capacity: 4 units

    Items:

    1. Value = 1,
      Weight = 4

    2. Value = 2,
      Weight = 5

    3. Value = 3,
      Weight = 1

Problem 2:

  • Capacity: 3 units

    Items:

    1. Value = 1,
      Weight = 4

    2. Value = 2,
      Weight = 5

    3. Value = 3,
      Weight = 6

Problem 3:

  • Capacity: 50 units

    Items:

    1. Value = 17,
      Weight = 36

    2. Value = 14,
      Weight = 33

    3. Value = 2,
      Weight = 3

    4. Value = 10,
      Weight = 18

    5. Value = 14,
      Weight = 42

    6. Value = 5,
      Weight = 45

    7. Value = 12,
      Weight = 46

    8. Value = 10,
      Weight = 46

    9. Value = 19,
      Weight = 20

    10. Value = 5,
      Weight = 28

You can fit only the item 3 in the Problem 1, so the value you can put in the knapsack is 3. In the Problem 2 each item has bigger weight, than the capacity, so the result is 0. Problem 3 is left as an exercise for the reader.