# NP (complexity)

Is
P = NP ? |

In computational complexity theory, **NP** is one of the most fundamental complexity classes.
The abbreviation **NP** refers to "nondeterministic polynomial time."

Intuitively, **NP** is the set of all decision problems for which the instances where the answer is "yes" have efficiently verifiable proofs of the fact that the answer is indeed "yes". More precisely, these proofs have to be *verifiable* in polynomial time by a deterministic Turing machine.
In an equivalent formal definition, **NP** is the set of decision problems where the "yes"-instances can be accepted in polynomial time by a non-deterministic Turing machine. The equivalence of the two definitions follows from the fact that an algorithm on such a non-deterministic machine consists of two phases, the first of which consists of a guess about the solution, which is generated in a non-deterministic way, while the second consists of a deterministic algorithm that verifies or rejects the guess as a valid solution to the problem.^{[2]}

The complexity class **P** is contained in **NP**, but **NP** contains many important problems, the hardest of which are called **NP**-complete problems, whose solutions are sufficient to deal with any other NP problem in polynomial time. The most important open question in complexity theory, the **P** = **NP** problem, asks whether polynomial time algorithms actually exist for **NP**-complete, and by corollary, all **NP** problems. It is widely believed that this is not the case.^{[3]}

## Formal definition

The complexity class **NP** can be defined in terms of NTIME as follows:

Alternatively, **NP** can be defined using deterministic Turing machines as verifiers. A language *L* is in **NP** if and only if there exist polynomials *p* and *q*, and a deterministic Turing machine *M*, such that

- For all
*x*and*y*, the machine*M*runs in time*p*(|*x*|*)*on input*(x,y)* - For all
*x*in*L*, there exists a string*y*of length*q*(|*x*|) such that*M(x,y)*= 1 - For all
*x*not in*L*and all strings*y*of length*q*(|*x*|),*M(x,y)*= 0

## Introduction

Many natural computer science problems are covered by the class **NP**.
In particular, the decision versions of many interesting search problems and optimization problems are contained in **NP**.

### Verifier-based definition

In order to explain the verifier-based definition of **NP**, let us consider the subset sum problem:
Assume that we are given some integers, such as {−7, −3, −2, 5, 8}, and we wish to know whether some of these integers sum up to zero. In this example, the answer is "yes", since the subset of integers {−3, −2, 5} corresponds to the sum (−3) + (−2) + 5 = 0. The task of deciding whether such a subset with sum zero exists is called the *subset sum problem*.

To answer if some of the integers add to zero we can create an algorithm which obtains all the possible subsets. As the number of integers that we feed into the algorithm becomes larger, the number of subsets grows exponentially and so does the computation time. However, notice that, if we are given a particular subset (often called a *certificate*), we can easily check or *verify* whether the subset sum is zero, by just summing up the integers of the subset. So if the sum is indeed zero, that particular subset is the *proof* or witness for the fact that the answer is "yes". An algorithm that verifies whether a given subset has sum zero is called *verifier*. A problem is said to be in **NP** if there exists a verifier for the problem that executes in polynomial time. In case of the subset sum problem, the verifier needs only polynomial time, for which reason the subset sum problem is in **NP**.

The "no"-answer version of this problem is stated as: "given a finite set of integers, does every non-empty subset have a nonzero sum?". The verifier-based definition of **NP** does *not* require an easy-to-verify certificate for the "no"-answers. The class of problems with such certificates for the "no"-answers is called **co-NP**. In fact, it is an open question whether all problems in **NP** also have certificates for the "no"-answers and thus are in **co-NP**.

### Machine-definition

Equivalent to the verifier-based definition is the following characterization: **NP** is the set of decision problems solvable by a non-deterministic Turing machine that runs in polynomial time. (This means that there is an accepting computation path if a word is in the language – **co-NP** is defined dually with rejecting paths.) This definition is equivalent to the verifier-based definition because a non-deterministic Turing machine could solve an NP problem in polynomial time by non-deterministically selecting a certificate and running the verifier on the certificate. Similarly, if such a machine exists, then a polynomial time verifier can naturally be constructed from it.

### Examples

This is an incomplete list of problems that are in **NP**.

- All problems in
**P**(For, given a certificate for a problem in**P**, we can ignore the certificate and just solve the problem in polynomial time. Alternatively, a deterministic Turing machine is also trivially a non-deterministic Turing machine that just happens to not use any non-determinism.) - The decision problem version of the integer factorization problem: given integers
*n*and*k*, is there a factor*f*with 1 <*f*<*k*and*f*dividing*n*? - The graph isomorphism problem of determining whether two graphs can be drawn identically
- All NP-complete problems, e.g.:
- A variant of the travelling salesman problem, where we want to know if there is a route of some length that goes through all the nodes in a certain network
- The boolean satisfiability problem, where we want to know whether or not a certain formula in propositional logic with boolean variables is true for some value of the variables

## Why some NP problems are hard to solve

Because of the many important problems in this class, there have been extensive efforts to find polynomial-time algorithms for problems in **NP**. However, there remain a large number of problems in **NP** that defy such attempts, seeming to require super-polynomial time. Whether these problems are not decidable in polynomial time is one of the greatest open questions in computer science (see **P** = **NP** problem for an in-depth discussion).

An important notion in this context is the set of NP-complete decision problems, which is a subset of **NP** and might be informally described as the "hardest" problems in **NP**. If there is a polynomial-time algorithm for even *one* of them, then there is a polynomial-time algorithm for *all* the problems in **NP**. Because of this, and because dedicated research has failed to find a polynomial algorithm for any **NP**-complete problem, once a problem has been proven to be **NP**-complete this is widely regarded as a sign that a polynomial algorithm for this problem is unlikely to exist.

However, in practical uses, instead of spending computational resources looking for an optimal solution, a good enough (but potentially suboptimal) solution may often be found in polynomial time. Also, the real life applications of some problems are easier than their theoretical equivalents. For example, inputs to the general Travelling salesman problem need not obey the triangle inequality, unlike real road networks.

## Equivalence of definitions

The two definitions of **NP** as the class of problems solvable by a nondeterministic Turing machine (TM) in polynomial time and the class of problems verifiable by a deterministic Turing machine in polynomial time are equivalent. The proof is described by many textbooks, for example Sipser's *Introduction to the Theory of Computation*, section 7.3.

To show this, first suppose we have a deterministic verifier. A nondeterministic machine can simply nondeterministically run the verifier on all possible proof strings (this requires only polynomially many steps because it can nondeterministically choose the next character in the proof string in each step, and the length of the proof string must be polynomially bounded). If any proof is valid, some path will accept; if no proof is valid, the string is not in the language and it will reject.

Conversely, suppose we have a nondeterministic TM called A accepting a given language L. At each of its polynomially many steps, the machine's computation tree branches in at most a constant number of directions. There must be at least one accepting path, and the string describing this path is the proof supplied to the verifier. The verifier can then deterministically simulate A, following only the accepting path, and verifying that it accepts at the end. If A rejects the input, there is no accepting path, and the verifier will never accept.

## Relationship to other classes

**NP** contains all problems in **P**, since one can verify any instance of the problem by simply ignoring the proof and solving it. **NP** is contained in **PSPACE**—to show this, it suffices to construct a **PSPACE** machine that loops over all proof strings and feeds each one to a polynomial-time verifier. Since a polynomial-time machine can only read polynomially many bits, it cannot use more than polynomial space, nor can it read a proof string occupying more than polynomial space (so we do not have to consider proofs longer than this). **NP** is also contained in EXPTIME, since the same algorithm operates in exponential time.

The complement of **NP**, co-NP, contains those problems which have a simple proof for *no* instances, sometimes called counterexamples. For example, primality testing trivially lies in co-NP, since one can refute the primality of an integer by merely supplying a nontrivial factor. **NP** and co-NP together form the first level in the polynomial hierarchy, higher only than **P**.

**NP** is defined using only deterministic machines. If we permit the verifier to be probabilistic (this however, is not necessarily a BPP machine ^{[4]} ), we get the class **MA** solvable using an Arthur-Merlin protocol with no communication from Merlin to Arthur.

**NP** is a class of decision problems; the analogous class of function problems is FNP.

The only known strict inclusions came from the space hierarchy theorem and the time hierarchy theorem, and respectively they are **NP** **NEXPTIME** and **NP** **EXPSPACE**.

## Other characterizations

In terms of descriptive complexity theory, **NP** corresponds precisely to the set of languages definable by existential second-order logic (Fagin's theorem).

**NP** can be seen as a very simple type of interactive proof system, where the prover comes up with the proof certificate and the verifier is a deterministic polynomial-time machine that checks it. It is complete because the right proof string will make it accept if there is one, and it is sound because the verifier cannot accept if there is no acceptable proof string.

A major result of complexity theory is that **NP** can be characterized as the problems solvable by probabilistically checkable proofs where the verifier uses O(log *n*) random bits and examines only a constant number of bits of the proof string (the class **PCP**(log *n*, 1)). More informally, this means that the **NP** verifier described above can be replaced with one that just "spot-checks" a few places in the proof string, and using a limited number of coin flips can determine the correct answer with high probability. This allows several results about the hardness of approximation algorithms to be proven.

## Example

The decision version of the travelling salesman problem is in **NP**. Given an input matrix of distances between *n* cities, the problem is to determine if there is a route visiting all cities with total distance less than *k*.

A proof certificate can simply be a list of the cities. Then verification can clearly be done in polynomial time by a deterministic Turing machine. It simply adds the matrix entries corresponding to the paths between the cities.

A non-deterministic Turing machine can find such a route as follows:

- At each city it visits it "guesses" the next city to visit, until it has visited every vertex. If it gets stuck, it stops immediately.
- At the end it verifies that the route it has taken has cost less than
*k*in O(*n*) time.

One can think of each guess as "forking" a new copy of the Turing machine to follow each of the possible paths forward, and if at least one machine finds a route of distance less than *k*, that machine accepts the input. (Equivalently, this can be thought of as a single Turing machine that always guesses correctly)

A binary search on the range of possible distances can convert the decision version of Traveling Salesman to the optimization version, by calling the decision version repeatedly (a polynomial number of times).

## References

- ↑ R. E. Ladner "On the structure of polynomial time reducibility," J.ACM, 22, pp. 151–171, 1975. Corollary 1.1. ACM site.
- ↑ Alsuwaiyel, M. H.:
*Algorithms: Design Techniques and Analysis*, p. 283 - ↑ {{#invoke:Citation/CS1|citation |CitationClass=journal }}
- ↑ https://complexityzoo.uwaterloo.ca/Complexity_Zoo:E#existsbpp

- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
*Introduction to Algorithms*, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 34.2: Polynomial-time verification, pp. 979–983. - {{#invoke:citation/CS1|citation

|CitationClass=book }} Sections 7.3–7.5 (The Class NP, NP-completeness, Additional NP-complete Problems), pp. 241–271.

- David Harel, Yishai Feldman. Algorithmics: The Spirit of Computing, Addison-Wesley, Reading, MA, 3rd edition, 2004.

## External links

- American Scientist primer on traditional and recent complexity theory research: "Accidental Algorithms"

{{#invoke: Navbox | navbox }}