Tate conjecture: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>TakuyaMurata
en>Niceguyedc
m WPCleaner v1.34 - Repaired 1 link to disambiguation page - (You can help) - James Milne
 
Line 1: Line 1:
In [[computer science]], an '''inverted index''' (also referred to as '''postings file''' or '''inverted file''') is an [[index (database)|index data structure]] storing a mapping from content, such as words or numbers, to its locations in a [[Table (database)|database file]], or in a document or a set of documents. The purpose of an inverted index is to allow fast [[full text search]]es, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its [[Index (database)|index]]. It is the most popular data structure used in [[document retrieval]] systems,<ref>{{Harvnb |Zobel|Moffat|Ramamohanarao|1998| Ref=none }}</ref> used on a large scale for example in [[search engine]]s.  Several significant general-purpose [[Mainframe computer|mainframe]]-based [[database management systems]] have used inverted list architectures, including [[ADABAS]], [[DATACOM/DB]], and [[Model 204]].
Irwin Butts is what my home std test spouse enjoys to call me though over the counter std test I don't really like becoming [http://Www.Sddch.org/?document_srl=345265 at home std test] called like that. Since she was [http://in.Answers.yahoo.com/question/index?qid=20071013030103AA4uGfV eighteen she's] been [http://health.Cvs.com/GetContent.aspx?token=f75979d3-9c7c-4b16-af56-3e122a3f19e3&chunkiid=21271 operating] as a receptionist but her marketing by no means arrives. One of the extremely very best issues in  [http://www.crmidol.com/discussion/19701/how-tell-if-woman-interested-or-sexually-captivated-you published here] the world for me is to do aerobics and now I'm trying to earn money with it. Years ago we moved to North Dakota.<br><br>My blog post - [http://www.pponline.co.uk/user/miriamlinswkucrd std testing at home]
 
There are two main variants of inverted indexes: A '''record level inverted index''' (or '''inverted file index''' or just '''inverted file''') contains a list of references to documents for each word. A '''word level inverted index''' (or '''full inverted index''' or '''inverted list''') additionally contains the positions of each word within a document.<ref name="isbn0-201-39829-X-p192">{{Harvnb |Baeza-Yates|Ribeiro-Neto|1999| p=192 | Ref=BYR99 }}</ref> The latter form offers more functionality (like [[phrase search]]es), but needs more time and space to be created.
 
==Example==
 
Given the texts
 
  T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"
 
we have the following inverted file index (where the integers in the set notation brackets refer to the indexes (or keys) of the text symbols, <code>T[0]</code>, <code>T[1]</code> etc.):
 
"a":      {2}
"banana": {2}
"is":     {0, 1, 2}
"it":    {0, 1, 2}
"what":  {0, 1}
 
A term search for the terms
<code>"what"</code>, <code>"is"</code> and <code>"it"</code> would give the set
<math>\{0,1\} \cap \{0,1,2\} \cap \{0,1,2\} = \{0,1\}</math>.
 
With the same texts, we get the following full inverted index, where the pairs are document numbers and local word numbers. Like the document numbers, local word numbers also begin with zero. So, <code>"banana": {(2, 3)}</code> means the word "banana" is in the third document (<code>T[2]</code>), and it is the fourth word in that document (position 3).
 
"a":      {(2, 2)}
"banana": {(2, 3)}
"is":    {(0, 1), (0, 4), '''(1, 1)''', (2, 1)}
"it":    {(0, 0), (0, 3), '''(1, 2)''', (2, 0)}
"what":  {(0, 2), '''(1, 0)'''}
 
If we run a phrase search for <code>"what is it"</code> we get hits for all the words in both document 0 and 1. But the terms occur consecutively only in document 1.
 
==Applications==
 
The inverted index [[data structure]] is a central component of a typical [[Index (search engine)|search engine indexing algorithm]]. A goal of a search engine implementation is to optimize the speed of the query: find the documents where word X occurs. Once a [[Search engine indexing#The forward index|forward index]] is developed, which stores lists of words per document, it is next inverted to develop an inverted index. Querying the forward index would require sequential iteration through each document and to each word to verify a matching document. The time, memory, and processing resources to perform such a query are not always technically realistic.  Instead of listing the words per document in the forward index, the inverted index data structure is developed which lists the documents per word.
 
With the inverted index created, the query can now be resolved by jumping to the word id (via [[random access]]) in the inverted index.
 
In pre-computer times, [[Concordance (publishing)|concordances]] to important books were manually assembled.  These were effectively inverted indexes with a small amount of accompanying commentary that required a tremendous amount of effort to produce.
 
In bioinformatics, inverted indexes are very important in the [[sequence assembly]] of short fragments of sequenced DNA. One way to find the source of a fragment is to search for it against a reference DNA sequence. A small number of mismatches (due to differences between the sequenced DNA and reference DNA, or errors) can be accounted for by dividing the fragment into smaller fragments—at least one subfragment is likely to match the reference DNA sequence. The matching requires constructing an inverted index of all substrings of a certain length from the reference DNA sequence. Since the human DNA contains more than 3 billion base pairs, and we need to store a DNA substring for every index, and a 32-bit integer for index itself, the storage requirement for such an inverted index would probably be in the tens of gigabytes.
 
==See also==
* [[Index (search engine)]]
* [[Reverse index]]
* [[Vector space model]]
 
== Bibliography ==
*{{cite book |last= Knuth |first= D. E. |authorlink= Donald Knuth |title= [[The Art of Computer Programming]] |publisher= [[Addison-Wesley]] |edition= Third |year= 1997 |origyear= 1973 |location= [[Reading, Massachusetts]] |isbn= 0-201-89685-0 |ref= Knu97 |chapter= 6.5. Retrieval on Secondary Keys}}
*{{cite journal|last= Zobel |first= Justin |coauthors= Moffat, Alistair; Ramamohanarao, Kotagiri |date=December 1998 |title= Inverted files versus signature files for text indexing |journal= ACM Transactions on Database Systems |volume= 23 |issue= 4 |pages=pp. 453–490 |publisher= [[Association for Computing Machinery]] |location= New York |doi= 10.1145/296854.277632 |url= |accessdate= }}
*{{cite journal|last= Zobel |first= Justin RMIT University, Australia |coauthors= Moffat, Alistair The University of Melbourne, Australia |date=July 2006 |title= Inverted Files for Text Search Engines |journal= ACM Computing Surveys |volume= 38 |issue= 2 |pages= 6|publisher= [[Association for Computing Machinery]] |location= New York |doi= 10.1145/1132956.1132959 |url= |accessdate= }}
*{{cite book |last= Baeza-Yates | first = Ricardo |authorlink=Ricardo Baeza-Yates |coauthors=Ribeiro-Neto, Berthier |title= Modern information retrieval |publisher= Addison-Wesley Longman |location= [[Reading, Massachusetts]] |year= 1999 |isbn= 0-201-39829-X |oclc= |doi= |ref= BYR99 |page= 192 }}
*{{cite journal |last= Luk | first = Robert | coauthors=W. Lam | title=Efficient in-memory extensible inverted file | journal = Information Systems | volume = 32 | issue = 5 | pages = 733–754 | year = 2007 | doi = 10.1016/j.is.2006.06.001}}
*{{cite journal |last= Salton | first = Gerard |coauthors=Fox, Edward A.; Wu, Harry |title= Extended Boolean information retrieval |publisher= ACM |year= 1983
|journal = Commun. ACM |volume = 26 |issue = 11 |doi= 10.1145/182.358466 |pages= 1022 }}
*{{cite book |title=Information Retrieval: Implementing and Evaluating Search Engines  |url=http://www.ir.uwaterloo.ca/book/ |publisher=MIT Press |year=2010 |location=Cambridge, Massachusetts |isbn= 978-0-262-02651-2 |author8=Stefan B&uuml;ttcher, Charles L. A. Clarke, and Gordon V. Cormack}}
 
==References==
* {{Harvnb |Knuth|1997| pp=560&ndash;563 of section 6.5: ''Retrieval on Secondary Keys'' | Ref= Knu97 }}
{{Reflist}}
 
==External links==
*[http://www.nist.gov/dads/HTML/invertedIndex.html NIST's Dictionary of Algorithms and Data Structures: inverted index]
*[http://mg4j.dsi.unimi.it Managing Gigabytes for Java] a free full-text search engine for large document collections written in Java.
*[http://lucene.apache.org/java/docs/ Lucene] - Apache Lucene is a full-featured text search engine library written in Java.
*[http://sphinxsearch.com/ Sphinx Search] - Open source high-performance, full-featured text search engine library used by craigslist and others employing an inverted index.
*[http://rosettacode.org/wiki/Inverted_Index Example implementations] on [[Rosetta Code]]
* [http://www.vision.caltech.edu/malaa/software/research/image-search/ Caltech Large Scale Image Search Toolbox]: a Matlab toolbox implementing Inverted File Bag-of-Words image search.
 
[[Category:Data management]]
[[Category:Search algorithms]]
[[Category:Database index techniques]]
[[Category:Substring indices]]

Latest revision as of 13:12, 26 December 2014

Irwin Butts is what my home std test spouse enjoys to call me though over the counter std test I don't really like becoming at home std test called like that. Since she was eighteen she's been operating as a receptionist but her marketing by no means arrives. One of the extremely very best issues in published here the world for me is to do aerobics and now I'm trying to earn money with it. Years ago we moved to North Dakota.

My blog post - std testing at home