|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| '''Data synchronization''' is the process of establishing consistency among [[data]] from a source to a target data storage and vice versa and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including [[file synchronization]] and mobile device synchronization e.g. for [[Personal Digital Assistant|PDAs]].<ref name=Agarwal2002>{{cite journal
| | Hello and welcome. My title is Numbers Wunder. Years ago we moved to North Dakota. Bookkeeping is my occupation. He is really fond of performing ceramics but he is having difficulties to find time for it.<br><br>Check out my blog - [http://www.youporntime.com/blog/12800 http://www.youporntime.com/blog/12800] |
| | author = Agarwal, S.
| |
| | coauthors = Starobinski, D.; [[Ari Trachtenberg]]
| |
| | year = 2002
| |
| | title = On the scalability of data synchronization protocols for PDAs andmobile devices
| |
| | journal = Network, IEEE
| |
| | volume = 16
| |
| | issue = 4
| |
| | pages = 22–28
| |
| | url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1020232&isnumber=21950
| |
| | accessdate = 2007-07-27
| |
| | doi = 10.1109/MNET.2002.1020232
| |
| }}</ref>
| |
| | |
| ==File-based solutions==
| |
| | |
| There are tools available for [[file synchronization]], [[version control]] ([[Concurrent Versions System|CVS]], [[Subversion (software)|Subversion]], etc.), [[distributed filesystem]]s ([[Coda (file system)|Coda]], etc.), and [[mirror (computing)|mirroring]] ([[rsync]], etc.), in that all these attempt to keep sets of files synchronized. However, only version control and file synchronization tools can deal with modifications to more than one copy of the files.
| |
| | |
| * [[File synchronization]] is commonly used for home backups on external [[hard drive]]s or updating for transport on [[USB flash drives]]. The automatic process prevents copying already identical files, thus can save considerable time relative to a manual copy, also being faster and less error prone.<ref>{{cite paper |author=A. Tridgell|authorlink=Andrew Tridgell |title=Efficient algorithms for sorting and synchronization |url=http://samba.org/~tridge/phd_thesis.pdf |date=February 1999 |version=PhD thesis |publisher=The Australian National University}}</ref>
| |
| * [[Version control]] tools are intended to deal with situations where more than one user attempts to simultaneously modify the same file, while file synchronizers are optimized for situations where only one copy of the file will be edited at a time. For this reason, although version control tools can be used for file synchronization, dedicated programs require less [[Computational overhead|overhead]].
| |
| * [[Distributed filesystem]]s may also be seen as ensuring multiple versions of a file are synchronized. This normally requires that the devices storing the files are always connected, but some distributed file systems like [[Coda (file system)|Coda]] allow disconnected operation followed by reconciliation. The merging facilities of a distributed file system are typically more limited than those of a version control system because most file systems do not keep a version graph.
| |
| * [[Mirror (computing)]]: A mirror is an exact copy of a data set. On the Internet, a mirror site is an exact copy of another Internet site. Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads.
| |
| | |
| Synchronization can also be useful in [[encryption]] for synchronizing [[Public-key_cryptography|Public Key]] Servers.<ref>[http://sks.dnsalias.net sks.dnsalias.net]</ref>
| |
| | |
| ==Theoretical models==
| |
| Several theoretical models of data synchronization exist in the research literature, and the problem is also related to problem of [[Slepian–Wolf coding]] in [[information theory]]. The models are classified based on how they consider the data to be synchronized.
| |
| | |
| ===Unordered data===
| |
| The problem of synchronizing unordered data (also known as the '''set reconciliation problem''') is modeled as an attempt to compute the symmetric difference
| |
| <math>S_A \oplus S_B = (S_A - S_B) \cup (S_B - S_A)</math> between two remote sets <math>S_A</math>
| |
| and <math>S_B</math> of b-bit numbers.<ref name="minksy-trachtenberg-zippel">
| |
| {{cite journal
| |
| | author = Minsky, Y.
| |
| | coauthors = [[Ari Trachtenberg]]; Zippel, R.
| |
| | year = 2003
| |
| | title = Set reconciliation with nearly optimal communication complexity
| |
| | journal = Information Theory, IEEE Transactions on
| |
| | volume = 49
| |
| | issue = 9
| |
| | pages = 2213–2218
| |
| | url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1226606
| |
| | accessdate = 2007-07-27
| |
| | doi = 10.1109/TIT.2003.815784
| |
| }}</ref> Some solutions to this problem are typified by:
| |
| | |
| ;Wholesale transfer: In this case all data is transferred to one host for a local comparison.
| |
| ;Timestamp synchronization: In this case all changes to the data are marked with timestamps. Synchronization proceeds by transferring all data with a timestamp later than the previous synchronization.<ref>[http://www.pumatech.com/enterprise/wp-1.html Palm developer knowledgebase manuals]</ref>
| |
| ;Mathematical synchronization: In this case data are treated as mathematical objects and synchronization corresponds to a mathematical process.<ref name="minksy-trachtenberg-zippel"/><ref>{{cite conference |author=[[Ari Trachtenberg]] |coauthors=D. Starobinski and S. Agarwal |title=Fast PDA Synchronization Using Characteristic Polynomial Interpolation |booktitle=IEEE INFOCOM 2002 |doi=10.1109/INFCOM.2002.1019402 |url=http://people.bu.edu/staro/infocom02pda.pdf }}</ref><ref>Y. Minsky and A. Trachtenberg, Scalable set reconciliation, Allerton Conference on Communication, Control, and Computing, Oct. 2002</ref>
| |
| | |
| ===Ordered data===
| |
| In this case, two remote strings <math>\sigma_A</math> and <math>\sigma_B</math> need to be reconcilied. Typically, it is assumed that these strings differ by up to a fixed number of '''edits''' (i.e. character insertions, deletions, or modifications). Then data synchronization is the process of reducing [[edit distance]] between <math>\sigma_A</math> and <math>\sigma_B</math>, up to the ideal distance of zero. This is applied in all filesystem based synchronizations (where the data is ordered). Many [[Data synchronization#Practical solutions|practical applications]] of this are discussed or referenced above.
| |
| | |
| It is sometimes possible to transform the problem to one of unordered data through a process known as [[w-shingling|shingling]] (splitting the strings into ''shingles''{{Clarify me|date=May 2009}}).<ref>{{cite journal |author=S. Agarwal |coauthors=V. Chauhan and [[Ari Trachtenberg]] |date=November 2006 |title=Bandwidth efficient string reconciliation using puzzles |journal=IEEE Transactions on Parallel and Distributed Systems |volume=17 |issue=11 |pages=1217–1225 |doi=10.1109/TPDS.2006.148 |url=http://ipsit.bu.edu/documents/puzzles_journal.pdf |accessdate=2007-05-23 |quote= }}</ref>
| |
| | |
| ==See also==
| |
| | |
| * [[SyncML]], a standard mainly for calendar, contact and email synchronization
| |
| * [[Synchronization (computer science)]]
| |
| | |
| == Notes ==
| |
| {{reflist}}
| |
| | |
| {{DEFAULTSORT:Data Synchronization}}
| |
| [[Category:Data synchronization|*]]
| |
| [[Category:Fault-tolerant computer systems]]
| |
Hello and welcome. My title is Numbers Wunder. Years ago we moved to North Dakota. Bookkeeping is my occupation. He is really fond of performing ceramics but he is having difficulties to find time for it.
Check out my blog - http://www.youporntime.com/blog/12800