Serialization: Difference between revisions
Mention lisp |
No edit summary Tags: Mobile edit Mobile app edit iOS app edit App section source |
||
(800 intermediate revisions by more than 100 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Conversion process for computer data}} |
|||
{{for|the term in publishing|serial}} |
|||
{{About|data structure encoding}} |
|||
{{External links|date=August 2024}} |
|||
{{otheruses}} |
|||
[[File:Serialization.jpg|thumb|upright=1.35|Flow diagram]] |
|||
In computing, '''serialization''' (or '''serialisation''', also referred to as '''[[pickling]]''' in [[Python (programming language)|Python]]) is the process of translating a [[data structure]] or [[object (computer science)|object]] state into a format that can be stored (e.g. [[computer file|files]] in [[secondary storage devices]], [[data buffer]]s in primary storage devices) or transmitted (e.g. [[data stream]]s over [[computer networks]]) and reconstructed later (possibly in a different computer environment).<ref>{{ cite web |
|||
| first = Marshall | last = Cline |
|||
| url = http://www.parashift.com/c++-faq-lite/serialize-overview.html |
|||
| title = C++ FAQ: "What's This "Serialization" Thing All About?" |
|||
| archive-url = https://web.archive.org/web/20150405013606/http://isocpp.org/wiki/faq/serialization |
|||
| archive-date = 2015-04-05 |
|||
| quote = It lets you take an object or group of objects, put them on a disk or send them through a wire or wireless transport mechanism, then later, perhaps on another computer, reverse the process, resurrecting the original object(s). The basic mechanisms are to flatten object(s) into a one-dimensional stream of bits, and to turn that stream of bits back into the original object(s).}}</ref> When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of [[reference (computer science)|references]], this process is not straightforward. Serialization of [[object (computer science)|object]]s does not include any of their associated [[Method (computer science)|methods]] with which they were previously linked. |
|||
This process of serializing an object is also called [[Marshalling (computer science)|marshalling]] an object in some situations.<ref>{{Cite web|url=https://ruby-doc.org/core-3.0.2/Marshal.html|title=Module: Marshal (Ruby 3.0.2)|website=ruby-doc.org|access-date=25 July 2021}}</ref><ref name=ocaml>{{Cite web |title=Marshal |author= |website=OCaml |date= |access-date=25 July 2021 |url= https://ocaml.org/enwiki/api/Marshal.html}}</ref><ref>{{Cite web|url=https://docs.python.org/3/library/pickle.html?highlight=marshalling#module-pickle|title=Python 3.9.6 documentation - Python object serialization —pickle|website=Documentation - The Python Standard Library }}</ref> The opposite operation, extracting a data structure from a series of bytes, is '''deserialization''', (also called '''unserialization''' or '''[[unmarshalling]]'''). |
|||
In [[computer science]], '''serialization''' has several distinct meanings. |
|||
In networking equipment hardware, the part that is responsible for serialization and deserialization is commonly called [[SerDes]]. |
|||
In the context of [[concurrency control]], '''serialization''' means to force one-at-a-time access. For example, a single-threaded [[ActiveX]] server can process only one request at a time; thus requests are queued and executed in the order they are made. |
|||
==Uses== |
|||
In the context of data storage and transmission '''serialization''' is the process of saving an [[object (computer science)|object]] onto a storage medium (such as a [[computer file | file]], or a memory buffer) or to transmit it across a [[computer network | network]] connection link (such as a [[Socket#Computer sockets | socket]]), either as a series of bytes or in some human-readable format such as [[XML]]. The series of bytes or the format can be used to re-create an object that is identical in its internal state to the original object (actually a clone). This type of serialization is used mostly to transport an object across a network, to persist objects to a file or database, or to distribute identical objects to several applications or locations. |
|||
Uses of serialization include: |
|||
* serializing data for transfer across wires and networks ([[messaging]]). |
|||
* storing data (in [[database]]s, on [[hard disk drive]]s). |
|||
* [[remote procedure call]]s, e.g., as in [[SOAP]]. |
|||
* distributing objects, especially in [[component-based software engineering]] such as [[Component Object Model|COM]], [[CORBA]], etc. |
|||
* detecting changes in time-varying data. |
|||
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different [[hardware architecture]] should be able to reliably reconstruct a serialized data stream, regardless of [[endianness]]. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture-independent format means preventing the problems of [[byte ordering]], memory layout, or simply different ways of representing data structures in different [[programming language]]s. |
|||
* This process of serializing an object is also called '''deflating''' an object or '''marshalling''' an object. |
|||
* The opposite operation, extracting a data structure from a series of bytes, is '''deserialization''' (which is also called '''inflating''' or '''unmarshalling'''). |
|||
Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications, this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization. |
|||
== Uses == |
|||
Even on a single machine, primitive [[pointer (computer programming)|pointer]] objects are too fragile to save because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called ''[[unswizzling]]'' or ''pointer unswizzling'', where direct pointer references are converted to references based on name or position. The deserialization process includes an inverse step called ''[[pointer swizzling]]''. |
|||
Serialization has a number of advantages. It provides: |
|||
Since both serializing and deserializing can be driven from common code (for example, the ''Serialize'' function in [[Microsoft Foundation Classes]]), it is possible for the common code to do both at the same time, and thus, 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy because differences can be detected on the fly, a technique called differential execution. This is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things. |
|||
* a simple and robust way to make objects [[persistence (computer science)|persistent]] |
|||
* a method of issuing [[remote procedure call]]s, e.g., as in [[SOAP]] |
|||
* a method for distributing objects, especially in [[software componentry]] such as [[Component Object Model|COM]], [[CORBA]], etc. |
|||
==Drawbacks== |
|||
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of [[endianness]]. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture independent format means that we do not suffer from the problems of [[byte ordering]], memory layout, or simply different ways of representing data structures in different [[programming language]]s. |
|||
Serialization breaks the opacity of an [[abstract data type]] by potentially exposing private implementation details. Trivial implementations which serialize all data members may violate [[encapsulation (object-oriented programming)|encapsulation]].<ref>{{cite web|last=S. Miller|first=Mark|title=Safe Serialization Under Mutual Suspicion|url=http://erights.org/data/serial/jhu-paper/intro.html|work=ERights.org|quote=Serialization, explained below, is an example of a tool for use by objects within an object system for operating on the graph they are embedded in. This seems to require violating the encapsulation provided by the pure object model.}}</ref> |
|||
To discourage competitors from making compatible products, publishers of [[proprietary software]] often keep the details of their programs' serialization formats a [[trade secret]]. Some deliberately [[obfuscated code|obfuscate]] or even [[encryption|encrypt]] the serialized data. Yet, interoperability requires that applications be able to understand each other's serialization formats. Therefore, [[RMI-IIOP|remote method call]] architectures such as [[CORBA]] define their serialization formats in detail. |
|||
In some forms, however, serialization has the disadvantage that because the encoding of the data is serial, merely extracting one part of the data structure that is serialized means that the entire object must be reconstructed or read before this can be done. The serialization capabilities in the [[Cocoa (API)|Cocoa]] framework, <tt>NSKeyedArchiver</tt>, alleviate the problem somewhat by allowing an object to be archived with each instance variable of the object accessible by using a key. |
|||
Many institutions, such as archives and libraries, attempt to [[future proof]] their [[backup]] archives—in particular, [[database dump]]s—by storing them in some relatively [[human-readable]] serialized format. |
|||
Even on a single machine, primitive [[pointer]] objects are too fragile to save, because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called ''[[unswizzling]]'' or ''pointer unswizzling'' and the deserialization process includes a step called ''[[pointer swizzling]]''. |
|||
==Serialization formats== |
|||
== Consequences == |
|||
{{Main|Comparison of data serialization formats}} |
|||
The [[Xerox Network Systems]] Courier technology in the early 1980s influenced the first widely adopted standard. [[Sun Microsystems]] published the [[External Data Representation]] (XDR) in 1987.<ref>{{cite journal |title= XDR: External Data Representation Standard |author= Sun Microsystems |journal= RFC 1014 |year= 1987 |publisher=Network Working Group |url= http://tools.ietf.org/html/rfc1014 |access-date= July 11, 2011 }}</ref> XDR is an [[open format]], and standardized as [https://tools.ietf.org/html/std67 STD 67] (RFC 4506). |
|||
In the late 1990s, a push to provide an alternative to the standard serialization protocols started: [[XML]], an [[SGML]] subset, was used to produce a human-readable [[binary-to-text encoding|text-based encoding]]. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in [[Ajax (programming)|Ajax]] web applications. XML is an open format, and standardized as a [https://www.w3.org/TR/xml11/ W3C recommendation]. |
|||
Serialization, however, breaks the [[opacity]] of an [[abstract data type]] by potentially exposing private implementation details. To discourage competitors from making compatible products, publishers of [[proprietary software]] often keep the details of their programs' serialization formats a [[trade secret]]. Some deliberately [[obfuscation|obfuscate]] or even [[encryption|encrypt]] the serialized data. |
|||
[[JSON]] is a lightweight plain-text alternative to XML, and is also commonly used for client-server communication in web applications. JSON is based on [[JavaScript syntax]], but is independent of JavaScript and supported in many other programming languages. JSON is an open format, standardized as [https://tools.ietf.org/html/std90 STD 90] ({{IETF RFC|8259}}), [http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf ECMA-404], and [https://www.iso.org/standard/71616.html ISO/IEC 21778:2017]. |
|||
Yet, interoperability requires that applications be able to understand the serialization of each other. Therefore [[RMI-IIOP|remote method call]] architectures such as [[CORBA]] define their serialization formats in detail and often provide methods of checking the consistency of any serialized stream when converting it back into an object. |
|||
[[YAML]] is a strict superset of JSON and includes additional features such as a data type tags, support for cyclic data structures, indentation-sensitive syntax, and multiple forms of scalar data quoting. YAML is an open format. |
|||
== Human-readable serialization == |
|||
[[Property list]]s are used for serialization by [[NeXTSTEP]], [[GNUstep]], [[macOS]], and [[iOS]] [[Software framework|frameworks]]. ''Property list'', or ''p-list'' for short, doesn't refer to a single serialization format but instead several different variants, some human-readable and one binary. |
|||
In the late [[1990s]], a push to provide an alternative to the standard serialization protocol started that uses [[XML]] and produces a human readable encoding. Such an encoding could be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language, but this has the disadvantage of losing the more compact, byte stream based encoding, which is generally more practical. A future solution to this dilemma could be transparent compression schemes (see [[binary XML]]). |
|||
For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. [[Hierarchical Data Format|HDF]], [[netCDF]] and the older [[GRIB]]. |
|||
== Programming language support == |
|||
==Programming language support== |
|||
Several [[object-oriented programming]] languages directly support ''object serialization'' (or ''object archival''), either by [[syntactic sugar]] elements or providing a standard [[interface (computing)|interface]] for doing so. |
|||
Several [[object-oriented programming]] languages directly support ''object serialization'' (or ''object archival''), either by [[syntactic sugar]] elements or providing a standard [[interface (computing)|interface]] for doing so. The languages which do so include [[Ruby programming language|Ruby]], [[Smalltalk]], [[Python (programming language)|Python]], [[PHP]], [[Objective-C]], [[Delphi (programming language)|Delphi]], [[Java (programming language)|Java]], and the [[.NET Framework|.NET]] family of languages. There are also libraries available that add serialization support to languages that lack native support for it. |
|||
===C and C++=== |
|||
Some of these programming languages are [[Ruby programming language|Ruby]], [[Smalltalk]], [[Python programming language|Python]], [[Objective-C]], [[Java programming language|Java]], and the [[.NET Framework|.NET]] family of languages. |
|||
[[C (programming language)|C]] and [[C++]] do not provide serialization as any sort of high-level construct, but both languages support writing any of the built-in [[C data types|data types]], as well as [[plain old data]] [[struct (C programming language)|structs]], as binary data. As such, it is usually trivial to write custom serialization functions. Moreover, compiler-based solutions, such as the ODB [[object–relational mapping|ORM]] system for C++ and the [[gSOAP]] toolkit for C and C++, are capable of automatically producing serialization code with few or no modifications to class declarations. Other popular serialization frameworks are Boost.Serialization<ref>{{cite web|url=http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/index.html|title=Serialization|website=www.boost.org}}</ref> from the [[Boost C++ Libraries|Boost Framework]], the S11n framework,<ref>{{cite web|url=http://s11n.net/|title=s11n.net: object serialization/persistence in C++|first=stephan|last=beal|website=s11n.net}}</ref> and Cereal.<ref>{{cite web|url=https://uscilab.github.io/cereal/|title=cereal Docs - Main|website=uscilab.github.io}}</ref> [[Microsoft Foundation Class Library|MFC framework]] (Microsoft) also provides serialization methodology as part of its Document-View architecture. |
|||
===CFML=== |
|||
There are also libraries available that add serialization support to languages that lack native support for it. |
|||
[[CFML]] allows data structures to be serialized to [[WDDX]] with the <code>[https://wikidocs.adobe.com/wiki/display/coldfusionen/cfwddx <cfwddx>]</code> tag and to [[JSON]] with the [https://wikidocs.adobe.com/wiki/display/coldfusionen/serializejson SerializeJSON()] function. |
|||
=== |
===Delphi=== |
||
[[Delphi (programming language)|Delphi]] provides a built-in mechanism for serialization of components (also called persistent objects), which is fully integrated with its [[Integrated development environment|IDE]]. The component's contents are saved to a DFM file and reloaded on-the-fly. |
|||
===Go=== |
|||
In the [[.NET Framework|.NET]] languages, classes can be serializated and deserialized by adding the <code>Serializable</code> attribute to the class. |
|||
[[Go (programming language)|Go]] natively supports unmarshalling/marshalling of [[JSON]] and [[XML]] data.<ref>{{Cite web |title=Package encoding |author= |website=pkg.go.dev |date=12 July 2021 |url= https://pkg.go.dev/encoding}}</ref> There are also third-party modules that support [[YAML]]<ref>{{cite web |url=https://github.com/go-yaml/yaml |title=GitHub - YAML support for the Go language|website=GitHub|date= |author= |accessdate= 25 July 2021}}</ref> and [[Protocol Buffers]].<ref>{{Cite web|title=proto · pkg.go.dev|url=https://pkg.go.dev/google.golang.org/protobuf/proto|access-date=2021-06-22|website=pkg.go.dev}}</ref> Go also supports ''Gobs''.<ref>{{Cite web |title=gob package - encoding/gob - pkg.go.dev |url=https://pkg.go.dev/encoding/gob |access-date=2022-03-04 |website=pkg.go.dev}}</ref> |
|||
===Haskell=== |
|||
<code> |
|||
In Haskell, serialization is supported for types that are members of the Read and Show [[type class]]es. Every type that is a member of the <code>Read</code> type class defines a function that will extract the data from the string representation of the dumped data. The <code>Show</code> type class, in turn, contains the <code>show</code> function from which a string representation of the object can be generated. The programmer need not define the functions explicitly—merely declaring a type to be deriving Read or deriving Show, or both, can make the compiler generate the appropriate functions for many cases (but not all: function types, for example, cannot automatically derive Show or Read). The auto-generated instance for Show also produces valid source code, so the same Haskell value can be generated by running the code produced by show in, for example, a Haskell interpreter.<ref>{{cite web|url=http://hackage.haskell.org/package/base-4.6.0.1/docs/Text-Show.html#t:Show|access-date=15 January 2014 |title=Text.Show Documentation}}</ref> For more efficient serialization, there are haskell libraries that allow high-speed serialization in binary format, e.g. [http://hackage.haskell.org/package/binary binary]. |
|||
'VB Example |
|||
<Serializable()> Class Employee |
|||
</code> |
|||
<code> |
|||
// C# Example |
|||
[Serializable] |
|||
class Employee |
|||
</code> |
|||
===Java=== |
|||
If new members are added to a serializable class, they can be tagged with the <code>OptionalField</code> attribute to allow previous versions of the object to be deserialized without error. This attribute affects only deserialization, and prevents the runtime from throwing an exception if a member is missing from the serialized stream. A member can also be marked with the <code>NonSerialized</code> attribute to indicate that it should not be serialized. |
|||
Java provides automatic serialization which requires that the object be [[Marker interface pattern|marked]] by implementing the {{Javadoc:SE|package=java.io|java/io|Serializable}} [[interface (Java)|interface]]. Implementing the interface marks the class as "okay to serialize", and Java then handles serialization internally. There are no serialization methods defined on the <code>Serializable</code> interface, but a serializable class can optionally define methods with certain special names and signatures that if defined, will be called as part of the serialization/deserialization process. The language also allows the developer to override the serialization process more thoroughly by implementing another interface, the {{Javadoc:SE|java/io|Externalizable}} interface, which includes two special methods that are used to save and restore the object's state.<br /> There are three primary reasons why objects are not serializable by default and must implement the <code>Serializable</code> interface to access Java's serialization mechanism.<br />Firstly, not all objects capture useful semantics in a serialized state. For example, a {{Javadoc:SE|java/lang|Thread}} object is tied to the state of the current [[JVM]]. There is no context in which a deserialized <code>Thread</code> object would maintain useful semantics.<br />Secondly, the serialized state of an object forms part of its class' compatibility contract. Maintaining compatibility between versions of serializable classes requires additional effort and consideration. Therefore, making a class serializable needs to be a deliberate design decision and not a default condition.<br />Lastly, serialization allows access to non-[[Transient (computer programming)|transient]] private members of a class that are not otherwise accessible. Classes containing sensitive information (for example, a password) should not be serializable nor externalizable.<ref name=Bloch>{{cite book | title= "Effective Java: Programming Language Guide" |last=Bloch| first=Joshua| publisher=Addison-Wesley | edition=third | isbn=978-0134685991| year=2018}}</ref>{{rp|339–345}} The standard encoding method uses a recursive graph-based translation of the object's class descriptor and serializable fields into a byte stream. [[Primitive data type|Primitive]]s as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object via a field that is not marked as <code>transient</code> must also be serialized; and if any object in the complete graph of non-transient object references is not serializable, then serialization will fail. The developer can influence this behavior by marking objects as transient, or by redefining the serialization for an object so that some portion of the reference graph is truncated and not serialized.<br /> Java does not use constructor to serialize objects. It is possible to serialize Java objects through [[JDBC]] and store them into a database.<ref>{{cite web|url=https://asktom.oracle.com/pls/apex/f?p=100:11:0::::p11_question_id:1285601748584|title=Ask TOM "Serializing Java Objects into the database (and ge..."|website=asktom.oracle.com}}</ref> While [[Swing (Java)|Swing]] components do implement the Serializable interface, they are not guaranteed to be portable between different versions of the Java Virtual Machine. As such, a Swing component, or any component which inherits it, may be serialized to a byte stream, but it is not guaranteed that this will be re-constitutable on another machine. |
|||
===JavaScript=== |
|||
To modify the default deserialization (for example, to automatically initialize a member marked <code>NonSerialized</code>), the class must implement the <code>IDeserializationCallback</code> interface and define the <code>IDeserializationCallback.OnDeserialization</code> method. |
|||
Since ECMAScript 5.1,<ref>{{cite web|url=https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON#Specifications|title=JSON|website=MDN Web Docs|access-date=22 March 2018}}</ref> [[JavaScript]] has included the built-in <code>JSON</code> object and its methods <code>JSON.parse()</code> and <code>JSON.stringify()</code>. Although JSON is originally based on a subset of JavaScript,<ref>{{cite web|url=http://www.json.org/|title=JSON|website=www.json.org|access-date=22 March 2018}}</ref> there are boundary cases where JSON is not valid JavaScript. Specifically, JSON allows the [[Unicode#Newlines|Unicode line terminators]] {{unichar|2028|LINE SEPARATOR}} and {{unichar|2029|PARAGRAPH SEPARATOR}} to appear unescaped in quoted strings, while ECMAScript 2018 and older does not.<ref name="json-2028">{{cite web|url=http://timelessrepo.com/json-isnt-a-javascript-subset|title=JSON: The JavaScript subset that isn't|author=Holm, Magnus|date=15 May 2011|publisher=The timeless repository|access-date=23 September 2016|archive-date=13 May 2012|archive-url=https://web.archive.org/web/20120513012409/http://timelessrepo.com/json-isnt-a-javascript-subset|url-status=dead}}</ref><ref>{{cite web|url=https://tc39.github.io/proposal-json-superset/|title=TC39 Proposal: Subsume JSON|date=22 May 2018|publisher=ECMA TC39 committee}}</ref> See [[JSON#Data portability issues|the main article on JSON]]. |
|||
===Julia=== |
|||
Objects may be serialized in binary format for deserialization by other [[.NET Framework|.NET]] applications. The framework also provides the <code>SoapFormatter</code> and <code>XmlSerializer </code> objects to support serialization in human-readable, cross-platform XML. |
|||
[[Julia (programming language)|Julia]] implements serialization through the <code>serialize()</code> / <code>deserialize()</code> modules,<ref>{{Cite web |title=Serialization |author= |website=The Julia Language |date= |access-date=25 July 2021 |url= https://docs.julialang.org/en/v1/stdlib/Serialization/}}</ref> intended to work within the same version of Julia, and/or instance of the same system image.<ref>{{cite web|url=https://github.com/JuliaLang/julia/commit/bb67ff25e2799b27d10877692f74bae66ccc0270#commitcomment-6403498|title=faster and more compact serialization of symbols and strings · JuliaLang/julia@bb67ff2|website=GitHub}}</ref> The <code>HDF5.jl</code> package offers a more stable alternative, using a documented format and common library with wrappers for different languages,<ref>{{cite web|url=https://github.com/JuliaIO/HDF5.jl|title=HDF5.jl: Saving and loading data in the HDF5 file format|date=20 August 2017|via=GitHub}}</ref> while the default serialization format is suggested to have been designed rather with maximal performance for network communication in mind.<ref>{{cite web|url=https://stackoverflow.com/a/24968971/2668831|title=Julia: how stable are serialize() / deserialize()|website=stackoverflow.com|date=2014}}</ref> |
|||
===Lisp=== |
|||
Generally a [[Lisp (programming language)|Lisp]] data structure can be serialized with the functions "<code>read</code>" and "<code>print</code>". A variable foo containing, for example, a list of arrays would be printed by <code>(print foo)</code>. Similarly an object can be read from a stream named s by <code>(read s)</code>. These two parts of the Lisp implementation are called the Printer and the Reader. The output of "<code>print</code>" is human readable; it uses lists demarked by parentheses, for example: {{code|(4 2.9 "x" y)|lisp}}. In many types of Lisp, including [[Common Lisp]], the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function <code>print-object</code>, this will be invoked when the object is printed. This is somewhat similar to the method used in Ruby. Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out, such as in a [[read–eval–print loop]]. Not all readers/writers support cyclic, recursive or shared structures. |
|||
=== |
===.NET=== |
||
[[.NET]] has several serializers designed by [[Microsoft]]. There are also many serializers by third parties. More than a dozen serializers are discussed and tested [http://geekswithblogs.net/LeonidGaneline/archive/2015/05/06/serializers-in-.net.-v.2.aspx here].<ref>{{cite web|title=.NET Serializers|url=http://geekswithblogs.net/LeonidGaneline/archive/2015/05/06/serializers-in-.net.-v.2.aspx|quote=There are many kinds of serializers; they produce very compact data very fast. There are serializers for messaging, for data stores, for marshaling objects. What is the best serializer in .NET?}}</ref> and [https://aumcode.github.io/serbench here]<ref>{{cite web|url=https://aumcode.github.io/serbench|title=SERBENCH by aumcode|website=aumcode.github.io}}</ref> |
|||
===OCaml=== |
|||
In the [[Objective-C]] programming language, serialization (most commonly known as ''archival'') is achieved by overriding the write: and read: methods in the Object root class. (NB This is in the GNU runtime variant of Objective-C. In the NeXT-style runtime, the implementation is very similar.) |
|||
[[OCaml]]'s standard library provides marshalling through the <code>Marshal</code> module<ref name=ocaml/> and the Pervasives functions <code>output_value</code> and <code>input_value</code>. While OCaml programming is statically type-checked, uses of the <code>Marshal</code> module may break type guarantees, as there is no way to check whether an unmarshalled stream represents objects of the expected type. In OCaml it is difficult to marshal a function or a data structure which contains a function (e.g. an object which contains a method), because executable code in functions cannot be transmitted across different programs. (There is a flag to marshal the code position of a function but it can only be unmarshalled in exactly the same program). The standard marshalling functions can preserve sharing and handle cyclic data, which can be configured by a flag. |
|||
=== |
===Perl=== |
||
Several [[Perl]] modules available from [[CPAN]] provide serialization mechanisms, including <code>Storable</code> , <code>JSON::XS</code> and <code>FreezeThaw</code>. Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars. In addition to serializing directly to files, <code>Storable</code> includes the <code>freeze</code> function to return a serialized copy of the data packed into a scalar, and <code>thaw</code> to deserialize such a scalar. This is useful for sending a complex data structure over a [[network socket]] or storing it in a database. When serializing structures with <code>Storable</code>, there are network safe functions that always store their data in a format that is readable on any computer at a small cost of speed. These functions are named <code>nstore</code>, <code>nfreeze</code>, etc. There are no "n" functions for deserializing these structures — the regular <code>thaw</code> and <code>retrieve</code> deserialize structures serialized with the "<code>n</code>" functions and their machine-specific equivalents. |
|||
===PHP=== |
|||
The following example demonstrates two independent programs, a "sender", who takes the current time (as per <tt>[http://www.opengroup.org/onlinepubs/007908799/xsh/time.html time]</tt> in the [[C standard library]]), archives it and prints the archived form to the standard output, and a "receiver" which decodes the archived form, reconstructs the time and prints it out. |
|||
[[PHP]] originally implemented serialization through the built-in <code>serialize()</code> and <code>unserialize()</code> functions.<ref>{{cite web|url=http://ca.php.net/manual/en/language.oop5.serialization.php|title=PHP: Object Serialization - Manual|website=ca.php.net}}</ref> PHP can serialize any of its data types except resources (file pointers, sockets, etc.). The built-in <code>unserialize()</code> function is often dangerous when used on completely untrusted data.<ref>{{cite web|last=Esser|first=Stephen|title=Shocking News in PHP Exploitation|url=http://www.suspekt.org/2009/11/28/shocking-news-in-php-exploitation/|work=Suspekt...|date=2009-11-28|url-status=dead|archive-url=https://web.archive.org/web/20120106034257/http://www.suspekt.org/2009/11/28/shocking-news-in-php-exploitation/|archive-date=2012-01-06}}</ref> For objects, there are two "[[Magic (programming)|magic]] methods" that can be implemented within a class — <code>__sleep()</code> and <code>__wakeup()</code> — that are called from within <code>serialize()</code> and <code>unserialize()</code>, respectively, that can clean up and restore an object. For example, it may be desirable to close a database connection on serialization and restore the connection on deserialization; this functionality would be handled in these two magic methods. They also permit the object to pick which properties are serialized. Since PHP 5.1, there is an object-oriented serialization mechanism for objects, the <code>Serializable</code> interface.<ref name="Serializable">{{cite web|url=http://www.php.net/manual/en/class.serializable.php|title=PHP: Serializable - Manual|website=www.php.net}}</ref> |
|||
===Prolog=== |
|||
When compiled, we get a sender program and a receiver program. If we just execute the sender program, we will get out a serialization that looks like: |
|||
[[Prolog]]'s ''term'' structure, which is the only data structure of the language, can be serialized out through the built-in predicate <code>write_term/3</code> and serialized-in through the built-in predicates <code>read/1</code> and <code>read_term/2</code>. The resulting stream is uncompressed text (in some encoding determined by configuration of the target stream), with any free variables in the term represented by placeholder variable names. The predicate <code>write_term/3</code> is standardized in the [[Prolog#ISO Prolog|ISO Specification for Prolog]] (ISO/IEC 13211-1) on pages 59 ff. ("Writing a term, § 7.10.5"). Therefore it is expected that terms serialized-out by one implementation can be serialized-in by another without ambiguity or surprises. In practice, implementation-specific extensions (e.g. SWI-Prolog's dictionaries) may use non-standard term structures, so interoperability may break in edge cases. As examples, see the corresponding manual pages for SWI-Prolog,<ref>{{cite web|url=https://www.swi-prolog.org/pldoc/man?section=termrw|title="Term reading and writing"|website=www.swi-prolog.org}}</ref> SICStus Prolog,<ref>{{cite web|url=https://sicstus.sics.se/sicstus/docs/latest4/html/sicstus.html/mpg_002dref_002dwrite_005fterm.html#mpg_002dref_002dwrite_005fterm|title="write_term/[2,3]"|website=sicstus.sics.se}}</ref> GNU Prolog.<ref>{{cite web|url=http://gprolog.org/manual/html_node/gprolog038.html|title="Term input/output"|website=gprolog.org}}</ref> Whether and how serialized terms received over the network are checked against a specification (after deserialization from the character stream has happened) is left to the implementer. Prolog's built-in [[Prolog syntax and semantics#Definite clause grammars|Definite Clause Grammars]] can be applied at that stage. |
|||
GNU TypedStream 1D@îC¡ |
|||
(with a NULL character after the 1). If we pipe the two programs together, as <tt> sender | receiver</tt>, we get |
|||
received 1089356705 |
|||
showing the object was serialized, sent, and reconstructed properly. |
|||
===Python=== |
|||
In essence, the sender and receiver programs could be distributed across a network connection, providing distributed object capabilities. |
|||
The core general serialization mechanism is the <code>pickle</code> [[Python (programming language)#Libraries|standard library]] module, alluding to the database systems term ''pickling''<ref>{{cite journal|last1=Herlihy|first1=Maurice|last2=Liskov|first2=Barbara|author-link1=Maurice Herlihy|author-link2=Barbara Liskov|title=A Value Transmission Method for Abstract Data Types|journal=[[ACM Transactions on Programming Languages and Systems]]|date=October 1982|volume=4|issue=4|pages=527–551|doi=10.1145/69622.357182|url=http://cs.brown.edu/~mph/HerlihyL82/p527-herlihy.pdf|issn=0164-0925|oclc=67989840|citeseerx=10.1.1.87.5301|s2cid=8126961}}</ref><ref>{{cite book|last1=Birrell|first1=Andrew|last2=Jones|first2=Mike|last3=Wobber|first3=Ted|title=Proceedings of the eleventh ACM Symposium on Operating systems principles - SOSP '87 |chapter=A simple and efficient implementation of a small database |date=November 1987|volume=11|issue=5|pages=149–154|doi=10.1145/41457.37517|isbn=089791242X |issn=0163-5980|oclc=476062921|quote=Our implementation makes use of a mechanism called “pickles”, which will convert between any strongly typed data structure and a representation of that structure suitable for storing in permanent disk files. The operation Pickle.Write takes a pointer to a strongly typed data structure and delivers buffers of bits for writing to the disk. Conversely Pickle.Read reads buffers of bits from the disk and delivers a copy of the original data structure.(*) This conversion involves identifying the occurrences of addresses in the structure, and arranging that when the structure is read back from disk the addresses are replaced with addresses valid in the current execution environment. The pickle mechanism is entirely automatic: it is driven by the run-time typing structures that are present for our garbage collection mechanism. ... (*) Pickling is quite similar to the concept of marshalling in remote procedure calls. But in fact our pickling implementation works only by interpreting at run-time the structure of [[dynamically typed]] values, while our RPC implementation works only by generating code for the marshalling of statically typed values. Each facility would benefit from adding the mechanisms of the other, but that has not yet been done.|citeseerx=10.1.1.100.1457|s2cid=12908261}}</ref><ref>{{cite web|last1=van Rossum|first1=Guido|author-link1=Guido van Rossum|title=Flattening Python Objects|url=http://legacy.python.org/workshops/1994-11/FlattenPython.html|website=Python Programming Language – Legacy Website|publisher=[[Python Software Foundation]]|access-date=6 April 2017|location=[[Delaware]], United States|date=1 December 1994|quote=Origin of the name 'flattening': Because I want to leave the original 'marshal' module alone, and Jim complained that 'serialization' also means something totally different that's actually relevant in the context of [[Concurrent computing|concurrent]] access to persistent objects, I'll use the term 'flattening' from now on. ... (The Modula-3 system uses the term 'pickled' data for this concept. They have probably solved all problems already, and in a type-safe manner :-)}}</ref> to describe data serialization (''unpickling'' for ''deserializing''). Pickle uses a simple [[Stack (abstract data type)|stack]]-based [[virtual machine]] that records the instructions used to reconstruct the object. It is a cross-version [https://docs.python.org/library/pickle.html#pickle-protocol customisable] but unsafe (not secure against erroneous or malicious data) serialization format. Malformed or maliciously constructed data, may cause the deserializer to import arbitrary modules and instantiate any object.<ref name=autogenerated1>{{cite web|url=https://docs.python.org/2/library/pickle.html|title=11.1. pickle — Python object serialization — Python 2.7.14rc1 documentation|website=docs.python.org}}</ref><ref>{{cite web|url=https://docs.python.org/release/3.0.1/library/pickle.html#pickle-restrict|title=pickle — Python object serialization — Python v3.0.1 documentation|website=docs.python.org}}</ref> The standard library also includes modules serializing to standard data formats: <code>[https://docs.python.org/library/json.html json]</code> (with built-in support for basic scalar and collection types and able to support arbitrary types via [https://docs.python.org/library/json.html#encoders-and-decoders encoding and decoding hooks]). <code>[https://docs.python.org/library/plistlib.html plistlib]</code> (with support for both binary and XML [[property list]] formats). <code>[https://docs.python.org/library/xdrlib.html xdrlib]</code> (with support for the External Data Representation (XDR) standard as described in RFC 1014). Finally, it is recommended that an object's <code>[https://docs.python.org/reference/datamodel.html#object.__repr__ __repr__]</code> be evaluable in the right environment, making it a rough match for Common Lisp's <code>[http://www.lispworks.com/documentation/HyperSpec/Body/f_pr_obj.htm print-object]</code>. Not all object types can be pickled automatically, especially ones that hold [[operating system]] resources like [[file handle]]s, but users can register custom "reduction" and construction functions to support the pickling and unpickling of arbitrary types. Pickle was originally implemented as the pure Python <code>pickle</code> module, but, in versions of Python prior to 3.0, the <code>cPickle</code> module (also a built-in) offers improved performance (up to 1000 times faster<ref name=autogenerated1 />). The <code>cPickle</code> was adapted from the [[Unladen Swallow]] project. In Python 3, users should always import the standard version, which attempts to import the accelerated version and falls back to the pure Python version.<ref>{{cite web|url=https://docs.python.org/release/3.1.5/whatsnew/3.0.html|title=What's New In Python 3.0 — Python v3.1.5 documentation|website=docs.python.org}}</ref> |
|||
=== |
===R=== |
||
[[R (programming language)|R]] has the function <code>dput</code> which writes an ASCII text representation of an R object to a file or connection. A representation can be read from a file using <code>dget</code>.<ref>[R manual http://stat.ethz.ch/R-manual/R-patched/library/base/html/dput.html]</ref> More specific, the function <code>serialize</code> serializes an R object to a connection, the output being a raw vector coded in hexadecimal format. The <code>unserialize</code> function allows to read an object from a connection or a raw vector.<ref>[R manual http://stat.ethz.ch/R-manual/R-patched/library/base/html/serialize.html]</ref> |
|||
===REBOL=== |
|||
#import <objc/Object.h> |
|||
[[REBOL]] will serialize to file (<code>save/all</code>) or to a <code>string!</code> (<code>mold/all</code>). Strings and files can be deserialized using the [[Type polymorphism|polymorphic]] <code>load</code> function. <code>RProtoBuf</code> provides cross-language data serialization in R, using [[Protocol Buffers]].<ref>{{Cite journal |arxiv = 1401.7372|last1 = Eddelbuettel|first1 = Dirk|title = RProtoBuf: Efficient Cross-Language Data Serialization in R|journal = Journal of Statistical Software|volume = 71|issue = 2|last2 = Stokely|first2 = Murray|last3 = Ooms|first3 = Jeroen|year = 2014|doi = 10.18637/jss.v071.i02 |doi-access = free|s2cid = 36239952}}</ref> |
|||
#import <time.h> |
|||
#import <stdio.h> |
|||
@interface Sender : Object |
|||
{ |
|||
time_t current_time; |
|||
} |
|||
- (id) setTime; |
|||
- (time_t) time; |
|||
- (id) send; |
|||
- (id) read: (TypedStream *) s; |
|||
- (id) write: (TypedStream *) s; |
|||
@end |
|||
=== |
===Ruby=== |
||
[[Ruby programming language|Ruby]] includes the standard module <code>[http://www.ruby-doc.org/core/classes/Marshal.html Marshal]</code> with 2 methods <code>dump</code> and <code>load</code>, akin to the standard Unix utilities <code>[[dump (Unix)|dump]]</code> and <code>[[restore (program)|restore]]</code>. These methods serialize to the standard class <code>String</code>, that is, they effectively become a sequence of bytes. Some objects cannot be serialized (doing so would raise a <code>TypeError</code> exception): bindings, procedure objects, instances of class IO, singleton objects and interfaces. If a class requires custom serialization (for example, it requires certain cleanup actions done on dumping / restoring), it can be done by implementing 2 methods: <code>_dump</code> and <code>_load</code>. The [[instance method]] <code>_dump</code> should return a <code>String</code> object containing all the information necessary to reconstitute objects of this class and all referenced objects up to a maximum depth given as an integer parameter (a value of -1 implies that depth checking should be disabled). The [[class method]] <code>_load</code> should take a <code>String</code> and return an object of this class. |
|||
===Rust=== |
|||
#import "Sender.h" |
|||
<code>[https://serde.rs/ Serde]</code> is the most widely used library, or crate, for serialization in [[Rust programming language|Rust]]. |
|||
@implementation Sender |
|||
- (id) setTime |
|||
{ |
|||
//Set the time |
|||
current_time = time(NULL); |
|||
return self; |
|||
} |
|||
- (time_t) time; |
|||
{ |
|||
return current_time; |
|||
} |
|||
- (id) write: (TypedStream *) stream |
|||
{ |
|||
/* |
|||
*Write the superclass to the stream. |
|||
*We do this so we have the complete object hierarchy, |
|||
*not just the object itself. |
|||
*/ |
|||
[super write:stream]; |
|||
/* |
|||
*Write the current_time out to the stream. |
|||
*time_t is typedef for an integer. |
|||
*The second argument, the string "i", specifies the types to write |
|||
*as per the @encode directive. |
|||
*/ |
|||
objc_write_types(stream, "i", ¤t_time); |
|||
return self; |
|||
} |
|||
- (id) read: (TypedStream *) stream |
|||
{ |
|||
/* |
|||
*Do the reverse to write: - reconstruct the superclass... |
|||
*/ |
|||
[super read:stream]; |
|||
/* |
|||
*And reconstruct the instance variables from the stream... |
|||
*/ |
|||
objc_read_types(stream, "i", ¤t_time); |
|||
return self; |
|||
} |
|||
- (id) send |
|||
{ |
|||
//Convenience method to do the writing. We open stdout as our byte stream |
|||
TypedStream *s = objc_open_typed_stream(stdout, OBJC_WRITEONLY); |
|||
//Write the object to the stream |
|||
[self write:s]; |
|||
//Finish up - close the stream. |
|||
objc_close_typed_stream(s); |
|||
} |
|||
@end |
|||
=== |
===Smalltalk=== |
||
In general, non-recursive and non-sharing objects can be stored and retrieved in a human readable form using the <code>storeOn:</code>/<code>readFrom:</code> protocol. The <code>storeOn:</code> method generates the text of a Smalltalk expression which – when evaluated using <code>readFrom:</code> – recreates the original object. This scheme is special, in that it uses a procedural description of the object, not the data itself. It is therefore very flexible, allowing for classes to define more compact representations. However, in its original form, it does not handle cyclic data structures or preserve the identity of shared references (i.e. two references a single object will be restored as references to two equal, but not identical copies). For this, various portable and non-portable alternatives exist. Some of them are specific to a particular Smalltalk implementation or class library. There are several ways in [[Squeak|Squeak Smalltalk]] to serialize and store objects. The easiest and most used are <code>storeOn:/readFrom:</code> and binary storage formats based on <code>SmartRefStream</code> serializers. In addition, bundled objects can be stored and retrieved using <code>ImageSegments</code>. Both provide a so-called "binary-object storage framework", which support serialization into and retrieval from a compact binary form. Both handle cyclic, recursive and shared structures, storage/retrieval of class and [[metaclass]] info and include mechanisms for "on the fly" object migration (i.e. to convert instances which were written by an older version of a class with a different object layout). The APIs are similar (storeBinary/readBinary), but the encoding details are different, making these two formats incompatible. However, the Smalltalk/X code is open source and free and can be loaded into other Smalltalks to allow for cross-dialect object interchange. Object serialization is not part of the ANSI Smalltalk specification. As a result, the code to serialize an object varies by Smalltalk implementation. The resulting binary data also varies. For instance, a serialized object created in Squeak Smalltalk cannot be restored in [[Ambrai Smalltalk]]. Consequently, various applications that do work on multiple Smalltalk implementations that rely on object serialization cannot share data between these different implementations. These applications include the MinneStore object database<ref>{{Cite web|url=http://minnestore.sourceforge.net/|archive-url=https://web.archive.org/web/20080511234145/http://minnestore.sourceforge.net/|archive-date=11 May 2008|website=SourceForge|title=MinneStore version 2}}</ref> and some [[Remote procedure call|RPC]] packages. A solution to this problem is SIXX,<ref>{{Cite web|url=http://www.mars.dti.ne.jp/~umejava/smalltalk/sixx/index.html|title=What's new|access-date=25 July 2021|website=SIXX - Smalltalk Instance eXchange in XML|date=23 January 2010}}</ref> which is a package for multiple Smalltalks that uses an [[XML]]-based format for serialization. |
|||
===Swift=== |
|||
#import "Sender.h" |
|||
The [[Swift (programming language)|Swift]] standard library provides two protocols, <code>Encodable</code> and <code>Decodable</code> (composed together as <code>Codable</code>), which allow instances of conforming types to be serialized to or deserialized from [[JSON]], [[property list]]s, or other formats.<ref>{{cite web|url=https://github.com/apple/swift-evolution/blob/master/proposals/0166-swift-archival-serialization.md|title=Swift Archival & Serialization|website=www.github.com|date=2018-12-02}}</ref> Default implementations of these protocols can be generated by the compiler for types whose stored properties are also <code>Decodable</code> or <code>Encodable</code>. |
|||
int |
|||
main(void) |
|||
{ |
|||
Sender *s = [Sender new]; |
|||
[s setTime]; |
|||
[s send]; |
|||
return 0; |
|||
} |
|||
=== |
===PowerShell=== |
||
[[PowerShell]] implements serialization through the [[Shell builtin|built-in]] cmdlet <code>Export-CliXML</code>. <code>Export-CliXML</code> serializes .NET objects and stores the resulting XML in a file. To reconstitute the objects, use the <code>Import-CliXML</code> cmdlet, which generates a deserialized object from the XML in the exported file. Deserialized objects, often known as "property bags" are not live objects; they are snapshots that have properties, but no methods. Two dimensional data structures can also be (de)serialized in [[Comma-separated values|CSV]] format using the built-in cmdlets <code>Import-CSV</code> and <code>Export-CSV</code>. |
|||
#import <objc/Object.h> |
|||
#import "Sender.h" |
|||
@interface Receiver : Object |
|||
{ |
|||
Sender *t; |
|||
} |
|||
- (id) receive; |
|||
- (id) print; |
|||
@end; |
|||
===== Receiver.m ===== |
|||
#import "Receiver.h" |
|||
@implementation Receiver |
|||
- (id) receive |
|||
{ |
|||
//Open stdin as our stream for reading. |
|||
TypedStream *s = objc_open_typed_stream(stdin, OBJC_READONLY); |
|||
//Allocate memory for, and instantiate the object from reading the stream. |
|||
t = <nowiki>[[Sender alloc] read:s]</nowiki>; |
|||
objc_close_typed_stream(s); |
|||
} |
|||
- (id) print |
|||
{ |
|||
fprintf(stderr, "received %d\n", [t time]); |
|||
} |
|||
@end |
|||
===== Receiver.c ===== |
|||
#import "Receiver.h" |
|||
int |
|||
main(void) |
|||
{ |
|||
Receiver *r = [Receiver new]; |
|||
[r receive]; |
|||
[r print]; |
|||
return 0; |
|||
} |
|||
=== Java === |
|||
Java provides automatic serialization which requires only that the object be [[Marker interface pattern|marked]] by implementing the {{Javadoc:SE|package=java.io|java/io|Serializable}} interface. Implementing the interface marks the class as "okay to serialize," and Java then handles serialization internally. There are no serialization methods defined on the <code>Serializable</code> interface, but a serializable class can optionally define methods with certain special names and signatures that if defined, will be called as part of the serialization/deserialization process. The language also allows the developer to override the serialization process more thoroughly by implementing another interface, the {{Javadoc:SE|java/io|Externalizable}} interface, which includes two special methods that are used to save and restore the object's state. |
|||
There are three primary reasons why objects are not serializable by default and must implement the <code>Serializable</code> interface to access Java's serialization mechanism. |
|||
# Not all objects capture useful semantics in a serialized state. For example, a {{Javadoc:SE|java/lang|Thread}} object is tied to the state of the current [[JVM]]. There is no context in which a deserialized <code>Thread</code> object would maintain useful semantics. |
|||
# The serialized state of an object forms part of its class's compatibility contract. Maintaining compatibility between versions of serializable classes requires additional effort and consideration. Therefore, making a class serializable needs to be deliberate design decision and not a default condition. |
|||
# Serialization allows access to non-transient private members of a class that are not otherwise accessible. Classes containing sensitive information (for example, a password) should not be serializable or externalizable. |
|||
The standard encoding method uses a simple translation of the fields into a byte stream. Primitives as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object and not marked as <tt>transient</tt> must also be serialized; and if any object in the complete graph of non-transient object references is not serializable, then serialization will fail. The developer can influence this behavior by marking objects as transient, or by redefining the serialization for an object so that the some portion of the reference graph is truncated and not serialized. |
|||
=== ColdFusion === |
|||
[[ColdFusion]] allows data stuctures to be serialized to [[WDDX]] with the [http://livedocs.macromedia.com/coldfusion/6.1/htmldocs/tags-c20.htm <cfwddx>] tag. |
|||
=== OCaml === |
|||
[[OCaml]]'s standard library provides marshalling through the <tt>Marshal</tt> module. While OCaml programming is statically type-checked, uses of the <tt>Marshal</tt> module may break type guarantees, as there is no way to check whether an unmarshalled stream represents objects of the expected type. |
|||
=== Perl === |
|||
Several [[Perl]] modules available from [[CPAN]] provide serialization mechanisms, including Storable and FreezeThaw. |
|||
Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars. |
|||
use Storable; |
|||
# Create a hash with some nested data structures |
|||
my %struct = ( text => 'Hello, world!', list => [1, 2, 3] ); |
|||
# Serialize the hash into a file |
|||
store \%struct, 'serialized'; |
|||
# Read the data back later |
|||
my $newstruct = retrieve 'serialized'; |
|||
In addition to serializing directly to files, Storable includes the <tt>freeze</tt> function to return a serialized copy of the data packed into a scalar, and <tt>thaw</tt> to deserialize such a scalar. This is useful for sending a complex data structure over a network socket or storing it in a database. |
|||
When serializing structures with Storable, there are network safe functions that always store their data in a format that is readable on any computer at a small cost of speed. These functions are named <tt>nstore</tt>, <tt>nfreeze</tt>, etc. There are no "n" functions for deserializing these structures - the regular <tt>thaw</tt> and <tt>retrieve</tt> deserialize structures serialized with the "n" functions and their machine-specific equivalents. |
|||
=== C++ === |
|||
The [[Boost library]] includes a library for serializing [[C++]] data structures. [[XML Data Binding]] implementations, such as [http://codesynthesis.com/products/xsd/ XML Schema to C++ data binding compiler], provide serialization/deserialization of C++ objects to/from [[XML]] and binary formats. |
|||
The [[Microsoft Foundation Class Library]] has comprehensive support for binary serialization and deserialization of objects. |
|||
=== Python === |
|||
[[Python programming language|Python]] implements serialization through the built-in <code>[http://docs.python.org/lib/module-pickle.html pickle]</code>, and to a lesser extent, the older <code>[http://docs.python.org/lib/module-marshal.html marshal]</code> modules. Marshal does offer the ability to serialize Python code objects, unlike pickle. |
|||
=== PHP === |
|||
[[PHP]] implements serialization through the built-in 'serialize' and 'unserialize' functions. PHP can serialize any of its datatypes except resources (file pointers, sockets, etc.). |
|||
For objects (as of at least PHP 4) there are two "magic methods" than can be implemented within a class — <tt>__sleep()</tt> and <tt>__wakeup()</tt> — that are called from within serialize() and unserialize(), respectively, that can clean up and restore an object. For example, it may be desirable to close a database connection on serialization and restore the connection on unserialization; this functionality would be handled in these two magic methods. They also permit the object to pick which properties are serialized. |
|||
=== REBOL === |
|||
[[REBOL]] will serialize to file (<code>save/all</code>) or to a <code>string!</code> (<code>mold/all</code>). Strings and files can be deserialized using the [[Type polymorphism|polymorphic]] <code>load</code> function. |
|||
=== Ruby === |
|||
[[Ruby programming language|Ruby]] include standard module <code>[http://www.ruby-doc.org/core/classes/Marshal.html Marshal]</code> with 2 methods <code>dump</code> and <code>restore</code>, akin to standard Unix utilities [[dump (program)|dump]] and [[restore (program)|restore]]. These methods serialize to standard class <code>String</code>, that is effectively a sequence of bytes. |
|||
Some objects can't be serialized (doing so would raise <code>TypeError</code> exception): |
|||
* bindings, |
|||
* procedure objects, |
|||
* instances of class IO, |
|||
* singleton objects. |
|||
If a class requires custom serialization (for example, it requires certain cleanup actions done on dumping / restoring), it can be done by implementing 2 methods: <code>_dump</code> and <code>_load</code>. The instance method <code>_dump</code> should return a <code>String</code> object containing all the information necessary to reconstitute objects of this class and all referenced objects up to a maximum depth given as an integer parameter (a value of -1 implies that depth checking should be disabled). The class method <code>_load</code> should take a <code>String</code> and return an object of this class. |
|||
class Klass |
|||
def initialize(str) |
|||
@str = str |
|||
end |
|||
def sayHello |
|||
@str |
|||
end |
|||
end |
|||
o = Klass.new("hello\n") |
|||
data = Marshal.dump(o) |
|||
obj = Marshal.load(data) |
|||
obj.sayHello » "hello\n" |
|||
=== Smalltalk === |
|||
==== Squeak Smalltalk ==== |
|||
There are several ways in [[Squeak|Squeak Smalltalk]] to serialize and store objects. The easiest and most used method will be shown below. Other classes of interest in Squeak for serializing objects are SmartRefStream and ImageSegment. |
|||
To store a Dictionary (sometimes called a [[hash map]] in other languages) containing some nonsense data of varying types into a file named "data.obj": |
|||
| data rr | |
|||
data := Dictionary new. |
|||
data at: #Meef put: 25; |
|||
at: 23 put: 'Amanda'; |
|||
at: 'Small Numbers' put: #(0 1 2 3 four). |
|||
rr := ReferenceStream fileNamed: 'data.obj'. |
|||
rr nextPut: data; close. |
|||
To restore the Dictionary object stored in "data.obj" and bring up an |
|||
| restoredData rr | |
|||
rr := ReferenceStream fileNamed: 'data.obj'. |
|||
restoredData := rr next. |
|||
restoredData inspect. |
|||
rr close. |
|||
==== Other Smalltalk dialects==== |
|||
Object serialization is not part of the ANSI Smalltalk specification. As a result, the code to serialize an object varies by Smalltalk implementation. The resulting binary data also varies. For instance, a serialized object created in Squeak Smalltalk cannot be restored in [[Ambrai Smalltalk]]. Consequently, various applications that do work on multiple Smalltalk implementations that rely on object serialization cannot share data between these different implementations. These applications include the MinneStore object database [http://minnestore.sourceforge.net/] and some [[Remote procedure call|RPC]] packages. A solution to this problem is SIXX [http://www.mars.dti.ne.jp/~umejava/smalltalk/sixx/index.html], which is an package for multiple Smalltalks that uses an [[XML]]-based format for serialization. |
|||
=== Lisp === |
|||
Generally a [[Lisp]] data structure can be serialized with the functions "read" and "print". A variable foo containing, for example, a list of arrays would be printed by (print foo). Similarly the contents of a stream can be read into a variable by (read foo). These two parts of the Lisp implementation are called the Printer and the Reader. The output of "print" in human readable, it uses lists demarked by parentheses for example (4 2.9 "x" y). |
|||
In many types of Lisp, including [[Common Lisp]], the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function print-object, this will be invoked when the object is printed. This is somewhat similar to the method used in Ruby. |
|||
Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out. See [[REPL]]. |
|||
== See also == |
|||
==See also== |
|||
* [[Commutation (telemetry)]] |
|||
* [[Comparison of data serialization formats]] |
|||
* [[Container format]] |
|||
* [[Hibernate (Java)]] |
* [[Hibernate (Java)]] |
||
* [[XML Schema (W3C)|XML Schema]] |
|||
* [[Basic Encoding Rules]] |
|||
* [[Google Protocol Buffers]] |
|||
* [[Wikidata|Wikibase]] |
|||
* [[Apache Avro]] |
|||
==References== |
|||
== External links == |
|||
{{reflist|30em}} |
|||
For Java: |
|||
==External links== |
|||
* {{Javadoc:SE-guide|serialization|Java Object Serialization documentation}} |
* {{Javadoc:SE-guide|serialization|Java Object Serialization documentation}} |
||
* [https://web.archive.org/web/20070122194723/http://java.sun.com/j2se/1.4.2/docs/guide/serialization/index.html Java 1.4 Object Serialization documentation]. |
|||
* [http://www.macchiato.com/columns/Durable4.html Durable Java: Serialization] {{webarchive |url=https://web.archive.org/web/20051125013312/http://www.macchiato.com/columns/Durable4.html |date=25 November 2005 }} |
|||
* [http://rpbourret.com/xml/XMLDataBinding.htm XML Data Binding Resources] |
|||
* [http://dev.simantics.org/index.php/Org.simantics.databoard Databoard] - Binary serialization with partial and random access, type system, RPC, type adaption, and text format |
|||
{{Data exchange}} |
|||
* {{Javadoc:SE-guide|serialization/spec/serialTOC.html|Java Object Serialization Specification}} |
|||
* [http://www.macchiato.com/columns/Durable4.html Durable Java: Serialization] |
|||
* [http://rpbourret.com/xml/XMLDataBinding.htm XML Data Binding Resources] |
|||
{{Authority control}} |
|||
[[Category:Programming constructs]] |
|||
[[Category:Data structures]] |
|||
[[Category:Data serialization formats]] |
|||
[[Category:Data serialization formats|*]] |
|||
[[als:Serialisierung]] |
|||
[[Category:Persistence]] |
|||
[[ca:Serialització]] |
|||
[[de:Serialisierung]] |
|||
[[es:Serialización]] |
|||
[[fr:Sérialisation]] |
|||
[[gl:Serialización]] |
|||
[[ja:シリアライズ]] |
|||
[[pl:Serializacja]] |
|||
[[ru:Сериализация]] |
Latest revision as of 01:10, 18 November 2024
This article's use of external links may not follow Wikipedia's policies or guidelines. (August 2024) |
In computing, serialization (or serialisation, also referred to as pickling in Python) is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e.g. data streams over computer networks) and reconstructed later (possibly in a different computer environment).[1] When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.
This process of serializing an object is also called marshalling an object in some situations.[2][3][4] The opposite operation, extracting a data structure from a series of bytes, is deserialization, (also called unserialization or unmarshalling).
In networking equipment hardware, the part that is responsible for serialization and deserialization is commonly called SerDes.
Uses
[edit]Uses of serialization include:
- serializing data for transfer across wires and networks (messaging).
- storing data (in databases, on hard disk drives).
- remote procedure calls, e.g., as in SOAP.
- distributing objects, especially in component-based software engineering such as COM, CORBA, etc.
- detecting changes in time-varying data.
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of endianness. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture-independent format means preventing the problems of byte ordering, memory layout, or simply different ways of representing data structures in different programming languages.
Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications, this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization.
Even on a single machine, primitive pointer objects are too fragile to save because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called unswizzling or pointer unswizzling, where direct pointer references are converted to references based on name or position. The deserialization process includes an inverse step called pointer swizzling.
Since both serializing and deserializing can be driven from common code (for example, the Serialize function in Microsoft Foundation Classes), it is possible for the common code to do both at the same time, and thus, 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy because differences can be detected on the fly, a technique called differential execution. This is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.
Drawbacks
[edit]Serialization breaks the opacity of an abstract data type by potentially exposing private implementation details. Trivial implementations which serialize all data members may violate encapsulation.[5]
To discourage competitors from making compatible products, publishers of proprietary software often keep the details of their programs' serialization formats a trade secret. Some deliberately obfuscate or even encrypt the serialized data. Yet, interoperability requires that applications be able to understand each other's serialization formats. Therefore, remote method call architectures such as CORBA define their serialization formats in detail.
Many institutions, such as archives and libraries, attempt to future proof their backup archives—in particular, database dumps—by storing them in some relatively human-readable serialized format.
Serialization formats
[edit]The Xerox Network Systems Courier technology in the early 1980s influenced the first widely adopted standard. Sun Microsystems published the External Data Representation (XDR) in 1987.[6] XDR is an open format, and standardized as STD 67 (RFC 4506).
In the late 1990s, a push to provide an alternative to the standard serialization protocols started: XML, an SGML subset, was used to produce a human-readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. In the 2000s, XML was often used for asynchronous transfer of structured data between client and server in Ajax web applications. XML is an open format, and standardized as a W3C recommendation.
JSON is a lightweight plain-text alternative to XML, and is also commonly used for client-server communication in web applications. JSON is based on JavaScript syntax, but is independent of JavaScript and supported in many other programming languages. JSON is an open format, standardized as STD 90 (RFC 8259), ECMA-404, and ISO/IEC 21778:2017.
YAML is a strict superset of JSON and includes additional features such as a data type tags, support for cyclic data structures, indentation-sensitive syntax, and multiple forms of scalar data quoting. YAML is an open format.
Property lists are used for serialization by NeXTSTEP, GNUstep, macOS, and iOS frameworks. Property list, or p-list for short, doesn't refer to a single serialization format but instead several different variants, some human-readable and one binary.
For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF, netCDF and the older GRIB.
Programming language support
[edit]Several object-oriented programming languages directly support object serialization (or object archival), either by syntactic sugar elements or providing a standard interface for doing so. The languages which do so include Ruby, Smalltalk, Python, PHP, Objective-C, Delphi, Java, and the .NET family of languages. There are also libraries available that add serialization support to languages that lack native support for it.
C and C++
[edit]C and C++ do not provide serialization as any sort of high-level construct, but both languages support writing any of the built-in data types, as well as plain old data structs, as binary data. As such, it is usually trivial to write custom serialization functions. Moreover, compiler-based solutions, such as the ODB ORM system for C++ and the gSOAP toolkit for C and C++, are capable of automatically producing serialization code with few or no modifications to class declarations. Other popular serialization frameworks are Boost.Serialization[7] from the Boost Framework, the S11n framework,[8] and Cereal.[9] MFC framework (Microsoft) also provides serialization methodology as part of its Document-View architecture.
CFML
[edit]CFML allows data structures to be serialized to WDDX with the <cfwddx>
tag and to JSON with the SerializeJSON() function.
Delphi
[edit]Delphi provides a built-in mechanism for serialization of components (also called persistent objects), which is fully integrated with its IDE. The component's contents are saved to a DFM file and reloaded on-the-fly.
Go
[edit]Go natively supports unmarshalling/marshalling of JSON and XML data.[10] There are also third-party modules that support YAML[11] and Protocol Buffers.[12] Go also supports Gobs.[13]
Haskell
[edit]In Haskell, serialization is supported for types that are members of the Read and Show type classes. Every type that is a member of the Read
type class defines a function that will extract the data from the string representation of the dumped data. The Show
type class, in turn, contains the show
function from which a string representation of the object can be generated. The programmer need not define the functions explicitly—merely declaring a type to be deriving Read or deriving Show, or both, can make the compiler generate the appropriate functions for many cases (but not all: function types, for example, cannot automatically derive Show or Read). The auto-generated instance for Show also produces valid source code, so the same Haskell value can be generated by running the code produced by show in, for example, a Haskell interpreter.[14] For more efficient serialization, there are haskell libraries that allow high-speed serialization in binary format, e.g. binary.
Java
[edit]Java provides automatic serialization which requires that the object be marked by implementing the java.io.Serializable
interface. Implementing the interface marks the class as "okay to serialize", and Java then handles serialization internally. There are no serialization methods defined on the Serializable
interface, but a serializable class can optionally define methods with certain special names and signatures that if defined, will be called as part of the serialization/deserialization process. The language also allows the developer to override the serialization process more thoroughly by implementing another interface, the Externalizable
interface, which includes two special methods that are used to save and restore the object's state.
There are three primary reasons why objects are not serializable by default and must implement the Serializable
interface to access Java's serialization mechanism.
Firstly, not all objects capture useful semantics in a serialized state. For example, a Thread
object is tied to the state of the current JVM. There is no context in which a deserialized Thread
object would maintain useful semantics.
Secondly, the serialized state of an object forms part of its class' compatibility contract. Maintaining compatibility between versions of serializable classes requires additional effort and consideration. Therefore, making a class serializable needs to be a deliberate design decision and not a default condition.
Lastly, serialization allows access to non-transient private members of a class that are not otherwise accessible. Classes containing sensitive information (for example, a password) should not be serializable nor externalizable.[15]: 339–345 The standard encoding method uses a recursive graph-based translation of the object's class descriptor and serializable fields into a byte stream. Primitives as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object via a field that is not marked as transient
must also be serialized; and if any object in the complete graph of non-transient object references is not serializable, then serialization will fail. The developer can influence this behavior by marking objects as transient, or by redefining the serialization for an object so that some portion of the reference graph is truncated and not serialized.
Java does not use constructor to serialize objects. It is possible to serialize Java objects through JDBC and store them into a database.[16] While Swing components do implement the Serializable interface, they are not guaranteed to be portable between different versions of the Java Virtual Machine. As such, a Swing component, or any component which inherits it, may be serialized to a byte stream, but it is not guaranteed that this will be re-constitutable on another machine.
JavaScript
[edit]Since ECMAScript 5.1,[17] JavaScript has included the built-in JSON
object and its methods JSON.parse()
and JSON.stringify()
. Although JSON is originally based on a subset of JavaScript,[18] there are boundary cases where JSON is not valid JavaScript. Specifically, JSON allows the Unicode line terminators U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR to appear unescaped in quoted strings, while ECMAScript 2018 and older does not.[19][20] See the main article on JSON.
Julia
[edit]Julia implements serialization through the serialize()
/ deserialize()
modules,[21] intended to work within the same version of Julia, and/or instance of the same system image.[22] The HDF5.jl
package offers a more stable alternative, using a documented format and common library with wrappers for different languages,[23] while the default serialization format is suggested to have been designed rather with maximal performance for network communication in mind.[24]
Lisp
[edit]Generally a Lisp data structure can be serialized with the functions "read
" and "print
". A variable foo containing, for example, a list of arrays would be printed by (print foo)
. Similarly an object can be read from a stream named s by (read s)
. These two parts of the Lisp implementation are called the Printer and the Reader. The output of "print
" is human readable; it uses lists demarked by parentheses, for example: (4 2.9 "x" y)
. In many types of Lisp, including Common Lisp, the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function print-object
, this will be invoked when the object is printed. This is somewhat similar to the method used in Ruby. Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out, such as in a read–eval–print loop. Not all readers/writers support cyclic, recursive or shared structures.
.NET
[edit].NET has several serializers designed by Microsoft. There are also many serializers by third parties. More than a dozen serializers are discussed and tested here.[25] and here[26]
OCaml
[edit]OCaml's standard library provides marshalling through the Marshal
module[3] and the Pervasives functions output_value
and input_value
. While OCaml programming is statically type-checked, uses of the Marshal
module may break type guarantees, as there is no way to check whether an unmarshalled stream represents objects of the expected type. In OCaml it is difficult to marshal a function or a data structure which contains a function (e.g. an object which contains a method), because executable code in functions cannot be transmitted across different programs. (There is a flag to marshal the code position of a function but it can only be unmarshalled in exactly the same program). The standard marshalling functions can preserve sharing and handle cyclic data, which can be configured by a flag.
Perl
[edit]Several Perl modules available from CPAN provide serialization mechanisms, including Storable
, JSON::XS
and FreezeThaw
. Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars. In addition to serializing directly to files, Storable
includes the freeze
function to return a serialized copy of the data packed into a scalar, and thaw
to deserialize such a scalar. This is useful for sending a complex data structure over a network socket or storing it in a database. When serializing structures with Storable
, there are network safe functions that always store their data in a format that is readable on any computer at a small cost of speed. These functions are named nstore
, nfreeze
, etc. There are no "n" functions for deserializing these structures — the regular thaw
and retrieve
deserialize structures serialized with the "n
" functions and their machine-specific equivalents.
PHP
[edit]PHP originally implemented serialization through the built-in serialize()
and unserialize()
functions.[27] PHP can serialize any of its data types except resources (file pointers, sockets, etc.). The built-in unserialize()
function is often dangerous when used on completely untrusted data.[28] For objects, there are two "magic methods" that can be implemented within a class — __sleep()
and __wakeup()
— that are called from within serialize()
and unserialize()
, respectively, that can clean up and restore an object. For example, it may be desirable to close a database connection on serialization and restore the connection on deserialization; this functionality would be handled in these two magic methods. They also permit the object to pick which properties are serialized. Since PHP 5.1, there is an object-oriented serialization mechanism for objects, the Serializable
interface.[29]
Prolog
[edit]Prolog's term structure, which is the only data structure of the language, can be serialized out through the built-in predicate write_term/3
and serialized-in through the built-in predicates read/1
and read_term/2
. The resulting stream is uncompressed text (in some encoding determined by configuration of the target stream), with any free variables in the term represented by placeholder variable names. The predicate write_term/3
is standardized in the ISO Specification for Prolog (ISO/IEC 13211-1) on pages 59 ff. ("Writing a term, § 7.10.5"). Therefore it is expected that terms serialized-out by one implementation can be serialized-in by another without ambiguity or surprises. In practice, implementation-specific extensions (e.g. SWI-Prolog's dictionaries) may use non-standard term structures, so interoperability may break in edge cases. As examples, see the corresponding manual pages for SWI-Prolog,[30] SICStus Prolog,[31] GNU Prolog.[32] Whether and how serialized terms received over the network are checked against a specification (after deserialization from the character stream has happened) is left to the implementer. Prolog's built-in Definite Clause Grammars can be applied at that stage.
Python
[edit]The core general serialization mechanism is the pickle
standard library module, alluding to the database systems term pickling[33][34][35] to describe data serialization (unpickling for deserializing). Pickle uses a simple stack-based virtual machine that records the instructions used to reconstruct the object. It is a cross-version customisable but unsafe (not secure against erroneous or malicious data) serialization format. Malformed or maliciously constructed data, may cause the deserializer to import arbitrary modules and instantiate any object.[36][37] The standard library also includes modules serializing to standard data formats: json
(with built-in support for basic scalar and collection types and able to support arbitrary types via encoding and decoding hooks). plistlib
(with support for both binary and XML property list formats). xdrlib
(with support for the External Data Representation (XDR) standard as described in RFC 1014). Finally, it is recommended that an object's __repr__
be evaluable in the right environment, making it a rough match for Common Lisp's print-object
. Not all object types can be pickled automatically, especially ones that hold operating system resources like file handles, but users can register custom "reduction" and construction functions to support the pickling and unpickling of arbitrary types. Pickle was originally implemented as the pure Python pickle
module, but, in versions of Python prior to 3.0, the cPickle
module (also a built-in) offers improved performance (up to 1000 times faster[36]). The cPickle
was adapted from the Unladen Swallow project. In Python 3, users should always import the standard version, which attempts to import the accelerated version and falls back to the pure Python version.[38]
R
[edit]R has the function dput
which writes an ASCII text representation of an R object to a file or connection. A representation can be read from a file using dget
.[39] More specific, the function serialize
serializes an R object to a connection, the output being a raw vector coded in hexadecimal format. The unserialize
function allows to read an object from a connection or a raw vector.[40]
REBOL
[edit]REBOL will serialize to file (save/all
) or to a string!
(mold/all
). Strings and files can be deserialized using the polymorphic load
function. RProtoBuf
provides cross-language data serialization in R, using Protocol Buffers.[41]
Ruby
[edit]Ruby includes the standard module Marshal
with 2 methods dump
and load
, akin to the standard Unix utilities dump
and restore
. These methods serialize to the standard class String
, that is, they effectively become a sequence of bytes. Some objects cannot be serialized (doing so would raise a TypeError
exception): bindings, procedure objects, instances of class IO, singleton objects and interfaces. If a class requires custom serialization (for example, it requires certain cleanup actions done on dumping / restoring), it can be done by implementing 2 methods: _dump
and _load
. The instance method _dump
should return a String
object containing all the information necessary to reconstitute objects of this class and all referenced objects up to a maximum depth given as an integer parameter (a value of -1 implies that depth checking should be disabled). The class method _load
should take a String
and return an object of this class.
Rust
[edit]Serde
is the most widely used library, or crate, for serialization in Rust.
Smalltalk
[edit]In general, non-recursive and non-sharing objects can be stored and retrieved in a human readable form using the storeOn:
/readFrom:
protocol. The storeOn:
method generates the text of a Smalltalk expression which – when evaluated using readFrom:
– recreates the original object. This scheme is special, in that it uses a procedural description of the object, not the data itself. It is therefore very flexible, allowing for classes to define more compact representations. However, in its original form, it does not handle cyclic data structures or preserve the identity of shared references (i.e. two references a single object will be restored as references to two equal, but not identical copies). For this, various portable and non-portable alternatives exist. Some of them are specific to a particular Smalltalk implementation or class library. There are several ways in Squeak Smalltalk to serialize and store objects. The easiest and most used are storeOn:/readFrom:
and binary storage formats based on SmartRefStream
serializers. In addition, bundled objects can be stored and retrieved using ImageSegments
. Both provide a so-called "binary-object storage framework", which support serialization into and retrieval from a compact binary form. Both handle cyclic, recursive and shared structures, storage/retrieval of class and metaclass info and include mechanisms for "on the fly" object migration (i.e. to convert instances which were written by an older version of a class with a different object layout). The APIs are similar (storeBinary/readBinary), but the encoding details are different, making these two formats incompatible. However, the Smalltalk/X code is open source and free and can be loaded into other Smalltalks to allow for cross-dialect object interchange. Object serialization is not part of the ANSI Smalltalk specification. As a result, the code to serialize an object varies by Smalltalk implementation. The resulting binary data also varies. For instance, a serialized object created in Squeak Smalltalk cannot be restored in Ambrai Smalltalk. Consequently, various applications that do work on multiple Smalltalk implementations that rely on object serialization cannot share data between these different implementations. These applications include the MinneStore object database[42] and some RPC packages. A solution to this problem is SIXX,[43] which is a package for multiple Smalltalks that uses an XML-based format for serialization.
Swift
[edit]The Swift standard library provides two protocols, Encodable
and Decodable
(composed together as Codable
), which allow instances of conforming types to be serialized to or deserialized from JSON, property lists, or other formats.[44] Default implementations of these protocols can be generated by the compiler for types whose stored properties are also Decodable
or Encodable
.
PowerShell
[edit]PowerShell implements serialization through the built-in cmdlet Export-CliXML
. Export-CliXML
serializes .NET objects and stores the resulting XML in a file. To reconstitute the objects, use the Import-CliXML
cmdlet, which generates a deserialized object from the XML in the exported file. Deserialized objects, often known as "property bags" are not live objects; they are snapshots that have properties, but no methods. Two dimensional data structures can also be (de)serialized in CSV format using the built-in cmdlets Import-CSV
and Export-CSV
.
See also
[edit]- Commutation (telemetry)
- Comparison of data serialization formats
- Container format
- Hibernate (Java)
- XML Schema
- Basic Encoding Rules
- Google Protocol Buffers
- Wikibase
- Apache Avro
References
[edit]- ^ Cline, Marshall. "C++ FAQ: "What's This "Serialization" Thing All About?"". Archived from the original on 2015-04-05.
It lets you take an object or group of objects, put them on a disk or send them through a wire or wireless transport mechanism, then later, perhaps on another computer, reverse the process, resurrecting the original object(s). The basic mechanisms are to flatten object(s) into a one-dimensional stream of bits, and to turn that stream of bits back into the original object(s).
- ^ "Module: Marshal (Ruby 3.0.2)". ruby-doc.org. Retrieved 25 July 2021.
- ^ a b "Marshal". OCaml. Retrieved 25 July 2021.
- ^ "Python 3.9.6 documentation - Python object serialization —pickle". Documentation - The Python Standard Library.
- ^ S. Miller, Mark. "Safe Serialization Under Mutual Suspicion". ERights.org.
Serialization, explained below, is an example of a tool for use by objects within an object system for operating on the graph they are embedded in. This seems to require violating the encapsulation provided by the pure object model.
- ^ Sun Microsystems (1987). "XDR: External Data Representation Standard". RFC 1014. Network Working Group. Retrieved July 11, 2011.
- ^ "Serialization". www.boost.org.
- ^ beal, stephan. "s11n.net: object serialization/persistence in C++". s11n.net.
- ^ "cereal Docs - Main". uscilab.github.io.
- ^ "Package encoding". pkg.go.dev. 12 July 2021.
- ^ "GitHub - YAML support for the Go language". GitHub. Retrieved 25 July 2021.
- ^ "proto · pkg.go.dev". pkg.go.dev. Retrieved 2021-06-22.
- ^ "gob package - encoding/gob - pkg.go.dev". pkg.go.dev. Retrieved 2022-03-04.
- ^ "Text.Show Documentation". Retrieved 15 January 2014.
- ^ Bloch, Joshua (2018). "Effective Java: Programming Language Guide" (third ed.). Addison-Wesley. ISBN 978-0134685991.
- ^ "Ask TOM "Serializing Java Objects into the database (and ge..."". asktom.oracle.com.
- ^ "JSON". MDN Web Docs. Retrieved 22 March 2018.
- ^ "JSON". www.json.org. Retrieved 22 March 2018.
- ^ Holm, Magnus (15 May 2011). "JSON: The JavaScript subset that isn't". The timeless repository. Archived from the original on 13 May 2012. Retrieved 23 September 2016.
- ^ "TC39 Proposal: Subsume JSON". ECMA TC39 committee. 22 May 2018.
- ^ "Serialization". The Julia Language. Retrieved 25 July 2021.
- ^ "faster and more compact serialization of symbols and strings · JuliaLang/julia@bb67ff2". GitHub.
- ^ "HDF5.jl: Saving and loading data in the HDF5 file format". 20 August 2017 – via GitHub.
- ^ "Julia: how stable are serialize() / deserialize()". stackoverflow.com. 2014.
- ^ ".NET Serializers".
There are many kinds of serializers; they produce very compact data very fast. There are serializers for messaging, for data stores, for marshaling objects. What is the best serializer in .NET?
- ^ "SERBENCH by aumcode". aumcode.github.io.
- ^ "PHP: Object Serialization - Manual". ca.php.net.
- ^ Esser, Stephen (2009-11-28). "Shocking News in PHP Exploitation". Suspekt... Archived from the original on 2012-01-06.
- ^ "PHP: Serializable - Manual". www.php.net.
- ^ ""Term reading and writing"". www.swi-prolog.org.
- ^ ""write_term/[2,3]"". sicstus.sics.se.
- ^ ""Term input/output"". gprolog.org.
- ^ Herlihy, Maurice; Liskov, Barbara (October 1982). "A Value Transmission Method for Abstract Data Types" (PDF). ACM Transactions on Programming Languages and Systems. 4 (4): 527–551. CiteSeerX 10.1.1.87.5301. doi:10.1145/69622.357182. ISSN 0164-0925. OCLC 67989840. S2CID 8126961.
- ^ Birrell, Andrew; Jones, Mike; Wobber, Ted (November 1987). "A simple and efficient implementation of a small database". Proceedings of the eleventh ACM Symposium on Operating systems principles - SOSP '87. Vol. 11. pp. 149–154. CiteSeerX 10.1.1.100.1457. doi:10.1145/41457.37517. ISBN 089791242X. ISSN 0163-5980. OCLC 476062921. S2CID 12908261.
Our implementation makes use of a mechanism called "pickles", which will convert between any strongly typed data structure and a representation of that structure suitable for storing in permanent disk files. The operation Pickle.Write takes a pointer to a strongly typed data structure and delivers buffers of bits for writing to the disk. Conversely Pickle.Read reads buffers of bits from the disk and delivers a copy of the original data structure.(*) This conversion involves identifying the occurrences of addresses in the structure, and arranging that when the structure is read back from disk the addresses are replaced with addresses valid in the current execution environment. The pickle mechanism is entirely automatic: it is driven by the run-time typing structures that are present for our garbage collection mechanism. ... (*) Pickling is quite similar to the concept of marshalling in remote procedure calls. But in fact our pickling implementation works only by interpreting at run-time the structure of dynamically typed values, while our RPC implementation works only by generating code for the marshalling of statically typed values. Each facility would benefit from adding the mechanisms of the other, but that has not yet been done.
- ^ van Rossum, Guido (1 December 1994). "Flattening Python Objects". Python Programming Language – Legacy Website. Delaware, United States: Python Software Foundation. Retrieved 6 April 2017.
Origin of the name 'flattening': Because I want to leave the original 'marshal' module alone, and Jim complained that 'serialization' also means something totally different that's actually relevant in the context of concurrent access to persistent objects, I'll use the term 'flattening' from now on. ... (The Modula-3 system uses the term 'pickled' data for this concept. They have probably solved all problems already, and in a type-safe manner :-)
- ^ a b "11.1. pickle — Python object serialization — Python 2.7.14rc1 documentation". docs.python.org.
- ^ "pickle — Python object serialization — Python v3.0.1 documentation". docs.python.org.
- ^ "What's New In Python 3.0 — Python v3.1.5 documentation". docs.python.org.
- ^ [R manual http://stat.ethz.ch/R-manual/R-patched/library/base/html/dput.html]
- ^ [R manual http://stat.ethz.ch/R-manual/R-patched/library/base/html/serialize.html]
- ^ Eddelbuettel, Dirk; Stokely, Murray; Ooms, Jeroen (2014). "RProtoBuf: Efficient Cross-Language Data Serialization in R". Journal of Statistical Software. 71 (2). arXiv:1401.7372. doi:10.18637/jss.v071.i02. S2CID 36239952.
- ^ "MinneStore version 2". SourceForge. Archived from the original on 11 May 2008.
- ^ "What's new". SIXX - Smalltalk Instance eXchange in XML. 23 January 2010. Retrieved 25 July 2021.
- ^ "Swift Archival & Serialization". www.github.com. 2018-12-02.
External links
[edit]- Java Object Serialization documentation
- Java 1.4 Object Serialization documentation.
- Durable Java: Serialization Archived 25 November 2005 at the Wayback Machine
- XML Data Binding Resources
- Databoard - Binary serialization with partial and random access, type system, RPC, type adaption, and text format