Talk:Flat-file database
This is the talk page for discussing improvements to the Flat-file database article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Computing Start‑class | ||||||||||
|
Databases (inactive) | ||||
|
definition inaccurate now
Someone went ahead and merged "flat file" with "flat file database", which I suppose is a step toward the greater good, but:
a flat file database is not always a single file database. a single file, flat file database is a special type of flat file database.
pmwiki is a flatfile database wiki, for example. it uses one page per wiki entry, and sorts the entries using file names in the filesystem tiddlywiki, on the other hand, is a flatfile database wiki, that holds everything in one file. sorting is accomplished by tag and location in the file's structure.
-maxnort96.245.14.103 (talk) 20:35, 17 September 2013 (UTC)
- As I recall the word "flat" in "flat file" was used to contrast with hierarchical data stores. That contrast may have seemed dated for a while, but now seems relevant again as data stores with more hierarchical or graph-like structures have regained popularity. Ericfluger (talk) 20:17, 31 May 2015 (UTC)
Hyphen (removed)
Why the hyphen in the noun? Deb 18:19, 3 Apr 2004 (UTC)
Beats the hell out of me. Edit boldly, as they say. Paul Beardsell 16:23, 8 Jun 2004 (UTC)
Graphic example
This page lacks any example to illustrate the concept, thus it hewes more closely to theory than to the thing itself. I shall edit an upgrade at: User:Xiong/Flat file database. — Xiong (talk) 14:30, 2005 Mar 25 (UTC)
The upgrade is live. — Xiong (talk) 22:42, 2005 Mar 25 (UTC)
Repeatable fields, and Boolean logic
One improvement of flat file databases was repeatable fields. Around 1970, some DBMS included "repeatable fields". In a bibliographic application, multiple subject headings may be needed to describe a document. Before repeatable fields, the designer would have to establish as many separate subject-heading fields as the maximum number expected to be needed. Many of these fixed-length fields would be unused in most of the records, thereby wasting storage space. Repeatable fields minimized such waste. Also, Boolean logic could be used to query/search flat file databases. Before the advent of relational DBMS, many bibliographic databases were flat file databases. Should this be mentioned in this article? AnonUser 02:02, 15 October 2005 (UTC)
My rationale
I'm going to go ahead and be bold. Excel is not a flat-file database, so Excel examples do not belong. FileMaker bills itself as a fully-fledged database tool, so it's not a flat-file database either. They might be able to import/export to a flat format, but this is not native to them. The article, as it was, read like an advertisment for FileMaker, providing instructions on how to use it and how to make it look a bit like a flat file database. --Sam Pointon 17:37, 5 June 2006 (UTC)
Flat-file != Relational DB ?
I find this quote strange: "The data are flat as in a sheet of paper, in contrast to more complex models such as a relational database." If I was to make a simple DB I'd just use more than one file to create some relations. Or is it imperative that a flat-file is just one file?
I ask because I have no idea of how it works, but it seems to be a simple walk-around.
File1
id name team 1 Amy Blues 2 Bob Reds 3 Chuck Blues 4 Dick Blues 5 Ethel Reds 6 Fred Blues 7 Gilly Blues 8 Hank Reds
File2
team arena Blues le Grand Bleu Reds Super Smirnoff Stadium
Query Suppose I want to date Amy - what arena do I go to.
team = "SELECT team FROM File1 WHERE name='Amy';" arena = "SELECT arena FROM File2 WHERE team=" + team + ";"
Of course I'd have to build a wrapper between SQL and the flat-files but that should be quite easy.
PER9000 07:06, 4 August 2006 (UTC)
- I also oppose this formulation: "A flat file database is described by a very simple database model, where all the information is stored in text files." - You can model an arbitrarily complex databases as flat-file databases. The thing that is simple about it is that it uses files with raw text instead of a some complex format. I or someone else should make this article more neutral and perhaps incorporate the small example I just made. Perhaps someone with a deeper insight in why it is more inefficient to store a db as flat-files (pointers, harddisks, that kind of stuff) should write a little about this. PER9000 07:26, 4 August 2006 (UTC)
- Yet another insight on my part: In database they talk about a Flat model and not a Flat file database. To me this is/was/should be (I don't know any more) two separate things. That may explain my frustration. Also it is not clear to be that we must have only one table but many files (perhaps a metafor to a phonebook - one table, many pages?) PER9000 07:35, 4 August 2006 (UTC)
- Right the original article is with relation to the "flat" database model. I add a section where "flat files" are used as data stores of a relational database, using the above example. --ANONYMOUS COWARD0xC0DE 07:04, 8 February 2007 (UTC)
I agree. The preceeding comments indicate some room for clarification in the article content. I will modify the introductory paragraph for clarity, possibly other items as well. dr.ef.tymac 23:55, 29 November 2006 (UTC)
- This subject matter area is rife with disagreements about definitions. There can be conflicts between formal and casual use. There are also reasonable disagreements between very well informed people. AFAIK the historical origins of the term "flat" was to contrast with hierarchical data stores. It was about structure (or lack of it) rather than representation. So yeah, a flat file could be a table, and typically was, but it could also be a group of key-value pairs as long as there is no graph structure expressed. We can create a table including pointers from one row to another. By the definition I've just given that would not be a flat file. However, in current popular use it might be considered a flat file.
- Strictly speaking relational databases work with abstractions called relations and tuples that can be represented by by tables and rows. SQL databases work with tables and rows. A relational database purist would probably call that confusing the map with territory. SQL advocates would probably say, that's how it's done in practice rather than theory and it's close enough. It's certainly quite practical to restrict, project and join ordinary text tables with common POSIX-style tools, and there are more formal systems for doing so, like shsql. Whether or not that's really relational depends on which of those views you subscribe to. (I personally try to sidestep the whole debate by referring to SQL databases as just that and reserving the term "relational" where it very clearly fits, but that's not always practical. I don't really care that much, just don't wanna start flame wars.)
- So my feeling is that it's good to say what you mean without jargon when it's practical, and when introducing terms it's probably helpful to spell out not only what you mean, but to acknowledge alternate definitions to avoid confusion, and if practical explain how you made your choice. I think it's good to be diplomatic about this stuff and say things like "for the purpose of this discussion we're defining it this way" rather than trying to present an absolute universal definition of anything.
- Hope there was something helpful in there somewhere, Ericfluger (talk) 21:30, 31 May 2015 (UTC)
Merge Flat file
Recommend merging the article Flat file with this one. What do you think? Thanks. SqlPac 04:26, 17 May 2007 (UTC)
Flat File Database is a collection of flat files. A Flat File may or may not be (part of) a database. The "Flat File Database" article should remain separate from the "Flat File" article because people looking for information about flat files may or may not want all the information about databases --- But I do believe the articles should reference/link to one another. My considered opinion. Please be kind, this is my first submission into Wikipedia.org. (previously posted Marion 18:26, 15 August 2007 (UTC)Eisforeverything on another discussion by mistake) Marion 18:37, 21 August 2007 (UTC)Eisforeverything (aka Marion)
I think they should be merged, for a rationale see Talk:Flat file. Adrianwn (talk) 17:13, 13 August 2008 (UTC)
Flat File and Flat File Database Topics should be kept separate
Flat File Database is a collection of flat files. A Flat File may or may not be (part of) a database.
The "Flat File Database" article should remain separate from the "Flat File" article because people looking for information about flat files may or may not want all the information about databases --- But I do believe the articles should reference/link to one another.
My considered opinion. Please be kind, this is my first submission into Wikipedia.org.
Marion 18:26, 15 August 2007 (UTC)Eisforeverything
I'd like to delete this I created it as a new discussion by mistake Marion 18:39, 21 August 2007 (UTC)
I think the article "flat file" and "flat file database" should be kept as two separate articles. Stolkin 19:01, 23 October 2007 (UTC)
Graphic Comment
I'm not sure that the blurb under the first graphic on the page actually makes any sense - it gives "one of several typical uses for a flat file database" as being convertible to a fully-fledged relational database.
Firstly, this doesn't really make sense as a use in of itself. Moreover, "converting to" might be less representative of real usage than "converting into a format importable into" (or whatever)?
Rswarbrick (talk) 20:33, 3 February 2008 (UTC)
- The last part is now gone, reasonable way to address the issues here; it would seem. dr.ef.tymac (talk) 03:20, 4 February 2008 (UTC)
Coinage
I would like to discuss adding two paragraphs on the origination of the term "flat file". The term flat file was coined in 1971 on the campus of Modesto Junior College by the founder of the computer club. At that time data sets were described in the same way that they were stored in a computer on magnetic rings, i.e. two dimensional array or three dimensional array. The coiner decided that a shorter term with fewer syllables may get accepted as a substitute. The term flat file was decided upon because it had a physical image and because comic research found that people enjoy hearing and saying words with “f” sounds and “p’ sounds (think of all the four letter words that start with either of these two letters). The new term was disseminated using the phrase” … flat file, you know, a two dimensional array.” A transfer to UC Berkeley along with membership to the campus computer club spread the term on that campus during 1971 – 1975. Later a position with the Fireman’s Insurance Company, which trained a high percentage of the SF Bay Area COBOL programmers, helped spread the term throughout the local region. Please feel free to edit/change/correct as necessary.David E. Mould (talk) 15:16, 21 October 2010 (UTC)
Do you have any kind of reliable sources for your claims? Please see WP:V and [WP:OR]] for further information. – Adrian Willenbücher (talk) 21:36, 25 October 2010 (UTC)
I don't have a reliable source for my claims. I don't know how to document something like this. does anybody have advice / experience they can provide? David E. Mould (talk) 16:51, 9 November 2010 (UTC)
Does anyone object to me adding the above paragraph as an opinion?David E. Mould (talk) 21:02, 16 March 2011 (UTC)
- Everything in the article should be supportable by a reliable source, so if all you have is opinion or first-hand experience, then it shouldn't be included. As Adrian Willenbücher pointed you to above, you should read WP:Verifiability and WP:No original research for explanations of why we have those requirements. VernoWhitney (talk) 21:24, 16 March 2011 (UTC)
I suggest that origin of the phrase "flat file" might be particularly difficult to adequately document. I further suggest that it may very well have arisen independently in multiple locations. For example, when IBM introduced VSAM in 1970 (or early '70s) the term flat file rapidly came in to daily use in IBM mainframe shops to describe a non-VSAM file. Meaning a non-indexed, or non-keyed, file. Or, as the article states, a file in which "There are no structural relationships between the records". (personal experience). Perhaps a statement in the article to this effect might be more accurate. Merligren (talk) 20:53, 22 April 2011 (UTC)
Flat File Indexing (Instantaneous Random Access to Any Record)
Flat file databases are used (historically) for sequential processing of textual data records, large amounts of data, with perhaps large records. The weakness of Flat file databases is the lack of ability for records to be randomly accessed. This can be accomplished with reading the entire Flat File and loading record offsets (in bytes) to a program hash table, for later random access whilst the program is currently in operation/loaded. But this has to be done each time the program is ran, as the program hash table must be reloaded each time. A user is not likely to appreciate nor tolerate such a wait time, especially if the Flat file is very large (millions of records). To solve this problem, the record offsets may be loaded and maintained within a persistent, public domain, external, binary SDBM file (of key/value pairs) tied to an "in memory" program hash table. That way, the record offsets are immediately available (at launch) to the program accessing the Flat File database. The File pointer can be set to any record offset (in bytes) within the opened Flat File, for instantaneous random access to any record. Supported are Unique Primary Keys, Unique Alternate Keys, and Alternate Keys with Duplicates (see example code below). Access to records in the database is arbitrarily done, as the Key in the Key/Value pair can be made up of a single or multiple fields and/or partial fields contained within the Flat File records. The Value in the Key/Value pair is used to hold the record byte offset relative from TOP or END of file. This can be a positive or negative integer. Some systems may support double integers. This is ISAM, NoSQL, Embedded database technology - totally FREE to implement and distribute with NO LIMITATIONS. Many programming languages (such as Perl) have built in SDBM database support. Also, Flat File record edits are made "in place", either overwriting an entire record with the record changes made in memory, or only overwriting a single or multiple fields. You can build Relational Databases with this dual/tandem database methodology which have both LOOKUPS (to eliminate redundant data) and one-to-many parent/child record relationships - with the child records maintained within separate Flat File(s).
- REDIRECT dbm
#-- For a Flat File with Primary Unique Index on Social Security Number, a useful Alternate Key with Duplicates #-- may also be needed to track down a record when the Social Security Number is not known. #-- YYYYMMDD #-- Key example: BirthDate|LastNameFirst4Chars|FirstNameInitial|StateCode #-- "19591219|Will|K|TX" #-- $KEY without a Seq Nbr is used to increment the number of records saved to the database #-- having a particular ALT KEY w/DUPS - in this example: "19591219|Will|K|TX"
$KEY=$BirthDate . "|" . $LastNameFirst4Chars . "|" . $FirstNameInitial . "|" . $StateCode; $Hash{$KEY}=0;
#-- Now index the first record encountered in the Flat File database with this particular ALT KEY w/DUPS $num_recs = $Hash{$KEY}; $num_recs++; #-- i.e. one(1) $Hash{$KEY}=$num_recs; $newKEY=$KEY . "|" . $num_recs; #-- produces: "19591219|Will|K|TX|1" $Hash{$newKEY}= #-- The VALUE would be set to the byte offset of the Flat File record just indexed
#-- Now index the second record encountered in the Flat File database with this particular ALT KEY w/DUPS $num_recs = $Hash{$KEY}; $num_recs++; #-- i.e. two(2) $Hash{$KEY}=$num_recs; $newKEY=$KEY . "|" . $num_recs; #-- produces: "19591219|Will|K|TX|2" $Hash{$newKEY}= #-- The VALUE would be set to the byte offset of the Flat File record just indexed
#-- and so on...
Assorted Semi-boo-Boos:
(Sorry I can't be more thorough, but I'm pressed for time.)
During a quick read I noticed some statements that seemed a bit off, or maybe just confusing, that could use a bit of either correction or clarification. For example the section on contemporary use refers to several legacy products. Also address books and sqlite are both given as examples of flat file storage. Address books are structured in various ways. Some are flat files, but others are not. As I recall, BBDB has some hierarchical structure, and vCard, which is used for persistence as well at data exchange, is flexible enough to adapt to various structures. A vCard file can be flat, but it can also be a graph. Sqlite is a surprisingly full featured SQL database that can be used to manage flat files but is capable of much, much more. I have feeling that some of this stuff may be accidental artifacts of editing rather than misunderstanding and that a once over to clean up and clarify would probably help a lot. 20:35, 31 May 2015 (UTC) — Preceding unsigned comment added by Ericfluger (talk • contribs)
- Agreed. SQLite is backed by a file, but it does its own indexing and in-place updates on top of that. I will remove SQLite as an example if nobody objects soon. --Damian Yerrick (talk) 16:16, 12 June 2015 (UTC)
- Sql server stores its data in one file. Should we list it as a flat file database as well ? — Preceding unsigned comment added by 213.30.149.61 (talk) 08:21, 19 June 2015 (UTC)
I think SQLite should probably count as a flatfile database. According to SQLite's website, [t]he complete state of an SQLite database is usually contained [in] a single file on disk called the 'main database file'. The SQLite Database File Format That's straight from the SQLite webpage. Claystu (talk) 16:55, 25 June 2015 (UTC)
- The lead section of this article states that "the file must be read in its entirety into the computer's memory" and that changes are made by modifying the data in memory and then writing the entire database "out in its entirety to the host's file system." Though an SQLite database file appears as a single binary file to the host operating system, SQLite modifies the file in place a block at a time and does its own on-disk indexing. The only time SQLite rewrites the whole file is during a
VACUUM
statement. --Damian Yerrick (talk) 15:19, 12 July 2015 (UTC)
litereplica creator here. SQLite is NOT a flat-file database. The data is stored in pages of fixed size using a B-tree and only the required pages are read to memory. When a transaction is completed only the modified pages are written to disk as well. So please remove this reference to SQLite as it is really wrong! — Preceding unsigned comment added by Kroggen (talk • contribs) 05:24, 27 August 2016 (UTC)
fixed width
"Simple" things should not always be taken for granted. More could be written about so called "fixed-width formatted" files (maybe in its own article). — Preceding unsigned comment added by 2001:6B0:E:4B42:0:0:0:206 (talk) 16:29, 28 June 2016 (UTC)
External links modified
Hello fellow Wikipedians,
I have just modified one external link on Flat file database. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
- Added archive https://web.archive.org/web/20090320001015/http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/$FILE/DIGloss.pdf to http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/$FILE/DIGloss.pdf
When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}
).
This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}}
(last update: 5 June 2024).
- If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
- If you found an error with any archives or the URLs themselves, you can fix them with this tool.
Cheers.—InternetArchiveBot (Report bug) 18:05, 21 July 2016 (UTC)
Don't have to be in memory, can be relational, more than a single file
Flat file databases don't have to be read into memory and can meet the base level definition of relational. /rdb, RDB and Nosql, just to name a few, use text files and support at least join, project and select. They're programmed in perl and/or awk plus Unix utilities. There are others written in PHP, Ruby, etc.
Is a single file with rows and columns a database? I've always had trouble with the idea that something like an address book is a database.
Anyway, this article really needs to be reworked unless there's a separate one for relational text databases. Jhart (talk) 23:33, 21 September 2016 (UTC)