Jump to content

Wikipedia:Articles for deletion/Very large database

From Wikipedia, the free encyclopedia
The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.

The result was keep. Merging, if desirable, can be discussed on the talk page. -- Patar knight - chat/contributions 03:46, 8 October 2018 (UTC)[reply]

Very large database (edit | talk | history | protect | delete | links | watch | logs | views) – (View log · Stats)
(Find sources: Google (books · news · scholar · free images · WP refs· FENS · JSTOR · TWL)

This is just a dictionary definition: ‘a VLDB is a very large database, but we cannot say how large is very large’ Dirk Beetstra T C 03:46, 28 September 2018 (UTC)[reply]

Note: This discussion has been included in the list of Computing-related deletion discussions. North America1000 06:01, 28 September 2018 (UTC)[reply]
  • Delete Merge to Big data#Architecture. Not notable; NOTDICT. I don't see the benefit of a merge, as there's very little useful writing here. Enterprisey (talk!) 07:58, 28 September 2018 (UTC)[reply]
    Struck NOTDICT after a bit of thought. I stand by my "delete" vote: I agree with Dirk that we could also merge this into Database. I cannot see anything useful that can be written here that wouldn't also belong in the Database article. Enterprisey (talk!) 22:19, 28 September 2018 (UTC)[reply]
    Changed to Merge, per the good points made below. Enterprisey (talk!) 00:09, 30 September 2018 (UTC)[reply]
  • Keep The topic is notable; for example, here's a book about one aspect of it: Mining Very Large Databases with Parallel Processing. To understand the general nature of this topic, see Building the biggest scientific databases. Notice that this uses synonyms such as "biggest" and "extremely large", so we should include those terms too. This demonstrates that the dictionary definition argument for deletion is nonsense. We are not dealing with a particular word here and there is no dictionary content – etymology, grammar and the like. Per WP:DICDEF, what we have here is a stub for a broad concept. Note that the page used to be larger but much of the content was split to form the VLDB page. That's about an annual conference which is devoted to this topic – the 45th will be next year in Los Angeles. They publish their proceedings each year and so that's another stack of potential sources. As it happens, I shall myself be attending a Wikimedia event tomorrow in Cambridge organised by Charles Matthews. This will discuss data-mining scientific sources and how Wikidata might help with this. I shall cite this discussion to demonstrate how we have our work cut out for us because Wikipedia's coverage of such topics is feeble. Thanks for the timely example. Andrew D. (talk) 17:57, 28 September 2018 (UTC)[reply]
    • @Andrew Davidson: So you can give me a definition of a very large database? Per the article, it is a database with a lot of records/data (well .. dôh). When does a database qualify as a very large database, and when is a large database not very large anymore? I understand that at some point a db becomes difficult to handle (though, that depends on storage methodology, and what one wants to get out of it. Compare it to a very large dictionary .. whether it contains 1000 words, or 1000.000.000 words, you'll be able to find 'database' quickly knowing how it is sorted. It is just that you need a bigger shelve and a strong table).
    (yes, I know, the article was split, the conference isa notable article in itself. And I can seehow there would be a conference on how to shelve something big, and how to organize it, but I don't think the subject discussed there warrants an own article). --Dirk Beetstra T C 18:40, 28 September 2018 (UTC)[reply]
    Looking at a PDF of the first book, I see nothing that can be used in this article outside of trivial statements such as how running standard algorithms on large data sets takes a long time. The article originally from SLAC Today is similarly uninstructive, and I would argue that neither provide coverage above the passing level. The article alludes to a "Very Large Database conference", from which sources may be found, but in the absence of those I don't think there are enough sources to support an article. Enterprisey (talk!) 19:11, 28 September 2018 (UTC)[reply]
Note: This debate has been included in the Article Rescue Squadron's list of content for rescue consideration. Andrew D. (talk) 21:07, 28 September 2018 (UTC)[reply]
  • Delete (Came here from ARS.) Dicdef. The fact that one can find sources "about this topic" is meaningless, as the same could be said about all sorts of words and phrases, and we are not a dictionary. The veiled call to engage in SYNTH above ("broad topic") is reminiscent of Andrew's comments for which he's been called out in other AFDs here and here, for example. Hijiri 88 (やや) 22:29, 28 September 2018 (UTC)[reply]
  • Comment. I'm not sure yet whether I'm a keep on this yet, but the criticism that there is no exact definition is very much a red herring. Nobody would argue that a supermarket is "just" a large grocery store and shouldn't have a separate article. But I doubt if there is any formal definition in square feet or number of products that define a supermarket. There is a definition in the article, it is one that contains so much data "that it requires special processing". It is easy to see that, beyond a certain size, simple or naive sorting and searching algorithms are going to fail. Or at least not return results in a sensible time. The exact borderline is going to be fuzzy, depend on application, user expectation, and it will be constantly shifting as technology improves. The existence of specific conferences for this is kind of compelling for notability. SpinningSpark 23:37, 28 September 2018 (UTC)[reply]
    • @Spinningspark: 'Large grocery stores that stock significant amounts of non-food products, such as clothing and household items, are called supermarkets.' ... a massive shop that only sells food items would still be a grocery (better: a very large grocery), not a supermarket. --Dirk Beetstra T C 00:27, 29 September 2018 (UTC)[reply]
      • Not sure if I go along with that definition (and its not sourced). In Britain, supermarkets primarily sell groceries, and nobody says "grocery store" for anything. But in any case, that is tangential. My point remains valid even if my analogy is full of faults. SpinningSpark 00:42, 29 September 2018 (UTC)[reply]
          • @Spinningspark: I was indeed looking further myself, and don't know whether my comment holds further water. From Dutch, a grocer ('groenteboer') was really distictly different than a supermarket. But I am not sure where I would go to a grocery during my time in UK. In the UKI would go to a minimart ('very small supermarket' in analogy, as I can now here in KSA. I was however surprised to see that that is a redirect to Superette (a word I have never seen before).
    My problem here is, what is mentioned below, that the only distinction here is plain size .. I am trying to compare such withother context. Regular computer vs. supercomputer is not the same, grocery vs. supermarket does not feel the same, .. dictionary vs verylarge dictionary (same concept, we do not do that). A VLDB needs other architecture .. but except foroverkill there is no reason why a very small database would not have the same architecture. --Dirk Beetstra T C 04:41, 29 September 2018 (UTC)[reply]
  • Comment. My name having been invoked above, I note that Google Books shows numerous hits for VLDB. It does seem possible that the term is a buzzword, particularly as there are not so many recent hits. But it should also be noted that in the world of Moore's law, one would not expect the term "very large" to be exactly quantified: exponential growth doesn't work like that. There was a "Very Large Databases Conference" series that ran to at least 35 editions. Conclusion: it was a perfectly valid engineering concept at that time of the article's creation in 2005. [1] equates it to an "enterprise class database" primarily for big data. Actually that could usefully be written conversely: unless you need a VLDB you don't have big data. "VLDB" may now mean a 5 TB range [2]. My conclusion is that the article could well be redirected to Big data#Architecture. Charles Matthews (talk) 03:47, 29 September 2018 (UTC)[reply]
@Charles Matthews: I guess that my 'problem' here is the distinction between 'database' and 'very large database'. The only difference is plainly size, and it only becomes a VLDB when its users need to use other techniques than what they would on a regular database (and a properly designed database may never become a VLDB ..). --Dirk Beetstra T C 04:41, 29 September 2018 (UTC)[reply]
<shrug> We are talking about engineering issues, and whether there is a qualitative difference between regular and "very large" databases. As a member of my family says, if your data fits on a laptop, it isn't "big data". If your database requires no special work for the application, it isn't "very large". We may be agreeing here. I don't see anything wrong if some who searches here for "very large database" finds a historical section about big data. It's obviously a moving target. Charles Matthews (talk) 04:51, 29 September 2018 (UTC)[reply]
  • If the article actually had some details of the techniques and hardware used for VLDB with some helpful links to other Wikipedia articles or external papers then I would be at keep. If someone so expands the article during the course of this AfD then I will change to keep. But as it stands, I think our best option is Charles Matthews suggestion of a merge and redirect to Big data#Architecture. Certainly, Very large database and VLDB should lead somewhere on Wikipedia. Completely redlinking it would be bad. SpinningSpark 08:03, 29 September 2018 (UTC)[reply]
  • Keep - VLDB has had a 40+ year history before "Big Data" and usage only appears to decline in about 2005 (Google Ngrams, Google Trends). Some of the early history can start with RAND/ARPA VLDB program and the VLDB Conference. Burroughs specified a computer to handle VLDB. As mentioned by Spinningspark, seems like this article could include the major innovations to support VLDB, mining the conference and awards. Also, "big data" has its 3+ V's but it doesn't define how big they have to be to be considered "big data" instead of just "data". My (worthless) opinion is VLDB generally refers to the biggest DBs (like top500) and next-gen DBs. StrayBolt (talk) 00:15, 30 September 2018 (UTC)[reply]
  • Redirect this article is a (very bad) WP:DICTDEF with no other content. It is a "term of art" used in the industry and redirecting somewhere is appropriate. power~enwiki (π, ν) 04:22, 2 October 2018 (UTC)[reply]
  • Merge into Database. This doesn't work as a stand-alone article because of its stubbish WP:DICDEF nature, but including this in the Database article would improve it. Reyk YO! 07:22, 2 October 2018 (UTC)[reply]
  • Keep or Speedy keep: The VLDB article has potential for growth and I expect at some point may receive a one liner from a/the storage section of the database article, and possibly one other location in it. Giving some WP:ORIGINAL thoughtlines the special processing referred to may relate to conventional DBMS management techniques becoming stretched so specialist approaches and configurations are required to holistically manage the scenario. Technically 2TB, 8TB and 32TB file limit sizes have been issues, as have the number of files a DBMS can support. Backup / recovery and maintenance will become issues. All these best discussed in a stand alone article. I know oracle used the VLDB term a lot, and quite simply not so sure about other vendors. There are even whole conferences on it in 2005 and 2018! I partly suspect due to some of the issues RDBMS can have with large data volumes, and partly due to fashion, a 'Bigdata' approach rather than getting into a 'VLDB' approach might be the current preferred approach. In all events AfD's tend to be minimally disruptive for the nominator but can be disproportionally more so for a rescue or non minimalistic merge. And AfD is a bad place for discussing merges. In this case a Template:Missing information could have been a precursor step to going to AfD (not perfect but Okay. So speedy keep add Missing information template is fine. Adding the computing Wikiproject to the talk page would have been useful also.Djm-leighpark (talk) 11:24, 2 October 2018 (UTC)[reply]
  • Keep as of current state, which I understand to be a version that has seen substantial editing since the nomination - if the editor gets round to actually providing sources for the many many unsourced statements they added. Otherwise this is a cardboard effort. --Elmidae (talk · contribs) 12:54, 4 October 2018 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.