Jump to content

Talk:Entity–attribute–value model: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Line 124: Line 124:
:: The article appears to examine one implementation of EAV systems, and point out its flaws. The Large Record and repeating group models used in many modern [[Model 204]] implementations removes requirement for large numbers of joins and allows for a very efficent method to record large datasets of time related data, requiring one group instance (only to house the actually changing data) instead of a complete, separately related table entry for each new datum. This results in a net reduction in the number of joins required to retrieve data. <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/115.187.253.93|115.187.253.93]] ([[User talk:115.187.253.93|talk]]) 13:49, 7 March 2010 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->
:: The article appears to examine one implementation of EAV systems, and point out its flaws. The Large Record and repeating group models used in many modern [[Model 204]] implementations removes requirement for large numbers of joins and allows for a very efficent method to record large datasets of time related data, requiring one group instance (only to house the actually changing data) instead of a complete, separately related table entry for each new datum. This results in a net reduction in the number of joins required to retrieve data. <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/115.187.253.93|115.187.253.93]] ([[User talk:115.187.253.93|talk]]) 13:49, 7 March 2010 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->


The last bits are also biased too much towards Microsoft solutions and - which is even worse - relational database solutions. There are a multitude of software today that offer sparse schemas and would solve those problems in a whim, but of course it will take a few years until the industry fully embraces it. --[[Special:Contributions/200.139.90.98|200.139.90.98]] ([[User talk:200.139.90.98|talk]]) 20:05, 25 June 2010 (UTC)


== POV / Metaphor ==
== POV / Metaphor ==

Revision as of 20:05, 25 June 2010

How about the Drawbacks?

Maps very poorly to the relational model,

The primary objective of the schema is to map "real life" concepts (i.e. heterogenous sparsely populated attributes; what you'd find in a patient's chart) instead of coercing "real life" into a relational model. You are still able to leverage relational concepts because of the methods you need to use to pivot, join, as well as perform localized normalization. Cowbert (talk) 01:54, 2 May 2008 (UTC)[reply]

Does not normalize well, Incurs overhead because it often requires data be implicitly or explicitly converted to/from character format, Many implementations waste space by providing several columns of different data types to allow for the different data types a value might be, You have to manually re-create a lot of the functionality built-in to a relational database, such as check constraints and foreign key constraints.

That is why in production you spin off EAV tables based on data types. EAV/CR directly addresses this issue: either the class table or the attribute table (if you are not using classes) stores metadata about which EAV table has all of those attribute's values. So you'd have an EAV_int, EAV_real, EAV_char, EAV_date etc. tables. When you enumerate the attribute for the entity, your business logic queries this metadata to determine which eav table to query for the value (example, patient_hemoglobin has a metadata column of 'eav_table' pointing to eav_real). Also, the constraints are now primarily metadata driven, instead of defined in the schema itself - again the primary goal is avoid having to coerce a certain representational system into a strict relational schema, where as the arguments that seem to be made are that "we must stick with relational schema" even though the whole point of the exercise is that a column-moduled schema poorly supports large and/or dynamic numbers of sparsely populated attributes (I can do a SELECT attribid, value FROM eavtable where entityid = foo vs. tossing all the null fields if I have a table where the attributes are column modeled [or worse, multiple tables of tons of sparse columns each that I'm going to have to index]). Finally, it has been admitted that the main difficulty more or less lies with attribute-centric queries (i.e. I *do* want to know every patient who is running a fever today, which brings up the necessity to pivot the attribute over into a columnar model or do some self-joins)) Cowbert (talk) 01:54, 2 May 2008 (UTC)[reply]

The fact that vendors impose "limitations" and design databases poorly is not an advantage. By the way, the example given in the article is very EASILY converted to a properly normalized relational model. Perhaps someone should come up with a better example that is NOT easily converted to a normalized relational model? As an example:

Patient Table:
[Primary Key is (Patient_ID)]
Patient_ID
Patient_Name


Patient_Test Table:
[Primary Key is (Patient_ID, Test_ID)]
Patient_ID
Test_ID
Date
Test_Type
Quantity
Unit_of_Measure

Test_Type would be related to another table containing the various tests that could be performed, and Unit_of_Measure could relate to another table with UOM identifiers and possibly a conversion factor for a common UOM.

Patient
Patient_IDPatient_Name
1Patient XYZ
Patient_Test
Patient_ID Test_ID Date Test_Type Quantity Unit_of_Measure
1 1 1/5/1998 Hemoglobin 12.5 gm/dl
1 2 1/5/1998 Potassium 4.9 Meq/L


This is how the normalized eav schema of the above would look like:

table test_events {
eventid integer primary key,
patientid integer foreign key references patient_table(patientid),
timestamp date
}

test_metadata {
testid int pkey,
test_type varchar(),
test_datatype int fkey references eav_types,
test_unit_of_measure varchar()
}

eav_types {
datatype int pkey,
eav_table_name varchar()
}

patient_table {
patientid int pkey,
first_name varchar,
last_name varchar,
dob date
....
}

eav_real {
eventid int fkey references test_events(eventid),
testid int fkey references test_metatdata(testid),
value real
}

ok. So now for the data/meta:

patient_table VALUES { patientid = 1, first_name =... }
patient_event { eventid = 1, patientid = 1, date = 1/5/1998 } patient_event { eventid = 2, patientid = 1, date = 1/5/1998 }

test_metadata { testid = 1, test_type = 'hemoglobin', test_datatype = 1, test_unit_of_measure = 'g/dL'} test_metadata { testid = 2, test_type = 'potassium', test_datatype = 1, test_unit_of_measure = 'meq/L'}

eav_types { datatype = 1, eav_table_name = 'eav_real'}

eav_real { eventid = 1, testid = 1, value = 12.5 } eav_real { eventid = 2, testid = 2, value = 4.9 }

Cowbert (talk) 01:54, 2 May 2008 (UTC)
[reply]

Downsides

Seem like good points, but should be written in 3rd person, and should be sourced! —Dfass 05:48, 24 December 2006 (UTC)[reply]

I agree. It is self-referential (referring to a previous version of the same article) in a way that is inappropriate. It also seems that many of the points are already discussed in other sections, and the information in this section could be integrated there. Nimrand 21:04, 1 June 2007 (UTC)[reply]
Just wanted to pop in and agree with the disappointment /w the Downsides section. It feels abundantly unprofessional and not up to the calibre I would normally expect from coverage of this topic from Wikipedia. More specifically, its placement, wording, and lack of sourcing make it sound like regressive sniping. Write some published papers on this and then cite it, until then (IMO) it shouldn't even be here...


The drawback number 2 (scalability for large datasets) can be greatly improved by using 2 queries for retrieving the data.

  • 1st is for filtering and sorting the records, here you would self join ONLY attributes that are used in filters and order by. Ordered result of this query creates empty array with placeholders for entities.
  • 2nd is regular SELECT (or UNION when attributes are splited to few tables by type) that is filtered by entity id, which were received from 1st query. Result from this query is being fetched into array created by 1st query, which makes the result correctly sorted.


I've removed the downsides section entirely for now. The article needs a downsides section to be sure, but the one it had was written in a personal/opinionated and unprofessional manner. I think it's better to leave it out until someone can rewrite it without original research.--Fyedernoggersnodden (talk) 01:36, 22 March 2010 (UTC)[reply]

Accuracy of this article

The article makes a lot of unsubstantiated claims (which may or may not be true), many of which seem to stem from the desire to constantly compare EAV to traditional/flat/SQL databases. We can leave them tagged as {{fact}} for a while, but there's a lot of it and I was hoping someone would provide sources or just remove it.

Perhaps we could split the article into an article about EAV, and a separate article comparing it to other models. In other words, we might consider separating the factual and useful information about EAV from the subjective and oft-disputed comparisons to other models.

rename

Rename to Entity-Attribute-Value model. Proper nouns in this context, always capitalised. Andy Dingley (talk) 18:32, 25 July 2008 (UTC)[reply]

I agree. A simple Google search for entity attribute value will return items in which all three are capitalized. 76.120.35.164 (talk) 06:40, 2 September 2008 (UTC)[reply]

Biased

This article seems awfully biased towards specific domain (medicine) and specific works/people.

It's like an example has over-taken the whole article...

Tail wagging the dog syndrome. —Preceding unsigned comment added by 173.15.94.125 (talk) 15:30, 8 October 2009 (UTC)[reply]

Because the first commercial application of it was in that domain and is the primary model used for data requirements within that domain. Feel free to talk about it in a non-medical domain with real examples if you have any (AFAIK, I have not seen EAV papers on or implementations for GIS or operational management domains "out in the field"; many domains now directly use ORM-based strategies and completely abstract out the schema entirely...) Cowbert (talk) 13:06, 15 October 2009 (UTC)[reply]
The Magento e-commerce system would be a good source of non-medical examples. It uses a very EAV to store very complex and variable data. —Preceding unsigned comment added by 158.158.240.230 (talk) 14:06, 19 October 2009 (UTC)[reply]
The article appears to examine one implementation of EAV systems, and point out its flaws. The Large Record and repeating group models used in many modern Model 204 implementations removes requirement for large numbers of joins and allows for a very efficent method to record large datasets of time related data, requiring one group instance (only to house the actually changing data) instead of a complete, separately related table entry for each new datum. This results in a net reduction in the number of joins required to retrieve data. —Preceding unsigned comment added by 115.187.253.93 (talk) 13:49, 7 March 2010 (UTC)[reply]

The last bits are also biased too much towards Microsoft solutions and - which is even worse - relational database solutions. There are a multitude of software today that offer sparse schemas and would solve those problems in a whim, but of course it will take a few years until the industry fully embraces it. --200.139.90.98 (talk) 20:05, 25 June 2010 (UTC)[reply]

POV / Metaphor

I taggged this as "POV" because the following two phrases just don't belong in a serious discussion of technical concepts, but express a particular author/editor's point of view, and clearly fail the "WP:SOAP" test:

Metadata is so important to the operation of a production EAV system that in its absence, one is essentially trying to drive a car without brakes or a steering wheel.

The correctness of the metadata contents, in terms of the intended system behavior, is critical enough that the contents constitute the system's "crown jewels".

TML (talk) 06:40, 26 March 2010 (UTC)[reply]


The offending sentences have been removed, since the point they emphasized was already stated earlier in the text, and I agree there is no need for additional dramatics. However, to argue that the issue of criticality of metadata to the operation of an EAV system is a POV is like asserting that the moon is made of green cheese. The schemas of production EMRs (EpicCare, Cerner) contain a very large metadata sub-schema for the very purpose of ensuring consistency. Please also take the trouble to inspect the design of the Department of Veteran Affairs' freely available VistA before jumping to the POV conclusion.

Finally, I quote from the XML chapter of Ben-Gan et al "SQL Server 2008 T-SQL Programming" (Microsoft Press) : this book is cited in the article. The author of this chapter, Dejan Sarka, is NOT a particular fan of EAV, and points out that the *drawback* of an EAV design is the need to create a metadata infrastructure. To quote:

"The so-called open schema solution is quite popular. In this solution, you have the main Products table with common attributes. Then you add an Entity-Attribute-Value (EAV) table, which has three columns: product ID, attribute name, and attribute value. This is a completely dynamic solution, and is quite simple. However, with this solution you cannot easily use declarative constraints. With a single EAV table, the attribute value column is either character or variant data type. In addition, you have to prevent the insertion of meaningless attributes, or binding attributes to products for which the attributes are meaningless. You can create an EAV table for every data type of attributes you need to solve the data type problems. You can add an additional table that holds valid combinations of products and attributes. You can add additional constraints using triggers, middle tier code, or client code.

"Whatever you do, your solution is not that simple anymore. Your database design does not reflect the business problem anymore; you do not design a database from logical point of view, you design it from a physical point of view. Because of this last fact, I would hardly call this solution relational. Note that I am not saying that if you use such a solution, you have to modify it. In the past, this was probably the best option you had if you needed dynamic schema. However, nowadays I prefer the solution I am going to introduce at the end of this section."

(He goes on to introduce an alternative, based on XML, which is discussed subsequently in the Wikipedia article, in the section "Alternative XML")


Prakash Nadkarni (talk) 06:40, 9 April 2010 (UTC)[reply]