Talk:Entity–attribute–value model

How about the Drawbacks?

Maps very poorly to the relational model,

The primary objective of the schema is to map "real life" concepts (i.e. heterogenous sparsely populated attributes; what you'd find in a patient's chart) instead of coercing "real life" into a relational model. You are still able to leverage relational concepts because of the methods you need to use to pivot, join, as well as perform localized normalization. Cowbert (talk) 01:54, 2 May 2008 (UTC)[reply]

Does not normalize well, Incurs overhead because it often requires data be implicitly or explicitly converted to/from character format, Many implementations waste space by providing several columns of different data types to allow for the different data types a value might be, You have to manually re-create a lot of the functionality built-in to a relational database, such as check constraints and foreign key constraints.

That is why in production you spin off EAV tables based on data types. EAV/CR directly addresses this issue: either the class table or the attribute table (if you are not using classes) stores metadata about which EAV table has all of those attribute's values. So you'd have an EAV_int, EAV_real, EAV_char, EAV_date etc. tables. When you enumerate the attribute for the entity, your business logic queries this metadata to determine which eav table to query for the value (example, patient_hemoglobin has a metadata column of 'eav_table' pointing to eav_real). Also, the constraints are now primarily metadata driven, instead of defined in the schema itself - again the primary goal is avoid having to coerce a certain representational system into a strict relational schema, where as the arguments that seem to be made are that "we must stick with relational schema" even though the whole point of the exercise is that a column-moduled schema poorly supports large and/or dynamic numbers of sparsely populated attributes (I can do a SELECT attribid, value FROM eavtable where entityid = foo vs. tossing all the null fields if I have a table where the attributes are column modeled [or worse, multiple tables of tons of sparse columns each that I'm going to have to index]). Finally, it has been admitted that the main difficulty more or less lies with attribute-centric queries (i.e. I *do* want to know every patient who is running a fever today, which brings up the necessity to pivot the attribute over into a columnar model or do some self-joins)) Cowbert (talk) 01:54, 2 May 2008 (UTC)[reply]

The fact that vendors impose "limitations" and design databases poorly is not an advantage. By the way, the example given in the article is very EASILY converted to a properly normalized relational model. Perhaps someone should come up with a better example that is NOT easily converted to a normalized relational model? As an example:

Patient Table:

[Primary Key is (Patient_ID)]

Patient_ID

Patient_Name

Patient_Test Table:

[Primary Key is (Patient_ID, Test_ID)]

Patient_ID

Test_ID

Date

Test_Type

Quantity

Unit_of_Measure

Test_Type would be related to another table containing the various tests that could be performed, and Unit_of_Measure could relate to another table with UOM identifiers and possibly a conversion factor for a common UOM.

Patient
Patient_ID	Patient_Name
1	Patient XYZ

Patient_Test
Patient_ID	Test_ID	Date	Test_Type	Quantity	Unit_of_Measure
1	1	1/5/1998	Hemoglobin	12.5	gm/dl
1	2	1/5/1998	Potassium	4.9	Meq/L

This is how the normalized eav schema of the above would look like:


table test_events {

eventid integer primary key,

patientid integer foreign key references patient_table(patientid),

timestamp date

}



test_metadata {

testid int pkey,

test_type varchar(),

test_datatype int fkey references eav_types,

test_unit_of_measure varchar()

}



eav_types {

datatype int pkey,

eav_table_name varchar()

}



patient_table {

patientid int pkey,

first_name varchar,

last_name varchar,

dob date

....

}



eav_real {

eventid int fkey references test_events(eventid),

testid int fkey references test_metatdata(testid),

value real

}



ok. So now for the data/meta:



patient_table VALUES { patientid = 1, first_name =... } 
 
patient_event { eventid = 1, patientid = 1, date = 1/5/1998 }
patient_event { eventid = 2, patientid = 1, date = 1/5/1998 }


test_metadata { testid = 1, test_type = 'hemoglobin', test_datatype = 1, test_unit_of_measure = 'g/dL'}
test_metadata { testid = 2, test_type = 'potassium', test_datatype = 1, test_unit_of_measure = 'meq/L'}


eav_types { datatype = 1, eav_table_name = 'eav_real'}


eav_real { eventid = 1, testid = 1, value = 12.5 }
eav_real { eventid = 2, testid = 2, value = 4.9 }

 Cowbert (talk) 01:54, 2 May 2008 (UTC)
[reply]

Downsides

Seem like good points, but should be written in 3rd person, and should be sourced! —Dfass 05:48, 24 December 2006 (UTC)[reply]

I agree. It is self-referential (referring to a previous version of the same article) in a way that is inappropriate. It also seems that many of the points are already discussed in other sections, and the information in this section could be integrated there. Nimrand 21:04, 1 June 2007 (UTC)[reply]

Just wanted to pop in and agree with the disappointment /w the Downsides section. It feels abundantly unprofessional and not up to the calibre I would normally expect from coverage of this topic from Wikipedia. More specifically, its placement, wording, and lack of sourcing make it sound like regressive sniping. Write some published papers on this and then cite it, until then (IMO) it shouldn't even be here...

The drawback number 2 (scalability for large datasets) can be greatly improved by using 2 queries for retrieving the data.

1st is for filtering and sorting the records, here you would self join ONLY attributes that are used in filters and order by. Ordered result of this query creates empty array with placeholders for entities.
2nd is regular SELECT (or UNION when attributes are splited to few tables by type) that is filtered by entity id, which were received from 1st query. Result from this query is being fetched into array created by 1st query, which makes the result correctly sorted.

Accuracy of this article

The article makes a lot of unsubstantiated claims (which may or may not be true), many of which seem to stem from the desire to constantly compare EAV to traditional/flat/SQL databases. We can leave them tagged as {{fact}} for a while, but there's a lot of it and I was hoping someone would provide sources or just remove it.

Perhaps we could split the article into an article about EAV, and a separate article comparing it to other models. In other words, we might consider separating the factual and useful information about EAV from the subjective and oft-disputed comparisons to other models.

rename

Rename to Entity-Attribute-Value model. Proper nouns in this context, always capitalised. Andy Dingley (talk) 18:32, 25 July 2008 (UTC)[reply]

I agree. A simple Google search for entity attribute value will return items in which all three are capitalized. 76.120.35.164 (talk) 06:40, 2 September 2008 (UTC)[reply]

Biased

This article seems awfully biased towards specific domain (medicine) and specific works/people.

It's like an example has over-taken the whole article...

Tail wagging the dog syndrome. —Preceding unsigned comment added by 173.15.94.125 (talk) 15:30, 8 October 2009 (UTC)[reply]

Because the first commercial application of it was in that domain and is the primary model used for data requirements within that domain. Feel free to talk about it in a non-medical domain with real examples if you have any (AFAIK, I have not seen EAV papers on or implementations for GIS or operational management domains "out in the field"; many domains now directly use ORM-based strategies and completely abstract out the schema entirely...) Cowbert (talk) 13:06, 15 October 2009 (UTC)[reply]

The Magento e-commerce system would be a good source of non-medical examples. It uses a very EAV to store very complex and variable data. —Preceding unsigned comment added by 158.158.240.230 (talk) 14:06, 19 October 2009 (UTC)[reply]

The article appears to examine one implementation of EAV systems, and point out its flaws. The Large Record and repeating group models used in many modern Model 204 implementations removes requirement for large numbers of joins and allows for a very efficent method to record large datasets of time related data, requiring one group instance (only to house the actually changing data) instead of a complete, separately related table entry for each new datum. This results in a net reduction in the number of joins required to retrieve data.