User talk:Undermedia: Difference between revisions
Galneweinhaw (talk | contribs) |
Undermedia (talk | contribs) |
||
Line 179: | Line 179: | ||
::: For #2, the weight is only used on when calculating the trendline, and that election data isn't part of the calculation. The election data point from there are plotted seperately as two points (dot and diamond) on line 170 and 171. For the trendline, the election data's "sample size" is added on line 70 (this is hard to read, but gsub is substituting and blanks in sample size, and the election is the only line in the table with a blank, so effectively it is just entering a really large sample size (99999999) for the election. For #3 if we don't want the confidence band to get narrower and smoother as we add more data, then we need to slowly lower the alpha/span value as we add more data. We could probably just put this in the code. For example, currently we have about 80 datapoints, with `span` of 0.35, so each point on the trendline is using the nearest 0.35 * 80 = 28 datapoints. If we wanted to hold constant a the nearest 28 datapoints (or whatever value we choose), we could set span = 28/numDataPoints. Does that make sense? [[User:Galneweinhaw|galneweinhaw]] ([[User talk:Galneweinhaw|talk]]) 22:41, 8 May 2017 (UTC) |
::: For #2, the weight is only used on when calculating the trendline, and that election data isn't part of the calculation. The election data point from there are plotted seperately as two points (dot and diamond) on line 170 and 171. For the trendline, the election data's "sample size" is added on line 70 (this is hard to read, but gsub is substituting and blanks in sample size, and the election is the only line in the table with a blank, so effectively it is just entering a really large sample size (99999999) for the election. For #3 if we don't want the confidence band to get narrower and smoother as we add more data, then we need to slowly lower the alpha/span value as we add more data. We could probably just put this in the code. For example, currently we have about 80 datapoints, with `span` of 0.35, so each point on the trendline is using the nearest 0.35 * 80 = 28 datapoints. If we wanted to hold constant a the nearest 28 datapoints (or whatever value we choose), we could set span = 28/numDataPoints. Does that make sense? [[User:Galneweinhaw|galneweinhaw]] ([[User talk:Galneweinhaw|talk]]) 22:41, 8 May 2017 (UTC) |
||
:::: Interesting info on the meaning of the confidence band of a LOESS regression: https://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression/82632 |
:::: Interesting info on the meaning of the confidence band of a LOESS regression: https://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression/82632 |
||
::::: Your proposal to set span = x/numDataPoints sounds clever. If anything I might suggest we set a larger number than 28; quickly assessing the graphs for the 2011 and 2015 elections with their respective α values and total number of polls, it looks like the trendlines for the 2011 graph were using roughly 35 polls while the ones for 2015 were using roughly 45 polls. Plus, the one editor of the 43rd election page who has so far commented on the new graph appears to be lamenting that the trendlines seem too sensitive to a recent poll showing a tie for the lead when other recent pollsters have been showing a larger gap. I'll play around with different numbers to assess the effect on the trendlines and propose some "constant"—I'm thinking probably in the whereabouts of 30–35 polls. I do realize however that this constant should probably generally be lower for graphs of pre-campaign periods than for those of campaign periods since polls are released much less frequently during the former and you don't want to give too much weight to polls that are distant in time. [[User:Undermedia|Undermedia]] ([[User talk:Undermedia#top|talk]]) 00:41, 9 May 2017 (UTC) |
|||
== 43rd Canada election polling "lead" == |
== 43rd Canada election polling "lead" == |
Revision as of 00:42, 9 May 2017
Welcome
Welcome!
Hello, Undermedia, and welcome to Wikipedia! Thank you for your contributions. I hope you like Wikipedia and decide to stay. Here are some pages that you might find helpful:
- The five pillars of Wikipedia
- How to edit a page
- Help pages
- Tutorial
- How to write a great article
- Manual of Style
I hope you enjoy editing here and being a Wikipedian! Please sign your name on talk pages using four tildes (~~~~); this will automatically produce your name and the date. If you need help, check out Wikipedia:Questions, ask me on my talk page, or place {{helpme}}
on your talk page and ask your question there. Again, welcome! - Ahunt (talk) 16:48, 25 September 2011 (UTC)
Barnstar
The Barnstar of Diligence | ||
For exceptional diligence in tracking down polling results and their references for 42nd Canadian federal election. Great work! - Ahunt (talk) 21:33, 2 May 2012 (UTC) |
Another Barnstar
The Tireless Contributor Barnstar | ||
For diligent work in adding mountains of data on sample size and polling methods on Opinion polling in the 42nd Canadian federal election Ahunt (talk) 15:36, 2 August 2013 (UTC) |
Barnstar
Thank you for the Barnstar! Glad my participation on the election articles is helpful. - Ahunt (talk) 18:36, 2 August 2013 (UTC)
Poll in Ontario
Hello, is this poll official enough to be listed in the poll section of the 41st Ontario general election article? I'm asking because you seem to know a lot about the opinion polls, and I haven't seen Ipolitics in the polling firms yet, so I am not sure if it is a reliable source. Merci beaucoup!
Votre pseudonyme ici (talk) 21:10, 9 April 2014 (UTC)
- Yup, that's an EKOS poll, and I see it's now been posted on the EKOS website. They most often do polls for iPolitics but also occasionally for other news outlets. Thanks for the heads-up! Undermedia (talk) 23:46, 9 April 2014 (UTC)
- All right perfect, no problem and thanks for adding it.
Votre pseudonyme ici (talk) 00:53, 10 April 2014 (UTC)
- All right perfect, no problem and thanks for adding it.
Libertarians? - Ontario election
I have started a discussion about the Libertarians in the candidate section of the Ontario election article, I thought you might want to join in. Me-123567-Me (talk) 04:40, 11 May 2014 (UTC)
Ontario general election, 2014 – Ipsos poll numbers
I undid your edit as the numbers you entered don't match the poll. I changed it back here to match the Ipsos figures as reported in the Press Release and the Detailed Tables. Also note that I removed the 2% figure listed for Green as they weren't even included in that poll, with Ipsos only showing Some others (including Green) polling at a combined 4%.
I note that in your edit comment you said “I've justified this presentation 3 times over and am yet to hear a single actual counterargument.”; but I looked and couldn't find any comments anywhere to justify the figures you came up with. Perhaps you can make your "presentation" once more, below; and, if I agree with you, I'll happily revert my undo of your edit myself.
Cheers — Who R you? Talk 02:06, 12 May 2014 (UTC)
- I had attempted to explain it in the edit summary field, but with limited space it perhaps wasn't so obvious or clear. Basically, given the difficulties Canadian pollsters have experienced in accurately calling a few recent elections with historically low voter turnouts, several of them have started reporting a second set of results among "likely voters" in addition to their regular results among all eligible decided voters. Incidentally, this has been standard practice among U.S. pollsters for many years. Canadian pollsters who have adopted this approach now include EKOS, Abacus Data, Ipsos Reid and Angus Reid. Thus, it has become the convention to show these "likely voters" results, when available, on Wiki election pages, including the recent BC (EKOS polls), NS (Abacus Data polls) and QC (EKOS, Ipsos Reid and Angus Reid polls) elections. In all cases, the "likely voters" results have proven to come closer to the actual election results than their corresponding "all voters" results, so it appears to be a sound practice. It is also worth noting that the renowned U.S. poll aggregator FiveThirtyEight bases its election projections on "likely voters" polling results, as well as Canadian spinoff ThreeHundredEight. For the Ipsos poll in question, the "likely voters" results I've been entering are given lower down in the linked-to text, under the heading "PC Voters Most Likely to Show Up to Vote". They are also given under the same heading in the PDF press release, as well as in the detailed tables under the column "COMMITTED". As for the Green results, that's a bit of a toss-up IMO, but technically you're correct, so I don't mind if we leave them out. -Undermedia (talk) 16:29, 12 May 2014 (UTC)
- Conversation continued on article's Talk page.
Meanwhile, I have to give you the warning below (but I'm not escalating it to the Administrator's noticeboard at this point)— Just following protocol that says I should warn you 1st… — Who R you? Talk 05:12, 14 May 2014 (UTC)
- Conversation continued on article's Talk page.
Hi Undermedia
I've reverted yet another of your changes in Ontario 2014 (discussion), that makes 2 or 3 I think; but I hope there's no hard feelings. I know from personal experience that being reverted sometimes stings a little. Just wanted to mention that I respect the obvious work & effort you've put in, there & elsewhere, particularly pertaining to Canadian elections. Hope you'll keep up the good work & that our minor differences of opinion on a few issues are viewed as just that, with both of us working towards a better WP. Cheers — Who R you? Talk 20:58, 22 May 2014 (UTC)
- Nah, there's never any hard feelings. I think I started taking for granted that I had assumed the de facto lead in updating and maintaining opinion polling sections on a bunch of Canadian election articles, so I'd gotten a little dictatorial about consistency in presentation and wasn't used to being challenged. But of course that's all part of the normal game on WP. Cheers. -Undermedia (talk) 21:13, 22 May 2014 (UTC)
Incorrect numbers in graphic
Not sure if you saw it, so I direct your attention to the comment I left here. Esn (talk) 06:37, 17 May 2014 (UTC)
- Thanks. The graph's been frozen since we started debating on the Ontario election Talk page whether to show "all voters" or "likely voters" results in the polling section. Right now it's showing the latter for the Ipsos poll, but I'll likely change it shortly if we don't reach a new consensus soon. You're also most welcome to join in on that discussion if you wish. -Undermedia (talk) 12:05, 17 May 2014 (UTC)
Another Barnstar
The Canada Barnstar of National Merit | ||
As a Wikipedian and a Canadian voter, I thank you for your tireless and invaluable service. Shawn in Montreal (talk) 16:01, 21 August 2014 (UTC) |
Latest Leger poll
Hi. Thanks again for your great work on the federal polling article. I believe we're still missing the latest Leger, which did include a federal component. See sidebar at right. Shawn in Montreal (talk) 13:50, 15 December 2014 (UTC)
- Good work spotting that Léger poll! I've added it. -Undermedia (talk) 13:59, 15 December 2014 (UTC)
Hi Undermedia. Thank you for the tireless work you have put in updating Canadian polling. Another poll came out Feb 2 from Legar that is not yet included — Preceding unsigned comment added by Mikemikem (talk • contribs) 19:35, 6 February 2015 (UTC)
42nd Canadian Federal Election Polling Graph
Hey there, I was wondering how to do a graph like yours as seen in the 42nd Canadian Federal Election Page.
I don't exactly know how to do separate polling data and the line data itself.
Cheers if you can help, DestinationAlan (talk) 06:40, 26 February 2015 (UTC)
- So I'm guessing you got as far as creating a table in Excel with a column for each party and row for each poll, identified by date, and then creating a "marked scatter" plot from those data? What you need to do next is add a trendline for each party, i.e. each data series. Not sure how this is done in Windows if you happen to have, but on Mac you can right-click on any individual data point from each of the parties and choose "Add trendline...". Then among the various trendline options, you can select "Moving average" and set the number of data points used to calculate the average. Hope this helps; let me know if you require more tips! -Undermedia (talk) 13:52, 26 February 2015 (UTC)
Ah yes, I was playing around with that and got the moving average and trendline in. Though, how do you make a scatterplot with an x-axis that lists the dates? I keep on getting numbers on the x-axis. DestinationAlan (talk) 05:29, 27 February 2015 (UTC)
- Hi Alan. Make sure your column of cells containing the dates are specifically formatted as date cells: select all the cells containing the dates and right-click, choose "Format Cells...", and in the "Number" tab under "Category", select "Date", then select one of the various date formats available and make sure all your dates are consistently typed out in this format. You may also have to specifically set your graph's horizontal axis to display dates: right-click on the axis and choose "Format Axis...", and in the "Scale" section under "Horizontal axis type", select "Date"; you'll then be able to set various options for minimum, maximum, base unit, interval, etc. Again, these instructions are for the Mac version of Excel and may differ on Windows. Cheers. -Undermedia (talk) 13:43, 27 February 2015 (UTC)
It's very similar on windows but Australian Polling tend to have different dates like "20-24 February 2015" or "September 2015" or "Sep-Oct 2011". How do I cater that to a general category? Also, I formatted the x-axis to do dates but it says 1901. How do I edit that? DestinationAlan (talk) 22:43, 27 February 2015 (UTC)
- I forgot to mention that in addition to setting the axis to "Date" in the "Scale" section of the "Format Axis" window, you should also go to the "Number" section and set that to "Date" as well; then just as your data table cells, you can select a format to display the dates in along the axis. For example, in my case the dates in my data table are typed in the "dd-mmm-yy" format, while the graph axis is set to display just "mmm-yy", with the major unit set to 1 month. I'm afraid that's about all the advice I've got. As for recording the dates in a consistent manner, I guess you'll just have to figure out something that works. Unless Australian pollsters simply aren't as scrupulous when it comes to disclosing their methodology, the exact field dates of a survey can usually be found somewhere in the detailed report on the poll or the news article reporting on it; that is the case for every single poll listed on the Canadian election polling page. -Undermedia (talk) 15:30, 28 February 2015 (UTC)
Apologies for not replying, been really busy! I haven't been able to look at my graphs in awhile. I was about to ask you, is it okay if you were able to send the spreadsheet file somehow? Also, how do you do things "month by montH' so for example, you got polling data for Jan-15, then Feb-15, then Mar-15 etc etc DestinationAlan (talk) 02:01, 25 March 2015 (UTC)
42nd Canadian Federal Election Polling
Yeesh. Hard to keep up with the subtly destructive changes to this chart. If you need a hand, let me know! Pinkville (talk) 00:30, 11 April 2015 (UTC)
Polls
Hey! First off thanks for all the time and hard work you invest in maintaining the Canada opinion polls page. I feel as though maybe when a pollster releases a report using likely voters we should use that. That way whenever somebody clicks a link to a pdf they see the wiki numbers match the top line numbers of the press release. Pollsters work hard to be as accurate as possible and Angus was the most accurate in 2011 election according to 308.com I feel we should respect their methods and use their top line numbers they release Mikemikem (talk) 16:03, 4 July 2015 (UTC)
- Yeah, I think I'm also leaning towards simply going with any given poll's top-line results; obviously seems most intuitive. The eligible/likely inconsistency is a bit bothersome though. In the last Ontario election we actually created two separate tables and then ironically, likely voter results from the pollsters who released them all turned out to be farther from the actual election results that their respective plain eligible voter results. Maybe we could simply identify "EV" vs. "LV" results in the current table by putting in a footnote or something. I'll start a discussion soon. Cheers, Undermedia (talk) 16:28, 4 July 2015 (UTC)
Hi,
You appear to be eligible to vote in the current Arbitration Committee election. The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to enact binding solutions for disputes between editors, primarily related to serious behavioural issues that the community has been unable to resolve. This includes the ability to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail. If you wish to participate, you are welcome to review the candidates' statements and submit your choices on the voting page. For the Election committee, MediaWiki message delivery (talk) 16:52, 24 November 2015 (UTC)
ArbCom Elections 2016: Voting now open!
Hello, Undermedia. Voting in the 2016 Arbitration Committee elections is open from Monday, 00:00, 21 November through Sunday, 23:59, 4 December to all unblocked users who have registered an account before Wednesday, 00:00, 28 October 2016 and have made at least 150 mainspace edits before Sunday, 00:00, 1 November 2016.
The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.
If you wish to participate in the 2016 election, please review the candidates' statements and submit your choices on the voting page. MediaWiki message delivery (talk) 22:08, 21 November 2016 (UTC)
I agree that the major events lines may be too arbitrary, but I have seen it implemented elsewhere on Wikipedia. Can we perhaps reserve it for major-major events, like, for example, Mulcair's loss of the Leadership Review, and Trudeau's decision to abandon his electoral reform pledge? Kiteinthewind Leave a message! 05:09, 4 April 2017 (UTC)
- I suggest you start a topic on the Talk section of the opinion polling page to at least properly discuss this and try to reach a consensus with the other editors before overhauling the polling tables. The other issue is that we may at some point swap the graph for a more sophisticated one based on local regressions that factor in varying poll sample sizes (like this one), which automatically reads the data directly from the table using some R code, and breaking the table up by year is likely to prevent it from working, or at the very least significantly complicate the coding. So I'm going to reverse those changes as well, and again suggest that you work towards a consensus in the Talk section if you believe the changes are really worthwhile. A handful of dedicated editors have worked together over several years (going back several elections) to fashion the opinion polling page into its current state, so it is only fair to seek their consensus for major formatting changes like this. Cheers, Undermedia (talk) 12:50, 4 April 2017 (UTC)
Generating plots with R
If you're interested in trying to get my code working, the easiest way would be to install R-Studio. You should just be able to run my code turn-key without any problems, and it should produce the plot, then save it as SVG. If you get it installed, let me knwo and I'll walk you through how to run the code. It would be good to have two of us available to update the plots, especially during the polling madness of the election period, and you are much more active here than I am. galneweinhaw (talk) 19:43, 30 April 2017 (UTC)
- Thanks, galneweinhaw. I've got it installed. What do we do next? I've noticed you've put the finishing touches on the 43rd federal election pre-campaign graph. Looks great. Since nobody else has weighed in on the Talk page discussion, I might just go ahead and trial the new graph alongside the current one on the main article page to get people's attention, and invite them to go to the Talk page to help decide whether we adopt it. Cheers, Undermedia (talk) 16:14, 3 May 2017 (UTC)
- OK, see if you can get this to work. I'm probably using a different version of R studio (0.99.467) and a different OS (Linux Mint 17.3) than you, but hopefully this will work:
- File > New Project (Choose *New Directory, then *Empty Project, then browse to where you want to save the project and give it a name). The project should open when done creation
- File > R Script. You should get an "untitled1" text file open up.
- Copy all the code from here and paste it in to your empty untitled1 doc
- In the top right of the doc/script, you should see three buttons, click on "Source" which will run the script from top to bottom. You may see some errors/messages in the console below.
- If everything works out, you should see the plot appear. You can resize the windows to get the plot the size you want it to be.
- At the top of the chart, click Export > Save as Image. Give it a name and it will save to your project directory. You probably want SVG format.
- Let me know how it goes! galneweinhaw (talk) 05:56, 7 May 2017 (UTC)
- Thanks for the instructions, galneweinhaw! The GUI of the version I installed (R 3.4.0 GUI 1.70 El Capitan build 7338 for macOS) indeed appears to be very different from the one you have, but after some wrestling around with it, I think I'm now almost set. I was able to save the code as a "Source File", and by going to File menu > "Source File..." and opening the saved file, it appears to run the script from top to bottom as you described. At first, the console kept telling me that this or that package required to run the script was missing, so I used the "Package Installer" dialog box to successively search for and install each required package until it no longer reported any missing package errors. Now when I run it a window titled "Quartz 2 [*]" appears displaying the plot, with square dimensions by default. I can indeed manually click and drag to resize the window and alter the plot's dimensions, but I'm wondering if there's a way to specify the dimensions directly in the code so that it automatically/consistently displays at the desired dimensions each time the code is re-run to add new polls? The other thing is that I appear to only have the option to save the plot as a PDF (using the general "File menu > Save As..." command; there is no option to export in the chart window itself, just a title bar and the plot). Is an additional package required to add image exporting functionalities? If I can get these couple of bits figured out, then I'll likely have a few questions for you about the particularities of the code itself, but overall I think I'm very close to having it all set up. Thanks again for all your help! A lot of work has obviously gone into writing this code. Cheers, Undermedia (talk) 16:29, 7 May 2017 (UTC)
- Just to confirm, you are using RStudio right? https://www.rstudio.com/products/rstudio/download/ The latest is version 1.0.143 for all OSes, so I just want to make sure we're using the same GUI. (meanwhile, I'll see if I can add the plot saving as svg into the code galneweinhaw (talk) 16:48, 7 May 2017 (UTC)
- Oops, I'm an idiot... When I first installed and attempted to run RStudio, I got an error message saying that I first needed to install R (as stated on the RStudio download page), so I did, and now upon reading your above reply, I realized that I had been running the R app, not RStudio! Will try again with RStudio and report back. I'm assuming everything will run smoothly now... Stay tuned! Undermedia (talk) 16:59, 7 May 2017 (UTC)
- Well, it's good you did that, because even with R Studio getting a consistently sized plot was frustrating, and now I've figured out how to just save the plot right from the code. So, thanks! Here's the updated code. Happy to discuss the code itself with you! galneweinhaw (talk) 17:12, 7 May 2017 (UTC)
- Thanks again, galneweinhaw. Running the latest version of the code in RStudio, I'm now getting the following warning messages which appear to be preventing the plot from being displayed/saved:
- 1: In svg(filename = "PollsPlot.svg", width = 15, height = 7, pointsize = 12) :
- unable to load shared object '/Library/Frameworks/R.framework/Resources/library/grDevices/libs//cairo.so':
- dlopen(/Library/Frameworks/R.framework/Resources/library/grDevices/libs//cairo.so, 6): Library not loaded: /opt/X11/lib/libcairo.2.dylib
- Referenced from: /Library/Frameworks/R.framework/Resources/library/grDevices/libs//cairo.so
- Reason: image not found
- 2: In svg(filename = "PollsPlot.svg", width = 15, height = 7, pointsize = 12) :
- failed to load cairo DLL
- Looks like I may been missing a font package ("cairo") or something??? Undermedia (talk) 19:19, 7 May 2017 (UTC)
- Right. You need the Cairo graphics library. Try installing it and hopefully that's all you need: https://www.cairographics.org/download/ galneweinhaw (talk) 19:30, 7 May 2017 (UTC)
- Alright, apparently on Mac, Cairo must be installed via MacPorts, which itself requires Xcode (4.5 GB download), so this is gonna take a little while. I'll let you know how it goes. Undermedia (talk) 19:48, 7 May 2017 (UTC)
- Sigh. So I downloaded and installed Xcode, then installed MacPorts, and finally installed Cairo as per the instructions at the link you provided. From what I can tell in Terminal, everything installed successfully, but RStudio is still giving me the same error. In relation to the error message, I can confirm that the file '/Library/Frameworks/R.framework/Resources/library/grDevices/libs//cairo.so' does exist on my computer, however the file '/opt/X11/lib/libcairo.2.dylib' does not. In fact, there isn't even an 'X11' directory inside the 'opt' directory. Sorry, I generally consider myself to be a decently intelligent individual, but admittedly computer programming stuff at this level is a bit out of my league. Sorry to be using up so much of your time trying to get this working. Undermedia (talk) 22:37, 7 May 2017 (UTC)
- Alright, apparently on Mac, Cairo must be installed via MacPorts, which itself requires Xcode (4.5 GB download), so this is gonna take a little while. I'll let you know how it goes. Undermedia (talk) 19:48, 7 May 2017 (UTC)
- Right. You need the Cairo graphics library. Try installing it and hopefully that's all you need: https://www.cairographics.org/download/ galneweinhaw (talk) 19:30, 7 May 2017 (UTC)
- Well, it's good you did that, because even with R Studio getting a consistently sized plot was frustrating, and now I've figured out how to just save the plot right from the code. So, thanks! Here's the updated code. Happy to discuss the code itself with you! galneweinhaw (talk) 17:12, 7 May 2017 (UTC)
- Oops, I'm an idiot... When I first installed and attempted to run RStudio, I got an error message saying that I first needed to install R (as stated on the RStudio download page), so I did, and now upon reading your above reply, I realized that I had been running the R app, not RStudio! Will try again with RStudio and report back. I'm assuming everything will run smoothly now... Stay tuned! Undermedia (talk) 16:59, 7 May 2017 (UTC)
- Just to confirm, you are using RStudio right? https://www.rstudio.com/products/rstudio/download/ The latest is version 1.0.143 for all OSes, so I just want to make sure we're using the same GUI. (meanwhile, I'll see if I can add the plot saving as svg into the code galneweinhaw (talk) 16:48, 7 May 2017 (UTC)
- Thanks for the instructions, galneweinhaw! The GUI of the version I installed (R 3.4.0 GUI 1.70 El Capitan build 7338 for macOS) indeed appears to be very different from the one you have, but after some wrestling around with it, I think I'm now almost set. I was able to save the code as a "Source File", and by going to File menu > "Source File..." and opening the saved file, it appears to run the script from top to bottom as you described. At first, the console kept telling me that this or that package required to run the script was missing, so I used the "Package Installer" dialog box to successively search for and install each required package until it no longer reported any missing package errors. Now when I run it a window titled "Quartz 2 [*]" appears displaying the plot, with square dimensions by default. I can indeed manually click and drag to resize the window and alter the plot's dimensions, but I'm wondering if there's a way to specify the dimensions directly in the code so that it automatically/consistently displays at the desired dimensions each time the code is re-run to add new polls? The other thing is that I appear to only have the option to save the plot as a PDF (using the general "File menu > Save As..." command; there is no option to export in the chart window itself, just a title bar and the plot). Is an additional package required to add image exporting functionalities? If I can get these couple of bits figured out, then I'll likely have a few questions for you about the particularities of the code itself, but overall I think I'm very close to having it all set up. Thanks again for all your help! A lot of work has obviously gone into writing this code. Cheers, Undermedia (talk) 16:29, 7 May 2017 (UTC)
- Let me know how it goes! galneweinhaw (talk) 05:56, 7 May 2017 (UTC)
- No problem. I wish I new more about Macs (I know nothing...). Check this post out. You may need to install X11 (XQuartz?) as it doesn't ship by default on Macs anymore? http://stackoverflow.com/questions/38952427/include-cairo-r-on-a-mac galneweinhaw (talk) 22:57, 7 May 2017 (UTC)
- Specifically this: "You should download X11 for Mac, which is called XQuartz. It doesn't ship with OS X any more, so you have to download it separately from: https://www.xquartz.org/" — Preceding unsigned comment added by Galneweinhaw (talk • contribs) 22:58, 7 May 2017 (UTC)
- It's working!!! You're one clever character, galneweinhaw! I've run out of time to spend on this for today, but next up I'll likely be bugging you with a few questions about the code to help me fully wrap my head around it. Specifically, I'd like to learn enough to be able to use it to generate graphs for other opinion polling pages on Wikipedia. Be back in touch soon. Thanks again for all your help and hard work! Undermedia (talk) 00:30, 8 May 2017 (UTC)
- Specifically this: "You should download X11 for Mac, which is called XQuartz. It doesn't ship with OS X any more, so you have to download it separately from: https://www.xquartz.org/" — Preceding unsigned comment added by Galneweinhaw (talk • contribs) 22:58, 7 May 2017 (UTC)
Hi again, galneweinhaw. So now a few questions about the code itself, which I invite you to answer at your convenience:
- From what I can tell, the code does not use the data in the Margin of Error column of the Wikipedia table, but rather calculates the margin of error based on each poll's sample size using the formula error = 1/sqrt(sample size). This has been my understanding all along, but I was wondering if you could confirm?
- Further to my last question, line 106 appears to show that the margin of error associated with the results of last election has been manually set (which makes sense since there's no sample size data for last election from which to calculate the margin of error using the standard formula), but I'm not sure how to interpret it, i.e. "(0,5)". I figure setting it to 0, though logical in principle, would potentially cause a problem on line 127 with the formulas "size = 1/error" and "weight = 1/error", since you'd have a denominator value of 0. Could you please explain?
- I've figured out that the parameter that controls the "sensitivity" of the trend lines and width of the 95% CI ribbons, and that you refer to as "alpha" in the graph's caption, is "span" on line 141. I've also noticed that the value of this parameter has varied among the different election graphs you've made (e.g. 0.35 for this one, 0.4 for the 2015 election campaign graph, and 0.45 for the 2011 election campaign graph). Do you set this value purely arbitrarily based on what seems to look best on the graph, or do you have some objective method of determining the appropriate value?
Thanks again! Cheers, Undermedia (talk) 19:23, 8 May 2017 (UTC)
- Happy to!
- Yes.
- rep.int(0,5) is just creating an array of 5 zeros. One MOE data point for each party.
- I've picked this based on aesthetics, because it makes a huge difference depending on dense the data is, and how much data there is. But it's meaning is precise: "The smoothing parameter, α, is the fraction of the total number of data points n that is used in each local fit." cite. That means its appearance will change as we add more data (the trend will smooth out) if we don't adjust the alpha/span. For an objective value of alpha, we could do a generalised cross-validation to determine the best value, but that might be moving us into original research. galneweinhaw (talk) 21:19, 8 May 2017 (UTC)
- Thanks for your responses! Regarding #2, I'm glad to now understand the syntax, though I'm still not clear on why assigning an error of 0 doesn't cause a problem with the "size" and "weight" formulas on line 127. Regarding #3, does that mean we will likely find ourselves adjusting the α value between now and next election as more polls are added? How have you proceeded in the past, for example with the last two election campaign graphs? Did you choose an α value at the start and stick with it, or did you in fact adjust it as more polls were added to the graphs as the campaigns wore on? Undermedia (talk) 22:06, 8 May 2017 (UTC)
- For #2, the weight is only used on when calculating the trendline, and that election data isn't part of the calculation. The election data point from there are plotted seperately as two points (dot and diamond) on line 170 and 171. For the trendline, the election data's "sample size" is added on line 70 (this is hard to read, but gsub is substituting and blanks in sample size, and the election is the only line in the table with a blank, so effectively it is just entering a really large sample size (99999999) for the election. For #3 if we don't want the confidence band to get narrower and smoother as we add more data, then we need to slowly lower the alpha/span value as we add more data. We could probably just put this in the code. For example, currently we have about 80 datapoints, with `span` of 0.35, so each point on the trendline is using the nearest 0.35 * 80 = 28 datapoints. If we wanted to hold constant a the nearest 28 datapoints (or whatever value we choose), we could set span = 28/numDataPoints. Does that make sense? galneweinhaw (talk) 22:41, 8 May 2017 (UTC)
- Interesting info on the meaning of the confidence band of a LOESS regression: https://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression/82632
- Your proposal to set span = x/numDataPoints sounds clever. If anything I might suggest we set a larger number than 28; quickly assessing the graphs for the 2011 and 2015 elections with their respective α values and total number of polls, it looks like the trendlines for the 2011 graph were using roughly 35 polls while the ones for 2015 were using roughly 45 polls. Plus, the one editor of the 43rd election page who has so far commented on the new graph appears to be lamenting that the trendlines seem too sensitive to a recent poll showing a tie for the lead when other recent pollsters have been showing a larger gap. I'll play around with different numbers to assess the effect on the trendlines and propose some "constant"—I'm thinking probably in the whereabouts of 30–35 polls. I do realize however that this constant should probably generally be lower for graphs of pre-campaign periods than for those of campaign periods since polls are released much less frequently during the former and you don't want to give too much weight to polls that are distant in time. Undermedia (talk) 00:41, 9 May 2017 (UTC)
- Interesting info on the meaning of the confidence band of a LOESS regression: https://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression/82632
- For #2, the weight is only used on when calculating the trendline, and that election data isn't part of the calculation. The election data point from there are plotted seperately as two points (dot and diamond) on line 170 and 171. For the trendline, the election data's "sample size" is added on line 70 (this is hard to read, but gsub is substituting and blanks in sample size, and the election is the only line in the table with a blank, so effectively it is just entering a really large sample size (99999999) for the election. For #3 if we don't want the confidence band to get narrower and smoother as we add more data, then we need to slowly lower the alpha/span value as we add more data. We could probably just put this in the code. For example, currently we have about 80 datapoints, with `span` of 0.35, so each point on the trendline is using the nearest 0.35 * 80 = 28 datapoints. If we wanted to hold constant a the nearest 28 datapoints (or whatever value we choose), we could set span = 28/numDataPoints. Does that make sense? galneweinhaw (talk) 22:41, 8 May 2017 (UTC)
- Thanks for your responses! Regarding #2, I'm glad to now understand the syntax, though I'm still not clear on why assigning an error of 0 doesn't cause a problem with the "size" and "weight" formulas on line 127. Regarding #3, does that mean we will likely find ourselves adjusting the α value between now and next election as more polls are added? How have you proceeded in the past, for example with the last two election campaign graphs? Did you choose an α value at the start and stick with it, or did you in fact adjust it as more polls were added to the graphs as the campaigns wore on? Undermedia (talk) 22:06, 8 May 2017 (UTC)
43rd Canada election polling "lead"
Hello,
I've noticed you are the main contributor to the opinion polling of the 43rd Canadian election, and are very knowledgeable of it.
When I click to organize polls by "lead" a poll with a lead of "4" is positioned higher than a poll with a lead of "30" and "9" is higher than "19" because 4 is higher than the "3" in "30 and "9" is higher than the "1" in 19.
Are you able to fix this (if it is a quick fix)? I am not aware of how to do it.
All of your timely contributions are greatly appreciated! Mikemikem (talk) 04:01, 3 May 2017 (UTC)
Never mind, looks like someone else fixed it! Mikemikem (talk) 06:20, 3 May 2017 (UTC)