The new updated list of CH DVDs (data miner)
#451
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
OK, I finally have the newest data mine. I have emailed the file to basaro whom I'm sure will add it to his website. If someone really needs it now, email me and I can send it. I fixed the errors from before with the enrollment vrs members as well as the title field not handling a comma and that forced some of your data to get offset. The header is now included. Price is not there yet. Also be aware that this includes all item numbers over a range and that means that other things such as Xbox/PS2 etc games and VHS tapes are included. The CH site has been up and down a little over night which means there is a possibility of missed items but that cannot be helped. Hope this is beneficial to all.
#452
DVD Talk Platinum Edition
Join Date: Jul 2003
Location: New Hampshire
Posts: 3,096
Likes: 0
Received 0 Likes
on
0 Posts
#453
Member
Join Date: Nov 2004
Posts: 233
Likes: 0
Received 0 Likes
on
0 Posts
Thanks, AlfB & basaro!
Browsing the site the other day I noticed just by chance that CH had added the remainder of the Poirot TV series volumes, so I'm looking forward to poring over this file to see what else I've missed lately.
Browsing the site the other day I noticed just by chance that CH had added the remainder of the Poirot TV series volumes, so I'm looking forward to poring over this file to see what else I've missed lately.
#455
Cool New Member
Join Date: Oct 2005
Posts: 33
Likes: 0
Received 0 Likes
on
0 Posts
This is AWESOME...Great work guys!!!
Just wanted to give you a heads up that the # Discs field shows everything as 1 disc.
Thanks again for your efforts...I've been hoping someone would pick this back up!!
Just wanted to give you a heads up that the # Discs field shows everything as 1 disc.
Thanks again for your efforts...I've been hoping someone would pick this back up!!
#456
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
It seems there is a bug in the CH website. I got the information from the pages that are accessed withoout logging in and they seem to show 1 disc for everything I tried. Example:
Gladiator Signature Edition Sel #2520807 is supposed to be 2 discs but shows 1.
Looking at the logged in pages gets the correct number of discs. I use this method as it is simple to determine the enrollments vrs members only selections. I'll see what I can do. If I start using the members pages, I cannot tell if it is an enrollment. If I use both, the time to get the data will basically double the time to get the data and it already takes two 10-12 hour sessions. Anyone know of a single page that you can determine enrollment or not as well as the correct number of discs?
Gladiator Signature Edition Sel #2520807 is supposed to be 2 discs but shows 1.
Looking at the logged in pages gets the correct number of discs. I use this method as it is simple to determine the enrollments vrs members only selections. I'll see what I can do. If I start using the members pages, I cannot tell if it is an enrollment. If I use both, the time to get the data will basically double the time to get the data and it already takes two 10-12 hour sessions. Anyone know of a single page that you can determine enrollment or not as well as the correct number of discs?
#457
DVD Talk Platinum Edition
Join Date: Jul 2003
Location: New Hampshire
Posts: 3,096
Likes: 0
Received 0 Likes
on
0 Posts
I'm not sure what address will get the correct info, but having the enrollment info is A LOT more important than the #discs, so if you have to decide between one or the other, stick with the way it is now.
I was also browsing through the newest data, and noticed that it even had all the old titles that CH doesn't carry anymore, including Miramax, Disney, and Fox titles. Maybe there is another site url to read the data from which will not have these titles either. If we can figure out how to get these titles off the datamine, that would also be a good thing.
Try this URL. It has the correct #discs, and it does not include the unavailable Miramax,Disney, etc. titles.
DVDtalk keeps messing up the url I paste in so you'll have to build it from this:
http://
www.
columbiahouse.com
/sa/80/ch/productDetail.do?itemID=1744904&club=3
The only problem with this url is that it does not include all the future titles which aren't available for pre-order yet. Not sure if we can have our cake and eat it too!?
We're getting there
Cheers
I was also browsing through the newest data, and noticed that it even had all the old titles that CH doesn't carry anymore, including Miramax, Disney, and Fox titles. Maybe there is another site url to read the data from which will not have these titles either. If we can figure out how to get these titles off the datamine, that would also be a good thing.
Try this URL. It has the correct #discs, and it does not include the unavailable Miramax,Disney, etc. titles.
DVDtalk keeps messing up the url I paste in so you'll have to build it from this:
http://
www.
columbiahouse.com
/sa/80/ch/productDetail.do?itemID=1744904&club=3
The only problem with this url is that it does not include all the future titles which aren't available for pre-order yet. Not sure if we can have our cake and eat it too!?
We're getting there
Cheers
Last edited by basaro; 12-19-05 at 11:15 AM.
#458
DVD Talk Ultimate Edition
Thanks for the update.
I'll repost this so it's easier for others to cut and paste
http://www.columbiahouse.com/sa/80/ch/productDetail.do?itemID=1744904&club=3
I'll repost this so it's easier for others to cut and paste
http://www.columbia
#459
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
Sorry for not getting back before now. I have been looking at several things including the history of the data mine back through bga's version. Those of you out there that may be more knowledgeable about Java/Perl/Web Pages in general may have some input here but what I have found is that the CH page has some links that are dynamic in that it is not the same each time. They also include a sessionID on the logged in site that is different each time you log in. There is something in the page that determines the difference between my script and the actual page being loaded an run in a browser and I have not found a way around it. This is what keeps me from getting info such as price, pre-order status and a correct number or discs. The CH server somehow knows that I am not the page requesting the info and doesn't send it back. I will keep looking, but so far the current list is the best I can do. This is the first time I have done any Perl/Web programming so it might just be the learning curve. In any case, I am still looking and can continue to keep the file we already have updated but there is not much more I can add to it in its present form. Be aware that there is alot of extraneous info there such as DVD Players and DVDs that are no longer available. Sorry but I have no control over that.
#461
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
I have a late Christmas or an early New Years gift for all. I have been working hard on a new way to get the data from CH and have got something working. I also have my own site for it. Go get it and see what you think. Let me know of errors and suggestions. I know the number of discs is still wrong but cannot do anything about it yet. I have put a lot of time in this trying to guess what people would want so there is quite a bit of new stuff here. Go to the website and you will see a listing of all the fields in the file.
http://webpages.charter.net/alfb/CH/
Merry Christmas and a Happy New Year to all.
http://webpages.charter.net/alfb/CH/
Merry Christmas and a Happy New Year to all.
#462
Senior Member
Originally Posted by AlfB
I have a late Christmas or an early New Years gift for all. I have been working hard on a new way to get the data from CH and have got something working. I also have my own site for it. Go get it and see what you think. Let me know of errors and suggestions. I know the number of discs is still wrong but cannot do anything about it yet. I have put a lot of time in this trying to guess what people would want so there is quite a bit of new stuff here. Go to the website and you will see a listing of all the fields in the file.
http://webpages.charter.net/alfb/CH/
Merry Christmas and a Happy New Year to all.
http://webpages.charter.net/alfb/CH/
Merry Christmas and a Happy New Year to all.
Thanks.
#463
DVD Talk Hall of Fame
Thanks AlfB. Much appreciated.
I am still trying to come to grips with "Selection Available" category. Some of the ones listed as "No" are, in fact, available from the Members site. For example 0756205 12 Angry Men. Might it have something to do with dropping the leading zeros from the selection number when searching the Members site?
It does appear that most of the ones not available are older listings that just haven't been cleared from the Columbia House database or just non-standard. Some have non-standard selection numbers (beginning with a 5 or a 6; CH is only up to 4xxxxxx).
For my workaround, I sorted the list by "Selection Available" descending (puts "Yes" first), then "Selection Number". I then selected the list from the end of the truncated selection number section (line 10366) on up and resorted by enrollment status and title. Those 10,365 ought to be the ones available from the members site, I would guess.
I am still trying to come to grips with "Selection Available" category. Some of the ones listed as "No" are, in fact, available from the Members site. For example 0756205 12 Angry Men. Might it have something to do with dropping the leading zeros from the selection number when searching the Members site?
It does appear that most of the ones not available are older listings that just haven't been cleared from the Columbia House database or just non-standard. Some have non-standard selection numbers (beginning with a 5 or a 6; CH is only up to 4xxxxxx).
For my workaround, I sorted the list by "Selection Available" descending (puts "Yes" first), then "Selection Number". I then selected the list from the end of the truncated selection number section (line 10366) on up and resorted by enrollment status and title. Those 10,365 ought to be the ones available from the members site, I would guess.
#464
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
You are correct. This is available. How this works is I have the original data mine that was posted recently whose data came from the enrollment site. This came up with numerous selections that just did not exist or were no longer available to actually purchase. This data mine takes that list and gets the data from the members site and if it is not available on the members site, the new data mine shows it as unavailable. During this data mine, I was debugging the program some so it could be an error because of that or other as you have suggested. I will look into it to see what the problem could be. Keep an eye out for this type of thing as this is a new program and could still have bugs. Do all the ones that you see as being available but are shown as not have leading zeros?
#465
DVD Talk Hall of Fame
Originally Posted by AlfB
...Do all the ones that you see as being available but are shown as not have leading zeros?
The ones labeled "members" tend to not be available. The leading zero titles labeled "enrollment" do tend to be available, so far as I have seen.
#466
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
Thanks lizard, you led me right to it. It was the sel #s with leading 0s. I am updating the data mine now to fix it. It was a number/string conversion formatting problem. Never thought about the leading 0s.
#467
DVD Talk Hall of Fame
Originally Posted by AlfB
Thanks lizard, you led me right to it. It was the sel #s with leading 0s. I am updating the data mine now to fix it. It was a number/string conversion formatting problem. Never thought about the leading 0s.
In the case of your list, I just blundered upon the realization that it was the leading zero titles that seemed to be available even when your program recorded them as not available. And the reason seemed apparent.
#469
Senior Member
Join Date: May 2003
Posts: 335
Likes: 0
Received 0 Likes
on
0 Posts
Thanks for the list. I noticed that it contains a number of Grateful Dead-related DVDs (e.g. View from the Vault, Truckin' Up to Buffalo etc.) that I don't find at the columbia house site. No big deal, just pointing it out. Or might these be available via phone?
#470
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
Originally Posted by Marvin
Thanks for the list. I noticed that it contains a number of Grateful Dead-related DVDs (e.g. View from the Vault, Truckin' Up to Buffalo etc.) that I don't find at the columbia house site. No big deal, just pointing it out. Or might these be available via phone?
Last edited by AlfB; 01-01-06 at 05:13 PM.
#472
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
I suggest you use something that can import a comma delimited file. I use Excel and Access but there are many options. Using one of them makes viewing the data much easier especially if it allows sorting the data like exel and access.
#473
DVD Talk Platinum Edition
Join Date: Jul 2003
Location: New Hampshire
Posts: 3,096
Likes: 0
Received 0 Likes
on
0 Posts
There are still lots of titles that show up without the leading zero's.
Here's a few:
1620059,"2600","Members","Doctor Zhivago 30Th Anniversary Edition","",1,"3:20","Warner Home Video","2001","Widescreen","","","","","","No","","","",
1616682,"207704","Enrollment","The Dead Zone","R",1,"1:43","Paramount","2000","Widescreen","","","","","","No","","","",
1616683,"447904","Enrollment","Pet Sematary","R",1,"1:42","Paramount","2000","Widescreen","","","","","","No","","","",
1616685,"629501","Members","Round Midnight","R",1,"2:11","Warner Home Video","2001","Widescreen","","","","","","No","","","",
1616936,"202101","Enrollment","An Officer and a Gentleman","R",1,"2:04","Paramount","2000","Widescreen","","","","","","No","","","",
1616937,"206607","Enrollment","The Odd Couple","G",1,"1:45","Paramount","2000","Widescreen","","","","","","No","","","",
1616938,"410506","Members","The Conversation","PG",1,"1:53","Paramount","2000","","","","","","","No","","","",
1616939,"432104","Members","The Untouchables","R",1,"1:59","Paramount","2001","Widescreen","","","","","","No","","","",
Here's a few:
1620059,"2600","Members","Doctor Zhivago 30Th Anniversary Edition","",1,"3:20","Warner Home Video","2001","Widescreen","","","","","","No","","","",
1616682,"207704","Enrollment","The Dead Zone","R",1,"1:43","Paramount","2000","Widescreen","","","","","","No","","","",
1616683,"447904","Enrollment","Pet Sematary","R",1,"1:42","Paramount","2000","Widescreen","","","","","","No","","","",
1616685,"629501","Members","Round Midnight","R",1,"2:11","Warner Home Video","2001","Widescreen","","","","","","No","","","",
1616936,"202101","Enrollment","An Officer and a Gentleman","R",1,"2:04","Paramount","2000","Widescreen","","","","","","No","","","",
1616937,"206607","Enrollment","The Odd Couple","G",1,"1:45","Paramount","2000","Widescreen","","","","","","No","","","",
1616938,"410506","Members","The Conversation","PG",1,"1:53","Paramount","2000","","","","","","","No","","","",
1616939,"432104","Members","The Untouchables","R",1,"1:59","Paramount","2001","Widescreen","","","","","","No","","","",
#474
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
Oops. I thought I got them all but obviously not. I have changed the code such that the next time any of the ones that have leading zeros are updated, they will be corrected. Thanks Brian.
#475
Member
Join Date: Jan 2003
Location: Simpsonville SC
Posts: 185
Likes: 0
Received 0 Likes
on
0 Posts
OK, latest data mine is available on my website. I think I have all the leading 0s fixed now. Let me know if not. I have done an enrollment update as well for those that have changed since the original download a couple of weeks ago. The enrollmentchange field tells you which selection changed to/from enrollment since the last download. Enjoy.
http://webpages.charter.net/alfb/CH
http://webpages.charter.net/alfb/CH