Hello,
I've noticed that there are many duplicate galleries for a specific sponsor I'm importing sets from. In the 1,000's. Setting for Check DB for Dupes is set to yes.
I believe the problem is that this specific tube sponsor is using different URL's, both "www" and without, and that's why dupe content is being added added.
1) Can you think of a solution to prevent this?
2) As there are many galleries with dupes, in the 1,000's, that have already gone through rotation, is there anyway to delete the current duplicates from my database?
prevent sponsor with different urls being added as dupes
Re: prevent sponsor with different urls being added as dupes
Do you have this option on ?
Check for dupe thumbs
script calculates MD5 check summ for each created thumb
if we already have the same MD5 - this usually means that thumb was created
from the same content
for example, FHG with the same content but diff. design/url
Check for dupe thumbs
script calculates MD5 check summ for each created thumb
if we already have the same MD5 - this usually means that thumb was created
from the same content
for example, FHG with the same content but diff. design/url
Don't forget to run script update
Re: prevent sponsor with different urls being added as dupes
I had found that setting yesterday before I made this post. It was set to NO and I changed it to YES. I'm not sure as of yet if it will prevent the issue. There is also another setting, Check for dupe description, and that was set to YES already. I don't use description field for my importing as I only use title. Maybe it would be helpful to have a setting for checking dupes for title as well?
What about my 2nd question, no way to find dupes with the same url and delete them from the database?
What about my 2nd question, no way to find dupes with the same url and delete them from the database?
Re: prevent sponsor with different urls being added as dupes
Looks like the thumb MD5 dupe feature is working.
I checked the gallery_grabber.log and noticed a few issues.
There are over 5,000 counts of the phrase "Looks like we already have this thumb". This is a lot. So the original URL dupe check isn't working for this sponsor. The tube is txxx.
Looking closer at the source URL's for already created gallery's, they aren't using the original source url for the tube gallery. A non domain url is being created somehow for the source url.
They look like this, "http://1c0ee2da9e6cb4e3743c598fac5ecd9c/"and different one for each gallery. I checked other tube sponsors and this is the only one I'm having this problem with.
Here is an excerpt from the error log:
I checked the gallery_grabber.log and noticed a few issues.
There are over 5,000 counts of the phrase "Looks like we already have this thumb". This is a lot. So the original URL dupe check isn't working for this sponsor. The tube is txxx.
Looking closer at the source URL's for already created gallery's, they aren't using the original source url for the tube gallery. A non domain url is being created somehow for the source url.
They look like this, "http://1c0ee2da9e6cb4e3743c598fac5ecd9c/"and different one for each gallery. I checked other tube sponsors and this is the only one I'm having this problem with.
Here is an excerpt from the error log:
Code: Select all
2017-06-21 01:16:15: Processing http://431cb1abe65a51fe731d331239ce0078/ (425632) (0.54480195045471, 0.0023047924041748)
2017-06-21 01:16:15: Gallery description is empty: Update with 'admin added' (0.54583501815796, 0.0010290145874023)
2017-06-21 01:16:15: Content type: 1 (0.54843211174011, 0.0025930404663086)
2017-06-21 01:16:15: Creating thumb (320x180) (Crop profile: 1) (0.54894304275513, 0.00050592422485352)
2017-06-21 01:16:15: Downloading img http://txxx.com/get_file/0/1bbedaa6738e6776e285bff5347c27ed/4140000/4140031/screenshots/5.jpg (../tmp/425632/tmp//751924.jpg) (0.54908108711243, 0.00013399124145508)
2017-06-21 01:16:15: Dupe check 70afad2e70b1f92ab66005714eff3e61 (0.86377501487732, 0.31469297409058)
2017-06-21 01:16:15: Grab: http://431cb1abe65a51fe731d331239ce0078/ : Looks like we already have this thumb (id: 733099 md5: 70afad2e70b1f92ab66005714eff3e61), skip... (0.86423993110657, 0.0004580020904541)
2017-06-21 01:16:15: Can not create thumb from http://txxx.com/get_file/0/1bbedaa6738e6776e285bff5347c27ed/4140000/4140031/screenshots/5.jpg () (0.86920309066772, 0.0049607753753662)
2017-06-21 01:16:15: No thumbs were created (0.86929106712341, 8.6069107055664E-5)
2017-06-21 01:16:15: Cleanup tmp folder ../tmp/425632 (0.86935591697693, 6.1988830566406E-5)
2017-06-21 01:16:15: Deleting gallery (0.86954498291016, 0.00018811225891113)
2017-06-21 01:16:15: Delete gallery 425632 from /domain.com/tcms/bin/gallery_grabber.php (0.86957693099976, 2.9087066650391E-5)
2017-06-21 01:16:15: Delete gallery content 425/632 (0.9338071346283, 0.064228057861328)
Re: prevent sponsor with different urls being added as dupes
How do you import galleries ? pattern
Don't forget to run script update
Re: prevent sponsor with different urls being added as dupes
Sorry, I was going to post a pattern, wasn't sure if you needed it.
example:
example:
Code: Select all
url|title|duration|date|thumb|embed|group|tags
http://txxx.com/videos/4257815/chloe-carter-in-solo-movie-amkingdom/|Chloe Carter in Solo Movie - AmKingdom|479|2017-06-22 10:58:00|http://txxx.com/get_file/0/f3c3c057fbb1c7d3baa376fad4358789/4257000/4257815/screenshots/9.jpg|<iframe width="1280" height="745" src="http://txxx.com/embed/4257815" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>|Solo Girl,Masturbation,Teens,Small Tits,Tattoos|AmKingdom
Re: prevent sponsor with different urls being added as dupes
Ok, so you are sure there's no such thumb in DB and it says this thumb already exists ?
Don't forget to run script update
Re: prevent sponsor with different urls being added as dupes
The thumb does exist. And dupe gallery's are getting added.
There is another problem I mentioned above regarding source url's not containing the correct source url.
Example:
So this is why there are so many dupe gallery's being added. Script checks source url before import and it doesn't find the url, because its been changed, and then adds a duplicate gallery.
Let me know if you don't understand I'll try and explain more.
There is another problem I mentioned above regarding source url's not containing the correct source url.
Example:
Code: Select all
Source url:
http://txxx.com/videos/4257815/chloe-carter-in-solo-movie-amkingdom/
Source url being added to gallery's after they are imported:
http://1c0ee2da9e6cb4e3743c598fac5ecd9c/
So this is why there are so many dupe gallery's being added. Script checks source url before import and it doesn't find the url, because its been changed, and then adds a duplicate gallery.
Let me know if you don't understand I'll try and explain more.
Re: prevent sponsor with different urls being added as dupes
I've tried to add this line 2 times - it says "duplicate"
how do you test it ?
how do you test it ?
Don't forget to run script update
Re: prevent sponsor with different urls being added as dupes
How can I delete duplicate galleries that have same MD5 thumb?