Page 1 of 2 12 LastLast
Results 1 to 15 of 20

 

Thread: Relational or Flatfile Database?

  1. #1
    Registered User

    Status
    Offline
    Join Date
    Feb 2006
    Location
    Gillingham
    Posts
    510
    Thanks
    0
    Thanked 4 Times in 1 Post


    I'm creating (yet another) product feed database script and I'm looking for ideas help from the forum.

    I've searched through various websites for the answer but can't find one anywhere.

    Is it better to use a flatfile or a relational database?

    Most of the articles I've read refer to various gurus of the 70's and the benefits of the relational database. But is this still relevant to today? I remember 'back in the day' running an employees database on an 8086 machine running at 8mhz and stored on a 360k floppy disc. Things have changed considerably since then, computer speeds have in increased by a factor of 500 and storage by a factor of hundreds of thousands. But the old '70s theories still prevail are they still relevant?

    The main table I have designed is shown in the following mysql dump.
    Code:
    CREATE TABLE `products` (
      `Merchant_ID` int(7) NOT NULL default '0',
      `Product_ID` varchar(30) NOT NULL default '0',
      `Name` varchar(100) NOT NULL default '',
      `Brand` varchar(30) default NULL,
      `Short_Description` varchar(255) NOT NULL default '',
      `Long_Description` text,
      `Product_URL` varchar(255) NOT NULL default '',
      `Image_URL` varchar(255) NOT NULL default '',
      `Thumbnail_URL` varchar(255) NOT NULL default '',
      `RRP` decimal(8,2) NOT NULL default '0.00',
      `Offer_Price` decimal(8,2) NOT NULL default '0.00',
      `Promo_Text` text,
      `Delivery` varchar(255) default NULL,
      `Category_ID` int(6) NOT NULL default '0',
      `Delete_Flag` tinyint(1) NOT NULL default '0',
      `currency` char(3) NOT NULL default 'GBP',
      PRIMARY KEY  (`Product_ID`),
      KEY `Name` (`Name`),
      KEY `Category_ID` (`Category_ID`),
      KEY `Merchant_ID` (`Merchant_ID`)
    ) ENGINE=MyISAM ;
    (I also have separate category tables for various networks linked on the Category_ID and a merchant table).

    If I wish to add industry specific data (eg holidays cellphone cds etc) would I
    be better off adding the relevant fields to the main table or would it be better to have a separate table for each industry link on the Product_ID?

    Any advice / help would be appreciated.

    Regards
    They came for my 404 and I said nothing

  2. #2
    data muncher

    Status
    Offline
    Join Date
    Sep 2004
    Location
    Berlin
    Posts
    2,475
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You opened up a whole can of worms, it is good to really put some thought into a database structure, i think you relational database thing is neither here nor there nowadays. So just carry on using mysql the way that everyone else does but onto your table structure.

    There is no right or wrong way, you really do have to make your own mind up depending on what it is you are trying to do and if you are trying to have lots of things then one setup will never be as good as another,

    you say one table over several tables, how do you seperate products into different tables when one merchant sells thousands of items across 6 different product types? how can you seperate them when the merchant doesnt have a category structure to identify them.

    You want one table with lots of data, how much data, we run mysql with millions of products in one table but we did a lot of custom tuning with both the server and mysql to get it to work/behave. What about indexes?

    For some jobs it would be better to have each field as a seperate table but then you have more strain on your machine doing searches in one table to get id's and then going to other tables to get the information (double the queries) so you see the list is endless.

    There are some clever ways that you can have one table for ALL types of data, it does not matter what product groups there are but that is probably not the kind of programming you would get a lot of support with on forums and programmers as they probably would not understand what you are doing and why.

    Everything depends on everything, processor, hard drive speed, memory buffers, caches, tuning so you have to know what it is sat on and how much it will be used and if your designing a script for others to use then you need to presume that they will all be on shared hosting all with timeout limits on processes and with very little memory so it makes the job even harder, you cant have large tables because there is not enough resource and you cant have many tables because mysql will have too many tables open in memory.

    I am not going to get into database consultation as its costs us many thousands of pounds to get where we are with ours and i would say its only now we really really understand how to push mysql hard like a ***** but if you are starting off i would say just put it all in one table and make some money, accept that the fact that it only becomes a problem when you are making money and then you have enough cash to fix the problem as they come about.

    Also bear in mind that generally performance problems associated with large tables, providing the code is ok can generally be solved by just getting a bigger server or faster drives most of the time.
    Nothing to see here...

  3. #3
    Spartacus's Avatar
    [yet-to-be] freed slave

    Status
    Offline
    Join Date
    Sep 2006
    Location
    Sheffield
    Posts
    316
    Thanks
    6
    Thanked 5 Times in 5 Posts
    Use a relational database, such as MySQL. They're still as valid today as way back when. With reference to the performance issues mentioned above, If you're not looking for an industrial strength solution then number of tables may not be an issue.

    If you performance is an issue then you would need consider denormalisation of your data, search indexes and other stuff that escapes me at the moment. Denormalisation creates redundancy and the potential for inconsistencies, so that’s where the design comes in.

    Flat files are OK for noddy applications but when it comes to searching and sorting this can be a pain. Much easier to write a bit of MySQL. I think you can use SQL to query files (vague memory), but it certainly isn’t widely used.

    There are also issues with security when using files, and you need to consider locking and have the same redundancy issues as denormalised tables.

    In my view it’s better to dive in build yourself a logical data model, normalise if you can and use it as the basis of your table design. Once you have tried to right a few SQL statements with tricky joins in you’ll soon realise how far to go with normailisation.

    Alternatively keep it simple and table numbers to a minimum.
    Last edited by Spartacus; 19-10-06 at 02:32 PM.
    Here endeth the post

    http://www.tenuouslinks.co.uk (Social networking for the anti-social )

  4. #4
    Spartacus's Avatar
    [yet-to-be] freed slave

    Status
    Offline
    Join Date
    Sep 2006
    Location
    Sheffield
    Posts
    316
    Thanks
    6
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Donk View Post
    If I wish to add industry specific data (eg holidays cellphone cds etc) would I
    be better off adding the relevant fields to the main table or would it be better to have a separate table for each industry link on the Product_ID?
    If it's a one to one relationship with the rest of the data on the table add it to the main table.
    Here endeth the post

    http://www.tenuouslinks.co.uk (Social networking for the anti-social )

  5. #5
    Registered User

    Status
    Offline
    Join Date
    Feb 2006
    Location
    Gillingham
    Posts
    510
    Thanks
    0
    Thanked 4 Times in 1 Post
    Thanks for your detailed and timely reply, there is a lot of information to take in.

    As you say to really produce a large site requires a dedicated server, also to manage a feed with millions of products requires more resources than just a 'one man band'.

    I am well aware too of the problems of shared hosting. I have been using two hosts. Up until yesterday one was cheap (used for many small sites) and the other was efficient (used for one larger site). Due to the vagaries of shared hosting and the hosts trying to 'optimise their ROI' today It would appear that the only way I can diferentiate the hosts now is that one is cheap and the other isn't.

    With my present limited resources I think will have to stick with the smaller sites for the time being but at least have the database and coding in place which will allow me to expand without having to start again.

    Spartacus my question wasn't teally to do storing the files as csv/text but how they are used in MySql. It would appear from scripts available on the web that the majority of users do not use the SQL capabilities of the MySql engine but rely on the coding in PHP to create the relationships.

    The use of one table for all types of products I think would create a fairly cumbersome table. I was looking at a mobile phone feed which had about 20 extra fields that are specific to the phone industry, also a holiday feed witha about 10 extra. For CDs you would need to add artist, genre, tracks etc. then add to that mortgages, loans, betting etc the list goes on. My main table would grow from the present 20 (approx) fields to a hudred or more many of which would require indexing.

    I need to consider all the different factors before I commit to much time into what has become a major project.

    Thanks for your input
    They came for my 404 and I said nothing

  6. #6
    data muncher

    Status
    Offline
    Join Date
    Sep 2004
    Location
    Berlin
    Posts
    2,475
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Donk View Post
    My main table would grow from the present 20 (approx) fields to a hudred or more many of which would require indexing.
    Thats what i was talking about when being clever about things, you dont need hundreds of fields just for every different eventuality and feature type, some things remain a constant throughout the datas, colour for example, you dont need to add a field just for colour just because it is important for one product type, make one field called attributes

    Colour:Value,height:value,width:value,depth:value, weight:value, etc etc etc

    You dont need to have all of that in each field but you can get all the information into one field and then just deal with one field looking for strings within it. You talk about keeping small but i seem to be able to manage a few million queries a day in this manner.

    Also that allows you to run searches for "colour:red" inside that field quite easily when you want to find red items etc. just the same as if you had a dedicated field.

    Ok i am not suggesting that this is one of the perfect ways to do it but it is a good example of thinking more rather than thinking that 1 field should just have 1 bit of info that is normalised in it.
    Nothing to see here...

  7. #7
    Registered User

    Status
    Offline
    Join Date
    Aug 2003
    Posts
    2,448
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I'd definately go with MySQL over flat file just because it is so much more flexible and forgiving (though just throwing faster servers at a problem is rarely the best move!). Flat file formats can perform much faster than MySQL but you really need to know what route you will be taking before starting constructing them.

    There was a series on Oreilly back in May (thanks Google desktop for finding it!) that discussed 'Database war stories' at companies like Bloglines, Flickr, Findory. I haven't reread them now but I think they were quite good. This links to the one about Findory, that uses a combination of the two, and that links to the other articles in the series.

    I pretty much always use MySQL, it's just so much easier - though I have been pulling my hair out for the last couple of days on one select which is now 360 lines long.

  8. #8
    Spartacus's Avatar
    [yet-to-be] freed slave

    Status
    Offline
    Join Date
    Sep 2006
    Location
    Sheffield
    Posts
    316
    Thanks
    6
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Donk View Post
    The use of one table for all types of products I think would create a fairly cumbersome table. I was looking at a mobile phone feed which had about 20 extra fields that are specific to the phone industry, also a holiday feed witha about 10 extra. For CDs you would need to add artist, genre, tracks etc. then add to that mortgages, loans, betting etc the list goes on. My main table would grow from the present 20 (approx) fields to a hudred or more many of which would require indexing.

    I think I understand you better now and agree with your approach. Just FYI I tend to use an object-oriented aproach. So I would have say a CD object which is stored on a CDs table, such that each attribute of the object represents a column on the table. I suppose if I had a cassette object that had mostly the same fields I would consider using a generalised audio odject and use a single table to store both types of data. You know you've gone wrong when there are null values everywhere.

    So when you refer to flat files do you mean a single record/row on the table?

    Good luck anyway
    Last edited by Spartacus; 19-10-06 at 04:45 PM. Reason: Reread original post
    Here endeth the post

    http://www.tenuouslinks.co.uk (Social networking for the anti-social )

  9. #9
    Spartacus's Avatar
    [yet-to-be] freed slave

    Status
    Offline
    Join Date
    Sep 2006
    Location
    Sheffield
    Posts
    316
    Thanks
    6
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Donk View Post
    It would appear from scripts available on the web that the majority of users do not use the SQL capabilities of the MySql engine but rely on the coding in PHP to create the relationships.
    Just one more thing ...

    I agree with this to. Try to go this way myself as it suits my OO approach. Sometimes however you just can't get round it. Suppose you had supplier records on another table an the supplier went bust , so you set the supplier to suspended. It could be useful then to join the products table to the supplier table in order to determine which products to no longer display at the time of the query.
    Here endeth the post

    http://www.tenuouslinks.co.uk (Social networking for the anti-social )

  10. #10
    Registered User

    Status
    Offline
    Join Date
    Feb 2006
    Location
    Gillingham
    Posts
    510
    Thanks
    0
    Thanked 4 Times in 1 Post
    I didn't think I'd get so many replies to this question thanks all for the input.

    Quote Originally Posted by Rich View Post
    I pretty much always use MySQL, it's just so much easier - though I have been pulling my hair out for the last couple of days on one select which is now 360 lines long.
    Just a suggestion but I've just a completed a small site (20k records max) for a customer and dragged in the whole table in one go.
    PHP Code:
    $productarray[]="";
     
    $result mysql_query($sql) or die("Query failed : " mysql_error());
    while(
    $productarray[]=mysql_fetch_assoc($result));
    array_pop($productarray);
    unset(
    $productarray[0]); 
    Then I did all the queries/sorting/searching in PHP it ran a lot quicker than I originally expected. (The first and last line were to make the array base 1 rather than base 0 and the pop function is there because the while loops one too many times.)

    Pricethat I do like the idea of storing the data in groups within a field it make a lot of sense and some of the fields can then store different data depending on the industry. I have used a similar technique in the category mapping using the scheme 123,234,567 etc where 123 is the grandfather 234 is the father and 567 is the actual base category so you can do a quick search or record count rather than have to recurse the the whole category map.
    They came for my 404 and I said nothing

  11. #11
    Spartacus's Avatar
    [yet-to-be] freed slave

    Status
    Offline
    Join Date
    Sep 2006
    Location
    Sheffield
    Posts
    316
    Thanks
    6
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Donk View Post
    dragged in the whole table in one go..
    If you load the whole table into an array before searching and sorting, are you not worried about the size of the memory footprint on the server?
    Here endeth the post

    http://www.tenuouslinks.co.uk (Social networking for the anti-social )

  12. #12
    Registered User

    Status
    Offline
    Join Date
    Feb 2006
    Location
    Gillingham
    Posts
    510
    Thanks
    0
    Thanked 4 Times in 1 Post
    No it doesn't seem to cause a problem.

    I use hosted servers so I can't really check.

    But running on my local computer's testing server it didn't slow it down.

    I've attached the screen dumps for my Windows task manager the first shows the state of the computer while resting and the second I tried to capture the peak of a page loading. (bear in in that my computer is acting as client as well as server.)

    I was using a Webgains/Yesasia feed - 10MB and 18000 records just searching two of the fields for one keyword and displaying the 10 page of the records.

    I wouldn't like to try it with PriceThat's feed tho
    Attached Images Attached Images
    They came for my 404 and I said nothing

  13. #13
    data muncher

    Status
    Offline
    Join Date
    Sep 2004
    Location
    Berlin
    Posts
    2,475
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Spartacus View Post
    If you load the whole table into an array before searching and sorting, are you not worried about the size of the memory footprint on the server?
    If you are worried about that then dont store the whole table as an array in the first place
    Nothing to see here...

  14. #14
    Spartacus's Avatar
    [yet-to-be] freed slave

    Status
    Offline
    Join Date
    Sep 2006
    Location
    Sheffield
    Posts
    316
    Thanks
    6
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by pricethat View Post
    If you are worried about that then dont store the whole table as an array in the first place
    That's why I use MySQL to do my searching and sorting. I may have misunderstood what Donk is doing, but it concerns me that a large array may be being stored for every user. This could lead to scalability problems going forward.

    Not a problem for my sites 'co I don't get any traffic
    Here endeth the post

    http://www.tenuouslinks.co.uk (Social networking for the anti-social )

  15. #15
    data muncher

    Status
    Offline
    Join Date
    Sep 2004
    Location
    Berlin
    Posts
    2,475
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Spartacus View Post
    Not a problem for my sites 'co I don't get any traffic
    That is though a topic that probably has more relevancy to design predictions and how you go about completing a project or at least getting something to work.

    This subject is the techy geeks war, waged in the fires of hell for eternity, you will always be able to discuss one way over another and etc etc. In context with affiliate marketing i would say yes that it is important to think about things properly and make some decent planning but dont spend all your life on it and get carried away, sometimes you have to say, its not the best i could probably of done that better here and here but its going to allow me to make some money.

    Lets face it, even the skilled amongst us here are not going to win any national prizes for programming and its not going to pay our wages whether we use proper class systems in our scripts, or whether our databases are 100% optimised for the queries.

    Whilst i can make suggestions that make sense and other people equally i dont think there is anyone in here with 100% technically right anything so whilst an argument or suggestion might seem valid to one side, it certainly wont be to another and no matter what route you take it will always make for new adventures along the way.

    We have problems with some of our queries running on multimillion record tables, some could call it a fatal scalability problem???? Not really, im not making millions from the millions of records i have already, i am certainly not going to make any more by having another 4 million records so its of no consequence to me. If i need to scale upwards then there is always options available, bigger servers, putting the database on one server and http server on another or even re writing the whole code but the important thing to remember is it is only a problem when i am actually earning money!!

    We can all sit here all day discussing mysql etc, but if you can do that and keep a level head with and say i need to earn money, then suddenly you become less bogged down with the technical aspects of it and give it a reasonable consideration rather than full scale project planning.

    Thats me done for one day, gonna earn some money ;-)
    Nothing to see here...

Page 1 of 2 12 LastLast


Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. mobile phone database
    By DutchRomano in forum Affiliate Marketing Lounge
    Replies: 4
    Last Post: 17-10-08, 12:17 PM
  2. Replies: 6
    Last Post: 21-01-07, 03:30 PM
  3. Euphoria database files?
    By lowndsy in forum Affiliate Marketing Lounge
    Replies: 1
    Last Post: 02-08-05, 08:41 AM
  4. Affiliate links stored in a database
    By Nigel in forum Programming
    Replies: 4
    Last Post: 07-01-04, 01:46 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
To Top

Content Relevant URLs by vBSEO 3.5.0 RC2