Researchers have traditionally analyzed patient data that is genetic, patient medical record data, and other data such as patient MRI images or Pathology reports in cancer research. Cancer has largely been considered as emanating “from genetic factors”. It is becoming increasingly clear however, that the primary factors towards cancers of various kinds are not solely genetic, but that environment and lifestyle are key contributing factors as well.
By ‘lifestyle’ factors, we imply aspects such as diet, exercise habits, other physical activity, weight and body mass index (BMI), alcohol tobacco or other substance use, sleep habits, stress and anxiety, etc. Many interesting studies, evaluating the impact of some such lifestyle factors, have appeared in recent years. For instance in breast cancer, a number of risk factors have been identified in the pathogenesis of breast tumors and among these, a great number are linked to nutrition and life-style such as alcohol consumption, obesity, and eating patterns (Study Link). A number of epidemiological studies (Study Link, Study Link) have provided convincing evidence that alcohol consumption is an important risk factor for the incidence and mortality of breast cancer. On the other hand, soybean products act as cancer preventive agents as shown in rodents and other animals (Study Link).
Interestingly soy products have been a staple part of the Asian diet for centuries (they are the predominant source of isoflavones, which belong to the family of phytoestrogens) and studies that investigate the relationship between soy food intake after the diagnosis of breast cancer and health status reported a slightly protective effect especially among the Asian population (Study Link). There have been almost 200 publications in the last 10 years, with the scope of research spanning genetic risk factors, late effects from treatment, comorbidities, second malignant neoplasms, reproductive health, psychosocial outcomes, long-term health, and lifestyle behaviors.
Getting Lifestyle Data: Social-Media
While lifestyle attributes are important, this is data that is typically hard to obtain as it is typically not part of patient medical records or other data. Many attributes, for a particular individual, are also dynamic and evolving – for instance exercise schedules, diet or other habits can and do change with time, place and other context. A promising source for gathering such information is social-media ! A person’s posts, conversations comments, likes and dislikes, and other expressions often provide valuable information from which lifestyle attributes can be derived. However, gathering such information in scalable fashion is a data science challenge.
TC-PAD: Lifestyle Data in a Database
TeraCrunch has developed TC-PAD (for Personal Attributes Database) which is comprehensive solution that provides a structured database of key personal attributes of an individual’s lifestyle. TC-PAD requires authorized access from individuals for the social-media feeds and profile, which it then accesses for content. It then applies sophisticated natural language and semantic understanding algorithms (part of the TeraCrunch Socratez text understanding suite) to synthesize a variety of lifestyle attributes, per individual and deliver this data as a structured database. This is an “on-demand” solution where such a lifestyle database can be created virtually instantly for a cohort of individuals, once authorization credentials have been provided.