| Home | E-Submission | Sitemap | Editorial Office |  
Korean J Urol Oncol > Volume 19(4); 2021 > Article
Choi, Kim, Lim, Lee, Kim, Kim, Lee, Cho, You, Jeong, Song, Hong, Kim, Ahn, and Hong: Construction of a Retrospective Cohort to Observe 10-Year Urologic Cancer Treatment Trends at the Biggest Medical Center of South Korea



To construct a urologic cancer database using a standardized, reproducible method, and to assess preliminary characteristics of this cohort.

Materials and Methods

Patients with prostate, bladder, and kidney cancers who were en-rolled with diagnostic codes in the electronic medical record (EMR) at Asan Medical Center from 2007–2016 were included. Research Electronic Data Capture (REDCap) was used to design the Asan Medical Center-Urologic Cancer Database (AMC-UCD). The process included developing a data dictionary, applying branching logic, mapping clinical data warehouse structures, al-pha testing, clinical record summary testing, creating “standards of procedure,” importing data, and entering data. Descriptive statistics were used to identify rates of surgeries and numbers of patients.


Clinical variables (n=407) were selected to develop a data dictionary from REDCap. In total, 20,198 urologic cancer patients visited our institution from 2007–2016 (bladder can-cer, 4,616; kidney cancer, 5,750; prostate cancer, 10,330). The overall numbers of patients and surgeries increased over time, with robotic surgeries rapidly growing over a decade. The most common treatment for urologic cancer was surgery, followed by chemotherapy and radiation therapy.


Using a standardized method, the AMC-UCD fosters multidisciplinary research. This constructed database provides access to clinical statistics to effectively assist research. Preliminary data should be refined through EMR chart review. The successful organization of data from 2007–2016 provides a framework for future periods of investigation and prospective models.


With the popular dissemination of electronic medical records (EMRs) systems, many medical institutions began storing clinical data in databases in the late 1990s and 2000s.13 The transition away from paper has accelerated data storage. Therefore, with EMR adoption, large quantities of clinical data have been stored.4 Recently, attention to secondary use of this clinical data to improve clinical care has grown.58
The fast-growing quantity of clinical data makes reused clinical data a candidate source for “big data.”4 Big data enables researchers to easily explore data, generate research questions, and determine study feasibility.9 In the field of urology, there is a desire to use past data to guide clinical decisions for the future.10
With emerging evidence of the benefits of multi-disciplinary research in cancer care,11 numerous multidisciplinary studies are being conducted in various departments on one topic, such as urological cancer.12,13 However, researchers still use databases that differ between project units, researchers, or departments. These methods lead to increased risk of information leakage in process of multidisciplinary studies with collaborators.14 Further, it is redundant and has a lot of missing value because the data are extracted manually with a higher likelihood of human error.15
A disease group-specific clinical database may help compensate for EMR limitations, as well as provide more readily-accessible means of research.16 Databases like these can be made available to researchers of various departments that all study urologic cancer, for example. This study aims to (1) construct the Asan Medical Center-Urologic Cancer Database (AMC-UCD) using a standardized and reproducible method, and (2) to identify preliminary characteristics of this cohort.


The administrative procedures for registry pla-nning took place from 2016–2018. We developed a retrospective cohort using Research Electronic Data Capture (REDCap) from July 2018– December 2018 with the methods shown in Fig. 1. The cohort included patients with prostate cancer (C61), bladder cancer (C67 and D09), and kidney cancer (C64). Only patients enrolled with diagnostic codes at Asan Medical Center between 2007 and 2016 were included.
Fig. 1.
Process flow chart. IRB: Institutional Review Board, CRF: case report form, CDW: clinical data warehouse, REDCap: Research Electronic Data Capture, SOP: standard of procedure.

1. Development of a Data Dictionary

We developed a case report form (CRF) to define the range of detail (variables) of the registry to be collected. Because data requirements differ, the scope of the variable was defined by considering data migration efficiency from the EMR to the group-specific database. And, we created a data dictionary to conduct physical modeling of the database from the CRF which is cannot be used to create a data structure by itself.

2. Applying Branching Logic

The branching logic provided by REDCap allows researchers the option of showing or hiding input fields. Because the registry consists of several input forms with the patient's initial information, treatment information, and follow-up information, the data collection instrument must be changed dynamically by cancer type. With this, it is possible to show only the input form to be filled out by the researchers. For example, when entering patients only with kidney cancer, there is no need to show the input fields for bladder cancer and prostate cancer. If these fields are present, this can confuse researchers, reducing the user input experience, and slowing the data entry speed. Therefore, we applied branching logic to show the data fields specifically for each cancer type.

3. Mapping With Clinical Data Warehouse Structure

When we constructed cohort, we planned on linking data from clinical data warehouse (CDW) to it, so we wished to build databases considering the interoperability in the future. Therefore, the data type and data name used for cohort were made to be compatible with CDW. We therefore created a REDCap database structure that reflects the data types used in EMR and order communication systems as much as possible.

4. Alpha Test

We structured the REDCap form through the data dictionary, installed it on the test server, and tested the function of REDCap. We conducted an alpha test with a total of 6 people, including 3 residents, 1 fellow, 1 staff professor, and 1 research coordinator. Our objective of this test was to confirm that (1) there were no data that were not summarizing the clinical records, (2) there were no technical errors while entering data into the REDCap, and (3) there were no spelling errors.
In addition, the most important function of REDCap is that clinical records are able to be comprehensively included. To determine whether the clinical records were sufficiently included, we categorized 3 types of treatments for each of the 3 cancers included in our cohort (surgeries, chemotherapy, radiotherapy), classified them into 9 subcategories total, and extracted some samples included in our test. We recruited clinicians and assigned 5 samples per clinician during the test, and asked them to input the patient records into REDCap.

5. Clinical Record Summary Test

The person who summarized the clinical records of the cohort we built was the clinical research coordinator. Depending on how well the clinical research coordinator summarizes the clinical record, the reliability of data may vary. Therefore, it was important to evaluate whether the coordinator abstracted clinical records properly. To do this, we asked the coordinator to enter the records of 2 patients per subgroup similarly to the alpha test (total 18 cases). To evaluate the records entered by the coordinator, a professor with more than 10 years of experience with urology entered the record of the same patient (answer label). We then evaluated the accuracy of the records entered by coordinator and compared these two. However, the clinical records could have been wrong, so the coordinator and clinician together compared the 18 cases, and the answer data set was ultimately made. Finally, both records were compared.
The records were evaluated by calculating the number of data correctly input by the coordinator divided by the number of data on the correct answer data set multiplied by 100 to get a percent. This study aimed to achieve clinical record quality of at least 95%. Fortunately, the coordinator in this study secured the reliability of clinical records by exceeding 95%.

6. Create a Standard of Procedure Document

To ensure data quality, we developed a standard of procedure (SOP) document for entry when 2 or more research coordinators participated in the study. In addition, when the research coordinator participated, the participant would be subjected to a clinical record summary test. Despite this, to ensure that data from the same clinical record was consistent when more than 2 coordinators entered it, a SOP was developed. The SOP included information on all data contained in REDCap (including label name, variable name, and code value) and where to find the data in the EMR. It also provided guidance on various situations that could be found in the medical records.
The role of managing the SOP was delegated to the research coordinator. In addition, we stated in the SOP document that whenever the version of REDCap is changed, the SOP must also be updated. After this, 2 clinical professors reviewed the SOP and confirmed it.

7. Data Importation

Considering the number of variables included in REDCap and the number of subjects in our cohort, it would have been a large burden to enter data into REDCap by summarizing the medical records manually. To prevent this, we preprocessed data that could be easily fitted to the REDcap data structure and imported them into REDCap. We extracted the subject's data according to the range of the cohort in the CDW, and we preprocessed it with R ver. 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria). Then, using the import data function of REDCap, the data was loaded according for all subjects in the cohort.

8. Data Entry

Except for preprocessed data that were already uploaded, the clinical research coordinator direc-tly input the parts requiring medical context. Data was searched and recorded using the developed entry SOP through an anonymous chart review in CDW.

9. Analysis Methods

Descriptive statistical analyses were performed to identify the likelihoods of surgeries and the numbers of patients according to age over 10 years, as well as the proportions of surgeries. Analysis of variance and Student t-tests were performed using R ver. 3.5.1 for comparisons of continuous variables. Categorical variables were analyzed using chi-square testing. Because there were patients with 2 or more cancers, we did not compare the sta-tistical differences between groups.
To identify the broad characteristics of the cohort, we analyzed data using natural language processing with several rules. To find surgeries performed mainly at our institute Urology Department, surgery methods by instrument and resection type (radical vs. partial) were classified using the names of surgeries. In addition, during the entire visit period for each patient, we found chemotherapy treatment and radiation therapy data.

10. Ethics

The study protocol was approved by the Institutional Review Board at Asan Medical Center (2018–0941). The need for informed consent was waived because of the retrospective nature of the study.


We defined the data requirements for REDCap development at the CRF definition stage. Fur-thermore, 407 data variables were selected for development of the data dictionary (Table 1).
Table 1.
The variables selected for development of the data dictionary
Cancer class & status   Clinical stage Treatment Survival & last follow-up
Is the patient diagnosed without other cancers?
Other solid tumor diagnosis code (ICD-10 code)
Diagnosis code (each bladder, kidney, prostate)
Common Birth
Diagnosis date
Body mass index
Clinical T stage
Clinical N stage
Clinical M stage
Treatment modality
Operation date
Method by instrument
Operation start time
Operation end time
Neoadjuvant therapy
Adjuvant therapy
Pathologic data
Histological type
WHO grade
Size of tumor
Location of tumor
Pathological T stage
Pathological N Stage
Lymphovascular invasion
Lymph node metastasis site
Focal therapy
1st regimen
1st initial dose
1st cycle
1st regimen start date
1st regimen end date
2nd regimen
3rd regimen (with more variables)
Definitive RT
Total dose
Start date
End date
Other Is the patient receiving follow-up care?
Death Death date Follow-up date
  Bladder cancer Has TURB been performed at another hospital? NMIBC TURB
Previous NUx
Reason for operation
Diversion of operation
Pelvic LND site
Combine operation
Rectal injury
Result of frozen biopsy (with more variables)
Urothelial carcinoma variant
Variant percentage
Soft tissue margin
Urethral margin positive
Urethral margin site
Ureter margin positive
Ureter margin site
Ureter orifice involvement
Ureter orifice site (with more variables)
    BCG installation  
  Renal cancer Performed kidney biopsy
Biopsy result
Biopsy date
Cytoreductive operation tumor location
Grossly change
Adhesive change
Ureter change Approach
Adrenalectomy (with more variables)
Papillary type
Fuhrman grade
ISUP grade
Renal vein invasion
Perirenal invasion
Sarcomatoid change
Surgical margin
Resected weight (with more variables)
    Ablation therapy
Embolization (with more variables)
  Prostate cancer PSA value at diagnosis
Histological type
Primary GS at diagnosis
Secondary GS
GS sum
GS grade group
Number of positive bx core
Number of total bx
Maximal percentage of positive biopsy (with more variables)
Nerve sparing
Frozen biopsy
Leakage test
Primary GS
Secondary GS
GS sum
GS grade group
Tumor percentage
Tertiary Gleason grade 5
Intraductal carcinoma component
Seminal vesicle invasion (with more variables)
  Adjuvant RT
Definitive RT
Neoadjuvant ADT
Adjuvant ADT
Focal therapy
Surveillance (with more variables)

ADT: androgen deprivation therapy, CCRT: concurrent chemoradiation therapy, RT: radiation therapy, WHO: World Health Organization, NUx: nephroureterectomy, NMIBC TURB: nonmuscle invasive bladder cancer transurethral resection of tumor, MIBC TURB: muscle invasive bladder cancer transurethral resection of tumor, PSA: prostate-specific antigen, Pelvic LND site: pelvic lymph node dissection site, GS: Gleason score, ISUP grade: International Society of Urological Pathology.

1. Characteristics of the Cohort

Approximately 20% of our cohort had bladder cancer (including overlap with other cancers). Most patients were male (81.1%), and transurethral resection of the bladder (TURB) was performed at a high rate among surgical treatments for bladder cancer (87.5%). Most radical cystectomies were performed as open surgeries (99.6%).
Among kidney cancer patients, the average age (56.2±13.8 years) was relatively low compared with those of the other 2 cancer cohorts. In addition, the proportion of radical nephrectomies and partial cystectomies were almost equal (50.1% vs. 49.9%, respectively). Furthermore, open surgery was often used for operation (50.1%). In addition to the open method, however, various methods were used.
In the prostate cancer patient group, the proportion of prostate cancer patients was the highest among urologic patients (51.1%). A large proportion of the number of surgeries was performed by robotically (n=3,888), with open surgery next-most likely (n=1,832) (Table 2).
Table 2.
The characteristics of urologic cancer patients at our institute from 2007–2016
Variable Cancer class (n=20,198), multiple
Bladder cancer patients (n=4,616) Renal cancer patients (n=5,750) Prostate cancer patients (n=10,330)
 Male 3,742 (81.1) 3,978 (69.2) 10,330 (100)
 Female 874 (18.9) 1,772 (30.8) NA
Height (kg) 164.33±8.3 163.52±14.9 166.14±6.78
Weight (kg) 65.55±11.0 66.77±12.31 67.49±9.05
BMI (kg/m2) 24.22±3.2 24.57±3.43 24.39±2.81
Age at diagnosis (yr) 64.73±12.0 56.20±13.8 67.37±8.23
 Radical 692 (91.0) 2,044 (50.1) 6,021 (100)
 Partial 68 (9.0) 2,039 (49.9) 0 (0)
Operation method      
 Open 757 (12.3) 2,045 (50.1) 1,832 (30.4)
 Laparoscopic 2 (0) 781 (19.1) 0 (0)
 Hand-assisted laparoscopic surgery 0 (0) 616 (15.1) 0 (0)
 Robotic 1 (0) 641 (15.7) 3,888 (64.6)
 Transurethral 5,361 (87.5) NA 302 (5.0)
Radiotherapy and chemotherapy      
 Chemotherapy 2,664 (57.7) 1,633 (28.4) 4,232 (41.0)
 Radiotherapy 399 (8.6) 485 (8.4) 1,355 (13.1)

Values are presented as number (%) or mean±standard deviation.

NA: not available.

The number of surgeries performed. Percentage is rate of performed operations.

The number of patients who underwent chemotherapy or radiotherapy treatment for overall visit.

2. Treatment Trends

The treatment methods were classified, and the surgical procedures and the number of patients were examined. The numbers of patients in each cancer group tended to increase steadily over the measured decade, and patients diagnosed with prostate cancer were the majority (Fig. 2A).
Fig. 2.
(A) Trends in the number of urinary cancer patients. (B) Trends in the number of surgeries for bladder cancer. (C) Trends in the number of surgery types for kidney cancer. (D) Trends in the number of surgery methods for kidney cancer. (E) Trends in the operation methods for prostate cancer. (F) Trends in the operation methods of urologic surgeries. TURB: transurethral resection of the bladder, HALS: hand-assisted laparoscopic surgery, TURP: transurethral resection of the prostate.
For the treatment of bladder cancer, radical cys-tectomy was performed more commonly than par-tial cystectomy, and the trend of bladder resection did not increase over the measured decade. However, the number of TURB procedures steadily increased (Fig. 2B).
For the treatment of kidney cancer, radical ne-phrectomy rates increased more than partial ne-phrectomy rates (Fig. 2C). For method, laparotomy use increasing steadily over the measured decade, and robotic surgery has increased at a constant rate since 2012. Indeed, by 2007, the number of robotic surgeries increased dramatically (Fig. 2D).
For the treatment of prostate cancer, the number of open surgeries and transurethral resections of the prostate did not significantly change, but the number of robotic surgeries increased at a constant rate (Fig. 2E).
Of the surgical approaches in the cohort, the number of transurethral surgeries was the most common, followed by robotic and open surgeries. The number of laparoscopic surgeries and hand-assisted laparoscopic surgeries was less 200 per year (Fig. 2F).

3. Treatment Modality Proportion

The most common treatment was surgery, except for patients who had just been diagnosed and who did not yet receive treatment. However, the ratios of treatment modalities varied for each urologic cancer. Unlike the treatment of kidney and prostate cancer, the number of patients undergoing surgery and chemotherapy (n=2,068) was higher than the number of patients who underwent only surgery (n=1,111) (Fig. 3).
Fig. 3.
Treatment modality proportions. OP: operation, CTx: chemotherapy, RT: radiotherapy.


We ultimately constructed a cohort of 20,198 urologic cancer patients for which 407 clinical variables were analyzed at Asan Medical Center from 2007–2016. The overall number of patients and surgeries increased over the measured decade, and robotic surgeries showed rapid growth. The most common treatment for urologic cancer was surgery, followed by chemotherapy and radiation therapy. In addition, we confirmed that there were numerous patients who had just been diagnosed at Asan Medical Center but not yet treated.
Through discussions with related departments, we compared data capture tools and databases. The tools and databases compared were ABLE (in-house CDW), REDcap, common data model, and clinical information system. Among these, we chose REDcap because it can securely process data by granting authority to each account, and the database setting is designed to be flexible and efficient.17 In addition, it was possible to generate and modify CRF in real time. In addition, we were in the process of developing a linkage function with ABLE, and we expected that REDCap as a data capture tool would reduce the labor of the input.
We tried to ensure that researchers had indivi-dually collected data sets and could evaluate them for development. Secondary use of data and the ability to load existing data could shorten the time needed to build a cohort. However, the data collected individually by researchers could not be guaranteed as complete, and would need more resources. Therefore, a new registry was constructed that did not require the use of existing data sets.
From 2007–2016, the number of surgeries and patients continuously increased. This did not mean that there were increased rates of cancer. In Korea, the age-standardized incidence rates of prostate, kidney, and bladder cancers were 25.5% (only male), 6.0%, and 4.4% per 100,000, respectively.18 The annual percentage changes were stable for prostate and kidney cancers. Even for bladder cancer, annual percentage change decreased slightly (−1.4%).18 Therefore, our results are likely related to the growing ability to accommodate cancer patients at Asan Medical Center. A new building was opened in May 2008, and an additional robotic machine was introduced in July 2007. In addition, the differential number between patients and surgeries was likely caused by multiple surgeries in single patients, especially for TURB.
Several limitations are present in this study. The exact numbers of urologic cancer patients and their treatments require validation. The numbers in this study were checked only by only the entered diagnostic codes in the EMR. However, our preliminary results likely do reflect an accurate trend over the measured decade. Furthermore, the differences between these numbers and previously entered data could be used to confirm the differences between the diagnostic codes in the EMR and real medical services. This may be related to entry error or the medical insurance system.19 Finally, selection bias cannot be avoided with the retrospective model. Our institution is the biggest medical center in Korea, and many patients were transferred here from other hospitals.


Using a standardized method, the AMC-UCD fosters multidisciplinary research. This constructed database provides access to clinical statistics to effectively assist research. Preliminary data should be refined through EMR chart review. The successful organization of data from 2007–2016 provides a framework for future periods of investigation and prospective models.

Conflict of Interest

The authors claim no conflicts of interest.


1.Williams F, Boren SA. The role of the electronic medical record (EMR) in care delivery development in developing countries: a systematic review. In-form Prim Care 2008;16:139–45
2.Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform 2014;52:28–35
crossref pmid pmc
3.Chae YM, Yoo KB, Kim ES, Chae H. The adoption of electronic medical records and decision support systems in Korea. Healthc Inform Res 2011;17:172–7
crossref pmid pmc
4.Ross MK, Wei W, Ohno-Machado L. "Big data" and the electronic health record. Yearb Med Inform 2014;9:97–104
crossref pmid pmc
5.Hribar MR, Read-Brown S, Goldstein IH, Reznick LG, Lombardi L, Parikh M, et al. Secondary use of electronic health record data for clinical workflow analysis. J Am Med Inform Asso 2018;25:40–6
6.McCullough JM, Zimmerman FJ, Bell DS, Rodriguez HP. Local public health department adoption and use of electronic health records. J Public Health Manag Pract 2015;21:E20–8
crossref pmid
7.Birkhead GS, Klompas M, Shah NR. Uses of elec-tronic health records for public health surveillance to advance public health. Annu Rev Public Health 2015;36:345–59
crossref pmid
8.Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2014;2:3
crossref pmid pmc
9.Lee ES, Black RA, Harrington RD, Tarczy-Hornoch P, Facmi . Characterizing secondary use of clinical data. AMIA Jt Summits Transl Sci Proc 2015;2015:92–6
pmid pmc
10.Ghani KR, Zheng K, Wei JT, Friedman CP. Har-nessing big data for health care and research: are urologists ready? Eur Urol 2014;66:975–7
crossref pmid
11.Lamb BW, Jalil RT, Sevdalis N, Vincent C, Green JS. Strategies to improve the efficiency and utility of multidisciplinary team meetings in urology cancer care: a survey study. BMC Health Serv Res 2014;14:377
crossref pmid pmc
12.Hricak H, Choyke PL, Eberhardt SC, Leibel SA, Scardino PT. Imaging prostate cancer: a multidisci-plinary perspective. Radiology 2007;243:28–53
crossref pmid
13.Choi SY, Yoon CG. Urologic diseases in Korean military population: a 6-year epidemiological re-view of medical records. J Korean Med Sci 2017;32:135–42
crossref pmid
14.In H, Bilimoria KY, Stewart AK, Wroblewski KE, Posner MC, Talamonti MS, et al. Cancer recurrence: an important but missing variable in national can-cer registries. Ann Surg Oncol 2014;21:1520–9
crossref pmid
15.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (RED-Cap): a metadata-driven methodology and work-flow process for providing translational research informatics support. J Biomed Inform 2009;42:377–81
16.Prokosch HU, Ganslandt T. Perspectives for medi- cal informatics: reusing the electronic medical record for clinical research. Methods Inf Med 2009;48:38–44
17.Kragelund SH, Kjærsgaard M, Jensen-Fangel S, Leth RA, Ank N. Research electronic data capture (RED-Cap®) used as an audit tool with a built-in data-base. J Biomed Inform 2018;81:112–8
18.Korea Central Cancer Registry, National Cancer Center. Annual report of cancer statistics in Ko-rea in 2015 [Internet]. Sejong (Korea), Ministry of Health and Welfare. 2017, [cited 2019 Jun 5]. Available from:. http://www.mohw.go.kr/react/gm/sgm0701vw.jsp?PAR_MENU_ID=13&MENU_ID=1304080401&CONT_SEQ=357197.
19.Rodziewicz TL, Houseman B, Hipskind JE. Medical error prevention. StatPearls. Treasure Island (FL), StatPearls Publishing. 2020,
Editorial Office
Department of Urology, Chung-Ang University Hospital
102 Heukseok-ro, Dongjak-gu, Seoul 06973, Korea
Tel: +82-2-6299-1819   Fax: +82-2-6294-1406   E-mail: journal@kjuo.or.kr
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
Copyright © The Korean Urological Oncology Society and The Korean Prostate Society.                 Developed in M2PI
Close layer
prev next